You are here: Tools > Lexical Analysis > Lexical Analysis: Comparison

Lexical Analysis: Comparison

The Comparison tab lets you see if the documents in a particular group are, for example, more negative than those in the data set as a whole, or more negative than documents in a set of other groups. If you have created groups based on time, the Trends view visualizes the changes.

On the Lexical Analysis window, click on the Comparison tab. If you have never set up a comparison for the present dataset, the screen will appear like this:
The lexical distribution window as it appears before a comparison is set up.


Setting Up a Comparison

Elements

Steps

  1. On the Lexical Comparison panel, click the Choose Groups... button. The Lexical Distribution Setup window opens.
    Lexical Distribution Setup window
  2. Choose an Axis by selecting one from the Axis dropdown list. If the lexical content of documents has not been measured, this list will be empty. In that case, click Cancel to close the Lexical Distribution Setup window and measure with the selected lexicon.
  3. Choose groups for comparison by choosing any groups listed at left (shift or control-click to select multiple groups) and clicking the arrow button to more them to the Groups for Comparison list. The list at left always matches the sort order of the Groups sidebar widget.
  4. Choose a Baseline. This is the group or groups against which you want to compare the others. By default the baseline is the documents you chose for comparison. To choose All Documents, select the radio button. To choose any of the groups listed at left, choose the radio button, then select them and click the arrow button to move the selected items into the Baseline list. If you select several groups, statistics for all of the documents in the groups will be computed and displayed as the Baseline.
  5. Use the up and down buttons to the right of the Comparison Groups list to order the list however you'd like.
  6. Click OK.

Choosing a Comparison Type

The bottom of the Comparison section has three buttons used to choose which type of view you would like to see. Use the Groups view when you have created groups of documents not in a specific time series. Use the Trends view when your groups are organized by time. Use the Advanced view when you want to see the specific numbers used in the calculation. (Please note that prior to IN-SPIRE 5.6, Advanced view was called Distribution.)

Understanding a Comparison

In the comparison shown here, the baseline is "All Documents" and three groups created from searches are being compared with it (and with each other). The dots and scores indicate the strength of the chosen axis with calculated values greater than zero on the blue side, and values less than zero on the tan side. Since the axis in this example is "Negative", we can state that in this dataset, the terms used in the documents about Japan are less negative (more favorable) than documents about Afghanistan, and terms in the documents about weddings are much less negative.

To see the changes in lexical content in a population over time

Create sub-groups at intervals from the main group and then visualize them using Lexical Distribution.

  1. Create a group of interesting documents.
  2. In the Groups tool, highlight only that group or groups.
  3. Open the Time Tool.
  4. Click on the Groups button.
  5. Display bar graphs rather than line graphs, for ease in selection. Choose View > Bar Graphs.
  6. Adjust the sliding time window in the bottom panel to display the time period you are interested in in the upper panel.
  7. To select only the documents in a group in a particular time segment, hover the cursor over the bar that represent them. The magenta highlighting shows you that those documents are the potential selection. When the magenta highlighting encompasses the documents you want to select, click. The documents will be selected (that bar will now be green).
    By contrast, if you want to select all documents in a particular time period, regardless of which of the highlighted groups they are in, hover the cursor over the date. Notice that the magenta highlighting is now a vertical rectangle which extends across all groups. Click to select all documents that are highlighted magenta.
  8. Go back to the Groups tool and create a new group from the selected documents (or press Control-G on your keyboard).
  9. Repeat for each time period in which you are interested.
  10. Set up your comparison in Lexical Distribution such that the time sub-groups are ordered chronologically, so that it is easier to see the progression of changes, if there are any.

Another method to creating time-based groups is to use the Time Search method to make time bins of the entire dataset, then use the Overlap Selection button at the bottom of the Groups tool to make new selections that can be saved as new groups.

Once you have time-based groups, use the Trends view to see them on the red/green visualization with connecting lines to help you see the change in the selected axis over time. The groups will be arranged in the order specified in the Choose Groups window.

Using Lexical Analysis with Focus

Setting Focus to a specific topic and them measuring with a lexicon is a powerful way to see trends across time. The Trend screenshot above is not very interesting because news stories on the whole don't change their level of negativity month-to-month. Here's the same visualization with the Focus set to a search on Japan* (note the Tsunami hit in March 2011)

Note: If the Time Tool is "Filtering Time"

warning iconLexical Distribution is responsive to the Time Tool. If the Time Tool is filtering time (Time Tool > Gear Menu > Filter Time) , then the comparison will reflect the distribution of lexical scores in the active time period (the time period in the upper panel of the Time Tool). These documents will also be those that are visible in the Galaxy.

Advanced View

If you are concerned with the extremes of the distribution of lexical scores, the lower 25% and the upper 75%, the Advanced view will be more helpful than the mean scores. This view shows how the inner and outer quartiles of lexical scores of each group compare to the baseline group. This might help you identify skew of a group towards lesser or greater amounts of negativity compared to the baseline. You might also identify polarization of negativity in comparison to the baseline which would be displayed as large red blocks for the upper and lower quartiles.

Exporting Lexical Comparison Data

All of the numbers needed to perform your own detailed calculations of affect in the data can be saved from the Export tool. Choose Table of Data, and make sure the IN-SPIRE Tool is set to Lexical Analysis.

Other Representations of Lexical Content Scores

of