Outliers

Sometimes your Galaxy visualization will contain documents that are so different from the majority of the documents in the dataset that they skew how the dataset as a whole is presented in the Galaxy. These different documents are called outliers.  Extreme outliers can cause all the other documents to be piled together on the opposite side of the Galaxy window, making it difficult to understand the relationships between the very documents that are most interesting.

 

Viewing the outlier documents in the Document Viewer may reveal that they are irrelevant to the analysis and should be "set aside" from the main visualization. The Outliers Panel is a "holding area" where you can place documents so that they will not affect the overall Galaxy visualization, but where they are still responsive to the Query, Gist, and most other IN-SPIRE tools. You can place any selected document in the Outlier Panel. Most often you will use it to remove outlier documents from the Galaxy visualization; also use it when creating dataset subsets.

 

After a dataset is recalculated with the outliers excluded, the outlier documents will

 

3-steps in creating a subset data set

Moving Outliers In and Out of the Galaxy

  1. Open the Galaxy for the dataset of interest.
  2. Use the S+ cursor, Query tool, or other selection method to select the outliers you want to move. See Ways to Select.
  3. Click on downward arrow (between the Galaxy and the Outliers panel). The selected document dots will be removed from the Galaxy, appearing in the Outliers panel. To see the titles of the outlier documents, from the Galaxy View menu, select Outlier Titles. The Recalculate button becomes active, to remind you that the Galaxy visualization needs to be recalculated. (ThemeView will also be recalculated when the Galaxy is recalculated.)
  4. If you change your mind about documents in the Outliers panel, and want to move them back to the Galaxy again, select them from the outlier panel and click on . The selected outliers will be moved back to the Galaxy and the Galaxy will automatically be recalculated.
  5. When you have added to the Outliers panel all of the outliers you want to eliminate, click Recalculate. The Galaxy is recalculated without the outliers. The visualization will change.
    Note: If you close the dataset (or the IN-SPIRE application) before recalculating, you will be warned that there are Outlier changes pending and given the choice to recalculate or add the Outliers back into the Galaxy before closing.

 

You can also remove documents from the Galaxy and add them to the Outliers panel from the Document Viewer.

Using the Outliers Panel

To interact with documents in the Outliers Panel, use the S+ cursor Select Document Button and Cursor and the V cursor View a Document Button and Cursor.  These work the same way they do in the Galaxy, so that documents in the Outliers panel can be selected and viewed the same way that documents in the Galaxy can. Any query of the dataset will also include outliers. You will probably want to examine unexpected query hits among the outliers.

 

To make all outlier document titles visible, from the Galaxy toolbar View menu, select Outlier Titles. Outlier documents are redrawn in a vertical list, showing the title of each.

Using Outlier Terms

You may notice some terms on the major terms list which occur frequently in your documents but which do not discriminate between them, or which are "red herrings" for the purposes of your analysis. IN-SPIRE enables you to identify those terms and remove them from being considered when the visualization is calculated. To define outlier terms perform the following procedure.

 

  1. Open the Galaxy for the dataset of interest.
  2. From the IN-SPIRE main toolbar Window menu, select Major Terms.  The Major Terms window will display.
  3. Scroll through the alphabetical list until you find a term which you want to remove from the major terms list. Click on it.
  4. Click Outlier Terms. The term will appear in dark red in the Outlier Terms box at the right hand side of the Outlier Panel.
  5. When you have moved all of the terms you want to eliminate to the Outlier Terms panel, on the Galaxy window, click Recalculate. The Galaxy is recalculated without the Outlier Terms, and the terms in the Outlier Terms box are now black rather than dark red.

Using Outlier Shortcuts


The Outlier Shortcuts drop-down menu allows you to: