Galaxy: Outliers

Sometimes your Galaxy will contain documents that are so dissimilar to the rest of the documents in the dataset that they are plotted over on one side of the Galaxy, and all the other documents are plotted on the opposite side, compressing the "interesting" documents and making it difficult to understand the relationships between the very documents that you most want to see. You can place any selected document in the Outliers panel, a "holding area" for documents that will not affect the Galaxy visualization but where they will still be responsive to Search, Summary, and most other IN-SPIRE tools.

Outliers is a useful triage tool. View documents in the Document Viewer, and then make any irrelevant documents outliers. You have the option of creating a subset from the documents still in the Galaxy. Conversely, you can make all the most relevant documents outliers, then create a subset from the outliers. See creating dataset subsets.

In each case, to remove the influence of the outliers, the dataset needs to be recalculated. Then, outlier documents will:

3-steps in creating a subset data set

Moving Outliers In and Out of the Galaxy

  1. Open the Galaxy for the dataset of interest.
  2. Use the S+ cursor, Search tool, or other selection method to select the outliers you want to move. See Ways to Select Documents.
  3. Click on (between the Galaxy and the Outliers panel) to removed the selected documents from the Galaxy and have them appear in the Outliers panel. To see the titles of the outlier documents, on the Galaxy window, select View >Outlier Titles. The Recalculate button becomes active to remind you that the Galaxy visualization needs to be recalculated. (ThemeView Classic will also be recalculated at the same time.)
  4. If you change your mind about documents in the Outliers panel and want to move them back to the Galaxy again, select them from the Outliers panel, and click on . The selected outliers will be moved back to the Galaxy, and the Galaxy will automatically be recalculated.
  5. When you have added to the Outliers panel all of the outliers you want to eliminate, click the Recalculate button. The Galaxy is recalculated without the outliers. The visualization will change.
    If you close the dataset (or the IN-SPIRE application) before recalculating, you will be warned that there are Outlier changes pending and given the choice to recalculate or add the Outliers back into the Galaxy before closing.

You can also remove documents from the Galaxy and add them to the Outliers panel from the Document Viewer.

Using the Outliers Panel

The Galaxy shows similarities and differences among documents. Sometimes a very few documents are so unique that these differences outweigh the other relationships. The result may be a Galaxy with most all of the documents plotted into a relatively small part of the screen. To balance the Galaxy

  1. In the Galaxy window, select the few documents that are off to themselves, and open the Document Viewer Document Viewer to see why they are so different. For example, there may be almost no text in the documents, or they may be in a different language.

  2. Click the Outliers down butto arrow on the Galaxy window. The irregular documents will be moved to the Outliers pane.

  3. Click the Recalculate button. The Galaxy will re-cluster and rearrange the remaining documents.

  4. If you want to keep track of the documents you moved down, open the Groups tool and make a new group out of them while they are still selected.

Documents in the Outliers panel can be selected and viewed the same way that documents in the Galaxy can by using the selection cursor Select Document Button and Cursor and view selected . Any search of the dataset will also include outliers. You will probably want to examine unexpected search hits among the outliers.

To make all outlier document titles visible, from the Galaxy toolbar select View > Outlier Titles. Outlier documents are redrawn in a vertical list, each with its title.

Using Outlier Terms

You may notice some terms on the major terms list that occur frequently in your documents but which do not discriminate between them, or which are "red herrings" for the purposes of your analysis. If you can identify those terms, you can stop them from having an influence on how the visualization is calculated.  

  1. Open the Galaxy.
  2. From the IN-SPIRE main menu, select Tools > Major Terms. The Major Terms window displays with all of the major terms in alphabetical order. These are the terms whose relative strengths determine the positioning of documents in the Galaxy and which determine document similarity for any Search by Example.
  3. Click on any term in the list whose influence you want to remove, and then click Outlier Terms. The term will appear in dark red in the Outlier Terms box at the right hand side of the Outliers panel. Repeat until you have moved all of the terms you want to eliminate to Outlier terms.
  4. On the Galaxy window, click the Recalculate button. The Galaxy is recalculated without the Outlier Terms, and the terms in the Outlier Terms box are now black rather than dark red.

Using Outlier Shortcuts


The Outlier Shortcuts drop-down menu allows you to: