Galaxy: Outliers
Sometimes your Galaxy will contain documents that are so dissimilar to the rest of the documents in the dataset that they are plotted over on one side of the Galaxy, and all the other documents are plotted on the opposite side, compressing the "interesting" documents and making it difficult to understand the relationships between the very documents that you most want to see. You can place any selected document in the Outliers panel, a "holding area" for documents that will not affect the Galaxy visualization but where they will still be responsive to Search, Summary, and most other IN-SPIRE tools.
Outliers is a useful triage tool. View documents in the Document Viewer, and then make any irrelevant documents outliers. You have the option of creating a subset from the documents still in the Galaxy. Conversely, you can make all the most relevant documents outliers, then create a subset from the outliers. See creating dataset subsets.
In each case, to remove the influence of the outliers, the dataset needs to be recalculated. Then, outlier documents will:
- Not be deleted from the dataset.
- Be visible and accessible in the Outliers panel.
- Be available for use in the Correlation, Document Viewer, Groups, Probe, Search, and Time tools when they are selected, just as documents in the Galaxy itself are.
Moving Outliers In and Out of the Galaxy
- Open the Galaxy for the dataset of interest.
- Use the S+ cursor, Search tool, or other selection method to select the outliers you want to move. See Ways to Select Documents.
- Click on
(between the Galaxy and the Outliers panel) to removed the selected documents from the Galaxy and have them appear in the Outliers panel. To see the titles of the outlier documents, on the Galaxy window, select View >Outlier Titles. The Recalculate
button becomes active to remind you that the Galaxy visualization needs to be recalculated. (ThemeView Classic will also be recalculated at the same time.)
- If you change your mind about documents in the Outliers panel and want to move them back to the Galaxy again, select them from the Outliers panel, and click on
. The selected outliers will be moved back to the Galaxy, and the Galaxy will automatically be recalculated.
- When you have added to the Outliers panel all of the outliers you want to eliminate, click the Recalculate button. The Galaxy is recalculated without the outliers. The visualization will change.
If you close the dataset (or the IN-SPIRE application) before recalculating, you will be warned that there are Outlier changes pending and given the choice to recalculate or add the Outliers back into the Galaxy before closing.
You can also remove documents from the Galaxy and add them to the Outliers panel from the Document Viewer.
Using the Outliers Panel
The Galaxy shows similarities and differences among documents. Sometimes a very few documents are so unique that these differences outweigh the other relationships. The result may be a Galaxy with most all of the documents plotted into a relatively small part of the screen. To balance the Galaxy
-
In the Galaxy window, select the few documents that are off to themselves, and open the Document Viewer
to see why they are so different. For example, there may be almost no text in the documents, or they may be in a different language.
-
Click the Outliers down butto arrow on the Galaxy window. The irregular documents will be moved to the Outliers pane.
-
Click the Recalculate button. The Galaxy will re-cluster and rearrange the remaining documents.
-
If you want to keep track of the documents you moved down, open the Groups tool and make a new group out of them while they are still selected.
Documents in the Outliers panel can be selected and viewed the same way that documents in the Galaxy can by using the selection cursor
and view selected
. Any search of the dataset will also include outliers. You will probably want to examine unexpected search hits among the outliers.
To make all outlier document titles visible, from the Galaxy toolbar select View > Outlier Titles. Outlier documents are redrawn in a vertical list, each with its title.
Using Outlier Terms
You may notice some terms on the major terms list that occur frequently in your documents but which do not discriminate between them, or which are "red herrings" for the purposes of your analysis. If you can identify those terms, you can stop them from having an influence on how the visualization is calculated.
- Open the Galaxy.
- From the IN-SPIRE main menu, select Tools > Major Terms. The Major Terms window displays with all of the major terms in alphabetical order. These are the terms whose relative strengths determine the positioning of documents in the Galaxy and which determine document similarity for any Search by Example.

- Click on any term in the list whose influence you want to remove, and then click Outlier Terms. The term will appear in dark red in the Outlier Terms box at the right hand side of the Outliers panel. Repeat until you have moved all of the terms you want to eliminate to Outlier terms.
- On the Galaxy window, click the Recalculate button. The Galaxy is recalculated without the Outlier Terms, and the terms in the Outlier Terms box are now black rather than dark red.
Using Outlier Shortcuts
The Outlier Shortcuts drop-down menu allows you to:
- Temporarily move all documents except selected ones to the Outliers panel, so the Galaxy visualization shows only selected documents, select Only Selected Documents.
- Return to the original visualization with all of the documents in the Galaxy, select All Documents.
- See only grouped and selected documents in the Galaxy, select Only Colored Documents.
- Save and revisit the present view, select Save As, enter a name for the view, and click OK. The name you entered appears on the Shortcuts drop-down menu. The name of the view from the Shortcuts drop-down menu that you have chosen will appear just to the right of the drop-down menu, under the Galaxy.
- Load, Rename, and Delete shortcuts, select Manage Shortcuts. The Outlier Shortcuts window will display. Clicking Load causes the selected view to be loaded into the Galaxy.