Welcome

Data Sets
Overview
Creating New
--ASCII Text
--XML
--Google Harvest
--Web Harvest
Settings
--Fields
--Stopwords
--Stopmajors
--Punctuation Rules
Editing
Merging
Exporting
Importing
Subsetting

Visualizations
Galaxy
--Basics
--Outliers
ThemeView
Settings

Tools
Document Viewer
Gist
Groups
--Basics
--Evidence Panel
Major Terms
Queries
Print
Probe
Time Slicer

About version 2.2
Overview
Known issues

Settings: Stopmajors

IN-SPIRE automatically determines which words from your data set are best for discriminating one document from another. Most of the time these are also the best words for describing the topical content of your documents, or "major terms". The process of determining major terms is not perfect, however, so IN-SPIRE gives you the ability to prevent a word from becoming a major term. You can create a customized Stopmajor list for a particular data set, and save it for use in subsequent analyses.

Influencing what words can be major terms

One way to stop some words from being major terms is to add them to the stopword list, but doing so effectively eliminates any reference to them, and they cannot be gisted or queried for. The Stopmajor list identifies terms that you want to be available for gisting and query, but do not want considered when IN-SPIRE is determining what major terms are used for creating the document "signatures".

Adding a term to the Stopmajor list prevents it from influencing the clustering of the documents and from appearing as a ThemeView™ peak label, since it's not part of the document's mathematical signature. However, because the Stopmajor list affects the complex statistics and term relationships that are the foundation of the IN-SPIRE text processing, it should be used sparingly.

Like Stopwords and Punctuation Rules, a Stopmajor list is associated with a data set.

Accessing the Stopmajor list

The Stopmajor list is accessible from the Data Set Editor's Data Set Wizard, which will be open when you create a new data set. If it is not:

  1. Choose File > Data Sets. . . The Data Set Editor window opens.
  2. Click on the name of the data set of interest to select it, and click Edit, or click New. The Data Set Wizard opens.
    If a data set is open, you will be given the option to close it and continue, or cancel.
  3. If you are editing an existing data set, click Next until you see the Stopmajor list panel (for an ASCII data set, this is Step 4 of 6). If this is a new data set, you must select a data set name and data type to go on to the Stopmajor panel.

If the Data Set Wizard is already open, use the <Back and Next> buttons to find the Stopmajor panel, which is Step 4 of 6.

Adding or removing words from the Stopmajor list

  1. Access the Stopmajor panel.
  2. Review the Stopmajor list. There are no Stopmajor words unless you specify some, so the Current Stopmajors is empty to begin with.
    stopmajors panel
  3. To add a word, click Add... The Add Stopmajors window opens.
  4. Enter one or more terms, separated by spaces, and click Add. Notice that the terms will appear in alphabetical order in the Current Stopmajors panel of the Stopmajors window. When all the terms you want to identify as stopmajors appear there, click Done on the Add Stopmajors window to close it.
  5. Alphanumeric tokens are words that contain both letters and numbers, for example, "32B". To insure that no alphanumeric words become major terms, click on the checkbox by Automatically use alphanumeric tokens as stopmajors.
  6. To remove a word from the Stopmajor list, click to select the word you want to remove and then click Delete. You can delete several words at a time. See Selecting multiple items for how to select several items from a list.

Using words from another data set's Stopmajor list

  1. Access the Stopmajor panel. On the Stopmajor panel, click Load... A file chooser dialog opens to the DatasetRoot\Stopwords\ directory. Stopmajor files that are accessible here are those that have been explicitly saved for use by other data sets. See To save the Current Stopmajor list for use by other data sets.
    stopmajors panel after loading a stopmajors file
  2. Choose a Stopmajor file and click Load. The terms in the file are listed in the panel to the left, with the file name directly above.
  3. Choose which terms you want to add by selecting (highlighting them):

    To add all terms in the file to your new stopwords list:
    Click in the lefthand panel, which contains the list of terms you can add.
    Click CTRL-A to select all terms in that list. All terms will be selected.

    To add a single word from the file to your new stopwords list:
    Click on the word in the list to select it.

    To select several terms from the list so that you can add them all simultaneously, see Adding several items from a list.
    You may notice a line like "*** webdata.stop_major ***" in the list of words. Don't be concerned; after all, this word will never be found in your data set. Do include it in the words you copy to the Current Stopmajor list in Step 4.
  4. Click right arrow button to add the selected terms to the current stopword list.
  5. The selected terms appear in the Current Stopwords list.

To save the Current Stopmajor list for use by other data sets

  1. On the upper right of the Stopmajor panel, click Save... A file save dialog opens.
  2. Enter a name for this stopmajor file.
    danger iconThe extension ".stop_major" will be automatically added to the filename you enter.
  3. Click Save. The file is saved in the Stopwords directory, For how to use this saved stopmajor file with any data set, see Using words from another data set's Stopmajor list (above).

Editing the default Stopmajor file

The default Stopmajor file is used for all new data sets. To modify this file, you can either edit it in a text editor, or use the IN-SPIRE Stopmajor panel, as above, to create the desired Stopmajor file. Using the Stopmajor panel is the preferred method, as you needn't worry about keeping the stopmajor file alphabetized; it's taken care of for you. To use the Stopmajor panel:

  1. Load the default stopmajor file that is distributed with IN-SPIRE (00000000.stop_major) into the Stopmajor panel, as above.
  2. Add all or some of the terms in the file to the Current Stopmajor list.
  3. Add and delete terms as appropriate.
  4. Save the Current Stopmajor list in the INSPIRE\DatasetRoot folder as "00000000.stop_major".
    Be careful of the number of zeroes; there are eight of them.

Alternatively, edit the default Stopmajor file in a text editor:

  1. In the main INSPIRE\DatasetRoot directory (by default this is C:\\Program Files\INSPIRE\DatasetRoot), find a file named 00000000.stop_major. This file contains the default stopmajor list.
  2. Make a backup copy of 00000000.stop_major.
  3. Edit 00000000.stop_major in a text editor such as Notepad. If you edit the file with MS Word, make sure you save it as text.
    danger iconThe stopmajor file is in alphabetical order and must remain so. Corrupting the stopmajor file will cause problems with the data sets.