Stopmajor

IN-SPIRE automatically determines which words from your dataset are best for discriminating one document from another. Most of the time these are also the best words for describing the topical content of your documents, or "major terms". The process of determining major terms is not perfect, however, so IN-SPIRE gives you the ability to prevent a word from becoming a major term. You can create a customized Stopmajor list for a particular dataset, and save it for use in subsequent analyses.

Influencing which Words can be Major Terms

One way to stop some words from being major terms is to add them to the stopword list, but doing so effectively eliminates any reference to them, and they cannot be gisted or queried for. The Stopmajor list identifies terms that you want to be available for gisting and query, but do not want considered when IN-SPIRE is determining what major terms are used for creating the document "signatures".

 

Adding a term to the Stopmajor list prevents it from influencing the clustering of the documents and from appearing as a ThemeView peak label, since it is not part of the document's mathematical signature. However, because the Stopmajor list affects the complex statistics and term relationships that are the foundation of the IN-SPIRE text processing, it should be used sparingly.

 

Like Stopwords and Punctuation Rules, a Stopmajor list is associated with a dataset.

Accessing the Stopmajor List

The Stopmajor list is accessible from the Dataset Editor Dataset Wizard.  

 

  1. From the IN-SPIRE main toolbar window select File > Datasets. The Dataset Editor window opens.
  2. Click on the name of the dataset of interest to select it, and click Edit, or click New. The Dataset Wizard opens.
    Note:  If a dataset is open, you will be given the option to close it and continue, or cancel.
  3. If you are editing an existing dataset, click Next until you see the Stopmajor list panel. If this is a new dataset, you must select a dataset name and data type to go on to the Stopmajor panel.

 

If the Dataset Wizard is already open, use the <Back and Next> buttons to find the Stopmajor panel.

Adding or Removing Words from the Stopmajor List

  1. Access the Stopmajor panel.
  2. Review the Stopmajor list. There are no Stopmajor words unless you specify some, so the Current Stopmajors is empty to begin with.
  3. To add a word, click Add. The Add Stopmajors window opens.
  4. Enter one or more terms, separated by spaces, and click Add. Notice that the terms will appear in alphabetical order in the Current Stopmajors panel of the Stopmajors window. When all the terms you want to identify as stopmajors appear there, click Done on the Add Stopmajors window to close it.
  5. Alphanumeric tokens are words that contain both letters and numbers, for example, "32B". To insure that no alphanumeric words become major terms, click on the checkbox by Automatically use alphanumeric tokens as stopmajors.
  6. To remove a word from the Stopmajor list, click to select the word you want to remove and then click Delete. You can delete several words at a time. See Selecting multiple items for how to select several items from a list.

Using Words from Another Dataset Stopmajor List

  1. Access the Stopmajor panel. On the Stopmajor panel, click Load... A file chooser dialog opens to the DatasetRoot\Stopwords\ directory. Stopmajor files that are accessible here are those that have been explicitly saved for use by other datasets. See Saving the Current Stopmajor List for Use by Other Datasets.
  2. Choose a Stopmajor file and click Load. The terms in the file are listed in the panel to the left, with the file name directly above.
  3. Choose which terms you want to add by selecting (highlighting them):
  4. Click right arrow button to add the selected terms to the current stopword list.
  5. The selected terms appear in the Current Stopwords list.

Saving the Current Stopmajor List for Use by Other Datasets

  1. On the upper right of the Stopmajor panel, click Save... A file save dialog opens.
  2. Enter a name for this stopmajor file.
    Note:  The extension ".stop_major" will be automatically added to the filename you enter.
  3. Click Save. The file is saved in the Stopwords directory, For how to use this saved stopmajor file with any dataset, see Using words from another dataset's Stopmajor list (above).

Editing the Default Stopmajor File

The default Stopmajor file is used for all new datasets. To modify this file, you can either edit it in a text editor, or use the IN-SPIRE Stopmajor panel, as above, to create the desired Stopmajor file. Using the Stopmajor panel is the preferred method, as you needn't worry about keeping the stopmajor file alphabetized; it's taken care of for you. To use the Stopmajor panel:

 

  1. Load the default stopmajor file that is distributed with IN-SPIRE (00000000.stop_major) into the Stopmajor panel, as above.
  2. Add all or some of the terms in the file to the Current Stopmajor list.
  3. Add and delete terms as appropriate.
  4. Save the Current Stopmajor list in the INSPIRE\DatasetRoot folder as "00000000.stop_major".
    Caution:  Be careful of the number of zeroes; there are eight of them.

 

Alternatively, edit the default Stopmajor file in a text editor:

 

  1. In the main INSPIRE\DatasetRoot directory (by default this is C:\\Program Files\INSPIRE\DatasetRoot), find a file named 00000000.stop_major. This file contains the default stopmajor list.
  2. Make a backup copy of 00000000.stop_major.
  3. Edit 00000000.stop_major in a text editor such as Notepad. If you edit the file with MS Word, make sure you save it as text.
    Caution:  The stopmajor file is in alphabetical order and must remain so. Corrupting the stopmajor file will cause problems with the datasets.