IN-SPIRE automatically determines which words from your dataset are
best for discriminating one document from another. Most of the time these
are also the best words for describing the topical content of your documents,
or "major terms". The process of determining major terms is
not perfect, however, so IN-SPIRE gives you the ability to prevent a word
from becoming a major term. You can create a customized Stopmajor list
for a particular dataset, and save it for use in subsequent analyses.
Influencing which Words can be Major Terms
One way to stop some words from being major terms is to add them to
the stopword list, but doing so effectively eliminates any reference to
them, and they cannot be gisted or queried for. The Stopmajor list identifies
terms that you want to be available for gisting and query, but do not
want considered when IN-SPIRE is determining what major terms are used
for creating the document "signatures".
Adding a term to the Stopmajor list prevents it from influencing the
clustering of the documents and from appearing as a ThemeView peak label,
since it is not part of the document's mathematical signature. However,
because the Stopmajor list affects the complex statistics and term relationships
that are the foundation of the IN-SPIRE text processing, it should be
used sparingly.
The Stopmajor list is accessible from the Dataset Editor Dataset Wizard.
From the IN-SPIRE main toolbar window select File
> Datasets. The Dataset Editor window opens.
Click on the name of the dataset of interest to select
it, and click Edit, or
click New. The Dataset
Wizard opens. Note: If
a dataset is open, you will be given the option to close it and continue,
or cancel.
If you are editing an existing dataset, click Next
until you see the Stopmajor list panel. If this is a new dataset, you
must select a dataset name and data type to go on to the Stopmajor panel.
If the Dataset Wizard is already open, use the <Back and Next>
buttons to find the Stopmajor panel.
Adding or Removing Words from the Stopmajor List
Access the Stopmajor panel.
Review the Stopmajor list. There are no Stopmajor
words unless you specify some, so the Current
Stopmajors is empty to begin with.
To add a word, click Add.
The Add Stopmajors window opens.
Enter one or more terms, separated by spaces, and
click Add. Notice that
the terms will appear in alphabetical order in the Current Stopmajors
panel of the Stopmajors window. When all the terms you want to identify
as stopmajors appear there, click Done on the Add Stopmajors
window to close it.
Alphanumeric tokens are words that contain both letters
and numbers, for example, "32B". To insure that no alphanumeric
words become major terms, click on the checkbox by Automatically
use alphanumeric tokens as stopmajors.
To remove a word from the Stopmajor list, click to
select the word you want to remove and then click Delete.
You can delete several words at a time. See Selecting
multiple items for how to select several items from a list.
Using Words from Another Dataset Stopmajor List
Access the Stopmajor panel. On the Stopmajor panel,
click Load... A file chooser dialog opens to the DatasetRoot\Stopwords\
directory. Stopmajor files that are accessible here are those that have
been explicitly saved for use by other datasets. See Saving
the Current Stopmajor List for Use by Other Datasets.
Choose a Stopmajor file and click Load.
The terms in the file are listed in the panel to the left, with the
file name directly above.
Choose which terms you want to add by selecting (highlighting
them):
To add all terms in the file to your new stopwords
list, click in the left hand panel, which contains the list of terms you
can add. Click Ctrl-A to select all terms in that list. All terms will
be selected.
To add a single word from the file to your new
stopwords list, click on the word in the list to select it.
To select several terms from the list so that
you can add them all simultaneously, see Selecting
Items from a List. Note: You
may notice a line like "*** webdata.stop_major ***" in the list
of words. Don't be concerned; after all, this word will never be found
in your dataset. Do include it in the words you copy to the Current
Stopmajor list in Step 4.
Click to add the selected terms to
the current stopword list.
The selected terms appear in the Current Stopwords
list.
Saving the Current Stopmajor List
for Use by Other Datasets
On the upper right of the Stopmajor panel, click
Save... A file save
dialog opens.
Enter a name for this stopmajor file. Note: The
extension ".stop_major" will be automatically added to the filename
you enter.
The default Stopmajor file is used for all new datasets. To modify this
file, you can either edit it in a text editor, or use the IN-SPIRE Stopmajor
panel, as above, to create the desired Stopmajor file. Using the Stopmajor
panel is the preferred method, as you needn't worry about keeping the
stopmajor file alphabetized; it's taken care of for you. To use the Stopmajor
panel:
Load the default stopmajor file that is distributed
with IN-SPIRE (00000000.stop_major) into the Stopmajor panel, as above.
Add all or some of the terms in the file to the Current
Stopmajor list.
Add and delete terms as appropriate.
Save the Current Stopmajor list in the INSPIRE\DatasetRoot
folder as "00000000.stop_major". Caution: Be
careful of the number of zeroes; there are eight of them.
Alternatively, edit the default Stopmajor file in a text editor:
In the main INSPIRE\DatasetRoot directory (by default
this is C:\\Program Files\INSPIRE\DatasetRoot), find a file named 00000000.stop_major.
This file contains the default stopmajor list.
Make a backup copy of 00000000.stop_major.
Edit 00000000.stop_major in a text editor such as
Notepad. If you edit the file with MS Word, make sure you save it as text. Caution: The
stopmajor file is in alphabetical order and must remain so. Corrupting
the stopmajor file will cause problems with the datasets.