Creating New Data Sets

Accessing the Data Set Editor

If IN-SPIRE is running, from the main IN-SPIRE menu bar, choose File > Data Sets... Alternatively, on start up, the Data Set Editor window opens after the splash screen has appeared.

Basic Steps

1a. To create a new data set from two existing data sets, see Merging Two Existing Data Sets.

1b. To create a new data set from source files, on the Data Set Editor window, click New. The Data Set Wizard window opens.

2. Notice that there are two tabbed panels on this screen: Data Sets and Saved Settings. Data Sets will be uppermost. Choose from the items on the list, which describe either the form of the data that you already have (ASCII data set or XML data set) or the Internet search that will retrieve the data you want to use for this data set (e.g., FBIS Portal Harvest, Google Harvest, and Web Harvest). then click Next, then follow the appropriate link below.

Settings for ASCII Data Set

Settings for XML Data Set

Settings for Google Harvest

Settings for Web Harvest

3. To use a saved processing configuration
Click on the Saved Settings tab and choose one of the saved settings listed in the window. You can also save the settings from this data set (see Step 9, below).

4. To accept the default processing options and start processing, go to step 10. To set your own processing options, click Next and go to step 5.

If you start processing at this point, without defining any fields, all the fields in your data will be lumped together for the purposes of the analysis. If this is not what you want, click Next and go to step 5.

Optional Settings

5. Fields. The default behavior for IN-SPIRE is for all the text in a document to be used for the analysis. If you have structured text documents with delimited sections, IN-SPIRE can be configured to analyze some document sections and not others. Click Add. The Field Properties window opens. See Fields for more information. There are special considerations when Using XML Data.

6. Stopwords: These are words which the text engine ignores during text processing; for all intents and purposes they will be eliminated from the data set. This means they will not appear in any of the tools such as Gist, and they will not be found when you query the data set. To edit the stopword list, click Next until the Stopword List window opens. To delete a word from this list, click on it to select it, then click Delete. To edit a word, click Edit. To add words to the list, click Add. For more information, see Stopwords.

7. Stopmajor list: Terms that you want to be available for gisting and query, but do not want considered as major terms when determining document relatedness. For more information, see Stopmajor Words.

8. Punctuation Rules. You can specify how punctuation characters in the text are treated by the text engine. Click Next until the Punctuation window opens. Click on the punctuation character of interest, then click Edit. For more information, see Punctuation Rules.

9. Save settings check box: This feature allows you to save your processing settings so they can be used in subsequent visualizations. They will be available through the "Saved Settings" tab in the Data Set Editor. The saved settings include the document collection, formats, Stopwords, and stopmajor settings. Click Save These Data Set Settings... checkbox, and edit the name of the setting, if you wish.

Start Processing

10. Click Finish to start processing. The Processing dialog opens, informing you that the data set is being processed. Click OK. The data set appears in the list of data sets in the Data Set Editor window. You can monitor its status as it is processed by clicking , the Refresh button, at the top of the Data Set Editor window.