Editing Datasets

There are three ways to modify an existing dataset:

  1. You can change which documents it contains by subsetting or merging.

  2. You can change which documents it contains by changing the harvest settings of a web or Google harvest, so that the harvest retrieves different documents.

  3. For other types of datasets, you can keep the documents you have and change how they are processed. Suppose your dataset contains the right documents but has major terms that are not meaningful for the analysis, or words that should be disregarded, or poorly-defined fields. If it is a web harvest or Google harvest, the terms of the harvest itself might be improved to return a better result. Rather than create a new dataset with the improved settings, you can edit the existing one and reprocess it.

Accessing the Dataset Editor

  1. Open the Dataset Editor. From the main IN-SPIRE toolbar, choose File > Datasets... The Dataset Editor window appears.
    Dataset Editor Window

  2. Click on the name of the dataset you want to edit.

  3. Click Edit.
    If the selected dataset is open, you will be asked whether you want to close and edit it.
  4. Click OK to continue, and the Dataset Wizard window will appear.
    Editing a dataset, step 1
  5. If your current dataset is a Web Harvest or Google Harvest, you have the choice of either reprocessing with new settings the documents that you harvested previously, or of doing a new harvest. To reprocess the documents currently in your selected dataset, un-check the Reharvest Documents box. To preserve a copy of the existing dataset before doing a new harvest, stop editing now by clicking Cancel. Export the dataset to a new file name.

    If the dataset you are editing is not a harvest, you will have the option of adjusting the file list by adding or subtracting files.

  6. Change any of the settings associated with the dataset, just as you could when you first created it. For details, see Creating New Datasets.

  7. Save your settings for future use by clicking Save These Dataset Settings on the last step in the Dataset Wizard.