Editing Datasets

There are three ways to modify an existing dataset:

  1. You can change which documents it contains by subsetting or merging.

  2. You can change which documents it contains by changing the harvest settings of a web or Google harvest, so that the harvest retrieves different documents.

  3. For other types of datasets, you can keep the documents you have and change how they are processed. Suppose your dataset contains the right documents but has major terms that are not meaningful for the analysis, or words that should be disregarded, or poorly-defined fields. If it is a web harvest or Google harvest, the terms of the harvest itself might be improved to return a better result. Rather than create a new dataset with the improved settings, you can edit the existing one and reprocess it.

Accessing the Dataset Editor

  1. Open the Dataset Editor.  From the main IN-SPIRE toolbar, choose File > Datasets... The Dataset Editor window opens.

  2. Click on the name of the dataset you want to edit.

  3. Click Edit. The Dataset Wizard window opens.

  4. To reprocess the documents currently in your selected dataset, un-check the Reharvest Documents box on the first step of the wizard. No Reharvest of Documents
    Note
    :  If you uncheck the box, your current dataset will be changed.  If you want to preserve a copy of the original dataset, you will have to export the dataset to a new file name.  

  5. To run a completely new data harvest, make sure the Reharvest Documents box is checked. Reharvest Documents Checkbox
    Note
    :  This will not change your currently selected dataset, instead an entirely new data harvest will be performed and you will have a new dataset to work with.

  6. Change any of the settings associated with the dataset, just as you would when you first created it. For details, see Creating New Datasets.

  7. Remember to save your settings for future use by clicking Checkbox Save These Dataset Settings on the last step in the Dataset Wizard.