Welcome

Data Sets
Overview
Creating New
--ASCII Text
--XML
--Google Harvest
--Web Harvest
Settings
--Fields
--Stopwords
--Stopmajors
--Punctuation Rules
Editing
Merging
Exporting
Importing
Subsetting

Visualizations
Galaxy
--Basics
--Outliers
ThemeView
Settings

Tools
Document Viewer
Gist
Groups
--Basics
--Evidence Panel
Major Terms
Queries
Print
Probe
Time Slicer

About version 2.2
Overview
Known issues

Subsetting Data Sets

You can create a subset data set that contains some but not all documents in an existing data set. In general, there are two ways to do this: Collect the documents you're interested in, e.g., in a group and make a data set from it, or subtract from the Galaxy those documents you want to exclude, and make a new data set of the remainder.

Subset

Included documents

Effect of Timeslicer

Outliers

From Selection...

Selected documents, wherever they may be located.

None

Selected outliers will be included.

From Galaxy...

Documents visible in the Galaxy only. Move documents you want to exclude to the Outliers panel.

If the Timeslicer is on, the documents in the Galaxy are those in the current slice.

Pending outliers (still part of the Galaxy because the Galaxy has not been recalculated) will be included; processed outliers will not be.

From Outliers...
All processed outliers in the Outliers panel, i.e., Galaxy has been recalculated and the Recalculate button is not active.
If the Timeslicer is on, the outliers that will be included are those processed outliers that are in the currently visible slice.
Processed outliers (now part of Outliers) will be included; pending outliers will not be, since they're still part of the Galaxy.

How to create a subset

  1. With the data set you want to subset open, and considering the information in the table above, choose which type of subset is most convenient for you:
    Are the valuable documents selected, or grouped? Create a subset From Selection... Go to step 2a.
    Are the valuable documents all those currently visible in the Galaxy? Create a subset From Galaxy... Go to step 2b.
    Are the valuable documents all in the Outliers panel? Create a subset From Outliers. Go to step 2c.
  2. a. To create a subset from selection: Select the documents you want to include in the subset. Consider Using the Group tool to create a group of these documents. As you discover documents to add, you can add them to the Group. Selected documents can be in the Galaxy, or the Outliers panel, or both. Choose Subset > From Selection... Go to step 3.

    b. To create a suset from the Galaxy: Select the documents you want to exclude from the subset, and move them to the Outliers panel. See outliers for more information. Click Recalculate to reprocess the data set. Choose Subset > From Galaxy...
    Or, click create subset from outliers button above the Outliers panel. Go to step 3.

    c. To create a subset from Outliers: Select documents you want to include in the subset and move them to the Outliers panel. See outliers for more information. Click Recalculate to reprocess the data set. Choose Subset > From Outliers... Go to step 3.
  3. The Data Set Wizard window opens.
  4. Edit the name (the default is "<data set name> Subset") and any of the settings (stopwords, stopmajor list, etc). For further information, see Stopwords, Punctuation Rules or Stopmajor List. You shouldn't have to edit the Fields.
  5. Click Finish. The subset data set is processed and appears in the list in the Data Set Editor window.
  6. To open the subset data set, choose File > Data Sets... The Data Set Editor window opens. If the status of the subset data set is "Available", click on it to select it, and click Open.