Welcome
Data Sets
Overview
Creating New
--ASCII Text
--XML
--Google Harvest
--Web Harvest
Settings
--Fields
--Stopwords
--Stopmajors
--Punctuation Rules
Editing
Merging
Exporting
Importing
Subsetting
Visualizations
Galaxy
--Basics
--Outliers
ThemeView
Settings
Tools
Document Viewer
Gist
Groups
--Basics
--Evidence Panel
Major Terms
Queries
Print
Probe
Time Slicer
About version 2.2
Overview
Known issues
|
Subsetting Data Sets
You can create a subset data set that contains some but not all documents
in an existing data set. In general, there are two ways to do this: Collect
the documents you're interested in, e.g., in a group and make a data set
from it, or subtract from the Galaxy those documents you want to exclude,
and make a new data set of the remainder.
Subset |
Included documents |
Effect of Timeslicer |
Outliers |
From Selection... |
Selected documents, wherever they may be located. |
None |
Selected outliers will be included. |
From Galaxy... |
Documents visible in the Galaxy only. Move documents
you want to exclude to the Outliers panel. |
If the Timeslicer is on, the documents in the Galaxy
are those in the current slice. |
Pending outliers (still part of the Galaxy because
the Galaxy has not been recalculated) will be included; processed
outliers will not be. |
From Outliers... |
All processed outliers in the Outliers panel,
i.e., Galaxy has been recalculated and the Recalculate
button is not active. |
If the Timeslicer is on, the outliers that will
be included are those processed outliers that are in the currently
visible slice. |
Processed outliers (now part of Outliers) will
be included; pending outliers will not be, since they're still part
of the Galaxy. |
How to create a subset
- With the data set you want to subset open, and considering the information
in the table above, choose which type of subset is most convenient for
you:
Are the valuable documents selected, or grouped? Create a subset
From Selection... Go to step 2a.
Are the valuable documents all those currently visible in the Galaxy?
Create a subset From Galaxy... Go to step 2b.
Are the valuable documents all in the Outliers panel? Create a subset
From Outliers. Go to step 2c.
- a. To create a subset from selection: Select
the documents you want to include in the subset. Consider
Using the Group tool to create a group of these documents. As you discover
documents to add, you can add them to the Group. Selected documents
can be in the Galaxy, or the Outliers panel, or both. Choose Subset
> From Selection... Go to step 3.
b. To create a suset from the Galaxy: Select
the documents you want to exclude from the subset, and
move them to the Outliers panel. See outliers
for more information. Click Recalculate to reprocess the data
set. Choose Subset > From Galaxy...
Or, click
above the Outliers panel. Go to step 3.
c. To create a subset from Outliers: Select
documents you want to include in the subset and move them
to the Outliers panel. See outliers for
more information. Click Recalculate to reprocess the data set.
Choose Subset > From Outliers... Go to step 3.
- The Data Set Wizard window opens.
- Edit the name (the default is "<data set name> Subset")
and any of the settings (stopwords, stopmajor list, etc). For further
information, see Stopwords, Punctuation
Rules or Stopmajor List. You shouldn't
have to edit the Fields.
- Click Finish. The subset data set is processed and appears
in the list in the Data Set Editor window.
- To open the subset data set, choose File > Data Sets... The Data
Set Editor window opens. If the status of the subset data set is "Available",
click on it to select it, and click Open.
|