Welcome
Data Sets
Overview
Creating New
--ASCII Text
--XML
--Google Harvest
--Web Harvest
Settings
--Fields
--Stopwords
--Stopmajors
--Punctuation Rules
Editing
Merging
Exporting
Importing
Subsetting
Visualizations
Galaxy
--Basics
--Outliers
ThemeView
Settings
Tools
Document Viewer
Gist
Groups
--Basics
--Evidence Panel
Major Terms
Queries
Print
Probe
Time Slicer
About version 2.2
Overview
Known issues
|
Data Sets: XML Data
Follow steps 1 and 2 of the Basic Steps listed
in Making a New Data Set, choosing XML Data Set. The following screen
appears:
- Enter a Data Set Name.
- Enter the record delimiter. The delimiter is the "content"
of the XML tag, without angle brackets or slash; it will typically be
a word or words enclosed in angle brackets, and will be within the first
three lines of your XML file(s). A corresponding XML tag with a "/"
in front of the word will be near or at the end of the XML record. For
example, <DOC> occurs at the beginning of the XML file, and </DOC>
occurs near the end. The IN-SPIRE record delimiter would be DOC in this
case.
- Add XML files. Click Add... to add files (singly or severally),
or click Add From Folder... to add an entire folder full of
files. Browse to the folder you want to add, click on it to select it
and click Add. The files appear in the Source files list. To
remove a file from the Source files list, click on it, then click Remove.
- Associate source files with XSLT files.
- Click on one or more source files that you wish to associate an
XSLT file with. CTRL-A will select all source files in the list.
- Click on the XSLT button. A file open dialog opens in
the IN-SPIRE sources folder.
- Navigate to your XSLT file, if it is not in the sources folder.
- Click Set. The XLST file name appears next to the source
file with which you have associated it.
- When you're done, click Next> and go to Step 6, or to
accept all of the default settings and begin processing immediately,
click Finish.
If you start
processing before you define any fields, all of the fields in your documents
will be lumped together for the purposes of the analysis.
- The Format Fields window opens.
Notice that the Delimiters should not include the angle brackets.
For example, for the field <DOCUMENT_ID>, DOCUMENT_ID is the
field delimiter. See Defining Fields
for information on how to define fields.
Make sure that at least one of the XML fields that marked "include
in computation" actually exists in most of your documents, and
that it contains significant textual content. If it does not, the
IN-SPIRE text engine may not find enough data to create a visualization
and could fail.
- To accept the default settings on the following screens and begin
processing immediately, click Finish and go to Start Processing,
otherwise go to Optional Settings.
Start Processing
The Processing dialog opens, informing you that the data set is being
processed. Click OK. The data set appears in the list of data
sets in the Data Set Editor window. You can monitor its status as it
is processed by clicking ,
the Refresh button, at the top of the Data Set Editor window.
Special Considerations for Processing XML Documents
XML/DTD Files
DTD files are not currently supported. If your XML files reference
DTD's, although the XML files will be processed, the DTD references
and DTD files will be ignored during processing.
Your
XML source files must be well-formed. For the rules which govern whether
XML is well-formed or not, see any XML reference book.
Unsupported entity markers
The entity markers [ and ] are not supported at
this time.
|