Welcome

Data Sets
Overview
Creating New
--ASCII Text
--XML
--Google Harvest
--Web Harvest
Settings
--Fields
--Stopwords
--Stopmajors
--Punctuation Rules
Editing
Merging
Exporting
Importing
Subsetting

Visualizations
Galaxy
--Basics
--Outliers
ThemeView
Settings

Tools
Document Viewer
Gist
Groups
--Basics
--Evidence Panel
Major Terms
Queries
Print
Probe
Time Slicer

About version 2.2
Overview
Known issues

Data Sets: XML Data

Follow steps 1 and 2 of the Basic Steps listed in Making a New Data Set, choosing XML Data Set. The following screen appears:

  1. Enter a Data Set Name.
  2. Enter the record delimiter. The delimiter is the "content" of the XML tag, without angle brackets or slash; it will typically be a word or words enclosed in angle brackets, and will be within the first three lines of your XML file(s). A corresponding XML tag with a "/" in front of the word will be near or at the end of the XML record. For example, <DOC> occurs at the beginning of the XML file, and </DOC> occurs near the end. The IN-SPIRE record delimiter would be DOC in this case.
  3. Add XML files. Click Add... to add files (singly or severally), or click Add From Folder... to add an entire folder full of files. Browse to the folder you want to add, click on it to select it and click Add. The files appear in the Source files list. To remove a file from the Source files list, click on it, then click Remove.
  4. Associate source files with XSLT files.
    • Click on one or more source files that you wish to associate an XSLT file with. CTRL-A will select all source files in the list.
    • Click on the XSLT button. A file open dialog opens in the IN-SPIRE sources folder.
    • Navigate to your XSLT file, if it is not in the sources folder.
    • Click Set. The XLST file name appears next to the source file with which you have associated it.
  5. When you're done, click Next> and go to Step 6, or to accept all of the default settings and begin processing immediately, click Finish.
    If you start processing before you define any fields, all of the fields in your documents will be lumped together for the purposes of the analysis.
  6. The Format Fields window opens.

    Notice that the Delimiters should not include the angle brackets. For example, for the field <DOCUMENT_ID>, DOCUMENT_ID is the field delimiter. See Defining Fields for information on how to define fields.

    danger icon Make sure that at least one of the XML fields that marked "include in computation" actually exists in most of your documents, and that it contains significant textual content. If it does not, the IN-SPIRE text engine may not find enough data to create a visualization and could fail.

  7. To accept the default settings on the following screens and begin processing immediately, click Finish and go to Start Processing, otherwise go to Optional Settings.

Start Processing

The Processing dialog opens, informing you that the data set is being processed. Click OK. The data set appears in the list of data sets in the Data Set Editor window. You can monitor its status as it is processed by clicking , the Refresh button, at the top of the Data Set Editor window.

Special Considerations for Processing XML Documents

XML/DTD Files

DTD files are not currently supported. If your XML files reference DTD's, although the XML files will be processed, the DTD references and DTD files will be ignored during processing.

danger iconYour XML source files must be well-formed. For the rules which govern whether XML is well-formed or not, see any XML reference book.

Unsupported entity markers

The entity markers &lsqb; and &rsqb; are not supported at this time.