Skip to Main Content U.S. Department of Energy
IN-SPIRE™ Visual Document Analysis

FAQ: What do I have to tell it about the format of my documents?

The only thing that IN-SPIRE™ must know about your document collection is how to identify the beginning of each document. For example, if you had 1000 news articles and they were each stored in a file on disk, you would identify the files to IN-SPIRE™ and specify the string of characters that occur at the beginning of each document. If you have structured fields such as titles or dates in your documents, you may identify them also, so that during analysis they may be queried separately from other document content. For example, if a field were defined for “COUNTRY:” IN-SPIRE can automatically categorize documents into separate bins for each unique country represented in the dataset.

Return to the FAQ page