IN-SPIRE™ Visual Document Analysis

FAQ: How are the ThemeView™ peak labels related to the cluster labels?

The ThemeView™ landscape is created by piling up the topicality of individual documents, so you will generally see higher peaks in areas of high document density. The number, placement, and height of peaks are really only an indirect correlation to the clusters, however, since they are based strictly on the Galaxy documents underneath, not the cluster groupings. An area under the peak may, and often does, include documents from multiple clusters.

In addition, the words used to label the cluster centroids are terms with the highest frequency count only, whereas the ThemeView™ labels are the words with the highest topical content in the region. These factors help explain why the ThemeView™ peak labels often differ from cluster centroid labels.

