Refining Phrase Queries

The Query Tool offers two powerful features in the Words in Document tab.  These are

 

 

Both of these features pertain to phrase queries, for example, to elements of a query that are contained within a pair of single or double quote marks.

Order-Independent Approximate Phrase Queries

In prior versions of IN-SPIRE, you could specify an asterisk (*) character between words in a query to stand for zero or one occurrences of any word.  An optional one or two-digit quantifier, N, could follow the asterisk, indicating that zero to N words could appear between the left-hand and right-hand words.  Thus, the phrase query term

 

“Brown *3 cow”

 

would match

 

“Brown cow”

“Brown old cow”

“Brown and black cow”

“Brown and white old cow”

 

and so on.  You could think of these phrase query terms as order-dependent, approximate phrase query terms.

 

Approximate phrase queries also include an order-independent form.  If the first non-white space character following the opening single or double quote mark of a phrase query term is a tilde (~), then the following words in the query, up to the closing quote mark, are searched for in an order-independent manner.  All of the words in the phrase must be present, along with a maximum specified or implied number of extra words, in any order, within a run of text in order to constitute a match.  The number of extra words helps determine the range of the query term (see below).  We call this form of approximate phrase query term an order-independent proximity query term.

 

The following are examples of valid order-independent proximity query terms:

 

“~orbit planet sun”

“~10 mayor governor president king”

“ ~3 University Ohio”

 

Note:  You are not allowed to mix order-dependent and order-independent notation within the same query term.  Any phrase query term that starts with a tilde and then has an asterisk between words is considered a syntax error.  Any phrase term that contains a tilde anywhere except at the beginning of the phrase also constitutes a syntax error.  

 

You can use a trailing, or stemming, asterisk on any or all of the words in these query terms.  For example,

 

“ ~ bomb* buil*”

 

would match any of the following:

 

“building was bombed”

“bomber was trapped in the building”

“build a bomb”

“bombs have been built”

Defining the Range

The range of an order-independent proximity query term is defined as the number of individual words in the term plus the specified number of extra words.  If the number of extra words is omitted, it defaults to five.  Thus, the query term

 

“~2 cattle alfalfa water”

 

has a range of 2 + 3 = 5, and the query term

 

“~crop rain damage cost”

 

has a range of 5 + 4 = 9.

 

The range specifies the maximum length of a run of text that can satisfy the query.  Thus, for example, the text

 

“The cattle became bloated after consuming too much water and alfalfa.”

 

would not match the first example above, because there are ten words, inclusive, between “cattle” and “alfalfa” and the query term has a range of only five.  However, the query term

 

“~7 cattle alfalfa water”

 

would match, because it has a range of 7 + 3 = 10.

Highlighting in the Document Viewers

Once you run a query, IN-SPIRE will highlight text in the Document Viewer that matches an order-independent proximity query term.  Each section of text highlighted will be the longest section that matches the query term within its range, beginning with the first matching word and ending with the last.  For example, if we apply the query term

 

“~ the moon”

 

to the text

 

“and the cow jumped over the moon”

 

the part of the phrase that will be highlighted is shown in red:

 

and  the cow jumped over the moon

 

This is because the first “the” is within the range, which is within seven words of “moon” so the matching text includes everything from the first “the” to “moon” inclusive.  On the other hand, the query term

 

“~2 the moon”

 

would result in the following highlighting:

 

and the cow jumped over the moon

 

This is because this query term has a range of four, and the first “the” is too far away.  So the second “the” is chosen to begin the matching text.

 

This behavior is consistent with the way the document viewer’s highlight function treats ordinary, order-dependent phrase query terms that contain asterisks between words.

 

As always, a complete Words in Document query can consist of any combination of single-word terms, phrase terms (enclosed in single or double quotes), and order-independent proximity query phrase terms (also enclosed in single or double quotes).  Boolean logic operators AND, OR and NOT can be used between any of these terms to specify Boolean logic operations; parentheses may be used to indicate the precedence of operations.

Choice Lists

Within a phrase or order-independent proximity query term, you can now specify a choice list of words, any one of which will match a word in the text at the corresponding position in the phrase.  To do this, enclose the list of words in parentheses.  Use any non-alphanumeric character or white space to separate terms in the list. Remember to enclose the entire phrase query term in single or double quotes.

 

Examples:

 

“New York (Yankees Mets Highlanders) baseball team”

 

will match “New York Yankees baseball team” or “New York Mets baseball team” or “New York Highlanders baseball team”, but not “New York State baseball team”.

 

“~ Oklahoma bomb* (facility=building=courthouse) federal”

 

will match, for example, any of these phrases:

 

“bomb went off at the Oklahoma City Federal Building”

“Oklahoma federal facility was bombed”

“bombing in Oklahoma, federal officials at the courthouse”

 

Note that choice lists can be used in ordinary phrase query terms as well as in order-independent proximity query terms.

 

When counting words to determine the range of the order-independent proximity query term, a choice list counts as one word, regardless of how many words it contains.

 

The stemming asterisk technique can be used with words in a choice list, for example,

 

“Federal (officer* official judge*)”

 

You can use more than one choice list in a phrase query term.  Examples:

 

“(men women) playing (baseball softball basketball soccer)”

“~12 “telescope (Hubble Palomar) photo* (asteroid comet meteor)”

 

 

4/3/06