Text Search Parameters

The following API entry points provide the ability to search for records based on the text they contain:

  • GET /posts

  • GET /posts/stats

With each of these endpoints, two parameters work together to specify text patterns that determine which results will be returned:

  • keyword – a string that can specify one or more keywords

  • simple_query – a switch to enable boolean search syntax in the keyword string

Default Behavior of thekeyword Parameter

The keyword parameter is a string that can specify one or more keywords. Its default behavior is to retrieve records that contain each of the given keywords in any order. For example, the following request will retrieve posts containing the words "climate" and "change" appearing anywhere within the post:

  • GET /posts?keyword=climate+change

Results would include posts containing any of the following combinations of the specified keywords or alternate forms of those words:

  • How does the frequency of severe storms relate to climate change?

  • Bird species respond to changes in the climate.

Note that the query string syntax of the request above used a "+" character as a separator between words. Alternatively, the URL encoded space character "%20" may be used as the separator but a literal space character is not allowed in URLs.

Case Sensitivity, Punctuation, and Grammatical Forms

Terms used with the keyword parameter are case insensitive. That is to say they will find a match regardless of capitalization.

All forms of punctuation are treated as merely separators between words and are not otherwise significant to the search.

An internal dictionary tries to match a term that can appear in multiple grammatical forms. For example, the dictionary attempts to match the singular, plural, and possessive forms of a noun to the same term. For verbs, the grammatical variations of the word may include its tenses and gerund form. This dictionary works for common terms but does not fully cover less common terms.

Note that the following examples all resolve to the same series of keywords:

  • record industry associate

  • recorded: Industrial Associations

  • recording industries. Associated

  • Records, industry-associated

Boolean Search Syntax

In the "keyword=climate+change" example given above, results included posts with the words "climate" and "change" appearing anywhere in any order. We can change this default behavior with boolean search syntax to be more precise about which combinations of keywords we wish to require. This feature is enabled when the simple_query parameter is assigned the value "true".

Specific boolean search syntax features are illustrated with examples below. A quick summary of these features appears under the Boolean Operators glossary heading.

Note that the simple_query parameter modifies the behavior of the keyword parameter. Using the simple_query parameter without the keyword parameter will have no effect. When boolean search syntax is not desired, the simple_query parameter should be omitted rather than setting it to "false" or any other value. When the simple_query parameter is present, assigning it to a value other than "true" may produce unintended results.

Boolean Search Syntax: Quoted Keyword String

One of the boolean operations supported is to specify a string of terms that must appear in the given order. For example, we can limit the search to the "climate change" with this query:

  • GET /posts?keyword="climate+change"&simple_query=true

This would not match the following post from because the sequence of terms is not in the given order or is interrupted by one or more intervening terms:

  • Bird species respond to changes in the climate.

  • We are witnessing a climate of change.

It would match both of the following posts because the terms are in the correct order:

  • How does the frequency of severe storms relate to climate change?

  • Did the survey consider multiple climates? Changing environments persist.

Note that variations in capitalization, grammatical forms, and punctuation are tolerated and the order of the terms is the only restriction imposed by a quoted keyword string when using boolean search syntax.

Boolean Search Syntax: Logical OR

A logical OR is a way of specifying multiple criteria such that any one of them can trigger a match. The following example, will match any post containing at least one instance of either "climate" or "change" or any of their grammatical forms. The vertical bar "|" character represents the logical OR operation and separates the terms when boolean syntax is enabled.

  • GET /posts?keyword=climate|change&simple_query=true

Boolean Search Syntax: Logical AND

A logical AND is a way of specifying multiple criteria that must all be met in order to trigger a match. For example, the following request will retrieve posts containing both of the words "climate" and "change" appearing anywhere within the post. The plus character represents the logical AND operation and separates the terms when boolean syntax is enabled.

  • GET /posts?keyword=climate+change&simple_query=true

Note that this request looks like and, in theory, should have the same result as the default behavior request where the simple_query=true parameter is not present. In practice, they have almost but not quite the same behavior due to a known issue. As of 2022-11/22, the simple_query=true parameter includes some meta-data in the search that gets missed when not present.

Boolean Search Syntax: Parentheses

When multiple boolean operators are combined in a single request, parentheses provide a way to explicitly control the precedence of operations. To illustrate, the order of operations in the following cases could yield different results:

  • a AND ( b OR c )

  • ( a AND b ) OR c

For example, we can require either that the word "climate" appears with the word "change" or that the word "global" appears with the word "warming" with this query:

  • GET /posts?keyword=(climate+change)|(global+warming)&simple_query=true

Boolean Search Syntax: Logical NOT

The logical NOT operation can be used to exclude certain results. For example, the following request will retrieve posts containing the word "Florida" but not if the word "hurricane" is also present. Note that the "-" character represents logical NOT in this case. It is combined with the logical AND. The combination can be interpreted as "Florida" AND NOT "hurricane".

  • GET /posts?keyword=florida+-hurricane&simple_query=true

It is also possible to apply logical NOT to an entire parenthetical expression. The following example can be interpreted as "Florida" AND NEITHER "hurricane" NOR "storm".

  • GET /posts?keyword=florida+-(hurricane|storm)&simple_query=true

Note that logical NOT operates on the single expression that immediately follows it. In this sense, it is a unary prefix operator. In contrast, logical OR and logical AND are binary infix operators that operate on the two expressions that they stand between.

Last updated