Lucene Query Syntax
Lucene has a custom query syntax for querying its indexes. Here are some query examples demonstrating the query syntax.
Search for word "foo" in the title field.
Search for phrase "foo bar" in the title field.
Search for phrase "foo bar" in the title field AND the phrase "quick fox" in the body field.
Search for either the phrase "foo bar" in the title field AND the phrase "quick fox" in the body field, or the word "fox" in the title field.
Search for word "foo" and not "bar" in the title field.
Search for any word that starts with "foo" in the title field.
Search for any word that starts with "foo" and ends with bar in the title field.
Note that Lucene doesn't support using a * symbol as the first character of a search.
Lucene supports finding words are a within a specific distance away.
Search for "foo bar" within 4 words from each other.
Note that for proximity searches, exact matches are proximity zero, and word transpositions (bar foo) are proximity 1.
A query such as "foo bar"~10000000 is an interesting alternative to foo AND bar.
Whilst both queries are effectively equivalent with respect to the documents that are returned, the proximity query assigns a higher score to documents for which the terms foo and bar are closer together.
The trade-off, is that the proximity query is slower to perform and requires more CPU.
Solr DisMax and eDisMax query parsers can add phrase proximity matches to a user query.
Range Queries allow one to match documents whose field(s) values are between the lower and upper bound specified by the Range Query. Range Queries can be inclusive or exclusive of the upper and lower bounds. Sorting is done lexicographically.
Solr's built-in field types are very convenient for performing range queries on numbers without requiring padding.
Query-time boosts allow one to specify which terms/clauses are "more important". The higher the boost factor, the more relevant the term will be, and therefore the higher the corresponding document scores.
A typical boosting technique is assigning higher boosts to title matches than to body content matches:
You should carefully examine explain output to determine the appropriate boost weights.
The official docs for the query parser syntax are here: http://lucene.apache.org/java/3_5_0/queryparsersyntax.html
The query syntax has not changed significantly since Lucene 1.3 (it is now 3.5.0).
Queries can be parsed by constructing a QueryParser object and invoking the parse() method.
Query q = new QueryParser(Version.LUCENE_CURRENT, "title", analyzer).parse(querystr);
Programmatic construction of queries
Lucene queries can also be constructed programmatically. This can be really handy at times. Besides, there are some queries which are not possible to construct by parsing.
Available query objects as of 3.4.0 are:
Use the BooleanQuery object to join and nest queries.
These classes are part of the org.apache.lucene.search package.
Here's a simple example:
String id = "123456";
BooleanQuery bq = new BooleanQuery();
Query query = qp.parse(str);
bq.add(new TermQuery(new Term("id", id), BooleanClause.Occur.MUST_NOT);
blog comments powered by Disqus