Your First Lucene Project

1. Start with your search results page

This might seem counter-intuitive - why at the end, not the beginning, like at the database?

The answer is simple: when you've figured out what search experience you're trying to produce, half the battle is won.

Pay attention to things like what data is to be displayed and how you'd like the results ranked.

2. Map your application to the Lucene model

From the search results page, determine what steps need to be taken to get your data into Lucene.

First, determine what Fields there are in a Document.

Then, if your data is in a database for example, you would determine which database tables and columns need to be accessed, and what SQL select statements need to executed.

3. Write the indexing code

Whether its files or a database that needs to be indexed, start by writing your indexer. Start out simple, don't worry about efficiency or performance for now.

When the first index has been created, browse the index using Luke, make sure it looks right, i.e. all the fields are there, all documents that should be indexed have been indexed, etc.

4. Write the searching code, in a separate class

Its always a good idea to separate the searching from the indexing. The searcher should accept a query string, and return a list of hits.

After you've implemented the most basic functionality, add functionality such as limiting the number of results displayed per page and moving between pages. Do add some field boosts where you see fit to emphasize certain fields over others.

5. Implement additional search functionality

By now, you have a really basic search app which takes a query from the user and spits out a list of results. You'll now want to implement any required search functionality such as filtering by permissions, sorting by date, etc.

6. Ensure your search results make sense

Since you don't want to look silly in front of your boss, quickly run through some sample queries, ensuring that hits are returning when they should, and that the order in which results are ranked makes sense to the user. You shouldn't have to go in-depth into query explanations at this stage.


blog comments powered by Disqus