Welcome to Lucene Tutorial.com

Lucene is an open-source full-text search library which makes it easy to add search functionality to an application or website.

The goal of Lucene Tutorial.com is to provide a gentle introduction into Lucene.

First-time Visitors

If this is your first-time here, you most probably want to go straight to the 5 minute introduction to Lucene.

Popular books related to Lucene and search

 
 
 

What's New

Added new article on Lucene's Query Syntax
Fixed broken links to Scoring docs
Added instructions for compiling and running HelloLucene.java for readers who are new to Java.
TextFileIndexer.java and HelloLucene.java has been updated to the recently released Lucene 2.9.1.
Fixed broken link to Lucene Scoring docs.

My recent blog posts about Lucene

Upgrading to Lucene 3.0 - Thu, 29 Apr 2010 00:58:42 +0000

Recently upgraded a 3-year old app from Lucene 2.1-dev to 3.0.1. Some random thoughts to the evolution of the Lucene API over the past 3 years: I miss Hits Sigh. Hits has been deprecated for awhile now, but with 3.0 its gone. And I have to say its a pain that it is. Where I used to pass [...]

Read more about Upgrading to Lucene 3.0

Mapping neighborhoods to street addresses via geocoding - Mon, 19 Apr 2010 22:27:34 +0000

As far as I know, none of the geocoders consistently provide neighborhood data given a street address. Useful information when consulting the gods at google proves elusive too. Here’s a step-by-step guide to obtaining neighborhood names for your street addresses (on Ubuntu). 0. Geocode your addresses if necessary using Yahoo, MapQuest or Google geocoders. (this means [...]

Read more about Mapping neighborhoods to street addresses via geocoding

Average length of a URL - Fri, 06 Nov 2009 23:48:39 +0000

I’ve always been curious what the average length of a URL is, mostly when approximating memory requirements of storing URLs in RAM. Well, I did a dump of the DMOZ URLs, sorted and uniq-ed the list of URLs. Ended up with 4074300 unique URLs weighing in at 139406406 bytes, which approximates to 34 characters per URL.

Read more about Average length of a URL

Idea: 2-stage recovery of corrupt Solr/Lucene indexes - Thu, 10 Sep 2009 02:43:21 +0000

I was recently onsite with a client who happened to have a corrupt Solr/Lucene index. The CheckIndex tool (lucene 2.4+) diagnosed the problem, and gave the option of fixing it. Except… fixing the index in this case meant losing the corrupt segment, which also happened to be the one containing over 90% of documents. Because Solr [...]

Read more about Idea: 2-stage recovery of corrupt Solr/Lucene indexes

Using Hadoop IPC/RPC for distributed applications - Mon, 02 Jun 2008 18:59:15 +0000

Hadoop is growing to be a pretty large framework – release 0.17.0 has 483 classes! Previously, I’d written about Hadoop SequenceFile. SequenceFile is part of the org.apache.hadoop.io package, the other notable useful classes in that package being ArrayFile and MapFile which are persistent array and dictionary data structures respectively. About Hadoop IPC Here, I’m going to introduce the [...]

Read more about Using Hadoop IPC/RPC for distributed applications