Welcome to Lucene Tutorial.com
Lucene is an open-source full-text search library which makes it easy to add search functionality to an application or website.The goal of Lucene Tutorial.com is to provide a gentle introduction into Lucene.
First-time Visitors
If this is your first-time here, you most probably want to go straight to the 5 minute introduction to Lucene.
Popular books related to Lucene and search
![]() |
![]() |
![]() |
![]() |
What's New
My recent blog posts about Lucene
Recently upgraded a 3-year old app from Lucene 2.1-dev to 3.0.1. Some random thoughts to the evolution of the Lucene API over the past 3 years: I miss Hits Sigh. Hits has been deprecated for awhile now, but with 3.0 its gone. And I have to say its a pain that it is. Where I used to pass [...]
As far as I know, none of the geocoders consistently provide neighborhood data given a street address. Useful information when consulting the gods at google proves elusive too. Here’s a step-by-step guide to obtaining neighborhood names for your street addresses (on Ubuntu). 0. Geocode your addresses if necessary using Yahoo, MapQuest or Google geocoders. (this means [...]
Read more about Mapping neighborhoods to street addresses via geocoding
I’ve always been curious what the average length of a URL is, mostly when approximating memory requirements of storing URLs in RAM. Well, I did a dump of the DMOZ URLs, sorted and uniq-ed the list of URLs. Ended up with 4074300 unique URLs weighing in at 139406406 bytes, which approximates to 34 characters per URL.
I was recently onsite with a client who happened to have a corrupt Solr/Lucene index. The CheckIndex tool (lucene 2.4+) diagnosed the problem, and gave the option of fixing it. Except… fixing the index in this case meant losing the corrupt segment, which also happened to be the one containing over 90% of documents. Because Solr [...]
Read more about Idea: 2-stage recovery of corrupt Solr/Lucene indexes
Hadoop is growing to be a pretty large framework – release 0.17.0 has 483 classes! Previously, I’d written about Hadoop SequenceFile. SequenceFile is part of the org.apache.hadoop.io package, the other notable useful classes in that package being ArrayFile and MapFile which are persistent array and dictionary data structures respectively. About Hadoop IPC Here, I’m going to introduce the [...]
Read more about Using Hadoop IPC/RPC for distributed applications



