A new article on indexing documents using the Xapian/Omega toolkit is now available.From the intro:
Storing and providing access to documentation and information is an ever-growing problem for many companies. There are many solutions, including wikis and structured documentation stores, but full-text indexes are often the only way to gain the information you need from a wide array of documents. Xapian is an open source tool that reads and indexes documents, including those in HTML, PDF, OpenOffice, Microsoft® Office®, and many others, and with programmable interfaces to add and extract information, including Java™ technology, allowing you to support document indexing within your IBM WebSphere®-deployed environment. Examine how to install and deploy a typical Xapian installation indexing a variety of information, then see some examples for extracting the information using the different language bindings. The process will focus on how this could be used within a typical company intranet environment. The article will also provide a quick overview of Omega, a custom tool designed to work with the Xapian infrastructure.