Lucene 2.9.2 and 3.0.1 versions have been released. Both are mainly bug fix versions from the previous ones.
The main difference between 2 and 3 versions is that version 3 has no support for java 1.4 and has a more clean API as deprecated stuff has been removed. This means if you want to upgrade your Lucene JARs to v.3 you must use at least Java 1.5 and have no deprecation warnings in you code.
More details of both releases can be found in the official announcement:
Hello Lucene users,
On behalf of the Lucene development community I would like to announce the release of Lucene Java versions 3.0.1 and 2.9.2:
Both releases fix bugs in the previous versions:
- 2.9.2 is a bugfix release for the Lucene Java 2.x series, based on Java 1.4
- 3.0.1 has the same bug fix level but is for the Lucene Java 3.x series, based on Java 5.New users of Lucene are advised to use version 3.0.1 for new developments, because it has a clean, type-safe API.
Important improvements in these releases include:
- An increased maximum number of unique terms in each index segment.
- Fixed experimental CustomScoreQuery to respect per-segment search. This introduced an API change!
- Important fixes to IndexWriter: a commit() thread-safety issue, lost document deletes in near real-time indexing.
- Bugfixes for Contrib’s Analyzers package.
- Restoration of some public methods that were lost during deprecation removal.
- The new Attribute-based TokenStream API now works correctly with different class loaders.Both releases are fully compatible with the corresponding previous versions. We strongly recommend upgrading to 2.9.2 if you are using 2.9.1 or 2.9.0; and to 3.0.1 if you are using 3.0.0.
See core changes at
http://lucene.apache.org/java/3_0_1/changes/Changes.html
http://lucene.apache.org/java/2_9_2/changes/Changes.htmland contrib changes at
http://lucene.apache.org/java/3_0_1/changes/Contrib-Changes.html
http://lucene.apache.org/java/2_9_2/changes/Contrib-Changes.htmlBinary and source distributions are available at
http://www.apache.org/dyn/closer.cgi/lucene/java/Lucene artifacts are also available in the Maven2 repository at
http://repo1.maven.org/maven2/org/apache/lucene/
Tags: Java, Lucene, Open source
It has been a long time since my last post. I have been very busy so unfortunatelly, I have not had the time to write about all I wish.
This week I have discovered via twitter a really interesting open source search project, ElasticSearch for the cloud. ElasticSearch has been createded by Shay Banon. It’s a RESTful search engine built on top of Lucene and very well prepared for high scalability. It includes shard merging, replication and much more features.
Lately I have been working a lot with search scalability and what I liked the most for the moment of ElasticSearch is that it allows 4 different types of distributed requests.
The most simple (Query and fetch) is just one request per relevant shard. Once all the requests are done, results are merged and… that’s it!
In this type of search, all fields of a document are returned to the merger for all the returned documents.
In another search type (Query then fetch, this one is not that simple), a first request is done across all shards. Here you don’t ask for the document content at the moment. Once the results are merged, you only need to ask for the whole document data of the most relevant documents, the ones you want to show.
If you have to search across lots of shards that’s definitely the way to go (the merger will just receive the fields of the important documents, wich means less data is sent across the network).
Both options present a typical problem in distributed search. The relevance is calculated relative to the shard, it’s not absolute across all of them.
To solve this, in ElasticSearch, both search options can be supplemented with an initial request. This one queries for the necessary term frequencies information to allow an “absolute relevance”.
This is not for free, you are paying with an extra trip (even it can be cached). It’s good if you can avoid that. A good way to do that is at indexing time, when you decide in wich shard a document must be added. Choosing it randomly will more or less ensure you that term frequencies won’t differ so much among shards.
Still have not had the chance to dig into the source but already have downloaded it from the git repository.
Anyone that want to share experiences with ElasticSearch is more than welcome.
Tags: Cloud computing, ElasticSearch, Java, Lucene, Open source