Life, code and stuff
Posted by Marc Sturlese

23 Apr 09 JAD Java Decomplier

Today I needed to check some old java source from wich only I just kept the class files.
Find a java decompiler for my Ubuntu was not as easy job as I tought. Couldn’t find one in the repositories and all what I found in the network was not updated at all.
JAD Java Decompiler is definitely not new stuff but it is really easy to use and did pretty good job for me. The problem was that almost all links guided me to http://www.kpdus.com but the software is not available in there anymore.
In the end I found it not just for Ubuntu but for other platforms aswell.
I leave here the JAD version for Ubuntu (and other linux distributions) that worked for me.

Tags: , , , , ,

Posted by Marc Sturlese

08 Apr 09 Lucene TrieRangeQuery

Lucene TrieRangeQuery is a cool contrib in Lucene (think not yet in the official release) created by Uwe  Schindler. I had heard about it before but learned about it in the LuceneMeetUp in ApacheCon EU. Uwe gave a great speach about it. As I found it a really useful feature will try to explain the basics.

TrieRangeQuery mainly sort out some RangeQuery problems:

  • Tipical RangeQuery can end in TooManyClausesException if our ranges are so large.
  • Tipical RangeQuery or even ConstantScoreRangeQuery are slow if have to classify using large ranges or the index is huge.

To explain it in an easy way, what TrieRangeQuery do is to search the data values skipping the less relevant “digits” in function of a precision parameter.

Let’s say for example we need to classify thousands of numbers of 6 figures. This could be a slow process using ConstantScoreRangeQuery in a huge index, not with TrieRangeQuery. Ranges will be divided recurively in function of  a precision parámeter (set at index time). Numbers from the middle of the range will be classified using the minimum precision value while numbers from extrems will use a higher precision. This will make the query run extremely much faster.

Depending on the level of presicionStep parameter given at index time we will be able to search with more or less precision.  The more precision marging we choose the more the lucene document will occuppy. It is due to we will have to index the field more times with the different precisions.

We need to index data in a special way to be able to search it using Lucene TrieRangeQuery. We must index our fields using TrieUtils. We can index numbers directly. It supports java signed int, long, float, double. There’s no loss of precision for doubles or floats. There’s no round for their creation, instead a long/int representation is used for cents.
Indexing numbers with TrieUtils will make us forget about maual padding.
We can index Dates aswell (from java timestamps data type).

As seen, Lucene TrieRangeQuery is totally a step forward for Lucene queries scalability.

Tags: , , , ,

Posted by Marc Sturlese

01 Apr 09 ApacheCon Europe 2009

Last week I had the chance to go to the ApacheCon Europe 2009. The event took place in Mövenpick Hotel, Amsterdam. I had a really good time in there.

Was good to share use cases and experiences in person with people who I had just spoken with in forums.
I spend the first two days in the hackathon doing some research and test of different ASF projects. Put special interest in Pig.

There were really interesting chats. I found specially great Mahout project. I had discovered it in ApacheCon 2008 in New Orleans, I almost just heard about it in there but paid more atention this time and looks full of possibilities. It is used for machine learning and runs under Hadoop.
Was also good to get some info about Servlet 3.0 and learn about servlets doFilter function and some other stuff.
HBase is another project I was interested in. Looks good to be used as a “data warehouse” but seems really difficult (at least at first impression) to deal with the stored data.

Meetups were so good too. There was a presentation about the new Lucene contrib TrieRangeQuery. It is still not available in the official release but you can use it graving a nightly build. In the next few days I will try to write with more detail about this and other presented projects.

Tags: , , , , , , ,