Here are the slides and the video of the presentation I gave at the Apache Lucene Eurocon 2011 in Barcelona. The talk was about how we crunch and index data using Solr, Hadoop and Hive at Trovit. I put special interest in the distributed indexing strategy.
Posts Tagged ‘Open source’
Lucene FieldCache.StringIndex and multiValued fields
Lately I’ve been doing some tests with Lucene MultiValued fields and FieldCache. I’ve load FieldCache.StringIndex of a multiValued field and I’ve seen some weird stuff happening which I think it’s worth to mention. FieldCache.StringIndex loads an int[] (order) and a String[] (lookup). The String[] contains all the terms on a field. The int[] array contains [...]
Apache Lucene EuroCon 2010
Yesterday I came back from the Lucene EuroCon 2010, wich took place in Prague. There have been many interesting talks there these days. Some of the slides are already on Slide Share. Can’t wait for the others to be uploaded. I gave a talk on Thursday about our usage of Solr at Trovit. Covered an [...]
Lucene 2.9.2 and 3.0.1 released
Lucene 2.9.2 and 3.0.1 versions have been released. Both are mainly bug fix versions from the previous ones. The main difference between 2 and 3 versions is that version 3 has no support for java 1.4 and has a more clean API as deprecated stuff has been removed. This means if you want to upgrade [...]
ElasticSearch
It has been a long time since my last post. I have been very busy so unfortunatelly, I have not had the time to write about all I wish. This week I have discovered via twitter a really interesting open source search project, ElasticSearch for the cloud. ElasticSearch has been createded by Shay Banon. It’s [...]
CloudCamp Barcelona 2009
Last Monday took place in Barcelona the first CloudCamp ever done in the city. Altough I was expecting more technical stuff it was good to be there and listen to what people have to say. The first part of the event consisted of some quick explanations from different companies related with cloud computing. Basically, were [...]
Performance measurement with JMeter 2.3.3
Last week was launched a new release of JMeter. JMeter 2.3.3 is a powerful java application designed to do web application functionality testing and performance measurement, allowing you to do powerful server stress tests. I have been doing some practices with it and I really liked the easy way you can set up a test [...]
Analyzing java heaps with jmap and jhat
Jmap and jhat are a couple of tools really useful to analyze the memory consume of a java program. Both are included in the JVM 1.6 so there is no need to install any extra stuff. Jmap allows you to create a dump of the java memory heap at any moment in the life of [...]
JAD Java Decomplier
Today I needed to check some old java source from wich only I just kept the class files. Find a java decompiler for my Ubuntu was not as easy job as I tought. Couldn’t find one in the repositories and all what I found in the network was not updated at all. JAD Java Decompiler [...]
Lucene TrieRangeQuery
Lucene TrieRangeQuery is a cool contrib in Lucene (think not yet in the official release) created by Uwe Schindler. I had heard about it before but learned about it in the LuceneMeetUp in ApacheCon EU. Uwe gave a great speach about it. As I found it a really useful feature will try to explain the [...]

