<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Marc Sturlese &#187; ApacheCon</title>
	<atom:link href="http://www.marcsturlese.com/tag/apachecon/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.marcsturlese.com</link>
	<description>Life, code and stuff</description>
	<lastBuildDate>Sun, 27 Jun 2010 14:45:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Lucene TrieRangeQuery</title>
		<link>http://www.marcsturlese.com/2009/04/08/lucene-trierangequery/</link>
		<comments>http://www.marcsturlese.com/2009/04/08/lucene-trierangequery/#comments</comments>
		<pubDate>Wed, 08 Apr 2009 21:50:43 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[ApacheCon]]></category>
		<category><![CDATA[TrieRangeQuery]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=97</guid>
		<description><![CDATA[Lucene TrieRangeQuery is a cool contrib in Lucene (think not yet in the official release) created by Uwe  Schindler. I had heard about it before but learned about it in the LuceneMeetUp in ApacheCon EU. Uwe gave a great speach about it. As I found it a really useful feature will try to explain the [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Lucene TrieRangeQuery</strong> is a cool contrib in <strong>Lucene</strong> (think not yet in the official release) created by Uwe  Schindler. I had heard about it before but learned about it in the LuceneMeetUp in ApacheCon EU. Uwe gave a great speach about it. As I found it a really useful feature will try to explain the basics.</p>
<p><strong>TrieRangeQuery</strong> mainly sort out some RangeQuery problems:</p>
<ul>
<li>Tipical RangeQuery can end in <strong>TooManyClausesException</strong> if our ranges are so large.</li>
</ul>
<ul>
<li>Tipical RangeQuery or even ConstantScoreRangeQuery are slow if have to classify using large ranges or the index is huge.</li>
</ul>
<p>To explain it in an easy way, what <strong>TrieRangeQuery </strong>do is to search the data values skipping the less relevant &#8220;digits&#8221; in function of a precision parameter.</p>
<p>Let&#8217;s say for example we need to classify thousands of numbers of 6 figures. This could be a slow process using ConstantScoreRangeQuery in a huge index, not with<strong> TrieRangeQuery</strong>. Ranges will be divided recurively in function of  a precision parámeter (set at index time). Numbers from the middle of the range will be classified using the minimum precision value while numbers from extrems will use a higher precision. This will make the query run extremely much faster.</p>
<p>Depending on the level of presicionStep parameter given at index time we will be able to search with more or less precision.  The more precision marging we choose the more the lucene document will occuppy. It is due to we will have to index the field more times with the different precisions.</p>
<p>We need to index data in a special way to be able to search it using <strong>Lucene TrieRangeQuery</strong>. We must index our fields using <strong>TrieUtils</strong>. We can index numbers directly. It supports java signed int, long, float, double. There&#8217;s no loss of precision for doubles or floats. There&#8217;s no round for their creation, instead a long/int representation is used for cents.<br />
Indexing numbers with <strong>TrieUtils</strong> will make us forget about maual padding.<br />
We can index Dates aswell (from java timestamps data type).</p>
<p>As seen, <strong>Lucene TrieRangeQuery</strong> is totally a step forward for <strong>Lucene</strong> queries <strong>scalability</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2009/04/08/lucene-trierangequery/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ApacheCon Europe 2009</title>
		<link>http://www.marcsturlese.com/2009/04/01/apachecon-europe-2009/</link>
		<comments>http://www.marcsturlese.com/2009/04/01/apachecon-europe-2009/#comments</comments>
		<pubDate>Wed, 01 Apr 2009 21:55:54 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Random]]></category>
		<category><![CDATA[ApacheCon]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Pig]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=80</guid>
		<description><![CDATA[Last week I had the chance to go to the ApacheCon Europe 2009. The event took place in Mövenpick Hotel, Amsterdam. I had a really good time in there. Was good to share use cases and experiences in person with people who I had just spoken with in forums. I spend the first two days [...]]]></description>
			<content:encoded><![CDATA[<p>Last week I had the chance to go to the <a title="ApacheCon Europe 2009" href="http://www.eu.apachecon.com/c/aceu2009/"><strong>ApacheCon Europe 2009</strong></a>. The event took place in Mövenpick Hotel, Amsterdam. I had a really good time in there.</p>
<p>Was good to share use cases and experiences in person with people who I had just spoken with in forums.<br />
I spend the first two days in the <strong>hackathon</strong> doing some research and test of different ASF projects. Put special interest in <a title="Pig" href="http://hadoop.apache.org/pig/" target="_blank"><strong>Pig</strong></a>.</p>
<p>There were really interesting chats. I found specially great <a title="Lucene mahout" href="http://lucene.apache.org/mahout/" target="_blank"><strong>Mahout</strong></a> project. I had discovered it in <strong>ApacheCon</strong> 2008 in New Orleans, I almost just heard about it in there but paid more atention this time and looks full of possibilities. It is used for machine learning and runs under <a title="Lucene" href="http://hadoop.apache.org/" target="_blank"><strong>Hadoop</strong></a>.<br />
Was also good to get some info about Servlet 3.0 and learn about servlets doFilter function and some other stuff.<br />
<a title="HBase" href="http://hadoop.apache.org/hbase/" target="_blank"><strong>HBase</strong> </a>is another project I was interested in. Looks good to be used as a &#8220;data warehouse&#8221; but seems really difficult (at least at first impression) to deal with the stored data.</p>
<p>Meetups were so good too. There was a presentation about the new <a title="Lucene" href="http://lucene.apache.org/java/docs/" target="_blank"><strong>Lucene</strong></a> contrib <strong>TrieRangeQuery</strong>. It is still not available in the official release but you can use it graving a nightly build. In the next few days I will try to write with more detail about this and other presented projects.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2009/04/01/apachecon-europe-2009/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
