<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Marc Sturlese</title>
	<atom:link href="http://www.marcsturlese.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.marcsturlese.com</link>
	<description>Life, code and stuff</description>
	<lastBuildDate>Tue, 29 Nov 2011 22:19:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Scaling search at Trovit with Solr and Hadoop</title>
		<link>http://www.marcsturlese.com/2011/11/29/scaling-search-at-trovit-with-solr-and-hadoop/</link>
		<comments>http://www.marcsturlese.com/2011/11/29/scaling-search-at-trovit-with-solr-and-hadoop/#comments</comments>
		<pubDate>Tue, 29 Nov 2011 22:19:18 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[EuroCon]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hive]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=228</guid>
		<description><![CDATA[Here are the slides and the video of the presentation I gave at the Apache Lucene Eurocon 2011 in Barcelona. The talk was about  how we crunch and index data using Solr, Hadoop and Hive at Trovit. I put special interest in the distributed indexing strategy.]]></description>
			<content:encoded><![CDATA[<p>Here are the <a title="solr hadoop" href="http://www.lucidimagination.com/sites/default/files/file/Eurocon2011/MarcSturleseTrovit.pdf" target="_blank">slides</a> and the <a title="solr hadoop" href="http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011/scaling-search-trovit-solr-and-hadoop" target="_blank">video</a> of the presentation I gave at the <a title="lucene eurocon" href="http://2011.lucene-eurocon.org/" target="_blank">Apache Lucene Eurocon 2011</a> in Barcelona. The talk was about  how we crunch and index data using <strong>Solr</strong>, <strong>Hadoop</strong> and Hive at Trovit. I put special interest in the distributed indexing strategy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2011/11/29/scaling-search-at-trovit-with-solr-and-hadoop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using LZO Compression with Hadoop and Snow Leopard</title>
		<link>http://www.marcsturlese.com/2011/03/12/using-lzo-compression-with-hadoop-and-snow-leopard/</link>
		<comments>http://www.marcsturlese.com/2011/03/12/using-lzo-compression-with-hadoop-and-snow-leopard/#comments</comments>
		<pubDate>Sat, 12 Mar 2011 14:50:49 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[CDH]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[LZO]]></category>
		<category><![CDATA[mac]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=199</guid>
		<description><![CDATA[There are a couple of very good sites (kevinweil, hadoop-gpl-compression) about how to set up LZO compression on Hadoop 0.20 so I won&#8217;t go on detail here. Just wanted to mention a couple of problems I had due to be using Snow Leopard 10.6. First install the LZO libraries for Mac OS X from the [...]]]></description>
			<content:encoded><![CDATA[<p>There are a couple of very good sites (<a title="hadoop-lzo" href="https://github.com/kevinweil/hadoop-lzo" target="_blank">kevinweil</a>, <a title="hadoop-lzo" href="http://code.google.com/a/apache-extras.org/p/hadoop-gpl-compression/wiki/FAQ" target="_blank">hadoop-gpl-compression</a>) about how to set up <strong>LZO</strong> compression on <strong>Hadoop 0.20</strong> so I won&#8217;t go on detail here. Just wanted to mention a couple of problems I had due to be using Snow Leopard 10.6.</p>
<p>First install the <strong>LZO</strong> libraries for Mac OS X from the tarball:</p>
<p><em>tar -xzf lzo-2.04.tar.gz<br />
env CFLAGS=&#8221;-arch x86_64&#8243; ./configure &#8211;build=x86_64-darwin &#8211;enable-shared &#8211;disable-asm &#8211;prefix=/path_to_lzo-2.04/<br />
make; make install</em></p>
<p>The problem comes now. You have to compile the <strong>LZO</strong> compression for <strong>Hadoop</strong>:</p>
<p><em>env JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/ C_INCLUDE_PATH=/path_to_lzo-2.04/include LIBRARY_PATH=/path_to_lzo-2.04/lib CFLAGS=&#8221;-arch x86_64&#8243; ant clean compile-native test tar</em></p>
<p>And this error happens:<br />
<a href="http://www.marcsturlese.com/wp-content/images/lzo.png"><img class="aligncenter size-full wp-image-213" title="hadoop-lzo" src="http://www.marcsturlese.com/wp-content/images/lzo.png" alt="" width="552" height="165" /></a></p>
<p>This is because the soft link that points to the java headers on Snow Leopard is broken and the headers are nowhere. To get the proper ones you have to install the Mac OS X development tools which contains them. Then you just have to remove the broken soft link and create the correct one:</p>
<p><em>ln -s /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/inclue -&gt; /Developer/SDKs/MacOSX10.6.sdk/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Headers</em></p>
<p>Once compiled you&#8217;ll have the jar file and the compiled <strong>LZO</strong> libraries (which you&#8217;ll have to add to the JAVA_LIBRARY_PATH on the hadoop-env.sh path) in <em>~./kevinweil-hadoop-lzo-0e70051/build/native/Mac_OS_X-x86_64-64</em></p>
<p><strong>Hadoop</strong> native libraries are needed too. The default compiled ones doesn&#8217;t work on Mac OS X so you will have to compile them too. If the headers are properly set, no problems should happen:</p>
<p><em>env JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home CFLAGS=&#8221;-arch x86_64&#8243; LDFLAGS=-L/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Libraries ANT_OPTS=-d64 ant clean compile-native</em></p>
<p>The compiled native libraries (you&#8217;ll have to add them to the JAVA_LIBRARY_PATH too) will be created in <em>~./hadoop-0.20.2+737/build/native/Mac_OS_X-x86_64-64/lib</em></p>
<p>Now just have to set properly the libraries paths and configure <strong>Hadoop</strong> to use the <strong>LZO</strong> compression as explained on the mentioned sites. Then, you&#8217;re done!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2011/03/12/using-lzo-compression-with-hadoop-and-snow-leopard/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Running syslog-ng on Snow Leopard</title>
		<link>http://www.marcsturlese.com/2010/10/18/running-syslog-ng-on-snow-leopard/</link>
		<comments>http://www.marcsturlese.com/2010/10/18/running-syslog-ng-on-snow-leopard/#comments</comments>
		<pubDate>Sun, 17 Oct 2010 23:40:57 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[OS]]></category>
		<category><![CDATA[mac]]></category>
		<category><![CDATA[Snow Leopard]]></category>
		<category><![CDATA[syslog-ng]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=177</guid>
		<description><![CDATA[Last week I switched from Ubuntu to Snow Leopard. The only thing that took me a while to get it to work was the syslog-ng. I normally use it because I like to send the logs from my java apps to the syslog and deal with them from there. With Ubuntu, I just had to [...]]]></description>
			<content:encoded><![CDATA[<div id="_mcePaste">Last week I switched from Ubuntu to Snow Leopard. The only thing that took me a while to get it to work was the syslog-ng. I normally use it because I like to send the logs from my java apps to the syslog and deal with them from there. With Ubuntu, I just had to use apt-get and tell the syslog-ng conf file to accept data from udp (it&#8217;s what log4j uses to send data to the syslog). This process was a little bit more tedious on mac OS.</div>
<div id="_mcePaste">What I have done is to leave the default system log and use syslog-ng just to recive messages from my apps (wich will write to the local0).</div>
<div id="_mcePaste">First of all I had to install macports (version 1.9.1). Once done it, I could install syslog-ng from there just typing a single command:</div>
<div id="_mcePaste"><strong>port install syslog-ng</strong></div>
<div id="_mcePaste">After the installation process everything seems to be ok but if you try to run the syslog-ng it won&#8217;t work (at least using the <strong>3.0.8 version</strong>).</div>
<div id="_mcePaste">The first error I got was that there was no conf file:</div>
<p><img class="alignnone" style="border: 0px initial initial;" src="http://www.marcsturlese.com/wp-content/images/syslog/syslog-no_conf.png" alt="" width="544" height="34" /></p>
<p>Just renaming the <strong>syslog-ng.conf-dist</strong> file to <em><strong><span style="font-style: normal;">syslog-ng.conf</span></strong> </em>will fix this issue but errors will keep coming:</p>
<p><img class="alignnone" src="http://www.marcsturlese.com/wp-content/images/syslog/syslog-version_error.png" alt="" width="546" height="153" /></p>
<p>Here we just have to add an @ at the beginning of the line zero just before the version number.<br />
So, change <strong>version 3.0</strong> to <strong>@version: 3.0</strong><br />
<strong></strong>(Note that &#8216;:&#8217; have to be added aswell)</p>
<p>Once this is done, syslog-ng won&#8217;t start yet. Now will give us a warning (but won&#8217;t start due to other errors that will mention later).</p>
<p><img class="alignnone" style="border: 0px initial initial;" src="http://www.marcsturlese.com/wp-content/images/syslog/syslog-obs.png" alt="" width="545" height="45" /></p>
<p>To make the warn disapear, as the message says, we have to change the line:<br />
<strong> options { long_hostnames(off); sync(0); };</strong><br />
for<br />
<strong> options { long_hostnames(off); flush_lines(0); };</strong></p>
<p>We are not done yet, run the daemon again and:</p>
<p><img class="alignnone" style="border: 0px initial initial;" title="syslog-ng" src="http://www.marcsturlese.com/wp-content/images/syslog/syslog-already_in_use.png" alt="" width="548" height="75" /></p>
<p>To fix this, I created a new destination as the one by default seems to be used by the default system log in the OS.<br />
Comment the line:<br />
<strong> destination syslog { file(&#8220;/var/log/syslog&#8221;); };</strong><br />
and add:<br />
<strong> destination d_syslog { file(&#8220;/var/log/syslog.log&#8221;); };</strong></p>
<p>After changing the destination we have to do some more changes, otherwise the conf file will have inconsistencies and the execution will end up with more errors.<br />
We need to apply the syslog filter to the new destination we have created. We do that changing:<br />
<strong> log { source(src); filter(f_syslog); destination(syslog); };</strong><br />
for<br />
<strong> log { source(src); filter(f_syslog); destination(d_syslog); };</strong></p>
<p>What I did at this point was to also modify the syslog-ng filter, by default the f_syslog filter is:<br />
<strong> filter f_syslog { not facility(authpriv, mail); };</strong><br />
I changed it to make it filter the messages from the localO, where I have the log4j configured to send the logs to:<br />
<strong> filter f_syslog { facility(local0); };</strong><br />
Note that this step is not a must to have the syslog-ng working, it&#8217;s just a custom configuration.</p>
<p>Last error I got said that it was not possible to use the tty12 (wich is used in the default conf):<br />
<strong> destination console_all { file(&#8220;/dev/tty12&#8243;); };</strong><br />
So I just changed it to &#8216;console&#8217;:<br />
<strong> destination console_all { file(&#8220;/dev/console&#8221;); };</strong></p>
<p>Now we can properly start the log system typing:<br />
<strong> syslog-ng</strong></p>
<p><img class="alignnone" src="http://www.marcsturlese.com/wp-content/images/syslog/syslog-work.png" alt="" width="545" height="35" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2010/10/18/running-syslog-ng-on-snow-leopard/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lucene FieldCache.StringIndex and multiValued fields</title>
		<link>http://www.marcsturlese.com/2010/06/27/lucene-fieldcache-stringindex-and-multivalued-fields/</link>
		<comments>http://www.marcsturlese.com/2010/06/27/lucene-fieldcache-stringindex-and-multivalued-fields/#comments</comments>
		<pubDate>Sun, 27 Jun 2010 14:42:11 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[FieldCache]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=169</guid>
		<description><![CDATA[Lately I&#8217;ve been doing some tests with Lucene MultiValued fields and FieldCache. I&#8217;ve load FieldCache.StringIndex of a multiValued  field and I&#8217;ve seen some weird stuff happening which I think it&#8217;s worth to mention. FieldCache.StringIndex loads an int[] (order) and a String[] (lookup). The String[] contains all the terms on a field. The int[] array contains [...]]]></description>
			<content:encoded><![CDATA[<p>Lately I&#8217;ve been doing some tests with <strong>Lucene</strong> MultiValued fields and <strong>FieldCache</strong>.<br />
I&#8217;ve load <strong>FieldCache.StringIndex</strong> of a multiValued  field and I&#8217;ve seen some weird stuff happening which I think it&#8217;s worth to mention.</p>
<p><strong>FieldCache.StringIndex</strong> loads an int[] (order) and a String[] (lookup). The String[] contains all the terms on a field. The int[] array contains for each document an index to the lookup array.<br />
It was curious to see that loading this structure for some multiValued fields on the index was working all rite. However, for some others was giving me back a RuntimeException I haven&#8217;t seen before, saying there were more terms than documents in the field &#8216;x&#8217;.</p>
<p><strong>FieldCache</strong> is a structure meant to be used on single token (per document) fields. All trouble starts because in my tests I am not respecting that.<br />
<strong>FieldCache</strong> can not hanlde more than one value per field. When loading <strong>FieldCache.StringIndex</strong> it does a test to ensure there&#8217;s no more than a term per field (it checks if the number of unique terms is greater than the number of docs). In my tests I am creating false negatives of these checks and seeing unexpected behavior.</p>
<p>So, let&#8217;s say I have an index with 100 docs and a multiValued field. The multiValued field has 2 values per document. If none of the field values is the same in the whole index I will get the exception. That&#8217;s due to the check done by the StringIndex.If I just have two different values and all the documents have these two values, no exception is thrown (false negative of the check). We can see that when the number of unique terms exceeds the number of docs the exception is thrown. That explains why when loading a <strong>FieldCache.StringIndex</strong> on a field with more than just one term can end up with a nasty exception or act as nothing is wrong.</p>
<p>There have been some fixes in the latter <strong>Lucene</strong> versions  (trunk, 3x, 3.0, 2.9 branches). The behavior now it that once the number of terms  &gt; total documents, the array will not grow anymore so at least no RunTimeExceptions is going to happen.</p>
<p>More info can be found in the jira for the issue<a href="http://"></a> <a title="Lucene FieldCache.StringIndex" href="https://issues.apache.org/jira/browse/LUCENE-2142" target="_blank"><strong>LUCENE-2142</strong></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2010/06/27/lucene-fieldcache-stringindex-and-multivalued-fields/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Apache Lucene EuroCon 2010</title>
		<link>http://www.marcsturlese.com/2010/05/24/apache-lucene-eurocon-2010/</link>
		<comments>http://www.marcsturlese.com/2010/05/24/apache-lucene-eurocon-2010/#comments</comments>
		<pubDate>Mon, 24 May 2010 12:07:43 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[EuroCon]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=159</guid>
		<description><![CDATA[Yesterday I came back from the Lucene EuroCon 2010, wich took place in Prague. There have been many interesting talks there these days. Some of the slides are already on Slide Share.  Can&#8217;t wait for the others to be uploaded. I gave a talk on Thursday about our usage of Solr at Trovit. Covered an [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday I came back from the <strong>Lucene EuroCon</strong> 2010, wich took place in Prague.<br />
There have been many interesting talks there these days. Some of the slides are already on Slide Share.  Can&#8217;t wait for the others to be uploaded.</p>
<p>I gave a talk on Thursday about our usage of <strong>Solr</strong> at Trovit. Covered an overview of our architecture, different of our 0ut 0f the box and custom features and some of the future lines we have in mind.</p>
<p>&#8220;Munching and Crunching: <strong>Lucene</strong> Index post-processing&#8221; was definitelly my favourite talk. Andrzej Bialecki covered topics I have never even thought about. Among other things there was a pretty complete explanation about index splitting, pruning and multi-tiered search.<br />
People tends to think all data processing must be done during indexing time. Andrzej showed us that many good stuff can be done once the index is already built.</p>
<p>Yonik explained in an hour the main features that are coming with new Solr releases, &#8220;<strong>Solr</strong> 1.5 and  Beyond&#8221;. Extended DisMax query parser, quick introduction to <strong>SolrCloud</strong>, Spatial Search, Realtime Time and Field Collapsing where covered.</p>
<p>Grant Ingersoll spoke about <strong>Lucene</strong> / <strong>Solr</strong> relevance: &#8220;Practical Relevance: Tips and Tricks for Understanding and Improving Search Quality&#8221;.<br />
It was very interesting to hear about the most commonly used techincques to do relevance testing:<br />
A/B test, log analysis, empirical tests, asking or using related projects as Open Relevance or TREC.</p>
<p>Mark Miller talked about <strong>SolrCloud</strong>. It promises to make life so much easier to <strong>Solr</strong> distributed installations admins.</p>
<p>There were really good topics in the MeetUp as well. &#8220;How We Scaled <strong>Solr</strong> to 3+ Billion Documents&#8221; by Jason Rutherglen was the one I was expecting the most. I always like to hear about big <strong>Solr</strong> deployments and <strong>Hadoop</strong> usage related to <strong>Lucene</strong> and <strong>Solr </strong>indexing. This one I think is the biggest I know.</p>
<p>So, these days have been really useful. Many new ideas, many stuff to test.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2010/05/24/apache-lucene-eurocon-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lucene 2.9.2 and 3.0.1 released</title>
		<link>http://www.marcsturlese.com/2010/02/27/lucene-2-9-2-and-3-0-1-released/</link>
		<comments>http://www.marcsturlese.com/2010/02/27/lucene-2-9-2-and-3-0-1-released/#comments</comments>
		<pubDate>Sat, 27 Feb 2010 14:52:21 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=154</guid>
		<description><![CDATA[Lucene 2.9.2 and 3.0.1 versions have been released. Both are mainly bug fix versions from the previous ones. The main difference between 2 and 3 versions is that version 3 has no support for java 1.4 and has a more clean API as deprecated stuff has been removed. This means if you want to upgrade [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Lucene</strong> 2.9.2 and 3.0.1 versions have been released. Both are mainly bug fix versions from the previous ones.<br />
The main difference between 2 and 3 versions is that version 3 has no support for java 1.4 and has a more clean API as deprecated stuff has been removed. This means if you want to upgrade your <strong>Lucene</strong> JARs to v.3 you must use at least Java 1.5 and have no deprecation warnings in you code.<br />
More details of both releases can be found in the <a title="Lucene official announcement" href="http://www.search-lucene.com/m?id=000501cab6bc$7d2cdf00$77869d00$@de||[ANNOUNCE]%20Release%20of%20Lucene%20Java%203.0.1%20and%202.9.2" target="_blank">official announcement</a>:</p>
<blockquote><p><em>Hello <strong>Lucene</strong> users,</em></p>
<p><em>On behalf of the <strong>Lucene</strong> development community I would like to announce the release of <strong>Lucene</strong> Java versions 3.0.1 and 2.9.2:</em></p>
<p><em>Both releases fix bugs in the previous versions:</em></p>
<p><em>- 2.9.2 is a bugfix release for the <strong>Lucene</strong> Java 2.x series, based on Java 1.4<br />
- 3.0.1 has the same bug fix level but is for the <strong>Lucene</strong> Java 3.x series, based on Java 5.</em></p>
<p><em>New users of <strong>Lucene</strong> are advised to use version 3.0.1 for new developments, because it has a clean, type-safe API.</em></p>
<p><em>Important improvements in these releases include:</em></p>
<p><em>- An increased maximum number of unique terms in each index segment.<br />
- Fixed experimental CustomScoreQuery to respect per-segment search. This introduced an API change!<br />
- Important fixes to IndexWriter: a commit() thread-safety issue, lost document deletes in near real-time indexing.<br />
- Bugfixes for Contrib&#8217;s Analyzers package.<br />
- Restoration of some public methods that were lost during deprecation removal.<br />
- The new Attribute-based TokenStream API now works correctly with different class loaders.</em></p>
<p><em>Both releases are fully compatible with the corresponding previous versions. We strongly recommend upgrading to 2.9.2 if you are using 2.9.1 or 2.9.0; and to 3.0.1 if you are using 3.0.0.</em></p>
<p><em>See core changes at<br />
<a title="apache lucene" href="http://lucene.apache.org/java/3_0_1/changes/Changes.html" target="_blank">http://lucene.apache.org/java/3_0_1/changes/Changes.html</a><br />
<a title="apache lucene" href="http://lucene.apache.org/java/2_9_2/changes/Changes.html" target="_blank">http://lucene.apache.org/java/2_9_2/changes/Changes.html</a></em></p>
<p><em>and contrib changes at<br />
<a title="apache lucene" href="http://lucene.apache.org/java/3_0_1/changes/Contrib-Changes.html" target="_blank">http://lucene.apache.org/java/3_0_1/changes/Contrib-Changes.html</a><br />
<a title="apache lucene" href="http://lucene.apache.org/java/2_9_2/changes/Contrib-Changes.html" target="_blank">http://lucene.apache.org/java/2_9_2/changes/Contrib-Changes.html</a></em></p>
<p><em>Binary and source distributions are available at<br />
<a title="apache lucene" href="http://www.apache.org/dyn/closer.cgi/lucene/java/" target="_blank">http://www.apache.org/dyn/closer.cgi/lucene/java/</a></em></p>
<p><em><strong>Lucene</strong> artifacts are also available in the Maven2 repository at<br />
<a title="apache lucene" href="http://repo1.maven.org/maven2/org/apache/lucene/" target="_blank">http://repo1.maven.org/maven2/org/apache/lucene/</a></em></p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2010/02/27/lucene-2-9-2-and-3-0-1-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ElasticSearch</title>
		<link>http://www.marcsturlese.com/2010/02/12/elasticsearch/</link>
		<comments>http://www.marcsturlese.com/2010/02/12/elasticsearch/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 00:33:20 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[ElasticSearch]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=145</guid>
		<description><![CDATA[It has been a long time since my last post. I have been very busy so unfortunatelly, I have not had the time to write about all I wish. This week I have discovered via twitter a really interesting open source search project, ElasticSearch for the cloud. ElasticSearch has been createded by Shay Banon. It&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>It has been a long time since my last post. I have been very busy so unfortunatelly, I have not had the time to write about all I wish.</p>
<p>This week I have discovered via twitter a really interesting open source search project, <strong><a title="ElasticSearch" href="http://www.elasticsearch.com/" target="_blank">ElasticSearch for the cloud</a></strong>. <strong>ElasticSearch</strong> has been createded by Shay Banon. It&#8217;s a RESTful search engine built on top of <strong><a title="Lucene" href="http://lucene.apache.org/java/docs/" target="_blank">Lucene</a></strong> and very well prepared for high scalability. It includes shard merging, replication and much more features.</p>
<p>Lately I have been working a lot with search scalability and what I liked the most for the moment of <strong>ElasticSearch</strong> is that it allows 4 different types of distributed requests.</p>
<p>The most simple (Query and fetch) is just one request per relevant shard. Once all the requests are done, results are merged and&#8230; that&#8217;s it!<br />
In this type of search, all fields of a document are returned to the merger for all the returned documents.</p>
<p>In another search type (Query then fetch, this one is not that simple), a first request is done across all shards. Here you don&#8217;t ask for the document content at the moment. Once the results are merged, you only need to ask for the whole document data of the most relevant documents, the ones you want to show.<br />
If you have to search across lots of shards that&#8217;s definitely the way to go (the merger will just receive the fields of the important documents, wich means less data is sent across the network).</p>
<p>Both options present a typical problem in distributed search. The relevance is calculated relative to the shard, it&#8217;s not absolute across all of them.<br />
To solve this, in <strong>ElasticSearch</strong>, both search options can be supplemented with an initial request. This one queries for the necessary term frequencies information to allow an &#8220;absolute relevance&#8221;.<br />
This is not for free, you are paying with an extra trip (even it can be cached). It&#8217;s good if you can avoid that. A good way to do that is at indexing time, when you decide in wich shard a document must be added. Choosing it randomly will more or less ensure you that term frequencies won&#8217;t differ so much among shards.</p>
<p>Still have not had the chance to dig into the source but already have downloaded it from the <a title="ElasticSearch" href="http://github.com/elasticsearch/elasticsearch" target="_blank">git repository</a>.<br />
Anyone that want to share experiences with <strong>ElasticSearch</strong> is more than welcome.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2010/02/12/elasticsearch/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CloudCamp Barcelona 2009</title>
		<link>http://www.marcsturlese.com/2009/06/18/cloudcamp-barcelona-2009/</link>
		<comments>http://www.marcsturlese.com/2009/06/18/cloudcamp-barcelona-2009/#comments</comments>
		<pubDate>Wed, 17 Jun 2009 23:43:49 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Abicloud]]></category>
		<category><![CDATA[CloudCamp]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Sqoop]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=134</guid>
		<description><![CDATA[Last Monday took place in Barcelona the first CloudCamp ever done in the city. Altough I was expecting more technical stuff it was good to be there and listen to what people have to say. The first part of the event consisted of some quick explanations from different companies related with cloud computing. Basically, were [...]]]></description>
			<content:encoded><![CDATA[<p>Last Monday took place in Barcelona the first <a title="CloudCamp" href="http://www.cloudcamp.com/?page_id=902" target="_blank"><strong>CloudCamp</strong></a> ever done in the city. Altough I was expecting more technical stuff it was good to be there and listen to what people have to say.<br />
The first part of the event consisted of some quick explanations from different companies related with cloud computing. Basically, were explaining the cloud choises and advantages they were offering. The one I enjoyed the most was the Abiquo&#8217;s presentation of their new software, <a title="Abicloud" href="http://www.abiquo.com/en/products/abicloud" target="_blank">Abicloud</a>. Through a really nice GUI developed with Flex, Abicloud, among other stuff, allows you to set up virtual machines configuring automatically an apache server, mysql database&#8230; with just a few drag &amp; drop actions. You can use you own machines, servers from an ISP or even combine both. Elastically, you can increase or decrease the number of virtual machines. This can be very convenient for sites with hight traffic peaks or testing environements.<br />
I am not going to talk more about it as with a five minutes presentation just could get the main idea. Can&#8217;t wait to have some free time to start playing with it. Just will add that Abicloud is completely open source.</p>
<p>After the quick talks, the following topics were discussed:</p>
<ul>
<li> What guarantees do I have with <strong>Cloud Computing</strong>?</li>
<li> What legal issues are there with your data?</li>
<li> Are standards important? If so, wich ones?</li>
<li> What is the benefit for a company with only a few dozens of servers?</li>
<li> Best platfrom to starting a cloud hosting company?</li>
<li> Is cloud computing green? If so, what?</li>
</ul>
<p>In the end people were divided in groups depending on in wich topic wanted to go deeper. I attended to &#8220;How to develope applications that are going to run in the cloud&#8221;. There I could have an interesting quick chat about application scalability and how to dump mysql databases to <strong>HDFS</strong> using the Cloudera&#8217;s tool <strong><a title="Hadoop's Sqoop" href="http://www.cloudera.com/hadoop-sqoop#getting_sqoop" target="_blank">Sqoop</a></strong>.</p>
<p><img class="aligncenter size-medium wp-image-140" title="cloudcamp" src="http://www.marcsturlese.com/wp-content/images/cloudcamp-300x72.jpg" alt="cloudcamp" width="260" height="62" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2009/06/18/cloudcamp-barcelona-2009/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Performance measurement with JMeter 2.3.3</title>
		<link>http://www.marcsturlese.com/2009/05/31/performance-measurement-with-jmeter-233/</link>
		<comments>http://www.marcsturlese.com/2009/05/31/performance-measurement-with-jmeter-233/#comments</comments>
		<pubDate>Sun, 31 May 2009 21:58:13 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[JMeter]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=119</guid>
		<description><![CDATA[Last week was launched a new release of JMeter. JMeter 2.3.3 is a powerful java application designed to do web application functionality testing and performance measurement, allowing you to do powerful server stress tests. I have been doing some practices with it and I really liked the easy way you can set up a test [...]]]></description>
			<content:encoded><![CDATA[<p>Last week was launched a new release of <a title="JMeter" href="http://jakarta.apache.org/jmeter/index.html" target="_blank"><strong>JMeter</strong></a>. <strong><a title="Apache JMeter" href="http://jakarta.apache.org/jmeter/changes.html" target="_blank">JMeter 2.3.3</a></strong> is a powerful java application designed to do web application functionality testing and performance measurement, allowing you to do powerful server stress tests.<br />
I have been doing some practices with it and I really liked the easy way you can set up a test plan and start stressing your machines to check response times when lot&#8217;s of threads are doing requests.</p>
<p>You just need to create a .jmx file wich will contain all the information needed to do the requests. Host name, port number, protocol, method, url path, url variables&#8230; You can actually tell <strong>JMeter</strong> to read the url variables from an external .dat file. It will allow you to give different values to the variables for each request.<br />
The .jmx can be written manually but it&#8217;s much easier to create it via the <strong>JMeter&#8217;s GUI</strong>.</p>
<p>You will have to tell <strong>JMeter</strong> the number of threads that must be executing requests and the number of requests per thread. It allows you to leave the threads making requests indefinitely.<br />
Once a test is launched you can see in real time the number of samples that have been executed and the Deviation, Throughput, Average and Median of the requests done by the threads (think of a thread as a user doing a request via browser).</p>
<p>This is just how to do a basic test plan but the application is really more complete than this and has much more interesting features.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2009/05/31/performance-measurement-with-jmeter-233/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Analyzing java heaps with jmap and jhat</title>
		<link>http://www.marcsturlese.com/2009/05/09/analyzing-java-heaps-with-jmap-and-jhat/</link>
		<comments>http://www.marcsturlese.com/2009/05/09/analyzing-java-heaps-with-jmap-and-jhat/#comments</comments>
		<pubDate>Sat, 09 May 2009 18:46:57 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[jhat]]></category>
		<category><![CDATA[jmap]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=115</guid>
		<description><![CDATA[Jmap and jhat are a couple of tools really useful to analyze the memory consume of a java program. Both are included in the JVM 1.6 so there is no need to install any extra stuff. Jmap allows you to create a dump of the java memory heap at any moment in the life of [...]]]></description>
			<content:encoded><![CDATA[<p><a title="jmap" href="http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jmap.html" target="_blank"><strong>Jmap</strong></a> and <a title="jhat" href="http://java.sun.com/javase/6/docs/technotes/tools/share/jhat.html" target="_blank"><strong>jhat</strong></a> are a couple of tools really useful to analyze the memory consume of a java program. Both are included in the JVM 1.6 so there is no need to install any extra stuff.</p>
<p><strong>Jmap</strong> allows you to create a dump of the java memory heap at any moment in the life of your running application. It will contain all the live objects and classes at that moment. To create the heap dump it&#8217;s as easy as:</p>
<p><strong>jmap -dump:file=my_stack.bin 4365</strong></p>
<p>Where my_stack.bin is the name of the file where you want the dump and 4365 is the pid of the java application process.</p>
<p>If you are running a servlet application under a java server and it ends with a:</p>
<p><strong>java.lang.OutOfMemoryError: Java heap space</strong></p>
<p>You can trigger a dump of the java heap at the OutOfMemory moment specifying these parameters to the server:</p>
<p><strong>-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/sturlese/stack_test/</strong></p>
<p>This will create a .hprof file (named with the pid&#8217;s process) containing the dump in the specified path.<br />
HeapDumpPath param is not compulsory. If we don&#8217;t specify it the dump will be created in the folder where Tomcat launches the webapps.</p>
<p>Now we have the dump of the java heap. To analyze it we will use<strong> jhat</strong>. Once we launch<strong> jhat</strong> specifying the   dump to analyze it will start an HTTP server (in the port 7000 by default) and will let you surf along all the classes and objects. You will be able to check how many instances of each class where alive in the moment the heap was created. To launch<strong> jhat</strong>:</p>
<p><strong>jhat my_stack.bin</strong></p>
<p>It&#8217;s easy to get an OutOfMememory exception when opening the java heap. The dump file can be very memory consuming if you app was in the moment it was taken. If you experience the problem you should give to the JVM as much memory as you can:</p>
<p><strong>jhat -J-mx2000m my_stack.bin</strong></p>
<p>Now is the moment to point your web browser to <strong>http://localhost:7000</strong> and start analyzing the heap!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2009/05/09/analyzing-java-heaps-with-jmap-and-jhat/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

