<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Marc Sturlese &#187; Open source</title>
	<atom:link href="http://www.marcsturlese.com/tag/open-source/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.marcsturlese.com</link>
	<description>Life, code and stuff</description>
	<lastBuildDate>Tue, 29 Nov 2011 22:19:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Scaling search at Trovit with Solr and Hadoop</title>
		<link>http://www.marcsturlese.com/2011/11/29/scaling-search-at-trovit-with-solr-and-hadoop/</link>
		<comments>http://www.marcsturlese.com/2011/11/29/scaling-search-at-trovit-with-solr-and-hadoop/#comments</comments>
		<pubDate>Tue, 29 Nov 2011 22:19:18 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[EuroCon]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hive]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=228</guid>
		<description><![CDATA[Here are the slides and the video of the presentation I gave at the Apache Lucene Eurocon 2011 in Barcelona. The talk was about  how we crunch and index data using Solr, Hadoop and Hive at Trovit. I put special interest in the distributed indexing strategy.]]></description>
			<content:encoded><![CDATA[<p>Here are the <a title="solr hadoop" href="http://www.lucidimagination.com/sites/default/files/file/Eurocon2011/MarcSturleseTrovit.pdf" target="_blank">slides</a> and the <a title="solr hadoop" href="http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011/scaling-search-trovit-solr-and-hadoop" target="_blank">video</a> of the presentation I gave at the <a title="lucene eurocon" href="http://2011.lucene-eurocon.org/" target="_blank">Apache Lucene Eurocon 2011</a> in Barcelona. The talk was about  how we crunch and index data using <strong>Solr</strong>, <strong>Hadoop</strong> and Hive at Trovit. I put special interest in the distributed indexing strategy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2011/11/29/scaling-search-at-trovit-with-solr-and-hadoop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lucene FieldCache.StringIndex and multiValued fields</title>
		<link>http://www.marcsturlese.com/2010/06/27/lucene-fieldcache-stringindex-and-multivalued-fields/</link>
		<comments>http://www.marcsturlese.com/2010/06/27/lucene-fieldcache-stringindex-and-multivalued-fields/#comments</comments>
		<pubDate>Sun, 27 Jun 2010 14:42:11 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[FieldCache]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=169</guid>
		<description><![CDATA[Lately I&#8217;ve been doing some tests with Lucene MultiValued fields and FieldCache. I&#8217;ve load FieldCache.StringIndex of a multiValued  field and I&#8217;ve seen some weird stuff happening which I think it&#8217;s worth to mention. FieldCache.StringIndex loads an int[] (order) and a String[] (lookup). The String[] contains all the terms on a field. The int[] array contains [...]]]></description>
			<content:encoded><![CDATA[<p>Lately I&#8217;ve been doing some tests with <strong>Lucene</strong> MultiValued fields and <strong>FieldCache</strong>.<br />
I&#8217;ve load <strong>FieldCache.StringIndex</strong> of a multiValued  field and I&#8217;ve seen some weird stuff happening which I think it&#8217;s worth to mention.</p>
<p><strong>FieldCache.StringIndex</strong> loads an int[] (order) and a String[] (lookup). The String[] contains all the terms on a field. The int[] array contains for each document an index to the lookup array.<br />
It was curious to see that loading this structure for some multiValued fields on the index was working all rite. However, for some others was giving me back a RuntimeException I haven&#8217;t seen before, saying there were more terms than documents in the field &#8216;x&#8217;.</p>
<p><strong>FieldCache</strong> is a structure meant to be used on single token (per document) fields. All trouble starts because in my tests I am not respecting that.<br />
<strong>FieldCache</strong> can not hanlde more than one value per field. When loading <strong>FieldCache.StringIndex</strong> it does a test to ensure there&#8217;s no more than a term per field (it checks if the number of unique terms is greater than the number of docs). In my tests I am creating false negatives of these checks and seeing unexpected behavior.</p>
<p>So, let&#8217;s say I have an index with 100 docs and a multiValued field. The multiValued field has 2 values per document. If none of the field values is the same in the whole index I will get the exception. That&#8217;s due to the check done by the StringIndex.If I just have two different values and all the documents have these two values, no exception is thrown (false negative of the check). We can see that when the number of unique terms exceeds the number of docs the exception is thrown. That explains why when loading a <strong>FieldCache.StringIndex</strong> on a field with more than just one term can end up with a nasty exception or act as nothing is wrong.</p>
<p>There have been some fixes in the latter <strong>Lucene</strong> versions  (trunk, 3x, 3.0, 2.9 branches). The behavior now it that once the number of terms  &gt; total documents, the array will not grow anymore so at least no RunTimeExceptions is going to happen.</p>
<p>More info can be found in the jira for the issue<a href="http://"></a> <a title="Lucene FieldCache.StringIndex" href="https://issues.apache.org/jira/browse/LUCENE-2142" target="_blank"><strong>LUCENE-2142</strong></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2010/06/27/lucene-fieldcache-stringindex-and-multivalued-fields/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Apache Lucene EuroCon 2010</title>
		<link>http://www.marcsturlese.com/2010/05/24/apache-lucene-eurocon-2010/</link>
		<comments>http://www.marcsturlese.com/2010/05/24/apache-lucene-eurocon-2010/#comments</comments>
		<pubDate>Mon, 24 May 2010 12:07:43 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[EuroCon]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=159</guid>
		<description><![CDATA[Yesterday I came back from the Lucene EuroCon 2010, wich took place in Prague. There have been many interesting talks there these days. Some of the slides are already on Slide Share.  Can&#8217;t wait for the others to be uploaded. I gave a talk on Thursday about our usage of Solr at Trovit. Covered an [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday I came back from the <strong>Lucene EuroCon</strong> 2010, wich took place in Prague.<br />
There have been many interesting talks there these days. Some of the slides are already on Slide Share.  Can&#8217;t wait for the others to be uploaded.</p>
<p>I gave a talk on Thursday about our usage of <strong>Solr</strong> at Trovit. Covered an overview of our architecture, different of our 0ut 0f the box and custom features and some of the future lines we have in mind.</p>
<p>&#8220;Munching and Crunching: <strong>Lucene</strong> Index post-processing&#8221; was definitelly my favourite talk. Andrzej Bialecki covered topics I have never even thought about. Among other things there was a pretty complete explanation about index splitting, pruning and multi-tiered search.<br />
People tends to think all data processing must be done during indexing time. Andrzej showed us that many good stuff can be done once the index is already built.</p>
<p>Yonik explained in an hour the main features that are coming with new Solr releases, &#8220;<strong>Solr</strong> 1.5 and  Beyond&#8221;. Extended DisMax query parser, quick introduction to <strong>SolrCloud</strong>, Spatial Search, Realtime Time and Field Collapsing where covered.</p>
<p>Grant Ingersoll spoke about <strong>Lucene</strong> / <strong>Solr</strong> relevance: &#8220;Practical Relevance: Tips and Tricks for Understanding and Improving Search Quality&#8221;.<br />
It was very interesting to hear about the most commonly used techincques to do relevance testing:<br />
A/B test, log analysis, empirical tests, asking or using related projects as Open Relevance or TREC.</p>
<p>Mark Miller talked about <strong>SolrCloud</strong>. It promises to make life so much easier to <strong>Solr</strong> distributed installations admins.</p>
<p>There were really good topics in the MeetUp as well. &#8220;How We Scaled <strong>Solr</strong> to 3+ Billion Documents&#8221; by Jason Rutherglen was the one I was expecting the most. I always like to hear about big <strong>Solr</strong> deployments and <strong>Hadoop</strong> usage related to <strong>Lucene</strong> and <strong>Solr </strong>indexing. This one I think is the biggest I know.</p>
<p>So, these days have been really useful. Many new ideas, many stuff to test.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2010/05/24/apache-lucene-eurocon-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lucene 2.9.2 and 3.0.1 released</title>
		<link>http://www.marcsturlese.com/2010/02/27/lucene-2-9-2-and-3-0-1-released/</link>
		<comments>http://www.marcsturlese.com/2010/02/27/lucene-2-9-2-and-3-0-1-released/#comments</comments>
		<pubDate>Sat, 27 Feb 2010 14:52:21 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=154</guid>
		<description><![CDATA[Lucene 2.9.2 and 3.0.1 versions have been released. Both are mainly bug fix versions from the previous ones. The main difference between 2 and 3 versions is that version 3 has no support for java 1.4 and has a more clean API as deprecated stuff has been removed. This means if you want to upgrade [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Lucene</strong> 2.9.2 and 3.0.1 versions have been released. Both are mainly bug fix versions from the previous ones.<br />
The main difference between 2 and 3 versions is that version 3 has no support for java 1.4 and has a more clean API as deprecated stuff has been removed. This means if you want to upgrade your <strong>Lucene</strong> JARs to v.3 you must use at least Java 1.5 and have no deprecation warnings in you code.<br />
More details of both releases can be found in the <a title="Lucene official announcement" href="http://www.search-lucene.com/m?id=000501cab6bc$7d2cdf00$77869d00$@de||[ANNOUNCE]%20Release%20of%20Lucene%20Java%203.0.1%20and%202.9.2" target="_blank">official announcement</a>:</p>
<blockquote><p><em>Hello <strong>Lucene</strong> users,</em></p>
<p><em>On behalf of the <strong>Lucene</strong> development community I would like to announce the release of <strong>Lucene</strong> Java versions 3.0.1 and 2.9.2:</em></p>
<p><em>Both releases fix bugs in the previous versions:</em></p>
<p><em>- 2.9.2 is a bugfix release for the <strong>Lucene</strong> Java 2.x series, based on Java 1.4<br />
- 3.0.1 has the same bug fix level but is for the <strong>Lucene</strong> Java 3.x series, based on Java 5.</em></p>
<p><em>New users of <strong>Lucene</strong> are advised to use version 3.0.1 for new developments, because it has a clean, type-safe API.</em></p>
<p><em>Important improvements in these releases include:</em></p>
<p><em>- An increased maximum number of unique terms in each index segment.<br />
- Fixed experimental CustomScoreQuery to respect per-segment search. This introduced an API change!<br />
- Important fixes to IndexWriter: a commit() thread-safety issue, lost document deletes in near real-time indexing.<br />
- Bugfixes for Contrib&#8217;s Analyzers package.<br />
- Restoration of some public methods that were lost during deprecation removal.<br />
- The new Attribute-based TokenStream API now works correctly with different class loaders.</em></p>
<p><em>Both releases are fully compatible with the corresponding previous versions. We strongly recommend upgrading to 2.9.2 if you are using 2.9.1 or 2.9.0; and to 3.0.1 if you are using 3.0.0.</em></p>
<p><em>See core changes at<br />
<a title="apache lucene" href="http://lucene.apache.org/java/3_0_1/changes/Changes.html" target="_blank">http://lucene.apache.org/java/3_0_1/changes/Changes.html</a><br />
<a title="apache lucene" href="http://lucene.apache.org/java/2_9_2/changes/Changes.html" target="_blank">http://lucene.apache.org/java/2_9_2/changes/Changes.html</a></em></p>
<p><em>and contrib changes at<br />
<a title="apache lucene" href="http://lucene.apache.org/java/3_0_1/changes/Contrib-Changes.html" target="_blank">http://lucene.apache.org/java/3_0_1/changes/Contrib-Changes.html</a><br />
<a title="apache lucene" href="http://lucene.apache.org/java/2_9_2/changes/Contrib-Changes.html" target="_blank">http://lucene.apache.org/java/2_9_2/changes/Contrib-Changes.html</a></em></p>
<p><em>Binary and source distributions are available at<br />
<a title="apache lucene" href="http://www.apache.org/dyn/closer.cgi/lucene/java/" target="_blank">http://www.apache.org/dyn/closer.cgi/lucene/java/</a></em></p>
<p><em><strong>Lucene</strong> artifacts are also available in the Maven2 repository at<br />
<a title="apache lucene" href="http://repo1.maven.org/maven2/org/apache/lucene/" target="_blank">http://repo1.maven.org/maven2/org/apache/lucene/</a></em></p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2010/02/27/lucene-2-9-2-and-3-0-1-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ElasticSearch</title>
		<link>http://www.marcsturlese.com/2010/02/12/elasticsearch/</link>
		<comments>http://www.marcsturlese.com/2010/02/12/elasticsearch/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 00:33:20 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[ElasticSearch]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=145</guid>
		<description><![CDATA[It has been a long time since my last post. I have been very busy so unfortunatelly, I have not had the time to write about all I wish. This week I have discovered via twitter a really interesting open source search project, ElasticSearch for the cloud. ElasticSearch has been createded by Shay Banon. It&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>It has been a long time since my last post. I have been very busy so unfortunatelly, I have not had the time to write about all I wish.</p>
<p>This week I have discovered via twitter a really interesting open source search project, <strong><a title="ElasticSearch" href="http://www.elasticsearch.com/" target="_blank">ElasticSearch for the cloud</a></strong>. <strong>ElasticSearch</strong> has been createded by Shay Banon. It&#8217;s a RESTful search engine built on top of <strong><a title="Lucene" href="http://lucene.apache.org/java/docs/" target="_blank">Lucene</a></strong> and very well prepared for high scalability. It includes shard merging, replication and much more features.</p>
<p>Lately I have been working a lot with search scalability and what I liked the most for the moment of <strong>ElasticSearch</strong> is that it allows 4 different types of distributed requests.</p>
<p>The most simple (Query and fetch) is just one request per relevant shard. Once all the requests are done, results are merged and&#8230; that&#8217;s it!<br />
In this type of search, all fields of a document are returned to the merger for all the returned documents.</p>
<p>In another search type (Query then fetch, this one is not that simple), a first request is done across all shards. Here you don&#8217;t ask for the document content at the moment. Once the results are merged, you only need to ask for the whole document data of the most relevant documents, the ones you want to show.<br />
If you have to search across lots of shards that&#8217;s definitely the way to go (the merger will just receive the fields of the important documents, wich means less data is sent across the network).</p>
<p>Both options present a typical problem in distributed search. The relevance is calculated relative to the shard, it&#8217;s not absolute across all of them.<br />
To solve this, in <strong>ElasticSearch</strong>, both search options can be supplemented with an initial request. This one queries for the necessary term frequencies information to allow an &#8220;absolute relevance&#8221;.<br />
This is not for free, you are paying with an extra trip (even it can be cached). It&#8217;s good if you can avoid that. A good way to do that is at indexing time, when you decide in wich shard a document must be added. Choosing it randomly will more or less ensure you that term frequencies won&#8217;t differ so much among shards.</p>
<p>Still have not had the chance to dig into the source but already have downloaded it from the <a title="ElasticSearch" href="http://github.com/elasticsearch/elasticsearch" target="_blank">git repository</a>.<br />
Anyone that want to share experiences with <strong>ElasticSearch</strong> is more than welcome.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2010/02/12/elasticsearch/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CloudCamp Barcelona 2009</title>
		<link>http://www.marcsturlese.com/2009/06/18/cloudcamp-barcelona-2009/</link>
		<comments>http://www.marcsturlese.com/2009/06/18/cloudcamp-barcelona-2009/#comments</comments>
		<pubDate>Wed, 17 Jun 2009 23:43:49 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Abicloud]]></category>
		<category><![CDATA[CloudCamp]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Sqoop]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=134</guid>
		<description><![CDATA[Last Monday took place in Barcelona the first CloudCamp ever done in the city. Altough I was expecting more technical stuff it was good to be there and listen to what people have to say. The first part of the event consisted of some quick explanations from different companies related with cloud computing. Basically, were [...]]]></description>
			<content:encoded><![CDATA[<p>Last Monday took place in Barcelona the first <a title="CloudCamp" href="http://www.cloudcamp.com/?page_id=902" target="_blank"><strong>CloudCamp</strong></a> ever done in the city. Altough I was expecting more technical stuff it was good to be there and listen to what people have to say.<br />
The first part of the event consisted of some quick explanations from different companies related with cloud computing. Basically, were explaining the cloud choises and advantages they were offering. The one I enjoyed the most was the Abiquo&#8217;s presentation of their new software, <a title="Abicloud" href="http://www.abiquo.com/en/products/abicloud" target="_blank">Abicloud</a>. Through a really nice GUI developed with Flex, Abicloud, among other stuff, allows you to set up virtual machines configuring automatically an apache server, mysql database&#8230; with just a few drag &amp; drop actions. You can use you own machines, servers from an ISP or even combine both. Elastically, you can increase or decrease the number of virtual machines. This can be very convenient for sites with hight traffic peaks or testing environements.<br />
I am not going to talk more about it as with a five minutes presentation just could get the main idea. Can&#8217;t wait to have some free time to start playing with it. Just will add that Abicloud is completely open source.</p>
<p>After the quick talks, the following topics were discussed:</p>
<ul>
<li> What guarantees do I have with <strong>Cloud Computing</strong>?</li>
<li> What legal issues are there with your data?</li>
<li> Are standards important? If so, wich ones?</li>
<li> What is the benefit for a company with only a few dozens of servers?</li>
<li> Best platfrom to starting a cloud hosting company?</li>
<li> Is cloud computing green? If so, what?</li>
</ul>
<p>In the end people were divided in groups depending on in wich topic wanted to go deeper. I attended to &#8220;How to develope applications that are going to run in the cloud&#8221;. There I could have an interesting quick chat about application scalability and how to dump mysql databases to <strong>HDFS</strong> using the Cloudera&#8217;s tool <strong><a title="Hadoop's Sqoop" href="http://www.cloudera.com/hadoop-sqoop#getting_sqoop" target="_blank">Sqoop</a></strong>.</p>
<p><img class="aligncenter size-medium wp-image-140" title="cloudcamp" src="http://www.marcsturlese.com/wp-content/images/cloudcamp-300x72.jpg" alt="cloudcamp" width="260" height="62" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2009/06/18/cloudcamp-barcelona-2009/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Performance measurement with JMeter 2.3.3</title>
		<link>http://www.marcsturlese.com/2009/05/31/performance-measurement-with-jmeter-233/</link>
		<comments>http://www.marcsturlese.com/2009/05/31/performance-measurement-with-jmeter-233/#comments</comments>
		<pubDate>Sun, 31 May 2009 21:58:13 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[JMeter]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=119</guid>
		<description><![CDATA[Last week was launched a new release of JMeter. JMeter 2.3.3 is a powerful java application designed to do web application functionality testing and performance measurement, allowing you to do powerful server stress tests. I have been doing some practices with it and I really liked the easy way you can set up a test [...]]]></description>
			<content:encoded><![CDATA[<p>Last week was launched a new release of <a title="JMeter" href="http://jakarta.apache.org/jmeter/index.html" target="_blank"><strong>JMeter</strong></a>. <strong><a title="Apache JMeter" href="http://jakarta.apache.org/jmeter/changes.html" target="_blank">JMeter 2.3.3</a></strong> is a powerful java application designed to do web application functionality testing and performance measurement, allowing you to do powerful server stress tests.<br />
I have been doing some practices with it and I really liked the easy way you can set up a test plan and start stressing your machines to check response times when lot&#8217;s of threads are doing requests.</p>
<p>You just need to create a .jmx file wich will contain all the information needed to do the requests. Host name, port number, protocol, method, url path, url variables&#8230; You can actually tell <strong>JMeter</strong> to read the url variables from an external .dat file. It will allow you to give different values to the variables for each request.<br />
The .jmx can be written manually but it&#8217;s much easier to create it via the <strong>JMeter&#8217;s GUI</strong>.</p>
<p>You will have to tell <strong>JMeter</strong> the number of threads that must be executing requests and the number of requests per thread. It allows you to leave the threads making requests indefinitely.<br />
Once a test is launched you can see in real time the number of samples that have been executed and the Deviation, Throughput, Average and Median of the requests done by the threads (think of a thread as a user doing a request via browser).</p>
<p>This is just how to do a basic test plan but the application is really more complete than this and has much more interesting features.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2009/05/31/performance-measurement-with-jmeter-233/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Analyzing java heaps with jmap and jhat</title>
		<link>http://www.marcsturlese.com/2009/05/09/analyzing-java-heaps-with-jmap-and-jhat/</link>
		<comments>http://www.marcsturlese.com/2009/05/09/analyzing-java-heaps-with-jmap-and-jhat/#comments</comments>
		<pubDate>Sat, 09 May 2009 18:46:57 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[jhat]]></category>
		<category><![CDATA[jmap]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=115</guid>
		<description><![CDATA[Jmap and jhat are a couple of tools really useful to analyze the memory consume of a java program. Both are included in the JVM 1.6 so there is no need to install any extra stuff. Jmap allows you to create a dump of the java memory heap at any moment in the life of [...]]]></description>
			<content:encoded><![CDATA[<p><a title="jmap" href="http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jmap.html" target="_blank"><strong>Jmap</strong></a> and <a title="jhat" href="http://java.sun.com/javase/6/docs/technotes/tools/share/jhat.html" target="_blank"><strong>jhat</strong></a> are a couple of tools really useful to analyze the memory consume of a java program. Both are included in the JVM 1.6 so there is no need to install any extra stuff.</p>
<p><strong>Jmap</strong> allows you to create a dump of the java memory heap at any moment in the life of your running application. It will contain all the live objects and classes at that moment. To create the heap dump it&#8217;s as easy as:</p>
<p><strong>jmap -dump:file=my_stack.bin 4365</strong></p>
<p>Where my_stack.bin is the name of the file where you want the dump and 4365 is the pid of the java application process.</p>
<p>If you are running a servlet application under a java server and it ends with a:</p>
<p><strong>java.lang.OutOfMemoryError: Java heap space</strong></p>
<p>You can trigger a dump of the java heap at the OutOfMemory moment specifying these parameters to the server:</p>
<p><strong>-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/sturlese/stack_test/</strong></p>
<p>This will create a .hprof file (named with the pid&#8217;s process) containing the dump in the specified path.<br />
HeapDumpPath param is not compulsory. If we don&#8217;t specify it the dump will be created in the folder where Tomcat launches the webapps.</p>
<p>Now we have the dump of the java heap. To analyze it we will use<strong> jhat</strong>. Once we launch<strong> jhat</strong> specifying the   dump to analyze it will start an HTTP server (in the port 7000 by default) and will let you surf along all the classes and objects. You will be able to check how many instances of each class where alive in the moment the heap was created. To launch<strong> jhat</strong>:</p>
<p><strong>jhat my_stack.bin</strong></p>
<p>It&#8217;s easy to get an OutOfMememory exception when opening the java heap. The dump file can be very memory consuming if you app was in the moment it was taken. If you experience the problem you should give to the JVM as much memory as you can:</p>
<p><strong>jhat -J-mx2000m my_stack.bin</strong></p>
<p>Now is the moment to point your web browser to <strong>http://localhost:7000</strong> and start analyzing the heap!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2009/05/09/analyzing-java-heaps-with-jmap-and-jhat/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>JAD Java Decomplier</title>
		<link>http://www.marcsturlese.com/2009/04/23/jad-java-decomplier/</link>
		<comments>http://www.marcsturlese.com/2009/04/23/jad-java-decomplier/#comments</comments>
		<pubDate>Thu, 23 Apr 2009 22:23:45 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[OS]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Decompiler]]></category>
		<category><![CDATA[JAD]]></category>
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=104</guid>
		<description><![CDATA[Today I needed to check some old java source from wich only I just kept the class files. Find a java decompiler for my Ubuntu was not as easy job as I tought. Couldn&#8217;t find one in the repositories and all what I found in the network was not updated at all. JAD Java Decompiler [...]]]></description>
			<content:encoded><![CDATA[<p>Today I needed to check some old java source from wich only I just kept the class files.<br />
Find a java decompiler for my <strong>Ubuntu</strong> was not as easy job as I tought. Couldn&#8217;t find one in the repositories and all what I found in the network was not updated at all.<br />
<strong>JAD Java Decompiler</strong> is definitely not new stuff but it is really easy to use and did pretty good job for me. The problem was that almost all links guided me to http://www.kpdus.com but the software is not available in there anymore.<br />
In the end I <a title="JAD Java Decompiler" rel="nofollow" href="http://www.varaneckas.com/jad" target="_blank">found it</a> not just for <strong>Ubuntu</strong> but for other platforms aswell.<br />
I leave <a title="JAD Java Decompiler" rel="nofollow" href="http://www.marcsturlese.com/wp-content/misc/jad158e.linux.static.zip" target="_blank">here</a><strong> </strong>the <strong>JAD</strong> version for <strong>Ubuntu</strong> (and other linux distributions) that worked for me.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2009/04/23/jad-java-decomplier/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Lucene TrieRangeQuery</title>
		<link>http://www.marcsturlese.com/2009/04/08/lucene-trierangequery/</link>
		<comments>http://www.marcsturlese.com/2009/04/08/lucene-trierangequery/#comments</comments>
		<pubDate>Wed, 08 Apr 2009 21:50:43 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[ApacheCon]]></category>
		<category><![CDATA[TrieRangeQuery]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=97</guid>
		<description><![CDATA[Lucene TrieRangeQuery is a cool contrib in Lucene (think not yet in the official release) created by Uwe  Schindler. I had heard about it before but learned about it in the LuceneMeetUp in ApacheCon EU. Uwe gave a great speach about it. As I found it a really useful feature will try to explain the [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Lucene TrieRangeQuery</strong> is a cool contrib in <strong>Lucene</strong> (think not yet in the official release) created by Uwe  Schindler. I had heard about it before but learned about it in the LuceneMeetUp in ApacheCon EU. Uwe gave a great speach about it. As I found it a really useful feature will try to explain the basics.</p>
<p><strong>TrieRangeQuery</strong> mainly sort out some RangeQuery problems:</p>
<ul>
<li>Tipical RangeQuery can end in <strong>TooManyClausesException</strong> if our ranges are so large.</li>
</ul>
<ul>
<li>Tipical RangeQuery or even ConstantScoreRangeQuery are slow if have to classify using large ranges or the index is huge.</li>
</ul>
<p>To explain it in an easy way, what <strong>TrieRangeQuery </strong>do is to search the data values skipping the less relevant &#8220;digits&#8221; in function of a precision parameter.</p>
<p>Let&#8217;s say for example we need to classify thousands of numbers of 6 figures. This could be a slow process using ConstantScoreRangeQuery in a huge index, not with<strong> TrieRangeQuery</strong>. Ranges will be divided recurively in function of  a precision parámeter (set at index time). Numbers from the middle of the range will be classified using the minimum precision value while numbers from extrems will use a higher precision. This will make the query run extremely much faster.</p>
<p>Depending on the level of presicionStep parameter given at index time we will be able to search with more or less precision.  The more precision marging we choose the more the lucene document will occuppy. It is due to we will have to index the field more times with the different precisions.</p>
<p>We need to index data in a special way to be able to search it using <strong>Lucene TrieRangeQuery</strong>. We must index our fields using <strong>TrieUtils</strong>. We can index numbers directly. It supports java signed int, long, float, double. There&#8217;s no loss of precision for doubles or floats. There&#8217;s no round for their creation, instead a long/int representation is used for cents.<br />
Indexing numbers with <strong>TrieUtils</strong> will make us forget about maual padding.<br />
We can index Dates aswell (from java timestamps data type).</p>
<p>As seen, <strong>Lucene TrieRangeQuery</strong> is totally a step forward for <strong>Lucene</strong> queries <strong>scalability</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2009/04/08/lucene-trierangequery/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

