<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for Marc Sturlese</title>
	<atom:link href="http://www.marcsturlese.com/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.marcsturlese.com</link>
	<description>Life, code and stuff</description>
	<lastBuildDate>Wed, 21 Jul 2010 11:25:52 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>Comment on Index scalability using Pig by Marc Sturlese</title>
		<link>http://www.marcsturlese.com/2009/03/02/index-scalability-using-pig/comment-page-1/#comment-373</link>
		<dc:creator>Marc Sturlese</dc:creator>
		<pubDate>Wed, 21 Jul 2010 11:25:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=70#comment-373</guid>
		<description>Hi Pablo,
I just tested the example in the post I pointed at and it worked great. But in the end I thought that trying to build a Lucene index using Pig wasn&#039;t worth it. I mean, I thing it&#039;s much more easy if you write your own MapReduce job to do that.
There are some examples out there.
I succeeded building Lucene index retrieving data from HBase using MapReduce jobs.</description>
		<content:encoded><![CDATA[<p>Hi Pablo,<br />
I just tested the example in the post I pointed at and it worked great. But in the end I thought that trying to build a Lucene index using Pig wasn&#8217;t worth it. I mean, I thing it&#8217;s much more easy if you write your own MapReduce job to do that.<br />
There are some examples out there.<br />
I succeeded building Lucene index retrieving data from HBase using MapReduce jobs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Index scalability using Pig by Pablo Mendes</title>
		<link>http://www.marcsturlese.com/2009/03/02/index-scalability-using-pig/comment-page-1/#comment-372</link>
		<dc:creator>Pablo Mendes</dc:creator>
		<pubDate>Mon, 19 Jul 2010 09:47:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=70#comment-372</guid>
		<description>Hi Marc,
Did you give this a shot? I&#039;m in a research setting working with tons of data. This means that I try many different indexing/weighting strategies that take a long time every time. I&#039;m already using Pig to precompute some stats for me, so I thought: why not have it build the index already. :)
The main advantage for me is that pig gives me a quick way to manipulate my input data before I give it to the index, all of that over a cluster.
So I thought that somebody must have thought of this before me. Did you get anywhere with your idea, or did you drop it for some reason?
Cheers,
Pablo</description>
		<content:encoded><![CDATA[<p>Hi Marc,<br />
Did you give this a shot? I&#8217;m in a research setting working with tons of data. This means that I try many different indexing/weighting strategies that take a long time every time. I&#8217;m already using Pig to precompute some stats for me, so I thought: why not have it build the index already. <img src='http://www.marcsturlese.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
The main advantage for me is that pig gives me a quick way to manipulate my input data before I give it to the index, all of that over a cluster.<br />
So I thought that somebody must have thought of this before me. Did you get anywhere with your idea, or did you drop it for some reason?<br />
Cheers,<br />
Pablo</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Solr and Hadoop integration against scalability problems by Marc Sturlese</title>
		<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/comment-page-1/#comment-370</link>
		<dc:creator>Marc Sturlese</dc:creator>
		<pubDate>Sun, 11 Jul 2010 21:00:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=62#comment-370</guid>
		<description>Hi Yun,
As you mentioned, HDFS is not good place to build the Lucene/Solr index as it requires lots of operation.so. If you look for open source examples of MapReduce indexing you&#039;ll see that when building an index, it&#039;s built on the local file system and then uploaded to HDFS. When all the process is done, index/shards have to be downloaded to local file system again.
MapReduce and HDFS are very good to build huge index in a fast way. But if you need real time updates I think it&#039;s better to execute the updates straight to the index/shards placed on the local file system.</description>
		<content:encoded><![CDATA[<p>Hi Yun,<br />
As you mentioned, HDFS is not good place to build the Lucene/Solr index as it requires lots of operation.so. If you look for open source examples of MapReduce indexing you&#8217;ll see that when building an index, it&#8217;s built on the local file system and then uploaded to HDFS. When all the process is done, index/shards have to be downloaded to local file system again.<br />
MapReduce and HDFS are very good to build huge index in a fast way. But if you need real time updates I think it&#8217;s better to execute the updates straight to the index/shards placed on the local file system.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Solr and Hadoop integration against scalability problems by yun chen</title>
		<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/comment-page-1/#comment-368</link>
		<dc:creator>yun chen</dc:creator>
		<pubDate>Thu, 08 Jul 2010 07:23:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=62#comment-368</guid>
		<description>Thanks for your post.
I just have a question:
    as i know HDFS is not suitable for frequent operation.so how do you solve the update of your index for searching in situation of real time.
Expect your answer!</description>
		<content:encoded><![CDATA[<p>Thanks for your post.<br />
I just have a question:<br />
    as i know HDFS is not suitable for frequent operation.so how do you solve the update of your index for searching in situation of real time.<br />
Expect your answer!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Lucene 2.4.1 available from today by Marc Sturlese</title>
		<link>http://www.marcsturlese.com/2009/03/09/lucene-241-available-from-today/comment-page-1/#comment-365</link>
		<dc:creator>Marc Sturlese</dc:creator>
		<pubDate>Mon, 14 Jun 2010 22:09:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=76#comment-365</guid>
		<description>Hey Rajani,
I supose you are working with Solr and not directly with Lucene. For a correct usage of Solr sharding I would recomend you keep the same schema for all the shards. 
There isn&#039;t a way to merge fields from different documents in a single doc on the Solr distributed search merging process.
I think you should consider reorganizing the data in your index. Shards are meant to scale data horizontally.</description>
		<content:encoded><![CDATA[<p>Hey Rajani,<br />
I supose you are working with Solr and not directly with Lucene. For a correct usage of Solr sharding I would recomend you keep the same schema for all the shards.<br />
There isn&#8217;t a way to merge fields from different documents in a single doc on the Solr distributed search merging process.<br />
I think you should consider reorganizing the data in your index. Shards are meant to scale data horizontally.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Lucene 2.4.1 available from today by Rajani</title>
		<link>http://www.marcsturlese.com/2009/03/09/lucene-241-available-from-today/comment-page-1/#comment-364</link>
		<dc:creator>Rajani</dc:creator>
		<pubDate>Mon, 14 Jun 2010 07:32:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=76#comment-364</guid>
		<description>Hello Marc..
 I want know a way where i can merge attributes across cores into single DOC not just merging cores; it will not merge attributes in same single doc..

I split my data in different cores wid different schemas in hierarchy..
Now i want to perform search across all cores that i am doing using Shards concepts..
But i have a problem that is regarding Logical operator..
I cannot perform AND operation across cores..
The result will fail..(as data in cores different)
Is there any way of dynamic merging of attributes..So that i can fix this logical operator problem..</description>
		<content:encoded><![CDATA[<p>Hello Marc..<br />
 I want know a way where i can merge attributes across cores into single DOC not just merging cores; it will not merge attributes in same single doc..</p>
<p>I split my data in different cores wid different schemas in hierarchy..<br />
Now i want to perform search across all cores that i am doing using Shards concepts..<br />
But i have a problem that is regarding Logical operator..<br />
I cannot perform AND operation across cores..<br />
The result will fail..(as data in cores different)<br />
Is there any way of dynamic merging of attributes..So that i can fix this logical operator problem..</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Solr and Hadoop integration against scalability problems by Marc Sturlese</title>
		<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/comment-page-1/#comment-228</link>
		<dc:creator>Marc Sturlese</dc:creator>
		<pubDate>Sat, 27 Feb 2010 18:59:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=62#comment-228</guid>
		<description>Thankfully I have learned a bit more about Hadoop and Solr since I writed the post.
What you say makes sense. However, I think there is no need to have all the set of index in all nodes.
I imagine the scenario as:
An index shard is built in each datanode and is stored in HDFS. This datanode also contains a Solr instance in the local file system. The shard has to be copied to the local file system, this way Solr can serach into it.
Having a Solr instance in another server you could search across all the shards in the different datanodes using Solr distributed search.
What do you think?</description>
		<content:encoded><![CDATA[<p>Thankfully I have learned a bit more about Hadoop and Solr since I writed the post.<br />
What you say makes sense. However, I think there is no need to have all the set of index in all nodes.<br />
I imagine the scenario as:<br />
An index shard is built in each datanode and is stored in HDFS. This datanode also contains a Solr instance in the local file system. The shard has to be copied to the local file system, this way Solr can serach into it.<br />
Having a Solr instance in another server you could search across all the shards in the different datanodes using Solr distributed search.<br />
What do you think?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Solr and Hadoop integration against scalability problems by Nick Adelman</title>
		<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/comment-page-1/#comment-210</link>
		<dc:creator>Nick Adelman</dc:creator>
		<pubDate>Fri, 26 Feb 2010 03:59:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=62#comment-210</guid>
		<description>I know this is an old post as of today, but I just wanted to add a comment regarding the tomcat instance. Reading the article, and having just done some POC work with Hadoop, I think that the tomcat instances are simply running on the same machines that are nodes in the Hadoop cluster. I don&#039;t know the details, but I think this would allow each Solr instance to leverage data locality in the cluster by working with the index(s) that exist on that particular node. If replication was configured such that all nodes in the hadoop cluster received a copy of every index, then all of the solr instances would be leveraging the same set of indexes. However, I&#039;m sure it is much more complicated than that, especially since it is not typical to have data replicated to all nodes in a hadoop cluster of any significant size.</description>
		<content:encoded><![CDATA[<p>I know this is an old post as of today, but I just wanted to add a comment regarding the tomcat instance. Reading the article, and having just done some POC work with Hadoop, I think that the tomcat instances are simply running on the same machines that are nodes in the Hadoop cluster. I don&#8217;t know the details, but I think this would allow each Solr instance to leverage data locality in the cluster by working with the index(s) that exist on that particular node. If replication was configured such that all nodes in the hadoop cluster received a copy of every index, then all of the solr instances would be leveraging the same set of indexes. However, I&#8217;m sure it is much more complicated than that, especially since it is not typical to have data replicated to all nodes in a hadoop cluster of any significant size.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on JAD Java Decomplier by ivmai</title>
		<link>http://www.marcsturlese.com/2009/04/23/jad-java-decomplier/comment-page-1/#comment-30</link>
		<dc:creator>ivmai</dc:creator>
		<pubDate>Tue, 15 Dec 2009 13:51:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=104#comment-30</guid>
		<description>Unfortunately, Jad hasn&#039;t been upgraded for a long time (the latest release is jad1.5.8g) while the java compilers continue to evolve... To achieve better results (including for Java 1.5+ classes) in decompilation, process the classes with JadRetro (jadretro.sf.net) tool before decompiling them by Jad.</description>
		<content:encoded><![CDATA[<p>Unfortunately, Jad hasn&#8217;t been upgraded for a long time (the latest release is jad1.5.8g) while the java compilers continue to evolve&#8230; To achieve better results (including for Java 1.5+ classes) in decompilation, process the classes with JadRetro (jadretro.sf.net) tool before decompiling them by Jad.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Analyzing java heaps with jmap and jhat by andre parodi</title>
		<link>http://www.marcsturlese.com/2009/05/09/analyzing-java-heaps-with-jmap-and-jhat/comment-page-1/#comment-29</link>
		<dc:creator>andre parodi</dc:creator>
		<pubDate>Thu, 10 Dec 2009 11:46:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=115#comment-29</guid>
		<description>if you are getting OOM when using jhat try eclipse mat.</description>
		<content:encoded><![CDATA[<p>if you are getting OOM when using jhat try eclipse mat.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
