<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Solr and Hadoop integration against scalability problems</title>
	<atom:link href="http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/</link>
	<description>Life, code and stuff</description>
	<lastBuildDate>Fri, 02 Sep 2011 13:27:32 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: Marc Sturlese</title>
		<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/comment-page-1/#comment-429</link>
		<dc:creator>Marc Sturlese</dc:creator>
		<pubDate>Fri, 02 Sep 2011 13:27:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=62#comment-429</guid>
		<description>Well Katta+Hadoop is more about deploying indexs. Solr+Hadoop is for index generation. There are patches to integrate Solr index deployment with Katta (https://issues.apache.org/jira/browse/SOLR-1395) as well as there are patches to build Solr indexes using hadoop (https://issues.apache.org/jira/browse/SOLR-1301)
I&#039;m really interested in the Lucene/Solr search implementation on HBase:
https://issues.apache.org/jira/browse/HBASE-3529</description>
		<content:encoded><![CDATA[<p>Well Katta+Hadoop is more about deploying indexs. Solr+Hadoop is for index generation. There are patches to integrate Solr index deployment with Katta (<a href="https://issues.apache.org/jira/browse/SOLR-1395" rel="nofollow">https://issues.apache.org/jira/browse/SOLR-1395</a>) as well as there are patches to build Solr indexes using hadoop (<a href="https://issues.apache.org/jira/browse/SOLR-1301" rel="nofollow">https://issues.apache.org/jira/browse/SOLR-1301</a>)<br />
I&#8217;m really interested in the Lucene/Solr search implementation on HBase:<br />
<a href="https://issues.apache.org/jira/browse/HBASE-3529" rel="nofollow">https://issues.apache.org/jira/browse/HBASE-3529</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sinoantony</title>
		<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/comment-page-1/#comment-415</link>
		<dc:creator>sinoantony</dc:creator>
		<pubDate>Tue, 05 Jul 2011 08:55:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=62#comment-415</guid>
		<description>Are there any comparison available between Katta+Hadoop and Solr+Hadoop for big data search solution?  .. Thanks in advance for any advice..</description>
		<content:encoded><![CDATA[<p>Are there any comparison available between Katta+Hadoop and Solr+Hadoop for big data search solution?  .. Thanks in advance for any advice..</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Marc Sturlese</title>
		<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/comment-page-1/#comment-370</link>
		<dc:creator>Marc Sturlese</dc:creator>
		<pubDate>Sun, 11 Jul 2010 21:00:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=62#comment-370</guid>
		<description>Hi Yun,
As you mentioned, HDFS is not good place to build the Lucene/Solr index as it requires lots of operation.so. If you look for open source examples of MapReduce indexing you&#039;ll see that when building an index, it&#039;s built on the local file system and then uploaded to HDFS. When all the process is done, index/shards have to be downloaded to local file system again.
MapReduce and HDFS are very good to build huge index in a fast way. But if you need real time updates I think it&#039;s better to execute the updates straight to the index/shards placed on the local file system.</description>
		<content:encoded><![CDATA[<p>Hi Yun,<br />
As you mentioned, HDFS is not good place to build the Lucene/Solr index as it requires lots of operation.so. If you look for open source examples of MapReduce indexing you&#8217;ll see that when building an index, it&#8217;s built on the local file system and then uploaded to HDFS. When all the process is done, index/shards have to be downloaded to local file system again.<br />
MapReduce and HDFS are very good to build huge index in a fast way. But if you need real time updates I think it&#8217;s better to execute the updates straight to the index/shards placed on the local file system.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: yun chen</title>
		<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/comment-page-1/#comment-368</link>
		<dc:creator>yun chen</dc:creator>
		<pubDate>Thu, 08 Jul 2010 07:23:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=62#comment-368</guid>
		<description>Thanks for your post.
I just have a question:
    as i know HDFS is not suitable for frequent operation.so how do you solve the update of your index for searching in situation of real time.
Expect your answer!</description>
		<content:encoded><![CDATA[<p>Thanks for your post.<br />
I just have a question:<br />
    as i know HDFS is not suitable for frequent operation.so how do you solve the update of your index for searching in situation of real time.<br />
Expect your answer!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Marc Sturlese</title>
		<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/comment-page-1/#comment-228</link>
		<dc:creator>Marc Sturlese</dc:creator>
		<pubDate>Sat, 27 Feb 2010 18:59:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=62#comment-228</guid>
		<description>Thankfully I have learned a bit more about Hadoop and Solr since I writed the post.
What you say makes sense. However, I think there is no need to have all the set of index in all nodes.
I imagine the scenario as:
An index shard is built in each datanode and is stored in HDFS. This datanode also contains a Solr instance in the local file system. The shard has to be copied to the local file system, this way Solr can serach into it.
Having a Solr instance in another server you could search across all the shards in the different datanodes using Solr distributed search.
What do you think?</description>
		<content:encoded><![CDATA[<p>Thankfully I have learned a bit more about Hadoop and Solr since I writed the post.<br />
What you say makes sense. However, I think there is no need to have all the set of index in all nodes.<br />
I imagine the scenario as:<br />
An index shard is built in each datanode and is stored in HDFS. This datanode also contains a Solr instance in the local file system. The shard has to be copied to the local file system, this way Solr can serach into it.<br />
Having a Solr instance in another server you could search across all the shards in the different datanodes using Solr distributed search.<br />
What do you think?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nick Adelman</title>
		<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/comment-page-1/#comment-210</link>
		<dc:creator>Nick Adelman</dc:creator>
		<pubDate>Fri, 26 Feb 2010 03:59:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=62#comment-210</guid>
		<description>I know this is an old post as of today, but I just wanted to add a comment regarding the tomcat instance. Reading the article, and having just done some POC work with Hadoop, I think that the tomcat instances are simply running on the same machines that are nodes in the Hadoop cluster. I don&#039;t know the details, but I think this would allow each Solr instance to leverage data locality in the cluster by working with the index(s) that exist on that particular node. If replication was configured such that all nodes in the hadoop cluster received a copy of every index, then all of the solr instances would be leveraging the same set of indexes. However, I&#039;m sure it is much more complicated than that, especially since it is not typical to have data replicated to all nodes in a hadoop cluster of any significant size.</description>
		<content:encoded><![CDATA[<p>I know this is an old post as of today, but I just wanted to add a comment regarding the tomcat instance. Reading the article, and having just done some POC work with Hadoop, I think that the tomcat instances are simply running on the same machines that are nodes in the Hadoop cluster. I don&#8217;t know the details, but I think this would allow each Solr instance to leverage data locality in the cluster by working with the index(s) that exist on that particular node. If replication was configured such that all nodes in the hadoop cluster received a copy of every index, then all of the solr instances would be leveraging the same set of indexes. However, I&#8217;m sure it is much more complicated than that, especially since it is not typical to have data replicated to all nodes in a hadoop cluster of any significant size.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

