<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Solr and Hadoop integration against scalability problems</title>
	<atom:link href="http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/</link>
	<description>Life, code and stuff</description>
	<lastBuildDate>Sat, 27 Feb 2010 18:59:32 +0100</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Marc Sturlese</title>
		<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/comment-page-1/#comment-228</link>
		<dc:creator>Marc Sturlese</dc:creator>
		<pubDate>Sat, 27 Feb 2010 18:59:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=62#comment-228</guid>
		<description>Thankfully I have learned a bit more about Hadoop and Solr since I writed the post.
What you say makes sense. However, I think there is no need to have all the set of index in all nodes.
I imagine the scenario as:
An index shard is built in each datanode and is stored in HDFS. This datanode also contains a Solr instance in the local file system. The shard has to be copied to the local file system, this way Solr can serach into it.
Having a Solr instance in another server you could search across all the shards in the different datanodes using Solr distributed search.
What do you think?</description>
		<content:encoded><![CDATA[<p>Thankfully I have learned a bit more about Hadoop and Solr since I writed the post.<br />
What you say makes sense. However, I think there is no need to have all the set of index in all nodes.<br />
I imagine the scenario as:<br />
An index shard is built in each datanode and is stored in HDFS. This datanode also contains a Solr instance in the local file system. The shard has to be copied to the local file system, this way Solr can serach into it.<br />
Having a Solr instance in another server you could search across all the shards in the different datanodes using Solr distributed search.<br />
What do you think?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nick Adelman</title>
		<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/comment-page-1/#comment-210</link>
		<dc:creator>Nick Adelman</dc:creator>
		<pubDate>Fri, 26 Feb 2010 03:59:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=62#comment-210</guid>
		<description>I know this is an old post as of today, but I just wanted to add a comment regarding the tomcat instance. Reading the article, and having just done some POC work with Hadoop, I think that the tomcat instances are simply running on the same machines that are nodes in the Hadoop cluster. I don&#039;t know the details, but I think this would allow each Solr instance to leverage data locality in the cluster by working with the index(s) that exist on that particular node. If replication was configured such that all nodes in the hadoop cluster received a copy of every index, then all of the solr instances would be leveraging the same set of indexes. However, I&#039;m sure it is much more complicated than that, especially since it is not typical to have data replicated to all nodes in a hadoop cluster of any significant size.</description>
		<content:encoded><![CDATA[<p>I know this is an old post as of today, but I just wanted to add a comment regarding the tomcat instance. Reading the article, and having just done some POC work with Hadoop, I think that the tomcat instances are simply running on the same machines that are nodes in the Hadoop cluster. I don&#8217;t know the details, but I think this would allow each Solr instance to leverage data locality in the cluster by working with the index(s) that exist on that particular node. If replication was configured such that all nodes in the hadoop cluster received a copy of every index, then all of the solr instances would be leveraging the same set of indexes. However, I&#8217;m sure it is much more complicated than that, especially since it is not typical to have data replicated to all nodes in a hadoop cluster of any significant size.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
