<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for Marc Sturlese</title>
	<atom:link href="http://www.marcsturlese.com/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.marcsturlese.com</link>
	<description>Life, code and stuff</description>
	<lastBuildDate>Fri, 02 Sep 2011 13:27:32 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>Comment on Solr and Hadoop integration against scalability problems by Marc Sturlese</title>
		<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/comment-page-1/#comment-429</link>
		<dc:creator>Marc Sturlese</dc:creator>
		<pubDate>Fri, 02 Sep 2011 13:27:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=62#comment-429</guid>
		<description>Well Katta+Hadoop is more about deploying indexs. Solr+Hadoop is for index generation. There are patches to integrate Solr index deployment with Katta (https://issues.apache.org/jira/browse/SOLR-1395) as well as there are patches to build Solr indexes using hadoop (https://issues.apache.org/jira/browse/SOLR-1301)
I&#039;m really interested in the Lucene/Solr search implementation on HBase:
https://issues.apache.org/jira/browse/HBASE-3529</description>
		<content:encoded><![CDATA[<p>Well Katta+Hadoop is more about deploying indexs. Solr+Hadoop is for index generation. There are patches to integrate Solr index deployment with Katta (<a href="https://issues.apache.org/jira/browse/SOLR-1395" rel="nofollow">https://issues.apache.org/jira/browse/SOLR-1395</a>) as well as there are patches to build Solr indexes using hadoop (<a href="https://issues.apache.org/jira/browse/SOLR-1301" rel="nofollow">https://issues.apache.org/jira/browse/SOLR-1301</a>)<br />
I&#8217;m really interested in the Lucene/Solr search implementation on HBase:<br />
<a href="https://issues.apache.org/jira/browse/HBASE-3529" rel="nofollow">https://issues.apache.org/jira/browse/HBASE-3529</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Using LZO Compression with Hadoop and Snow Leopard by Edgar Henigan</title>
		<link>http://www.marcsturlese.com/2011/03/12/using-lzo-compression-with-hadoop-and-snow-leopard/comment-page-1/#comment-428</link>
		<dc:creator>Edgar Henigan</dc:creator>
		<pubDate>Tue, 30 Aug 2011 19:48:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=199#comment-428</guid>
		<description>I wish more people would write sites like this that are actually interesting to read.  With all the fluff floating around on the internet, it is a great change of pace to read a site like yours instead.</description>
		<content:encoded><![CDATA[<p>I wish more people would write sites like this that are actually interesting to read.  With all the fluff floating around on the internet, it is a great change of pace to read a site like yours instead.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Solr and Hadoop integration against scalability problems by sinoantony</title>
		<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/comment-page-1/#comment-415</link>
		<dc:creator>sinoantony</dc:creator>
		<pubDate>Tue, 05 Jul 2011 08:55:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=62#comment-415</guid>
		<description>Are there any comparison available between Katta+Hadoop and Solr+Hadoop for big data search solution?  .. Thanks in advance for any advice..</description>
		<content:encoded><![CDATA[<p>Are there any comparison available between Katta+Hadoop and Solr+Hadoop for big data search solution?  .. Thanks in advance for any advice..</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Lucene FieldCache.StringIndex and multiValued fields by Marc Sturlese</title>
		<link>http://www.marcsturlese.com/2010/06/27/lucene-fieldcache-stringindex-and-multivalued-fields/comment-page-1/#comment-386</link>
		<dc:creator>Marc Sturlese</dc:creator>
		<pubDate>Thu, 06 Jan 2011 11:02:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=169#comment-386</guid>
		<description>Hey Amit,
I&#039;m not sure, but maybe since Solr 1.4 you can do something similar using the UnInvertedField class, used to facet on multiValued fields.
http://bit.ly/ig3aFC</description>
		<content:encoded><![CDATA[<p>Hey Amit,<br />
I&#8217;m not sure, but maybe since Solr 1.4 you can do something similar using the UnInvertedField class, used to facet on multiValued fields.<br />
<a href="http://bit.ly/ig3aFC" rel="nofollow">http://bit.ly/ig3aFC</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Lucene FieldCache.StringIndex and multiValued fields by Amit Nithian</title>
		<link>http://www.marcsturlese.com/2010/06/27/lucene-fieldcache-stringindex-and-multivalued-fields/comment-page-1/#comment-385</link>
		<dc:creator>Amit Nithian</dc:creator>
		<pubDate>Thu, 06 Jan 2011 05:45:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=169#comment-385</guid>
		<description>This is a good post. I have run into a similar issue before where getting all multiple values for a multi-valued field was difficult. I ended up extending the FieldCacheImpl of Solr 1.2 and had this working.. of course the API changed and so porting this to 1.5 is a bit harder but am working on it. I have some applications where I need access to all the values to make some decisions so this feature is necessary.</description>
		<content:encoded><![CDATA[<p>This is a good post. I have run into a similar issue before where getting all multiple values for a multi-valued field was difficult. I ended up extending the FieldCacheImpl of Solr 1.2 and had this working.. of course the API changed and so porting this to 1.5 is a bit harder but am working on it. I have some applications where I need access to all the values to make some decisions so this feature is necessary.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Index scalability using Pig by Marc Sturlese</title>
		<link>http://www.marcsturlese.com/2009/03/02/index-scalability-using-pig/comment-page-1/#comment-373</link>
		<dc:creator>Marc Sturlese</dc:creator>
		<pubDate>Wed, 21 Jul 2010 11:25:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=70#comment-373</guid>
		<description>Hi Pablo,
I just tested the example in the post I pointed at and it worked great. But in the end I thought that trying to build a Lucene index using Pig wasn&#039;t worth it. I mean, I thing it&#039;s much more easy if you write your own MapReduce job to do that.
There are some examples out there.
I succeeded building Lucene index retrieving data from HBase using MapReduce jobs.</description>
		<content:encoded><![CDATA[<p>Hi Pablo,<br />
I just tested the example in the post I pointed at and it worked great. But in the end I thought that trying to build a Lucene index using Pig wasn&#8217;t worth it. I mean, I thing it&#8217;s much more easy if you write your own MapReduce job to do that.<br />
There are some examples out there.<br />
I succeeded building Lucene index retrieving data from HBase using MapReduce jobs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Index scalability using Pig by Pablo Mendes</title>
		<link>http://www.marcsturlese.com/2009/03/02/index-scalability-using-pig/comment-page-1/#comment-372</link>
		<dc:creator>Pablo Mendes</dc:creator>
		<pubDate>Mon, 19 Jul 2010 09:47:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=70#comment-372</guid>
		<description>Hi Marc,
Did you give this a shot? I&#039;m in a research setting working with tons of data. This means that I try many different indexing/weighting strategies that take a long time every time. I&#039;m already using Pig to precompute some stats for me, so I thought: why not have it build the index already. :)
The main advantage for me is that pig gives me a quick way to manipulate my input data before I give it to the index, all of that over a cluster.
So I thought that somebody must have thought of this before me. Did you get anywhere with your idea, or did you drop it for some reason?
Cheers,
Pablo</description>
		<content:encoded><![CDATA[<p>Hi Marc,<br />
Did you give this a shot? I&#8217;m in a research setting working with tons of data. This means that I try many different indexing/weighting strategies that take a long time every time. I&#8217;m already using Pig to precompute some stats for me, so I thought: why not have it build the index already. <img src='http://www.marcsturlese.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
The main advantage for me is that pig gives me a quick way to manipulate my input data before I give it to the index, all of that over a cluster.<br />
So I thought that somebody must have thought of this before me. Did you get anywhere with your idea, or did you drop it for some reason?<br />
Cheers,<br />
Pablo</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Solr and Hadoop integration against scalability problems by Marc Sturlese</title>
		<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/comment-page-1/#comment-370</link>
		<dc:creator>Marc Sturlese</dc:creator>
		<pubDate>Sun, 11 Jul 2010 21:00:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=62#comment-370</guid>
		<description>Hi Yun,
As you mentioned, HDFS is not good place to build the Lucene/Solr index as it requires lots of operation.so. If you look for open source examples of MapReduce indexing you&#039;ll see that when building an index, it&#039;s built on the local file system and then uploaded to HDFS. When all the process is done, index/shards have to be downloaded to local file system again.
MapReduce and HDFS are very good to build huge index in a fast way. But if you need real time updates I think it&#039;s better to execute the updates straight to the index/shards placed on the local file system.</description>
		<content:encoded><![CDATA[<p>Hi Yun,<br />
As you mentioned, HDFS is not good place to build the Lucene/Solr index as it requires lots of operation.so. If you look for open source examples of MapReduce indexing you&#8217;ll see that when building an index, it&#8217;s built on the local file system and then uploaded to HDFS. When all the process is done, index/shards have to be downloaded to local file system again.<br />
MapReduce and HDFS are very good to build huge index in a fast way. But if you need real time updates I think it&#8217;s better to execute the updates straight to the index/shards placed on the local file system.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Solr and Hadoop integration against scalability problems by yun chen</title>
		<link>http://www.marcsturlese.com/2009/02/06/solr-and-hadoop-integration-against-scalability-problems/comment-page-1/#comment-368</link>
		<dc:creator>yun chen</dc:creator>
		<pubDate>Thu, 08 Jul 2010 07:23:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=62#comment-368</guid>
		<description>Thanks for your post.
I just have a question:
    as i know HDFS is not suitable for frequent operation.so how do you solve the update of your index for searching in situation of real time.
Expect your answer!</description>
		<content:encoded><![CDATA[<p>Thanks for your post.<br />
I just have a question:<br />
    as i know HDFS is not suitable for frequent operation.so how do you solve the update of your index for searching in situation of real time.<br />
Expect your answer!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Lucene 2.4.1 available from today by Marc Sturlese</title>
		<link>http://www.marcsturlese.com/2009/03/09/lucene-241-available-from-today/comment-page-1/#comment-365</link>
		<dc:creator>Marc Sturlese</dc:creator>
		<pubDate>Mon, 14 Jun 2010 22:09:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=76#comment-365</guid>
		<description>Hey Rajani,
I supose you are working with Solr and not directly with Lucene. For a correct usage of Solr sharding I would recomend you keep the same schema for all the shards. 
There isn&#039;t a way to merge fields from different documents in a single doc on the Solr distributed search merging process.
I think you should consider reorganizing the data in your index. Shards are meant to scale data horizontally.</description>
		<content:encoded><![CDATA[<p>Hey Rajani,<br />
I supose you are working with Solr and not directly with Lucene. For a correct usage of Solr sharding I would recomend you keep the same schema for all the shards.<br />
There isn&#8217;t a way to merge fields from different documents in a single doc on the Solr distributed search merging process.<br />
I think you should consider reorganizing the data in your index. Shards are meant to scale data horizontally.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

