<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Index scalability using Pig</title>
	<atom:link href="http://www.marcsturlese.com/2009/03/02/index-scalability-using-pig/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.marcsturlese.com/2009/03/02/index-scalability-using-pig/</link>
	<description>Life, code and stuff</description>
	<lastBuildDate>Wed, 21 Jul 2010 11:25:52 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: Marc Sturlese</title>
		<link>http://www.marcsturlese.com/2009/03/02/index-scalability-using-pig/comment-page-1/#comment-373</link>
		<dc:creator>Marc Sturlese</dc:creator>
		<pubDate>Wed, 21 Jul 2010 11:25:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=70#comment-373</guid>
		<description>Hi Pablo,
I just tested the example in the post I pointed at and it worked great. But in the end I thought that trying to build a Lucene index using Pig wasn&#039;t worth it. I mean, I thing it&#039;s much more easy if you write your own MapReduce job to do that.
There are some examples out there.
I succeeded building Lucene index retrieving data from HBase using MapReduce jobs.</description>
		<content:encoded><![CDATA[<p>Hi Pablo,<br />
I just tested the example in the post I pointed at and it worked great. But in the end I thought that trying to build a Lucene index using Pig wasn&#8217;t worth it. I mean, I thing it&#8217;s much more easy if you write your own MapReduce job to do that.<br />
There are some examples out there.<br />
I succeeded building Lucene index retrieving data from HBase using MapReduce jobs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pablo Mendes</title>
		<link>http://www.marcsturlese.com/2009/03/02/index-scalability-using-pig/comment-page-1/#comment-372</link>
		<dc:creator>Pablo Mendes</dc:creator>
		<pubDate>Mon, 19 Jul 2010 09:47:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.marcsturlese.com/?p=70#comment-372</guid>
		<description>Hi Marc,
Did you give this a shot? I&#039;m in a research setting working with tons of data. This means that I try many different indexing/weighting strategies that take a long time every time. I&#039;m already using Pig to precompute some stats for me, so I thought: why not have it build the index already. :)
The main advantage for me is that pig gives me a quick way to manipulate my input data before I give it to the index, all of that over a cluster.
So I thought that somebody must have thought of this before me. Did you get anywhere with your idea, or did you drop it for some reason?
Cheers,
Pablo</description>
		<content:encoded><![CDATA[<p>Hi Marc,<br />
Did you give this a shot? I&#8217;m in a research setting working with tons of data. This means that I try many different indexing/weighting strategies that take a long time every time. I&#8217;m already using Pig to precompute some stats for me, so I thought: why not have it build the index already. <img src='http://www.marcsturlese.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
The main advantage for me is that pig gives me a quick way to manipulate my input data before I give it to the index, all of that over a cluster.<br />
So I thought that somebody must have thought of this before me. Did you get anywhere with your idea, or did you drop it for some reason?<br />
Cheers,<br />
Pablo</p>
]]></content:encoded>
	</item>
</channel>
</rss>
