<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Marc Sturlese &#187; Pig</title>
	<atom:link href="http://www.marcsturlese.com/tag/pig/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.marcsturlese.com</link>
	<description>Life, code and stuff</description>
	<lastBuildDate>Tue, 29 Nov 2011 22:19:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>ApacheCon Europe 2009</title>
		<link>http://www.marcsturlese.com/2009/04/01/apachecon-europe-2009/</link>
		<comments>http://www.marcsturlese.com/2009/04/01/apachecon-europe-2009/#comments</comments>
		<pubDate>Wed, 01 Apr 2009 21:55:54 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[ApacheCon]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Pig]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=80</guid>
		<description><![CDATA[Last week I had the chance to go to the ApacheCon Europe 2009. The event took place in Mövenpick Hotel, Amsterdam. I had a really good time in there. Was good to share use cases and experiences in person with people who I had just spoken with in forums. I spend the first two days [...]]]></description>
			<content:encoded><![CDATA[<p>Last week I had the chance to go to the <a title="ApacheCon Europe 2009" href="http://www.eu.apachecon.com/c/aceu2009/"><strong>ApacheCon Europe 2009</strong></a>. The event took place in Mövenpick Hotel, Amsterdam. I had a really good time in there.</p>
<p>Was good to share use cases and experiences in person with people who I had just spoken with in forums.<br />
I spend the first two days in the <strong>hackathon</strong> doing some research and test of different ASF projects. Put special interest in <a title="Pig" href="http://hadoop.apache.org/pig/" target="_blank"><strong>Pig</strong></a>.</p>
<p>There were really interesting chats. I found specially great <a title="Lucene mahout" href="http://lucene.apache.org/mahout/" target="_blank"><strong>Mahout</strong></a> project. I had discovered it in <strong>ApacheCon</strong> 2008 in New Orleans, I almost just heard about it in there but paid more atention this time and looks full of possibilities. It is used for machine learning and runs under <a title="Lucene" href="http://hadoop.apache.org/" target="_blank"><strong>Hadoop</strong></a>.<br />
Was also good to get some info about Servlet 3.0 and learn about servlets doFilter function and some other stuff.<br />
<a title="HBase" href="http://hadoop.apache.org/hbase/" target="_blank"><strong>HBase</strong> </a>is another project I was interested in. Looks good to be used as a &#8220;data warehouse&#8221; but seems really difficult (at least at first impression) to deal with the stored data.</p>
<p>Meetups were so good too. There was a presentation about the new <a title="Lucene" href="http://lucene.apache.org/java/docs/" target="_blank"><strong>Lucene</strong></a> contrib <strong>TrieRangeQuery</strong>. It is still not available in the official release but you can use it graving a nightly build. In the next few days I will try to write with more detail about this and other presented projects.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2009/04/01/apachecon-europe-2009/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Index scalability using Pig</title>
		<link>http://www.marcsturlese.com/2009/03/02/index-scalability-using-pig/</link>
		<comments>http://www.marcsturlese.com/2009/03/02/index-scalability-using-pig/#comments</comments>
		<pubDate>Mon, 02 Mar 2009 22:37:41 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Pig]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=70</guid>
		<description><![CDATA[Here is a really interesting example of how to build an inverted index using Pig. As I have seen in Hadoop, to create a Lucene index you must start from a text file and use MapReduce jobs to build it. Pig however, allows you to retrieve data not just from a text file but from [...]]]></description>
			<content:encoded><![CDATA[<p>Here is a really <a title="Build index with Hadoop Pig" href="http://squarecog.wordpress.com/2009/01/17/building-an-inverted-index-with-hadoop-and-pig/" target="_blank">interesting example</a> of how to build an inverted index using <strong>Pig</strong>. As I have seen in <strong>Hadoop</strong>, to create a <strong>Lucene index</strong> you must start from a text file and use <strong>MapReduce</strong> jobs to build it. <strong>Pig</strong> however, allows you to retrieve data not just from a text file but from <strong>SQL databases, HBase</strong> or other data sources.</p>
<p>After checking the example with detail, what comes now to my mind is if it would be possible to create a <strong>Lucene </strong>index using <strong>Pig</strong> and <strong>MapReduce</strong> jobs retrieving data from a distributed <strong>HBase</strong> data store system&#8230; I am wandering if there would be <strong>Lucene</strong> analyzers problems (or any other), for example.</p>
<p>I have read that <strong>Pig</strong> is not specially fast accessing to data. However, in indexing cases, probably this would be more than compensated with the <strong>MapReduce</strong> jobs.</p>
<p>How fast would it be? I still have lots of research and tests to do&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2009/03/02/index-scalability-using-pig/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

