<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Marc Sturlese &#187; ElasticSearch</title>
	<atom:link href="http://www.marcsturlese.com/tag/elasticsearch/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.marcsturlese.com</link>
	<description>Life, code and stuff</description>
	<lastBuildDate>Tue, 29 Nov 2011 22:19:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>ElasticSearch</title>
		<link>http://www.marcsturlese.com/2010/02/12/elasticsearch/</link>
		<comments>http://www.marcsturlese.com/2010/02/12/elasticsearch/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 00:33:20 +0000</pubDate>
		<dc:creator>Marc Sturlese</dc:creator>
				<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[ElasticSearch]]></category>

		<guid isPermaLink="false">http://www.marcsturlese.com/?p=145</guid>
		<description><![CDATA[It has been a long time since my last post. I have been very busy so unfortunatelly, I have not had the time to write about all I wish. This week I have discovered via twitter a really interesting open source search project, ElasticSearch for the cloud. ElasticSearch has been createded by Shay Banon. It&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>It has been a long time since my last post. I have been very busy so unfortunatelly, I have not had the time to write about all I wish.</p>
<p>This week I have discovered via twitter a really interesting open source search project, <strong><a title="ElasticSearch" href="http://www.elasticsearch.com/" target="_blank">ElasticSearch for the cloud</a></strong>. <strong>ElasticSearch</strong> has been createded by Shay Banon. It&#8217;s a RESTful search engine built on top of <strong><a title="Lucene" href="http://lucene.apache.org/java/docs/" target="_blank">Lucene</a></strong> and very well prepared for high scalability. It includes shard merging, replication and much more features.</p>
<p>Lately I have been working a lot with search scalability and what I liked the most for the moment of <strong>ElasticSearch</strong> is that it allows 4 different types of distributed requests.</p>
<p>The most simple (Query and fetch) is just one request per relevant shard. Once all the requests are done, results are merged and&#8230; that&#8217;s it!<br />
In this type of search, all fields of a document are returned to the merger for all the returned documents.</p>
<p>In another search type (Query then fetch, this one is not that simple), a first request is done across all shards. Here you don&#8217;t ask for the document content at the moment. Once the results are merged, you only need to ask for the whole document data of the most relevant documents, the ones you want to show.<br />
If you have to search across lots of shards that&#8217;s definitely the way to go (the merger will just receive the fields of the important documents, wich means less data is sent across the network).</p>
<p>Both options present a typical problem in distributed search. The relevance is calculated relative to the shard, it&#8217;s not absolute across all of them.<br />
To solve this, in <strong>ElasticSearch</strong>, both search options can be supplemented with an initial request. This one queries for the necessary term frequencies information to allow an &#8220;absolute relevance&#8221;.<br />
This is not for free, you are paying with an extra trip (even it can be cached). It&#8217;s good if you can avoid that. A good way to do that is at indexing time, when you decide in wich shard a document must be added. Choosing it randomly will more or less ensure you that term frequencies won&#8217;t differ so much among shards.</p>
<p>Still have not had the chance to dig into the source but already have downloaded it from the <a title="ElasticSearch" href="http://github.com/elasticsearch/elasticsearch" target="_blank">git repository</a>.<br />
Anyone that want to share experiences with <strong>ElasticSearch</strong> is more than welcome.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.marcsturlese.com/2010/02/12/elasticsearch/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

