<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Sematext Blog</title>
	<atom:link href="http://blog.sematext.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.sematext.com</link>
	<description>Search, Text Analytics, Natural Language Processing</description>
	<lastBuildDate>Fri, 03 Feb 2012 06:08:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.sematext.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Sematext Blog</title>
		<link>http://blog.sematext.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.sematext.com/osd.xml" title="Sematext Blog" />
	<atom:link rel='hub' href='http://blog.sematext.com/?pushpress=hub'/>
		<item>
		<title>The New SolrCloud: Overview</title>
		<link>http://blog.sematext.com/2012/02/01/solrcloud-distributed-realtime-search/</link>
		<comments>http://blog.sematext.com/2012/02/01/solrcloud-distributed-realtime-search/#comments</comments>
		<pubDate>Wed, 01 Feb 2012 13:32:54 +0000</pubDate>
		<dc:creator>Rafał Kuć</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[cluster]]></category>
		<category><![CDATA[distributed]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[solrcloud]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=1754</guid>
		<description><![CDATA[Just the other day we wrote about Sensei, the new distributed, real-time full-text search database built on top of Lucene and here we are again writing about another &#8220;new&#8221; distributed, real-time, full-text search server also built on top of Lucene: SolrCloud. In this post we&#8217;ll share some interesting SolrCloud bits and pieces that matter mostly [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1754&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Just the other day we wrote about <a href="http://blog.sematext.com/2012/01/26/sensei-distributed-realtime-semi-structured-database/">Sensei</a>, the new distributed, real-time full-text search database built on top of Lucene and here we are again writing about another &#8220;new&#8221; distributed, real-time, full-text search server also built on top of Lucene: SolrCloud.</p>
<p>In this post we&#8217;ll share some interesting SolrCloud bits and pieces that matter mostly to those working with large data and query volumes, but that all search lovers should find really interesting, too.  If you have any questions about what we wrote (or did not write!) in this post, please leave a comment &#8211; we&#8217;re good at following up to comments!  Or just ask <a href="http://twitter.com/sematext">@sematext</a>!</p>
<p>Please note that functionality described in this post is now part of <em>trunk</em> in Lucene and Solr SVN repository.  This means that it will be available when Lucene and Solr 4.0 are released, but you can also use <em>trunk</em> version just like we did, if you don&#8217;t mind living on the bleeding edge.</p>
<p>Recently, we were given the opportunity to once again use big data (massive may actually be more descriptive of this data volume) stored in a HBase cluster and search. We needed to design a scalable search cluster capable of elastically handling future data volume growth.  Because of the huge data volume and high search rates our search system required the index to be sharded.  We also wanted the indexing to be as simple as possible and we also wanted a stable, reliable, and very fast solution. The one thing we did not want to do is reinvent the wheel.  At this point you may ask why we didn&#8217;t choose <a href="http://elasticsearch.org/">ElasticSearch</a>, especially since we use ElasticSearch a lot at <a href="http://sematext.com/">Sematext</a>.  The answer is that when we started the engagement with this particular client a whiiiiile back when ElasticSearch wasn&#8217;t where it is today.  And while ElasticSearch does have a number of advantages over the old master-slave Solr, with SolrCloud being in the trunk now, Solr is again a valid choice for very large search clusters.</p>
<p>And so we took the opportunity to use SolrCloud and some of its features not present in previous versions of Solr.  In particular, we wanted to make use of Distributed Indexing and Distributed Searching, both of which SolrCloud makes possible. In the process we looked at a few JIRA issues, such as <a href="https://issues.apache.org/jira/browse/SOLR-2358">SOLR-2358</a> and <a href="https://issues.apache.org/jira/browse/SOLR-2355">SOLR-2355</a>, and we got familiar with relevant portions of SolrCloud source code.  This confirmed SolrCloud would indeed satisfy our needs for the project and here we are sharing what we&#8217;ve learned.</p>
<h4>Our Search Cluster Architecture</h4>
<p>Basically, we wanted the search cluster to look like this:</p>
<div id="attachment_1807" class="wp-caption aligncenter" style="width: 640px"><a href="http://sematext.files.wordpress.com/2012/01/distributedsolr-arch.png"><img class="size-full wp-image-1807 " title="SolrCloud App Architecture" src="http://sematext.files.wordpress.com/2012/01/distributedsolr-arch.png?w=630&#038;h=335" alt="" width="630" height="335" /></a><p class="wp-caption-text">SolrCloud App Architecture</p></div>
<p>Simple? Yes, we like simple.  Who doesn&#8217;t!  But let&#8217;s peek inside that &#8220;Solr cluster&#8221; box now.</p>
<h4></h4>
<h4>SolrCloud Features and Architecture</h4>
<p>Some of the nice things about SolrCloud are:</p>
<ul>
<li>centralized cluster configuration</li>
<li>automatic node fail-over</li>
<li>near real time search</li>
<li>leader election</li>
<li>durable writes</li>
<li>&#8230;</li>
</ul>
<p>Furthermore, SolrCloud can be configured to:</p>
<ul>
<li>have multiple index shards</li>
<li>have one or more replicas of each shards</li>
</ul>
<p>Shards and Replicas are arranged into Collections. Multiple Collections can be deployed in a single SolrCloud cluster.  A single search request can search multiple Collections at once, as long as they are compatible. The diagram below shows a high-level picture of how SolrCloud indexing works.</p>
<div id="attachment_1808" class="wp-caption aligncenter" style="width: 640px"><a href="http://sematext.files.wordpress.com/2012/01/distributedsolr-shardsreplicas.png"><img class="size-full wp-image-1808" title="SolrCloud Shards, Replicas, Replication" src="http://sematext.files.wordpress.com/2012/01/distributedsolr-shardsreplicas.png?w=630&#038;h=298" alt="" width="630" height="298" /></a><p class="wp-caption-text">SolrCloud Shards, Replicas, Replication</p></div>
<p>As the above diagram shows, documents can be sent to any SolrCloud node/instance in the SolrCloud cluster.  Documents are automatically forwarded to the appropriate Shard Leader (labeled as Shard 1 and Shard 2 in the diagram). This is done automatically and documents are sent in batches between Shards. If a Shard has one or more replicas (labeled Shard 1 replica and Shard 2 replica in the diagram) a document will get replicated to one or more replicas.  Unlike in traditional master-slave Solr setups where index/shard replication is performed periodically in batches, replication in SolrCloud is done in real-time.  This is how Distributed Indexing works at the high level.  We simplified things a bit, of course &#8211; for example, there is no <a href="http://search-lucene.com/?q=zookeeper&amp;fc_project=Solr">ZooKeeper</a> or <a href="http://search-lucene.com/?q=overseer&amp;fc_project=Solr">overseer</a> shown in our diagram.</p>
<h4></h4>
<h4>Setup Details</h4>
<p>All configuration files are stored in ZooKeeper.  If you are not familiar with ZooKeeper you can think of it as a distributed file system where SolrCloud configuration files are stored. When the first Solr instance in a SolrCloud cluster is started configuration files need to be sent to ZooKeeper and one needs to specify how many shards there should be in the cluster. Then, this Solr instance/node is running one can start additional Solr instances/nodes and point them to the ZooKeeper  instance (ZooKeeper is actually typically deployed as a quorum or 3, 5, or more instances in production environments).  And voilà &#8211; the SolrCloud cluster is up!  I must say, it&#8217;s quite simple and straightforward.</p>
<p>Shard Replicas in SolrCloud serve multiple purposes.  They provide fault tolerance in the sense that when (not if!) a single Solr instance/node containing a portion of the index goes down, you still have one or more replicas of data that was served by that instance elsewhere in the cluster and thus you still have the whole data set and no data loss.  They also allow you to spread query load over more servers, this making the cluster capable of handling higher query rates.</p>
<h4></h4>
<h4>Indexing</h4>
<p>As you saw above, the new SolrCloud really simplifies Distributed Indexing.  Document distribution between Shards and Replicas is automatic and real-time.  There is no master server one needs to send all documents to. A document can be sent to any SolrCloud instance and SolrCloud takes care of the rest. Because of this, there is no longer a SPOF (Single Point of Failure) in Solr.  Previously, Solr master was a SPOF in all but the most elaborate setups.</p>
<h4></h4>
<h4>Querying</h4>
<p>One can query SolrCloud a few different ways:</p>
<ul>
<li>One can query a single Shard, which is just like Solr querying a search a single Solr instance.</li>
<li>The second option is to query a single Collection (i.e., search all shards holding pieces of a given Collection&#8217;s index).</li>
<li>The third option is to only query some of the Shards by specifying their addresses or names.</li>
<li>Finally, one can query multiple Collections assuming they are compatible and Solr can merge results they return.</li>
</ul>
<p>As you can see, lots of choices!</p>
<h4></h4>
<h4>Administration with Core Admin</h4>
<p>In addition to the standard <a href="http://wiki.apache.org/solr/CoreAdmin">core admin</a> parameters there are some new ones available in SolrCloud. These new parameters let one:</p>
<ul>
<li>create new Shards for an existing Collection</li>
<li>create a new Collection</li>
<li>add more nodes</li>
<li>&#8230;</li>
</ul>
<h4>The Future</h4>
<p>If you look at the New SolrCloud Design wiki page (<a href="http://wiki.apache.org/solr/NewSolrCloudDesign" target="_blank">http://wiki.apache.org/solr/NewSolrCloudDesign</a>) you will notice, that not all planned features have been implemented yet. There are still things like cluster re-balancing or monitoring (<span style="color:#ff0000;"><em><strong>if you are using SolrCloud already and want to monitor its performance, let us know if you want early access to <a href="http://sematext.com/spm/index.html"><span style="color:#ff0000;">SPM</span></a> for SolrCloud</strong></em></span>) to be done.  Now that SolrCloud is in the Solr trunk, it should see more user and more developer attention.  We look forward to using SolrCloud in more projects in the future!</p>
<p>&#8211; <a href="http://twitter.com/sematext">@sematext</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/1754/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/1754/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sematext.wordpress.com/1754/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sematext.wordpress.com/1754/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sematext.wordpress.com/1754/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sematext.wordpress.com/1754/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sematext.wordpress.com/1754/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sematext.wordpress.com/1754/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sematext.wordpress.com/1754/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sematext.wordpress.com/1754/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sematext.wordpress.com/1754/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sematext.wordpress.com/1754/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sematext.wordpress.com/1754/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sematext.wordpress.com/1754/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1754&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2012/02/01/solrcloud-distributed-realtime-search/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d05d7fbddd69c91eefaad0b9ec624111?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">kucrafal</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2012/01/distributedsolr-arch.png" medium="image">
			<media:title type="html">SolrCloud App Architecture</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2012/01/distributedsolr-shardsreplicas.png" medium="image">
			<media:title type="html">SolrCloud Shards, Replicas, Replication</media:title>
		</media:content>
	</item>
		<item>
		<title>Sensei: distributed, realtime, semi-structured database</title>
		<link>http://blog.sematext.com/2012/01/26/sensei-distributed-realtime-semi-structured-database/</link>
		<comments>http://blog.sematext.com/2012/01/26/sensei-distributed-realtime-semi-structured-database/#comments</comments>
		<pubDate>Fri, 27 Jan 2012 04:55:32 +0000</pubDate>
		<dc:creator>sematext</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[distributed]]></category>
		<category><![CDATA[real-time]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[sensei]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=1787</guid>
		<description><![CDATA[Once upon a time there was no decent open-source search engine.  Then, at the very beginning of this millennium Doug Cutting gave us Lucene.  Several years later Yonik Seeley wrote Solr.  In 2010 Shay Banon released ElasticSearch.  And just a few days ago John Wang and his team at LinkedIn announed Sensei 1.0.0 (also known [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1787&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Once upon a time there was no decent open-source search engine.  Then, at the very beginning of this millennium Doug Cutting gave us <a href="http://lucene.apache.org/java">Lucene</a>.  Several years later Yonik Seeley wrote <a href="http://lucene.apache.org/solr">Solr</a>.  In 2010 Shay Banon released <a href="http://blog.sematext.com/2010/05/03/elastic-search-distributed-lucene/">ElasticSearch</a>.  And just a few days ago John Wang and his team at LinkedIn announed <a href="http://senseidb.com/">Sensei </a>1.0.0 (also known as SenseiDB).  Here at <a href="http://sematext.com/">Sematext</a> we&#8217;ve been aware of Sensei for a while now (2 years?) and are happy to have one more great piece of search software available for our own needs and those of our <a href="http://sematext.com/clients/index.html">customers</a>.  As a matter of fact, we are so excited about Sensei that we&#8217;ve already started hacking on adding support for Sensei to SPM, our <a href="http://sematext.com/spm/index.html">Scalable Performance Monitoring</a> tool and service!  Since Sensei is brand new, we asked John to tell us a little more about it.</p>
<p><strong>Please tell us a bit about yourself.</strong></p>
<p>My name is John Wang, and I am the search architect at LinkedIn.com. I am the creator and the current lead for the Sensei project.<br />
<strong></strong></p>
<p><strong>Could you describe Sensei for us?</strong></p>
<p>Sensei is an open-source, elastic, realtime, distributed database with native support for searching and navigating both unstructured text and structured data. Sensei is designed to handle complex semi-structured queries on very large, and rapidly changing datasets.</p>
<p>It was written by the content search team at LinkedIn to support LinkedIn Homepage and Signal.</p>
<p>The core engine is also used for LinkedIn search properties, e.g. people search, recruiter system, job and company search pages.<br />
<strong></strong></p>
<p><strong>Why did you write Sensei instead of using Lucene or Solr?</strong></p>
<p>Sensei leverages Lucene.</p>
<p>We weren’t able to leverage Solr because of the following requirements:</p>
<ul>
<li>High update requirement, 10’s of thousands updates per second in to the system</li>
<li>Real distributed solution, current Solr’s distributed story has a SPOF at the master, and Solr Cloud is not yet completed.</li>
<li>Complex faceting support. Not just your standard terms based faceting. We needed to facet on social graph, dynamic time ranges and many other interesting faceting scenarios. Faceting behavior also needs to be highly customizable, which is not available via Solr.</li>
</ul>
<p><strong>What does Sensei do that existing open-source search solutions don’t provide?</strong></p>
<p>Consider Sensei if your application has the following characteristics:</p>
<ul>
<li>High update rates</li>
<li>Non-trivial semi-structured query support</li>
</ul>
<p><strong>Who should stick with Solr or ElasticSearch instead of using Sensei?</strong></p>
<p>The feature set, as well as limitations of all these system don’t overlap fully. Depending on your application, if you are building on certain features in one system and it is working out, then I would suggest you stick with it. But for Sensei, our philosophy is to consider performance ahead of features.</p>
<p><strong>Are there currently any Sensei users other than LinkedIn that you are aware of?</strong></p>
<p>We have seen some activities on the mailing list indicating deployments outside of LinkedIn, but I don’t know the specifics. This is a new project and we are keeping track of its usage on <a href="http://senseidb.github.com/sensei/usage.html">http://senseidb.github.com/sensei/usage.html</a>, so let us know if you are using Sensei and want to be listed there.</p>
<p><strong>What are Sensei’s biggest weaknesses and how and when do you plan on addressing them?</strong></p>
<p>Let me address this question by providing a few limitations of Sensei:</p>
<ul>
<li>Each document inserted into Sensei must have a unique identifier (UID) of type long. Though this can be inconvenient, this is a decision made for performance reasons. We have no immediate plans for addressing this, but we are thinking about it.</li>
<li>For columns defined in numeric format, e.g. int, float, long&#8230;, we don’t yet support negative numbers. We will have support for negative numbers very soon.</li>
<li>Static schema. Dynamic schema is something we find useful, and we will support it in the near future.</li>
</ul>
<p><strong>What’s next for Sensei as a project?</strong></p>
<p>We will continue iterating on Sensei on both the performance and feature front. See below for a list of things we are looking at.</p>
<p><strong>What are some of the key features you plan on adding to Sensei in the coming months?</strong></p>
<p>This may not be a comprehensive list, but gives you an idea areas we are working on:</p>
<ul>
<li>Relevance toolkit</li>
<li>Built-in time and geo type columns</li>
<li>Parent-node type documents</li>
<li>Attribute type faceting (name-value pairs)</li>
<li>Online rebalancing</li>
<li>Online reindexing</li>
<li>Parameter secondary store (e.g. activities on a document, social gestures, etc.)</li>
<li>Dynamic schemata</li>
<li>Support for aggregation functions, e.g. AVG, MIN, MAX, etc.</li>
</ul>
<p><strong>The Relevance toolkit sounds interesting.  Could you tell us a bit about what this will do and if, by any chance, this might in any way be close to the idea behind Apache Open Relevance?</strong></p>
<p>This is a big feature for 1.1.0. I am not familiar with Open Relevance to comment. The idea behind relevance toolkit is to allow you to specify a model with the query. One important usage for us is to be able to perform relevance tuning against fast-flowing production data.  Waiting for things to be redeployed to production after relevance model changes does not work if the production data is changing in real-time, like tweets.</p>
<p>Maybe some specific tech questions &#8211; feel free to skip the ones that you think are not important or you just don’t feel like answering.</p>
<p><strong>What is the role of Norbert in Sensei?</strong></p>
<p>Sensei currently uses Norbert , whose maintainer is one of our main developers ,as a cluster manager and RPC between a Broker and Sensei nodes. A Broker is servlet embedded in each Sensei node. Norbert is used as a message transport to Sensei nodes.. Norbert is an elegant wrapper around Zookeeper for cluster management. We do have plans to create abstraction around this component to allow pluggability for other cluster managers.</p>
<p><strong>When I first saw SQL-like query on Sensei’s home page I thought it was purely for illustration purposes, but now I realize it is actually very real!</strong></p>
<p>BQL &#8211; Browse Query Language, is a SQL-variant to query Senesi. It is very real, we plan for BQL to be a standard way to query Sensei.</p>
<p><strong>Can you share with us any Sensei performance numbers?</strong></p>
<p>We have published some performance numbers at <a href="http://senseidb.com/performance.html">http://senseidb.com/performance.html</a></p>
<p>We have created a separate Github repository containing all our performance evaluation code at:</p>
<p><a href="https://github.com/kwei/search-perf">https://github.com/kwei/search-perf</a></p>
<p><strong>Does Sensei have a SPOF (Single Point Of Failure)?</strong></p>
<p>No &#8211; assuming a Sensei cluster contains more than 1 replica of each document. This is one important design goal of Sensei: every Sensei node in the cluster acts independently in both consuming data as well as handling queries. See the following answers for details.<br />
<strong></strong></p>
<p><strong>What has to happen for data loss to occur?</strong></p>
<p>Data loss occurs only if you have data store corruption on all replicas.  If only 1 replica is corrupted, you can always recover from other replicas.</p>
<p>Sensei by design assumes a data source that is ordered and versioned, e.g., a persistent queue. Each Sensei node persists the version for each commit. Thus, to recover data events can be replayed from that version.</p>
<p>In production at Linkedin, this is very handy to ensure data consistency when bouncing nodes.</p>
<p><strong>You mention recovery from other replicas and recovery by replaying data events from a specific version.  Does that mean once a copy of a document makes it into Sensei in order to recover lost replicas for that document Sensei does not need to reach out to the originator of the data and is self-sufficient, so to speak?  Or does replaying mean getting the lost data from an external data store?</strong></p>
<p>The data stream is external. So to catch-up from an older version, Sensei would just re-play the data events from the stream using this version. But if an entire data replica is lost, a manual copy from other replicas is required (for now).</p>
<p><strong>What happens if a node in a cluster fails?</strong></p>
<p>When a node fails, Zookeeper notifies other cluster event listeners in the cluster, which means the Broker. Broker keeps a state of the current cluster node topology, and subsequent queries will be routed to the live replicas, thus avoiding sending requests to the failed node. If all nodes for one replica are down, then partial results are returned.</p>
<p><strong>What happens when the cluster reaches its capacity?  Can one simply add more nodes to the cluster and Sensei auto-magically takes care of the rest or does one have to manually rebalance shards, or&#8230;?</strong></p>
<p>Depending on how data is sharded:</p>
<p>If over-sharding technique is used, then adding nodes to the cluster is trivial. New nodes would just specify which shards they want to handle &#8211; every node in sensei.properties indicates partitions it should handle, e.g.,<em> sensei.node.partitions=1,3,8</em></p>
<p>If using sharding strategy where data migration is not needed as new data is flowing into the system, e.g., sharding by time or consecutive UID, then expanding the cluster is also trivial.</p>
<p>If such sharding strategy requires data migration, e.g. mod based sharding. Then cluster rebalancing is required. This is coming in a future release, where we already have designs for online data rebalancing.  For now, one has to reindex all data in order to reshard and rebalance.</p>
<p><strong>Since Sensei is an eventually consistent system, how does one insure the search client gets consistent results (e.g. when paging through results or filtering results with facets)?</strong></p>
<p>On the Sensei request object, there is a routing parameter. Typically this routing parameter is set to the value of the search session. By default, Sensei applies consistent hashing on the routing parameter to make sure the same replica is used for queries with the same routing parameter or search session.</p>
<p><strong>How does one upgrade Sensei? Can one perform an online upgrade of a Sensei cluster without shutting down the whole cluster? Are new versions of Sensei backwards compatible?</strong></p>
<p>Yes, subsets of the cluster can be shut down, and dynamic routing via Zookeeper would take place. This is very useful when we are pushing out new builds in canary mode to ensure stability and compatibility.</p>
<p><strong>Does Sensei require a schema or is it schemaless?</strong></p>
<p>Sensei requires a schema. But we do have plans to make Sensei schema dynamic like ElasticSearch.</p>
<p><strong>Does Sensei have support for things like Spatial Search, Function Queries, Parent-Child data, or JOIN?</strong></p>
<p>We have in the works a relevance toolkit which should cover features of Solr’s Function Queries.</p>
<p>We also have plans to support Spatial Search and Parent-Child data.</p>
<p>We don’t have immediate plans to support Joins.</p>
<p><strong>How does one talk to Sensei?  Are there existing client libraries?</strong></p>
<p>The Sensei cluster exposes 2 rest end-points: REST/JSON and BQL.<br />
The packaging also include Java and Python clients, (also Ruby if resourcing works out), along with JavaScript helpers for using the REST/JSON API in a web application.</p>
<p><strong>Does Sensei have an administrative/management UI?</strong></p>
<p>Sensei comes with a web application for helping with building queries against the cluster. We use it to tweak relevance models as well as instrumenting an online cluster.</p>
<p>JMX is also exposed to administer the cluster.</p>
<p>In the configuration users can turn on other types of data reporting to other clusters, e.g. RRD, log etc.</p>
<p>Big thank you to John and his team for releasing and open-sourcing Sensei and for taking the time to answer all our questions.</p>
<p>&#8211; <a href="http://twitter.com/sematext">@sematext</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/1787/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/1787/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sematext.wordpress.com/1787/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sematext.wordpress.com/1787/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sematext.wordpress.com/1787/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sematext.wordpress.com/1787/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sematext.wordpress.com/1787/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sematext.wordpress.com/1787/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sematext.wordpress.com/1787/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sematext.wordpress.com/1787/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sematext.wordpress.com/1787/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sematext.wordpress.com/1787/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sematext.wordpress.com/1787/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sematext.wordpress.com/1787/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1787&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2012/01/26/sensei-distributed-realtime-semi-structured-database/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/18ddfa48f19355af2afd971dce4c899c?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">sematext</media:title>
		</media:content>
	</item>
		<item>
		<title>Sematext Year 2011 in Review</title>
		<link>http://blog.sematext.com/2012/01/09/sematext-year-2011-in-review/</link>
		<comments>http://blog.sematext.com/2012/01/09/sematext-year-2011-in-review/#comments</comments>
		<pubDate>Mon, 09 Jan 2012 05:16:43 +0000</pubDate>
		<dc:creator>sematext</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[2011]]></category>
		<category><![CDATA[sematext]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=1719</guid>
		<description><![CDATA[2011 was a good year for Sematext. Here are some highlights. Products In 2011, we&#8217;ve released several new versions of our popular AutoComplete, Key Phrase Extractor, and DYM ReSearcher products and have witnessed a number of organizations adopting them. SaaS After months of hard work, we&#8217;ve opened up our Search Analytics and Performance Monitoring services [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1719&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>2011 was a good year for Sematext. Here are some highlights.</p>
<h3>Products</h3>
<p>In 2011, we&#8217;ve released several new versions of our popular <a href="http://sematext.com/products/autocomplete/index.html">AutoComplete</a>, <a href="http://sematext.com/products/key-phrase-extractor/index.html">Key Phrase Extractor</a>, and <a href="http://sematext.com/products/dym-researcher/index.html">DYM ReSearcher</a> products and have witnessed a number of organizations adopting them.</p>
<h3>SaaS</h3>
<p>After months of hard work, we&#8217;ve opened up our <a href="http://sematext.com/search-analytics/index.html">Search Analytics</a> and <a href="http://sematext.com/spm/index.html">Performance Monitoring</a> services to public. Anyone can sign up for an account and use either or both of these services for free.  <strong><span style="color:#ff6600;">Yes, both services are completely free now and can be used without any restrictions</span></strong>.</p>
<h3>Services &#8211; Tech Support</h3>
<p>In addition to our standard <a href="http://sematext.com/services/index.html">consulting services</a> we&#8217;ve successfully started offering <a href="http://sematext.com/services/tech-support.html">commercial tech support</a> for Lucene and Solr.  These annual subscriptions come in several different packages and are made for those running Lucene or Solr in production and wanting immediate access to Lucene and Solr experts when things go awry.</p>
<h3>Services &#8211; Consulting Packages</h3>
<p>For those who need long-term access to Lucene or Solr experts we started offering several levels of <a href="http://sematext.com/services/consulting-support.html">consulting support packages</a>.  What makes these packages attractive is that one gets immediate help to Lucene or Solr expert consultants while paying a lower rate in exchange for an annual commitment.</p>
<h3>Conferences</h3>
<p>We attended and presented at a number of <a href="http://blog.sematext.com/tag/conference/">conferences</a> (slides and videos) in 2011 &#8211; Lucene Revolution in San Francisco in May, Berlin Buzzwords in June, Lucene Eurocon in Barcelona in October, and Enterprise Search Summit Fall in Washington, DC in November.</p>
<h3>Open Source</h3>
<p>During our work on <a href="http://sematext.com/search-analytics/index.html">Search Analytics</a> and <a href="http://sematext.com/spm/index.html">Performance Monitoring</a> and specifically the parts of them that use HBase, we&#8217;ve forked a couple of <a href="http://sematext.com/open-source/index.html">open-source projects</a> that we put up on <a href="http://github.com/sematext">Github</a>.  We are looking at open-sourcing a few other things in 2012.  We&#8217;ve also contributed patches to Flume, Solr, HBase, and while in Berlin in June we took part in our first HBase Hackathon.</p>
<h3>Team Growth</h3>
<p>Our team has roughly doubled in size.  Our people are now in 3 different time zones and 6 countries and we are looking to expand this further.  We are actively hiring and have a number of <a href="http://sematext.com/about/jobs.html">open positions</a>, from mobile development and design to system administration, sales and marketing.  Of course, we are always looking for search and big data experts.  The team didn&#8217;t grow only in number, but also in knowledge and expertise &#8211; we now have 2 Cloudera Certified Hadoop developers on staff, 2 published book authors, a recent Machine Learning and Artificial Intelligence Stanford online class &#8220;graduate&#8221;, etc.</p>
<h3>Collaboration with Academia</h3>
<p>We have partnered with a university lab on the other side of the Atlantic and have been collaborating on some interesting projects results of which we&#8217;ll share in 2012.</p>
<p>&nbsp;</p>
<p>Overall, 2011 was a good year.  But we are in 2012 now and it&#8217;s time to look ahead.  Looking at our crystal ball, I see more fun work and more opportunities.  We&#8217;ll do our best to make the most of them.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/1719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/1719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sematext.wordpress.com/1719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sematext.wordpress.com/1719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sematext.wordpress.com/1719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sematext.wordpress.com/1719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sematext.wordpress.com/1719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sematext.wordpress.com/1719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sematext.wordpress.com/1719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sematext.wordpress.com/1719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sematext.wordpress.com/1719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sematext.wordpress.com/1719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sematext.wordpress.com/1719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sematext.wordpress.com/1719/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1719&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2012/01/09/sematext-year-2011-in-review/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/18ddfa48f19355af2afd971dce4c899c?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">sematext</media:title>
		</media:content>
	</item>
		<item>
		<title>Relevance Tuning and Competitive Advantage via Search Analytics</title>
		<link>http://blog.sematext.com/2012/01/06/relevance-tuning-and-competitive-advantage-via-search-analytics/</link>
		<comments>http://blog.sematext.com/2012/01/06/relevance-tuning-and-competitive-advantage-via-search-analytics/#comments</comments>
		<pubDate>Fri, 06 Jan 2012 17:00:25 +0000</pubDate>
		<dc:creator>sematext</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=1658</guid>
		<description><![CDATA[Here are two cool things about Search Analytics that I&#8217;d like to point out.  The slides are stolen from our Search Analytics presentation at Enterprise Search Summit 2011 in Washington DC. Search Analytics for A/B testing, relevance tuning and improvements This slide shows how Search Analytics can be used to help with A/B testing.  Concretely, in this [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1658&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Here are two cool things about Search Analytics that I&#8217;d like to point out.  The slides are stolen from our <a href="http://www.slideshare.net/sematext/search-analytics-at-enterprise-search-summit-fall-2011">Search Analytics presentation at Enterprise Search Summit 2011</a> in Washington DC.</p>
<p><span class="Apple-style-span" style="font-size:15px;font-weight:bold;">Search Analytics for A/B testing, relevance tuning and improvements</span></p>
<p>This slide shows how <a href="http://sematext.com/search-analytics/index.html">Search Analytics</a> can be used to help with A/B testing.  Concretely, in this slide we see two Solr Dismax handlers selected on the right side.  If you are not familiar with Solr, think of a Dismax handler as an API that search applications call to execute searches.  In this example, each Dismax handler is configured differently and thus each of them ranks search hits slightly differently.  On the graph we see the MRR (see Wikipedia page for <a href="http://en.wikipedia.org/wiki/Mean_reciprocal_rank">Mean Reciprocal Rank</a> details) for both Dismax handlers and we can see that the one corresponding to the <strong>blue line is performing much bette</strong>r.  That is, users are clicking on search hits closer to the top of the search results page, which is one of several signals of this Dismax handler providing better relevance ranking than the other one.  Once you have a system like this in place you can add more Dismax handlers and compare 2 or more of them at a time.  As the result, <strong>with the help of Search Analytics you get actual, real feedback about any changes you make to your search engine</strong>.  Without a tool like this, you cannot really tune your search engine&#8217;s relevance well and will  be doing it blindly.</p>
<p><strong><em><br />
</em></strong></p>
<div id="attachment_1654" class="wp-caption alignnone" style="width: 640px"><a href="http://sematext.files.wordpress.com/2011/11/sa-mrr-ab-testing-markup-small.png"><img class="size-full wp-image-1654" title="A/B Testing with Search Analytics" src="http://sematext.files.wordpress.com/2011/11/sa-mrr-ab-testing-markup-small.png?w=630&#038;h=167" alt="A/B Testing with Search Analytics" width="630" height="167" /></a><p class="wp-caption-text">A/B Testing with Search Analytics</p></div>
<p><em><strong>Note:</strong> while in this slide we see two Solr Dismax handlers, <strong>Sematext Search Analytics is search vendor agnostic</strong> - the same thing can be done with search powered by FAST/Microsoft Search, Attivio, Endeca/Oracle, Autonomy/HP, Vivisimo, Dieselpoint, Coveo, ElasticSearch, vanilla Lucene, Xapian, or Sphinx, or &#8230;</em></p>
<h3>Gaining Competitive Advantage with Search Analytics</h3>
<p>As you can see, the only way to fix or improve things, and in this case we are talking about various aspects of search experience, is by having something with which you can measure this search experience and your changes.  You need something to tell you when it&#8217;s time to improve things, and you need something that gives you feedback about your changes: Did the key metrics improve after you changes?  If so, how much?  Did any metrics degrade? etc.</p>
<div id="attachment_1655" class="wp-caption alignnone" style="width: 640px"><a href="http://sematext.files.wordpress.com/2011/11/sa-key-takeaways.png"><img class="size-full wp-image-1655" title="Search Analytics Key Takeways" src="http://sematext.files.wordpress.com/2011/11/sa-key-takeaways.png?w=630&#038;h=517" alt="Search Analytics Key Takeways" width="630" height="517" /></a><p class="wp-caption-text">Search Analytics Key Takeways</p></div>
<p>I can&#8217;t emphasize enough how important Search Analytics is and how few organizations use it or use it well and consistently.  While this may be a bit mind boggling for those of us who live and breath search, from your perspective this is a great thing &#8211; it means that <strong>if you are smart about using Search Analytics to improve your search engine and your users&#8217; search experience, you will gain competitive advantage</strong> and be ahead of your competitors who still don&#8217;t have or don&#8217;t use <a href="http://sematext.com/search-analytics/index.html">Search Analytics</a>!</p>
<p>If you have any questions of feedback about Search Analytics in general, please leave a comment and we&#8217;ll follow up as soon as possible!</p>
<p>&#8211; <a href="http://twitter.com/sematext">@sematext</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/1658/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/1658/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sematext.wordpress.com/1658/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sematext.wordpress.com/1658/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sematext.wordpress.com/1658/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sematext.wordpress.com/1658/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sematext.wordpress.com/1658/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sematext.wordpress.com/1658/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sematext.wordpress.com/1658/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sematext.wordpress.com/1658/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sematext.wordpress.com/1658/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sematext.wordpress.com/1658/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sematext.wordpress.com/1658/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sematext.wordpress.com/1658/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1658&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2012/01/06/relevance-tuning-and-competitive-advantage-via-search-analytics/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/18ddfa48f19355af2afd971dce4c899c?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">sematext</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2011/11/sa-mrr-ab-testing-markup-small.png" medium="image">
			<media:title type="html">A/B Testing with Search Analytics</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2011/11/sa-key-takeaways.png" medium="image">
			<media:title type="html">Search Analytics Key Takeways</media:title>
		</media:content>
	</item>
		<item>
		<title>Hadoop 1.0.0 &#8211; Extra Notes</title>
		<link>http://blog.sematext.com/2012/01/04/hadoop-1-0-0-extra-notes/</link>
		<comments>http://blog.sematext.com/2012/01/04/hadoop-1-0-0-extra-notes/#comments</comments>
		<pubDate>Wed, 04 Jan 2012 06:01:11 +0000</pubDate>
		<dc:creator>Alex Baranau</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[hadoop]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=1733</guid>
		<description><![CDATA[The big Hadoop 1.0.0 release has arrived.  The general notes about releases from the dev team include: security Better support for HBase (append/hsynch/hflush, and security) webhdfs (with full support for security) performance enhanced access to local files for HBase other performance enhancements, bug fixes, and features You can also find the complete release notes here and see [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1733&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The big Hadoop 1.0.0 release has arrived.  The <a href="http://hadoop.apache.org/common/releases.html#News">general notes</a> about releases from the dev team include:</p>
<ul>
<li>security</li>
<li>Better support for HBase (append/hsynch/hflush, and security)</li>
<li>webhdfs (with full support for security)</li>
<li>performance enhanced access to local files for HBase</li>
<li>other performance enhancements, bug fixes, and features</li>
</ul>
<p>You can also find the complete release notes <a href="http://hadoop.apache.org/common/docs/r1.0.0/releasenotes.html">here</a> and see all fixes, improvements and new features included in the release. To save you time, please find below additional information about some of the items that attracted our attention from the Hadoop 1.0.0 release.</p>
<h5>Cluster Management Optimizations</h5>
<p><a href="https://issues.apache.org/jira/browse/HADOOP-7728" target="_blank">HADOOP-7728</a> &#8211; <span style="color:#993300;">hadoop-setup-conf.sh should be modified to enable task memory manager</span><br />
Adds additional options to manage memory usage by MR tasks. In particular, this allows to set max memory usage for map and reduce tasks (separately).</p>
<h5>Performance Improvements</h5>
<p><a href="https://issues.apache.org/jira/browse/HDFS-2246" target="_blank">HDFS-2246</a> &#8211; <span style="color:#993300;">hadoop-setup-conf.sh should be modified to enable task memory manager</span><br />
This is a short-term solution for the issue <a href="https://issues.apache.org/jira/browse/HDFS-347" target="_blank">HDFS-347</a> &#8220;DFS read performance suboptimal when client co-located on nodes with data&#8221; which is quite hot in Hadoop dev community nowadays. <strong>NOTE: by default this optimization is switched off</strong> (or <a href="https://issues.apache.org/jira/browse/HADOOP-7804">is it</a>? <strong>Update</strong>: it is <strong>not</strong>, see the <a href="http://blog.sematext.com/2012/01/04/hadoop-1-0-0-extra-notes/#respond">comments</a>) so some config adjustments are required to benefit from it. And you will definitely want to benefit from it: <a href="https://twitter.com/#!/otisg/statuses/144905733367009281">some reported</a> two times I/O performance improvements. Also highly recommended for HBase users.</p>
<p><a href="https://issues.apache.org/jira/browse/HDFS-895" target="_blank">HDFS-895</a> &#8211; <span style="color:#993300;">Allow hflush/sync to occur in parallel with new writes to the file</span><br />
Previously if a hflush/sync were in progress, an application could not write data to the HDFS client buffer. Again we stress out this improvement for HBase users as this increases the write throughput of the transaction log in HBase.</p>
<p><a href="https://issues.apache.org/jira/browse/MAPREDUCE-2494" target="_blank">MAPREDUCE-2494</a> &#8211; <span style="color:#993300;">Make the distributed cache delete entires using LRU priority</span><br />
When certain threshold was reached and distributed cache was being purged, previous implementation deleted all entries that were not currently being used. With new code more hot data can be left in the cache (the percentage is configurable) and thus decrease cache misses.</p>
<h5>New Features</h5>
<p><a href="https://issues.apache.org/jira/browse/HDFS-2316" target="_blank">HDFS-2316</a> -<span style="color:#993300;"> [umbrella] WebHDFS: a complete FileSystem implementation for accessing HDFS over HTTP</span><br />
Allows accessing HDFS over HTTP (read &amp; <strong>write</strong>)</p>
<p><a href="https://issues.apache.org/jira/browse/MAPREDUCE-3169" target="_blank">MAPREDUCE-3169</a> &#8211; <span style="color:#993300;">Create a new MiniMRCluster equivalent which only provides client APIs cross MR1 and MR2</span><br />
Cleaner MR1 &amp; MR2 compatible API for mini MR cluster to be used in unit-tests.</p>
<p><a href="https://issues.apache.org/jira/browse/HADOOP-7710" target="_blank">HADOOP-7710</a> &#8211; <span style="color:#993300;">Create a script to setup application in order to create root directories for application such hbase, hcat, hive etc</span><br />
Similar to hadoop-setup-user script, a hadoop-setup-applications script was added to set up root directories for apps to write to (/hbase, /hive, etc.)</p>
<p>Enjoy Hadoop 1.0.0 and we hope you found this quick summary useful!</p>
<p>&#8211; <a href="http://twitter.com/sematext">@sematext</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/1733/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/1733/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sematext.wordpress.com/1733/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sematext.wordpress.com/1733/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sematext.wordpress.com/1733/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sematext.wordpress.com/1733/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sematext.wordpress.com/1733/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sematext.wordpress.com/1733/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sematext.wordpress.com/1733/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sematext.wordpress.com/1733/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sematext.wordpress.com/1733/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sematext.wordpress.com/1733/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sematext.wordpress.com/1733/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sematext.wordpress.com/1733/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1733&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2012/01/04/hadoop-1-0-0-extra-notes/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cab7d4c69f7d86484454927e392600e0?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">abaranau</media:title>
		</media:content>
	</item>
		<item>
		<title>Lucene &amp; Solr Year 2011 in Review</title>
		<link>http://blog.sematext.com/2011/12/21/lucene-solr-year-2011-in-review/</link>
		<comments>http://blog.sematext.com/2011/12/21/lucene-solr-year-2011-in-review/#comments</comments>
		<pubDate>Thu, 22 Dec 2011 04:59:17 +0000</pubDate>
		<dc:creator>Rafał Kuć</dc:creator>
				<category><![CDATA[Lucene Ecosystem Digest]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=1692</guid>
		<description><![CDATA[The year 2011 is coming to an end and it&#8217;s time to reflect on the past 12 months.  Without further fluff, let&#8217;s look back and summarize all significant events that happened in Lucene and Solr world over the course of last dozen months. In the next few paragraphs we&#8217;ll go over major changes in Lucene [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1692&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The year 2011 is coming to an end and it&#8217;s time to reflect on the past 12 months.  Without further fluff, let&#8217;s look back and summarize all significant events that happened in Lucene and Solr world over the course of last dozen months. In the next few paragraphs we&#8217;ll go over major changes in Lucene and Solr, new blood, relevant conferences and books.</p>
<p>We should start by pointing out that this year Apache Lucene celebrated its 10 year anniversary as an Apache Software Foundation project.  Lucene itself is actually over 10 years old.  <a href="http://twitter.com/otisg">Otis</a> is one of the very few people from the early years who is still around.  While we didn&#8217;t celebrations any Solr anniversaries this year, we should note that Solr, too, has been around for quite a while and is in fact approaching its 6th year at ASF!</p>
<p>This year saw numerous changes and additions both in Lucene and Solr.  As a matter of fact, we&#8217;d venture to say we saw more changes in Lucene &amp; Solr this year than in any one year before.  In that sense, both projects are very much like wine &#8211; getting better with time. Lets take a look at a few of the most significant changes in 2011.</p>
<p>The much anticipated Near Real-Time search (<a href="http://search-lucene.com/?q=NRT">NRT</a>) functionality has arrived.  What this means is that documents that were just added to a Lucene/Solr index can immediately be made visible in search results.  This is big!  Of course, work on NRT is still in progress, but NRT is ready and you, like a number of our clients, should start using it.</p>
<p><a href="http://wiki.apache.org/solr/FieldCollapsing">Field Collapsing</a> was one of the most watched and voted for JIRA issues for many month.  This functionality was implemented this year and now Lucene and Solr users can group result on the basis of a field or a query. In addition, you can control the groups and even do faceting calculation on the groups, not single documents. A rather powerful feature.</p>
<p>From Lucene users&#8217; perspective it is also worth noting that Lucene finally got a <a href="https://issues.apache.org/jira/browse/LUCENE-3079">faceting module</a>.  Until now, faceting was available only in Solr.  If you are a pure Lucene users, you now don&#8217;t need Solr to calculate facets.</p>
<p>In the past modeling parent-child relationships in Lucene and Solr indices was not really possible &#8211; one had to flatten everything.  No longer &#8211; if you need to model a parent-child relationship in your index you can use the <a href="http://wiki.apache.org/solr/Join">Join</a> contrib module.  This Join functionality lets you join parent and child documents at query-time, while relaying on some assumptions about how documents were indexed.</p>
<p>Good and broad <a href="http://wiki.apache.org/solr/LanguageAnalysis">language support</a> is hugely important for any search solution and this year was good for Lucene and Solr in that department: <a href="http://wiki.apache.org/solr/LanguageAnalysis#Notes_about_solr.KStemFilterFactory">KStemFilter</a> English stemmer was added, full Unicode 4 support was added, a new Japanese and Chinese support was added, a new stemmer-protection mechanism was added, work on synonym filter RAM consumption reduction was done, etc.  Another big addition was <a href="http://wiki.apache.org/solr/LanguageAnalysis#Notes_about_solr.HunspellStemFilterFactory">integration with Hunspell</a>, which enables language-specific processing for all languages supported by Open Office.  That&#8217;s a lot of new languages we can now handle with Lucene and Solr! There is more.</p>
<p>Lucene 3.5.0 introduced significantly reduced the  term dictionary memory footprint. Big!  Right now, Lucene uses 3 to 5 times less memory for when dealing with terms dictionary, so it&#8217;s even less RAM consuming.</p>
<p>If you use Lucene and need to page through a lot of results you may run into problems. That&#8217;s why in Lucene 3.5.0 the <em>searchAfter</em> method was introduced which solves the deep paging problem once and for all!</p>
<p>There is also a new, fast and reliable Term Vector-based highlighter that both Lucene and Solr can use.</p>
<p>Dismax is great, but <a href="http://search-lucene.com/?q=Extended+Dismax">Extended Dismax</a> query parser added to Solr is even better &#8211; it extends Dismax query parser functionality and can further improve the quality of search results.</p>
<p>You can now also <a href="http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function">sort by function</a> (imagine sorting the results by distance from a point) and a new spatial search with filtering.</p>
<p>Solr also got the new <a href="http://wiki.apache.org/solr/Suggester">suggest</a>/autocomplete functionality based on FST automaton which significantly reduced the memory needed for such functionality.  If you need this for your search application, have a look at Sematext&#8217;s <a href="http://sematext.com/products/autocomplete/index.html">AutoComplete</a> &#8211; it has additional functionality that lots of our customers like.</p>
<p>While not yet officially released, the new <a href="https://issues.apache.org/jira/browse/SOLR-2700">transaction log</a> support provides Solr with a <a href="https://issues.apache.org/jira/browse/SOLR-2656">real-time get</a> operation &#8211; as soon as you add a document you can retrieve it by ID.  This will also be used for recovering nodes in SolrCloud.</p>
<p>And talking about <a href="http://wiki.apache.org/solr/SolrCloud">SolrCloud</a>&#8230;  We&#8217;ve covered SolrCloud on this blog before in <a href="http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/">Solr Digest, Spring-Summer 2011, Part 2: Solr Cloud and Near Real Time Search</a>, and we&#8217;ll be covering it again soon.  In short, SolrCloud will make it easier for people to operate larger Solr clusters by making use of more modern design principles and software components such as ZooKeeper, that make creation of distributed, cluster-based software/services easier.  Some of the core functionality is that there will be no single point of failure, any node will be able to handle any operation, there will be no traditional master-slave setup, there will be centralized cluster management and configuration, failovers will be automatic and in general things will be much more dynamic.  SolrCloud has not been released yet, but Solr developers are working on it and the codebase is seeing good progress.  We&#8217;ve used SolrCloud in a few recent engagements with our customers and were pleased by what we saw.</p>
<p>After merging developments of those two projects back in the 2010, we saw a speed up in development and releases. Lucene and Solr committers introduced five(!) new versions of both projects! In March, Lucene and Solr 3.1 was released with the Unicode 4 support, ReusableTokenStream, Spatial search, Vector-based Highlighter, Extended Dismax parser, and many more features and bug fixes. Then, after less than 3 months(!) on June 4th, version 3.2 was released. This release introduced a new and much desired results grouping module, <a href="http://search-lucene.com/?q=NRTCachingDirectory&amp;fc_project=Lucene">NRTCachingDirectory</a>, and highlighting performance improvements. Just one month later, on July 1st, Lucene and Solr 3.3 were introduced. That release included KStem stemmer, new implementations of Spellchecker, Field Collapsing in Solr and RAM usage reduction for autocomplete mechanism. By the end of summer there was another release, this time it was version 3.4 released on the 14th of September. Pure Lucene users got what Solr could do for a very long time &#8211; the long awaited faceting module contributed by IBM. Version 3.4 also included the new Join functionality, ability to turn off query and filter caches and faceting calculation for Field Collapsing. The last release of Lucene and Solr saw the light of day in late November. The 3.5.0 version consisted of huge memory reduction when dealing with term dictionaries, deep paging support, <a href="http://search-lucene.com/?q=SearcherManager&amp;fc_project=Lucene">SearcherManager</a> and <a href="http://search-lucene.com/?q=SearcherLifetimeManager&amp;fc_project=Lucene">SearcherLifetimeManager</a> classes along with language identification provided by Tika, as well as sortMissingFirst and sortMissingLast support for TrieFields.</p>
<p>During the last 12 months we attended three major conferences focused on search and big data themes. Lucene Revolution took place in San Francisco in May. <a href="http://twitter.com/otisg">Otis</a> gave a talk titled <em>&#8220;Search Analytics: What? Why? How?&#8221;</em> (<a href="http://www.lucidimagination.com/files/Gospodnetic%20Otis%20-%20ppt%20-%20Search%20Analytics%20What%20Why%20How.pdf" target="_blank">slides</a>) during the first day. There were a number of other good talks there and the complete conference agenda is available on  <a href="http://lucenerevolution.com/2011/agenda" target="_blank">http://lucenerevolution.com/2011/agenda</a>. Some videos are available as well. Next came the Berlin Buzzwords conference, a more grass-roots conference which took place between 4th and 10th of June. Otis gave the updated version of his <em>&#8220;Search Analytics: What? Why? How?&#8221;</em>. If you want to know more, check conference official site &#8211; <a href="http://berlinbuzzwords.de" target="_blank">http://berlinbuzzwords.de</a>. The last conference focused exclusively on Lucene and Solr was Lucene Eurocon 2011 in sunny and tourist-filled Barcelona between 17th and 20th of October. And guess what &#8211; we were there again (surprise!), this time in slightly larger numbers. Otis gave a talk about <em>&#8220;Search Analytics: Business Value &amp; BigData NoSQL Backend&#8221;</em> (<a href="http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011/search-analytics-business-value-bigdata-nosql-backend" target="_blank">video</a>, <a href="http://www.lucidimagination.com/sites/default/files/file/Eurocon2011/otis_gospodnetic_search_analytics_lucene_eurocon_2011.ppt" target="_blank">slides</a>) and <a href="http://twitter.com/kucrafal">Rafał</a> gave a talk on a pretty popular topic -<em> &#8220;Explaining &amp; Visualizing Solr &#8216;explain&#8217; information&#8221;</em> (<a href="http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011/understanding-visualising-solr-explain-information">video</a>, <a href="http://www.lucidimagination.com/sites/default/files/file/Eurocon2011/Understanding%20and%20Visualizing%20Solr%20Explain%20information%20-%20Solr.pl%20-%20version%202.pdf">slides</a>). No open source project can endure without regular injections of new blood. This year, Lucene and Solr development team was joined by a number of new people whose names may look familiar to you:</p>
<ul>
<li><a href="http://search-lucene.com/?q=&amp;sort=newestOnTop&amp;fc_author=Andi+Vajda">Andi Vajda</a></li>
<li><a href="http://search-lucene.com/?q=&amp;sort=newestOnTop&amp;fc_author=Chris+Male">Chris Male</a></li>
<li><a href="http://search-lucene.com/?q=&amp;sort=newestOnTop&amp;fc_author=Dawid+Weiss">Dawid Weiss</a></li>
<li><a href="http://search-lucene.com/?q=&amp;sort=newestOnTop&amp;fc_author=Erick+Erickson">Erick Erickson</a></li>
<li><a href="http://search-lucene.com/?q=&amp;sort=newestOnTop&amp;fc_author=Jan+Høydahl">Jan Høydahl</a></li>
<li><a href="http://search-lucene.com/?q=&amp;sort=newestOnTop&amp;fc_author=Martijn+v+Groningen">Martin van Groningen</a></li>
<li><a href="http://search-lucene.com/?q=&amp;sort=newestOnTop&amp;fc_author=Stanislaw+Osinski">Stanisław Osiński</a></li>
</ul>
<p>These 7 men are now Lucene and Solr committers and we look forward to our next year&#8217;s Year in Review post, where we hope to go over the good things these people will have brought to Lucene and Solr in 2012.</p>
<p>You know an open source project is successful when a whole book is dedicated to it.  You know a project is <em>very</em> successful when more than one book and more than one publisher cover it.  There were no new editions of <em>Lucene in Action</em> (<a href="http://www.amazon.com/Lucene-Action-Second-Covers-Apache/dp/1933988177">amazon</a>, <a href="http://manning.com/lucene">manning</a>) this year, but our own Rafał Kuć published his <em>Solr 3.1 Cookbook</em> (<a href="http://www.amazon.com/Apache-Solr-3-1-Cookbook-Rafal/dp/1849512183">amazon</a>) in July.  Rafał&#8217;s cookbook includes a number of recipes that can make your life easier when it comes to solving common problems with Apache Solr. Another book, <em>Apache Solr 3 Enterprise Search Server</em> (<a href="http://www.amazon.com/Apache-Solr-Enterprise-Search-Server/dp/1849516065">amazon</a>) by David Smiley and Eric Pugh was published in November. This is a major update to the first edition of the book and it covers a wide range of functionalities of Apache Solr.</p>
<p>&#8211; <a href="http://twitter.com/sematext">@sematext</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/1692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/1692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sematext.wordpress.com/1692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sematext.wordpress.com/1692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sematext.wordpress.com/1692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sematext.wordpress.com/1692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sematext.wordpress.com/1692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sematext.wordpress.com/1692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sematext.wordpress.com/1692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sematext.wordpress.com/1692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sematext.wordpress.com/1692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sematext.wordpress.com/1692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sematext.wordpress.com/1692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sematext.wordpress.com/1692/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1692&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2011/12/21/lucene-solr-year-2011-in-review/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d05d7fbddd69c91eefaad0b9ec624111?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">kucrafal</media:title>
		</media:content>
	</item>
		<item>
		<title>Search Analytics at Enterprise Search Summit Fall 2011 Presentation</title>
		<link>http://blog.sematext.com/2011/11/02/search-analytics-at-enterprise-search-summit-fall-2011-presentation/</link>
		<comments>http://blog.sematext.com/2011/11/02/search-analytics-at-enterprise-search-summit-fall-2011-presentation/#comments</comments>
		<pubDate>Wed, 02 Nov 2011 13:42:20 +0000</pubDate>
		<dc:creator>sematext</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[presentation]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=1652</guid>
		<description><![CDATA[Here is another take on Search Analytics, this one being presented at Enterprise Search Summit Fall 2011 in Washington DC, to an audience coming mainly from the US government agencies, very large enterprises, and large international companies with 10s of thousands of employees world wide.  The audience was good and posed a number of good [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1652&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Here is another take on <a href="http://sematext.com/search-analytics/index.html">Search Analytics</a>, this one being presented at <a href="http://enterprisesearchsummit.com/Fall2011/">Enterprise Search Summit Fall 2011</a> in Washington DC, to an audience coming mainly from the US government agencies, very large enterprises, and large international companies with 10s of thousands of employees world wide.  The audience was good and posed a number of good questions after the talk.  The full slide deck is below as well as in <a href="http://slideshare.net/sematext">Sematext@Slideshare</a>.</p>
<iframe src='http://www.slideshare.net/slideshow/embed_code/9981043' width='630' height='516'></iframe>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/1652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/1652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sematext.wordpress.com/1652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sematext.wordpress.com/1652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sematext.wordpress.com/1652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sematext.wordpress.com/1652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sematext.wordpress.com/1652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sematext.wordpress.com/1652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sematext.wordpress.com/1652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sematext.wordpress.com/1652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sematext.wordpress.com/1652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sematext.wordpress.com/1652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sematext.wordpress.com/1652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sematext.wordpress.com/1652/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1652&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2011/11/02/search-analytics-at-enterprise-search-summit-fall-2011-presentation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/18ddfa48f19355af2afd971dce4c899c?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">sematext</media:title>
		</media:content>
	</item>
		<item>
		<title>Solr Performance Monitoring with SPM</title>
		<link>http://blog.sematext.com/2011/10/28/solr-performance-monitoring-with-spm/</link>
		<comments>http://blog.sematext.com/2011/10/28/solr-performance-monitoring-with-spm/#comments</comments>
		<pubDate>Fri, 28 Oct 2011 19:58:47 +0000</pubDate>
		<dc:creator>sematext</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[jvm]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[monitor]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=1635</guid>
		<description><![CDATA[Originally delivered as Lightning Talk at Lucene Eurocon 2011 in Barcelona, this quick presentation shows how to use Sematext&#8217;s SPM service, currently free to use for unlimited time, to monitor Solr, OS, JVM, and more. We built SPM because we wanted to have a good and easy to use tool to help us with Solr performance [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1635&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Originally delivered as Lightning Talk at Lucene Eurocon 2011 in Barcelona, this quick presentation shows how to use Sematext&#8217;s <a href="http://sematext.com/spm/index.html">SPM service</a>, <span style="text-decoration:underline;">currently free to use for unlimited time</span>, to <a href="http://sematext.com/spm/solr-performance-monitoring/index.html">monitor Solr, OS, JVM</a>, and more.</p>
<p>We built SPM because we wanted to have a good and easy to use tool to help us with Solr performance tuning during engagements with our numerous Solr customers.  We hope you find our <a href="http://sematext.com/spm/index.html">Scalable Performance Monitoring</a> service useful!  Please let us know if you have any sort of feedback, from SPM functionality and usability to its speed.  Enjoy!</p>
<iframe src='http://www.slideshare.net/slideshow/embed_code/9928603' width='630' height='516'></iframe>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/1635/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/1635/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sematext.wordpress.com/1635/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sematext.wordpress.com/1635/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sematext.wordpress.com/1635/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sematext.wordpress.com/1635/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sematext.wordpress.com/1635/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sematext.wordpress.com/1635/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sematext.wordpress.com/1635/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sematext.wordpress.com/1635/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sematext.wordpress.com/1635/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sematext.wordpress.com/1635/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sematext.wordpress.com/1635/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sematext.wordpress.com/1635/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1635&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2011/10/28/solr-performance-monitoring-with-spm/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/18ddfa48f19355af2afd971dce4c899c?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">sematext</media:title>
		</media:content>
	</item>
		<item>
		<title>What&#8217;s Your Search Analytics Solution of Choice?</title>
		<link>http://blog.sematext.com/2011/10/25/whats-your-search-analytics-solution-of-choice/</link>
		<comments>http://blog.sematext.com/2011/10/25/whats-your-search-analytics-solution-of-choice/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 07:41:21 +0000</pubDate>
		<dc:creator>sematext</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[poll]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=1625</guid>
		<description><![CDATA[Here is a quick one.  What&#8217;s your Search Analytics tool of choice, if you have one? And if you are happy or unhappy with the solution you are using, we&#8217;d love to hear what it is about your solution that is making you happy or unhappy - please leave a comment. Thanks!<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1625&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Here is a quick one.  What&#8217;s your <span style="text-decoration:underline;"><strong>Search</strong></span> Analytics tool of choice, if you have one?</p>
<p>And if you are <strong>happy</strong> or <strong>unhappy</strong> with the solution you are using, <strong>we&#8217;d love to hear what it is about your solution that is making you happy or unhappy</strong> - please <strong>leave a comment<em>.</em></strong></p>
<p>Thanks!</p>
<a name="pd_a_5611837"></a><div class="PDS_Poll" id="PDI_container5611837" style="display:inline-block;"></div><div id="PD_superContainer"></div><noscript><a href="http://polldaddy.com/poll/5611837">Take Our Poll</a></noscript>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/1625/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/1625/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sematext.wordpress.com/1625/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sematext.wordpress.com/1625/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sematext.wordpress.com/1625/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sematext.wordpress.com/1625/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sematext.wordpress.com/1625/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sematext.wordpress.com/1625/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sematext.wordpress.com/1625/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sematext.wordpress.com/1625/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sematext.wordpress.com/1625/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sematext.wordpress.com/1625/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sematext.wordpress.com/1625/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sematext.wordpress.com/1625/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1625&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2011/10/25/whats-your-search-analytics-solution-of-choice/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/18ddfa48f19355af2afd971dce4c899c?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">sematext</media:title>
		</media:content>
	</item>
		<item>
		<title>Search Analytics: Business Value &amp; NoSQL Backend Presentation</title>
		<link>http://blog.sematext.com/2011/10/24/search-analytics-business-value-nosql-backend-presentation/</link>
		<comments>http://blog.sematext.com/2011/10/24/search-analytics-business-value-nosql-backend-presentation/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 01:27:43 +0000</pubDate>
		<dc:creator>sematext</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[flume]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hbase]]></category>
		<category><![CDATA[presentation]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=1621</guid>
		<description><![CDATA[Last week involved a few late nights for some of us at Sematext &#8211; we were busy readying our Search Analytics and Scalable Performance Monitoring services, as well as putting the final touches on the our Search Analytics: Business Value &#38; NoSQL Backend presentation for Lucene Eurocon in Barcelona. In the past we&#8217;ve given a [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1621&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Last week involved a few late nights for some of us at <a href="http://sematext.com/">Sematext</a> &#8211; we were busy readying our <a href="http://sematext.com/search-analytics/index.html">Search Analytics</a> and <a href="http://sematext.com/spm/index.html">Scalable Performance Monitoring</a> services, as well as putting the final touches on the our <strong><em>Search Analytics: Business Value &amp; NoSQL Backend</em></strong> presentation for Lucene Eurocon in Barcelona.</p>
<p>In the past we&#8217;ve given a few other public talks about Search Analytics and you can check them all out via <a href="http://blog.sematext.com/tag/analytics/">http://blog.sematext.com/tag/analytics/</a>.</p>
<iframe src='http://www.slideshare.net/slideshow/embed_code/9864913' width='630' height='516'></iframe>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/1621/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/1621/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sematext.wordpress.com/1621/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sematext.wordpress.com/1621/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sematext.wordpress.com/1621/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sematext.wordpress.com/1621/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sematext.wordpress.com/1621/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sematext.wordpress.com/1621/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sematext.wordpress.com/1621/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sematext.wordpress.com/1621/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sematext.wordpress.com/1621/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sematext.wordpress.com/1621/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sematext.wordpress.com/1621/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sematext.wordpress.com/1621/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&amp;blog=7678707&amp;post=1621&amp;subd=sematext&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2011/10/24/search-analytics-business-value-nosql-backend-presentation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/18ddfa48f19355af2afd971dce4c899c?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">sematext</media:title>
		</media:content>
	</item>
	</channel>
</rss>
