<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Sematext Blog</title>
	<atom:link href="http://blog.sematext.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.sematext.com</link>
	<description>Search, Big Data, Analytics, Natural Language Processing</description>
	<lastBuildDate>Sat, 18 May 2013 01:30:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.sematext.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Sematext Blog</title>
		<link>http://blog.sematext.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.sematext.com/osd.xml" title="Sematext Blog" />
	<atom:link rel='hub' href='http://blog.sematext.com/?pushpress=hub'/>
		<item>
		<title>What&#8217;s New in SPM 1.11.0</title>
		<link>http://blog.sematext.com/2013/04/24/spm-performance-monitoring-news/</link>
		<comments>http://blog.sematext.com/2013/04/24/spm-performance-monitoring-news/#comments</comments>
		<pubDate>Wed, 24 Apr 2013 09:00:30 +0000</pubDate>
		<dc:creator>sematext</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[monitor]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[spm]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=2873</guid>
		<description><![CDATA[We&#8217;ve been doing quite a bit of work behind the scenes in SPM.  Here are a few new things in the most recent release &#8211; 1.11.0 from April 16, 2013: We&#8217;ve added a Standalone Monitor.  So far the only way to monitor Solr, ElasticSearch, HBase, Sensei, or JVM with SPM was by running our SPM [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2873&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>We&#8217;ve been doing quite a bit of work behind the scenes in <a href="http://sematext.com/spm/index.html">SPM</a>.  Here are a few new things in the most recent release &#8211; <a href="http://sematext.com/spm/whats-new.html">1.11.0</a> from April 16, 2013:</p>
<ul>
<li>We&#8217;ve added a <strong><a href="https://sematext.atlassian.net/wiki/display/PUBSPM/SPM+Monitor+-+Standalone">Standalone Monitor</a></strong>.  So far the only way to monitor Solr, ElasticSearch, HBase, Sensei, or JVM with SPM was by running our SPM Monitor in-process, as a Javaagent.  Starting with this version you have an additional option of running the monitor in a separate process.</li>
<li><strong>SPM URLs are now sharable</strong>. Just copy the URL from your browser while using SPM and give it to anyone who has access to the same SPM App and they&#8217;ll see the exact same view as you &#8211; this means seeing the same report, same graph, same filter selection(s), and the same time range!  Because we use SPM with a lot of our Solr and ElasticSearch consulting clients, this is huge for us (and them!), as it helps us all see the exact same view.</li>
<li>We have <strong>simplified the SPM client installation</strong> a <strong>lot</strong> and have simplified the Collectd config a bit, too.</li>
<li>Both SPM Sender and SPM Monitor have been reworked. Monitor has the ability to register new applications and Sender has the ability to pick that up.  Sender should also be running with ionice if you have ionice available and a bit of unnecessary work was removed from Monitor, so it should consume even fewer resources than before.</li>
<li>We&#8217;ve added more info about SPM to the new <a href="https://sematext.atlassian.net/wiki/display/PUBSPM">SPM Wiki Space</a>.</li>
</ul>
<p>In addition to that, we&#8217;re working on:</p>
<ul>
<li><span style="line-height:13px;"><a href="http://blog.sematext.com/2013/04/23/sneak-peek-hadoop-monitoring-comes-to-spm/"><strong>Hadoop monitoring</strong></a>. This includes performance reports for both HDFS and MapReduce &#8211; NameNode, JobTracker, TaskTracker, and DataNode.</span></li>
<li><strong>SPM client packaging</strong>.  This means you&#8217;ll soon be able to install SPM client as a Deb package or RPM, and then automate with Puppet or Chef.</li>
<li>&#8230;</li>
</ul>
<p>There are a few more interesting things in the works, but we&#8217;ve got to leave something for later.  If you have not tried <a href="http://sematext.com/spm/index.html">SPM</a> yet, you should!  User feedback has been awesome and there are a number of good things on the 2013 roadmap!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/2873/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/2873/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2873&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2013/04/24/spm-performance-monitoring-news/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/18ddfa48f19355af2afd971dce4c899c?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">sematext</media:title>
		</media:content>
	</item>
		<item>
		<title>Sneak Peek: Hadoop Monitoring comes to SPM</title>
		<link>http://blog.sematext.com/2013/04/23/sneak-peek-hadoop-monitoring-comes-to-spm/</link>
		<comments>http://blog.sematext.com/2013/04/23/sneak-peek-hadoop-monitoring-comes-to-spm/#comments</comments>
		<pubDate>Tue, 23 Apr 2013 10:00:12 +0000</pubDate>
		<dc:creator>sematext</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hdfs]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[monitor]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[spm]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=2879</guid>
		<description><![CDATA[When it comes to Hadoop, they say you&#8217;ve got to monitor it and then monitor it some more.  Since our own Performance Monitoring and Search Analytics services run on top of Hadoop, we figured it was time to add Hadoop performance monitoring to SPM.  So here is a sneak peek at SPM for Hadoop.  If [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2879&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>When it comes to Hadoop, they say you&#8217;ve got to monitor it and then monitor it some more.  Since our own <a href="http://sematext.com/spm/index.html">Performance Monitoring</a> and <a href="http://sematext.com/search-analytics/index.html">Search Analytics</a> services run on top of Hadoop, we figured it was time to add Hadoop performance monitoring to SPM.  So here is a sneak peek at <a href="http://sematext.com/spm/hadoop-performance-monitoring/index.html">SPM for Hadoop</a>.  If you&#8217;d like to try it on your Hadoop cluster, we&#8217;ll be sending invitations soon and you can get on the <a href="http://sematext.com/spm/hadoop-performance-monitoring/index.html">private beta list</a> starting today!</p>
<p>In the mean time, here is a small sample of pretty self-explanatory reports from SPM for Hadoop, so you can get a sense of what&#8217;s available.  There are, of course, a number of other Hadoop-specific reports included, as well as server reports, filtering, alerting, multi-user support, report sharing, etc. etc.</p>
<p>Please don&#8217;t forget to tell us what else would you like us to monitor - <a href="https://docs.google.com/a/sematext.com/spreadsheet/viewform?formkey=dFlVbUNxOHR6UFlQem5XeGIzTjV6Qmc6MQ">select your candidates</a> - and if you like what you see and want a good monitoring tool for your Hadoop cluster, please sign up for <a href="http://sematext.com/spm/hadoop-performance-monitoring/index.html">private beta</a> now.</p>
<p style="text-align:center;"><strong>Click on any graph to see it in its full size and high quality.</strong></p>
<div id="attachment_2906" class="wp-caption aligncenter" style="width: 640px"><a href="http://sematext.files.wordpress.com/2013/04/spm-hadoop-nn-files.png"><img class="size-full wp-image-2906" title="Hadoop NameNode Files" alt="Hadoop NameNode Files" src="http://sematext.files.wordpress.com/2013/04/spm-hadoop-nn-files.png?w=630&#038;h=208" width="630" height="208" /></a><p class="wp-caption-text">Hadoop NameNode Files</p></div>
<p>.</p>
<div id="attachment_2905" class="wp-caption aligncenter" style="width: 640px"><a href="http://sematext.files.wordpress.com/2013/04/spm-hadoop-dn-readwrite.png"><img class="size-full wp-image-2905" alt="Hadoop DataNode Read-Write" src="http://sematext.files.wordpress.com/2013/04/spm-hadoop-dn-readwrite.png?w=630&#038;h=209" width="630" height="209" /></a><p class="wp-caption-text">Hadoop DataNode Read-Write</p></div>
<p>.</p>
<div id="attachment_2904" class="wp-caption aligncenter" style="width: 640px"><a href="http://sematext.files.wordpress.com/2013/04/spm-hadoop-jt-mr-runtime.png"><img class="size-full wp-image-2904" alt="Hadoop JobTracker MapReduce Runtime" src="http://sematext.files.wordpress.com/2013/04/spm-hadoop-jt-mr-runtime.png?w=630&#038;h=209" width="630" height="209" /></a><p class="wp-caption-text">Hadoop JobTracker MapReduce Runtime</p></div>
<p>.</p>
<div id="attachment_2903" class="wp-caption aligncenter" style="width: 640px"><a href="http://sematext.files.wordpress.com/2013/04/spm-hadoop-tt-tasks.png"><img class="size-full wp-image-2903" alt="Hadoop TaskTracker Tasks" src="http://sematext.files.wordpress.com/2013/04/spm-hadoop-tt-tasks.png?w=630&#038;h=194" width="630" height="194" /></a><p class="wp-caption-text">Hadoop TaskTracker Tasks</p></div>
<p>.</p>
<p>What else would you like us to monitor with <a href="http://sematext.com/spm/index.html">SPM</a>?  Please <a href="https://docs.google.com/a/sematext.com/spreadsheet/viewform?formkey=dFlVbUNxOHR6UFlQem5XeGIzTjV6Qmc6MQ">select your candidates</a>!</p>
<p>For announcements, promotions, discounts, service status, milk, cookies, and other goodies follow <a href="http://twitter.com/sematext">@sematext</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/2879/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/2879/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2879&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2013/04/23/sneak-peek-hadoop-monitoring-comes-to-spm/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/18ddfa48f19355af2afd971dce4c899c?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">sematext</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2013/04/spm-hadoop-nn-files.png" medium="image">
			<media:title type="html">Hadoop NameNode Files</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2013/04/spm-hadoop-dn-readwrite.png" medium="image">
			<media:title type="html">Hadoop DataNode Read-Write</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2013/04/spm-hadoop-jt-mr-runtime.png" medium="image">
			<media:title type="html">Hadoop JobTracker MapReduce Runtime</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2013/04/spm-hadoop-tt-tasks.png" medium="image">
			<media:title type="html">Hadoop TaskTracker Tasks</media:title>
		</media:content>
	</item>
		<item>
		<title>EC2 Neighbour Caught Stealing CPU</title>
		<link>http://blog.sematext.com/2013/04/22/ec2-neighbour-caught-stealing-cpu/</link>
		<comments>http://blog.sematext.com/2013/04/22/ec2-neighbour-caught-stealing-cpu/#comments</comments>
		<pubDate>Mon, 22 Apr 2013 16:54:08 +0000</pubDate>
		<dc:creator>sematext</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[monitor]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[spm]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=2867</guid>
		<description><![CDATA[We run all our services on top of AWS.  We like the flexibility and the speed of provisioning and decommissioning instances.  Unfortunately, this &#8220;new age&#8221; computing comes at a price.  Once in a while we hit an EC2 instance that has a loud, noisy neighbour.  Kind of like this: Unlike in real life, you can&#8217;t [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2867&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>We run all our services on top of AWS.  We like the flexibility and the speed of provisioning and decommissioning instances.  Unfortunately, this &#8220;new age&#8221; computing comes at a price.  Once in a while we hit an EC2 instance that has a loud, noisy neighbour.  Kind of like this:</p>
<div class="wp-caption aligncenter" style="width: 190px"><img class=" " alt="" src="http://www.studenthousing.lon.ac.uk/uploads/pics/Jack_Nicholson_in_the_Shining.jpg" width="180" height="239" /><p class="wp-caption-text">Noisy Jack Nicholson</p></div>
<p>Unlike in real life, you can&#8217;t really hear your noisy neighours in virtualized worlds.  This is kind of good &#8211; if you don&#8217;t hear them, they won&#8217;t bother you, right? Wrong! Oh yes, they will bother you, it&#8217;s just that without proper tools you won&#8217;t really realize when they&#8217;ve become load, how loud they got, and how much their noise is hurting you! So while it&#8217;s true you can&#8217;t hear these neighbours, you <em>can</em> see them!  Have a look at this graph from <a href="http://sematext.com/spm/index.html">SPM</a>:</p>
<div id="attachment_2886" class="wp-caption aligncenter" style="width: 640px"><a href="http://sematext.files.wordpress.com/2013/04/spm-steal-ec2.png"><img class="size-full wp-image-2886 " alt="Noisy neighbour(s) stealing your CPU" src="http://sematext.files.wordpress.com/2013/04/spm-steal-ec2.png?w=630&#038;h=243" width="630" height="243" /></a><p class="wp-caption-text">Noisy neighbour(s) stealing your CPU. Click for a larger and sharper image.</p></div>
<p>What we see here is a graph for <strong>CPU &#8220;steal time&#8221;</strong> for one of our HBase servers.  Luckily, this happens to be one of our HBase masters, which doesn&#8217;t do a ton of CPU intensive work.  What we see is that somebody, some other VM(s) sharing the same underlying host, is stealing about 30% of the CPU that really belongs to us.  Sounds bad, doesn&#8217;t it?  What exactly does that mean?  It means that about 30% of the time, applications on this instance (i.e., in our VM) try to use the CPU and the CPU is not available.  Bummer. Of course, this happens at a very, very low level, so from the outside, without this sort of insight, everything looks OK &#8212; it&#8217;s impossible to tell whether applications are not getting the CPU cycles when they need them by just looking at applications themselves.</p>
<p><strong>So, do you know how noisy your virtual neighbours are?  Do you know how much they steal from you?</strong></p>
<p>If you want to see what your neighbour situation is, whether on AWS or in some other virtualized environment, this is what you can do:</p>
<ol>
<li>Get <a href="http://sematext.com/spm/index.html">SPM</a> (pick &#8220;Java&#8221; for your SPM Application type once you get in even if you don&#8217;t need to monitor any Java apps, yes)</li>
<li>Run the installer, but don&#8217;t bother with the &#8220;monitor&#8221; (aka SPM Monitor) piece &#8211; all you need to know are your CPU metrics and for that we don&#8217;t need the monitor piece to be running at all actually.</li>
<li>Go to <a href="http://apps.sematext.com/">http://apps.sematext.com/</a> and look at the &#8220;CPU &amp; Mem&#8221; tab</li>
<li>Unselect all metrics other than &#8220;steal&#8221;, as show in the image above.  Select each server you want to check in the filter right of that graph (not shown in the image) to check one server at a time.</li>
<li>Make use of SPM alerts and set them up so you get notified when the CPU steal percentage goes over a certain threshold that you don&#8217;t want to tolerate. This way you&#8217;ll know when it&#8217;s time to consider moving to a new VM/instance.</li>
</ol>
<p><strong>What do you do if you find out you do have noisy neighbours?</strong></p>
<p>There are a couple of options:</p>
<ul>
<li>Be patient and hope they go to sleep or move out</li>
<li>Pack your belongings, launch a new EC2 instance, and move there after ensuring it doesn&#8217;t suffer from the same problem</li>
<li>Create more noise than your neighbour and drive him/her out instead. Yes, I just made this up.</li>
</ul>
<p>In this particular case, we&#8217;ll try the patient option first and move out only when the noise starts noticeably hurting us or we run out of patience.  Happy monitoring!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/2867/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/2867/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2867&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2013/04/22/ec2-neighbour-caught-stealing-cpu/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/18ddfa48f19355af2afd971dce4c899c?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">sematext</media:title>
		</media:content>

		<media:content url="http://www.studenthousing.lon.ac.uk/uploads/pics/Jack_Nicholson_in_the_Shining.jpg" medium="image" />

		<media:content url="http://sematext.files.wordpress.com/2013/04/spm-steal-ec2.png" medium="image">
			<media:title type="html">Noisy neighbour(s) stealing your CPU</media:title>
		</media:content>
	</item>
		<item>
		<title>Marketing Intern Position Available</title>
		<link>http://blog.sematext.com/2013/04/22/marketing-intern-at-sematext/</link>
		<comments>http://blog.sematext.com/2013/04/22/marketing-intern-at-sematext/#comments</comments>
		<pubDate>Mon, 22 Apr 2013 14:32:13 +0000</pubDate>
		<dc:creator>sematext</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[job]]></category>
		<category><![CDATA[marketing]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=2869</guid>
		<description><![CDATA[We&#8217;ve been very busy at Sematext working on our flagship products &#8211; SPM, Search Analytics, and &#8230;. more (we&#8217;ll be announcing something new soon).  We&#8217;ve received great positive feedback from users and customers.  We&#8217;re now looking for a Marketing Intern to join our highly distributed team and help us further generate and drive the demand [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2869&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>We&#8217;ve been very busy at <a href="http://sematext.com/">Sematext</a> working on our flagship products &#8211; <a href="http://sematext.com/spm/index.html">SPM</a>, <a href="http://sematext.com/search-analytics/index.html">Search Analytics</a>, and &#8230;. more (we&#8217;ll be announcing something new soon).  We&#8217;ve received great positive feedback from users and customers.  We&#8217;re now looking for a <a href="http://sematext.com/about/jobs.html#marketingIntern">Marketing Intern</a> to join our highly distributed team and help us further generate and drive the demand for our products.  This is flexible role that can be part-time or full-time, remote or local (<a href="http://sematext.com/about/contact.html">NYC</a>).  For more information about this opportunity and other open positions see our <a href="http://sematext.com/about/jobs.html">jobs page</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/2869/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/2869/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2869&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2013/04/22/marketing-intern-at-sematext/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/18ddfa48f19355af2afd971dce4c899c?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">sematext</media:title>
		</media:content>
	</item>
		<item>
		<title>Poll: Using SolrCloud or Not?</title>
		<link>http://blog.sematext.com/2013/02/25/poll-solr-cloud-or-not/</link>
		<comments>http://blog.sematext.com/2013/02/25/poll-solr-cloud-or-not/#comments</comments>
		<pubDate>Mon, 25 Feb 2013 18:01:27 +0000</pubDate>
		<dc:creator>sematext</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[poll]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[solrcloud]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=2847</guid>
		<description><![CDATA[We know that as of February 2013, of those Solr users who follow Sematext Blog about 75% use one some version of Solr 4.x.  But today we are trying to get to another interesting stat: What portion of Solr 4.x users use SolrCloud? Let&#8217;s find out!  Please tweet this to help us get more votes and better stats. Please [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2847&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>We know that as of February 2013, of those Solr users who follow <a href="http://blog.sematext.com/">Sematext Blog</a> about <a href="http://blog.sematext.com/2013/02/15/poll-which-solr-version-are-you-using/">75% use one some version of Solr 4.x</a>.  But today we are trying to get to another interesting stat:</p>
<p style="text-align:center;"><strong>What portion of Solr 4.x users use SolrCloud?</strong></p>
<p style="text-align:center;"><strong></strong>Let&#8217;s find out!  Please <strong><a href="https://twitter.com/intent/tweet?source=webclient&amp;text=Poll:%20Using%20SolrCloud%20or%20Not?%20%20%20http://blog.sematext.com/2013/02/25/poll-solr-cloud-or-not/%20via%20@sematext">tweet this</a></strong> to help us get more votes and better stats.</p>
<p>Please <strong>vote only if you are using Solr 4.x</strong>.  Please <strong>do NOT vote if you are using 1.x or 3.x version of Solr</strong>.</p>
<a name="pd_a_6920703"></a>
<div class="PDS_Poll" id="PDI_container6920703" data-settings="{&quot;url&quot;:&quot;http:\/\/static.polldaddy.com\/p\/6920703.js&quot;}" style="display:inline-block;"></div>
<div id="PD_superContainer"></div>
<noscript><a href="http://polldaddy.com/poll/6920703">Take Our Poll</a></noscript>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/2847/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/2847/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2847&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2013/02/25/poll-solr-cloud-or-not/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/18ddfa48f19355af2afd971dce4c899c?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">sematext</media:title>
		</media:content>
	</item>
		<item>
		<title>Poll: Which Solr version are you using?</title>
		<link>http://blog.sematext.com/2013/02/15/poll-which-solr-version-are-you-using/</link>
		<comments>http://blog.sematext.com/2013/02/15/poll-which-solr-version-are-you-using/#comments</comments>
		<pubDate>Fri, 15 Feb 2013 17:57:14 +0000</pubDate>
		<dc:creator>sematext</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[poll]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[solrcloud]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=2839</guid>
		<description><![CDATA[With Solr 4.1 recently released, let&#8217;s see which version(s) of Solr people are using.  Please tweet it to help us get more votes and better stats.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2839&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>With Solr 4.1 recently released, let&#8217;s see which version(s) of Solr people are using.  Please <a href="https://twitter.com/intent/tweet?source=webclient&amp;text=Poll:%20Which%20version%20of%20Solr%20are%20you%20using?  %20http://blog.sematext.com/2013/02/15/poll-which-solr-version-are-you-using/%20via%20@sematext">tweet it</a> to help us get more votes and better stats.</p>
<a name="pd_a_6900588"></a>
<div class="PDS_Poll" id="PDI_container6900588" data-settings="{&quot;url&quot;:&quot;http:\/\/static.polldaddy.com\/p\/6900588.js&quot;}" style="display:inline-block;"></div>
<div id="PD_superContainer"></div>
<noscript><a href="http://polldaddy.com/poll/6900588">Take Our Poll</a></noscript>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/2839/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/2839/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2839&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2013/02/15/poll-which-solr-version-are-you-using/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/18ddfa48f19355af2afd971dce4c899c?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">sematext</media:title>
		</media:content>
	</item>
		<item>
		<title>Solr vs. ElasticSearch: Part 6 &#8211; User &amp; Dev Communities</title>
		<link>http://blog.sematext.com/2013/01/22/solr-vs-elasticsearch-userdev-communities/</link>
		<comments>http://blog.sematext.com/2013/01/22/solr-vs-elasticsearch-userdev-communities/#comments</comments>
		<pubDate>Tue, 22 Jan 2013 09:34:46 +0000</pubDate>
		<dc:creator>Rafał Kuć</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[elasticsearch]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[solrcloud]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=2727</guid>
		<description><![CDATA[One of the questions after my talk during the recent ApacheCon EU was what I thought about the communities of the two search engines I was comparing. Not surprisingly, this is also a question we often address in our consulting engagements.  As a part of our Apache Solr vs ElasticSearch post series we decided to [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2727&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>One of the questions after <a href="http://blog.sematext.com/2012/11/08/slides-battle-of-the-giants-solr-4-0-vs-elasticsearch-0-20-0/" target="_blank">my talk</a> during the recent ApacheCon EU was what I thought about the communities of the two search engines I was comparing. Not surprisingly, this is also a question we often address in our <a href="http://sematext.com/services/index.html">consulting engagements</a>.  As a part of our Apache Solr vs ElasticSearch post series we decided to step away from the technical aspects of SolrCloud vs. ElasticSearch and look at the communities gathered around thesee two projects. If you haven&#8217;t read the previous posts about Apache Solr vs. ElasticSearch here are pointers to all of them:</p>
<ul>
<li><a href="http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/">Solr vs. ElasticSearch: Part 1 &#8211; Overview</a></li>
<li><a href="http://blog.sematext.com/2012/09/04/solr-vs-elasticsearch-part-2-data-handling/">Solr vs. ElasticSearch: Part 2 &#8211; Data Handling</a></li>
<li><a href="http://blog.sematext.com/2012/10/01/solr-vs-elasticsearch-part-3-searching/">Solr vs. ElasticSearch: Part 3 &#8211; Searching</a></li>
<li><a href="http://blog.sematext.com/2012/10/30/solr-vs-elasticsearch-part-4-faceting/">Solr vs. ElasticSearch: Part 4 &#8211; Faceting</a></li>
<li><a title="Management API Capabilities" href="http://blog.sematext.com/2013/01/08/solr-vs-elasticsearch-part-5-management-api-capabilities/">Solr vs. ElasticSearch: Part 5 &#8211; Management API Capabilities</a></li>
<li><a href="http://blog.sematext.com/2013/01/22/solr-vs-elasticsearch-userdev-communities/">Solr vs. ElasticSearch: Part 6 &#8211; User &amp; Dev Communities Compared</a></li>
</ul>
<p><span id="more-2727"></span></p>
<h2>Users Community</h2>
<p>Let&#8217;s start by discussing the user activity around both ElasticSearch and Apache Solr.</p>
<h3>Users Activity</h3>
<p>We started working on this post right before the Christmas break of 2012. During that time we decided to see how active the user base was for both ElasticSearch and Apache Solr. To do that we used our handy <a href="http://search-lucene.com">search-lucene.com</a> service and we compared the number of email messages sent to both user list. So let&#8217;s see how they stack up.</p>
<h4>Apache Solr</h4>
<p style="text-align:center;"><a href="http://blog.sematext.com/2013/01/22/solr-vs-elasticsearch-userdev-communities/solr_user/" rel="attachment wp-att-2747"><img class=" wp-image-2747 aligncenter" alt="Solr User Mailing List Activity" src="http://sematext.files.wordpress.com/2012/12/solr_user.png?w=659&#038;h=138" width="659" height="138" /></a></p>
<p>As you can see, Solr user activity varies slightly from month to month which is perfectly understandable. Each bar on the chart represents two weeks. We can see the number of messages ranges from about 390 mails to about 770 per two weeks, which gives us between 800 to 1600 mails per month is we do a bit of rounding up. Quite impressive I must say!</p>
<h4>ElasticSearch</h4>
<p style="text-align:center;"><a href="http://blog.sematext.com/2013/01/22/solr-vs-elasticsearch-userdev-communities/es_user/" rel="attachment wp-att-2749"><img class="aligncenter  wp-image-2749" alt="ElasticSearch User Mailing List" src="http://sematext.files.wordpress.com/2012/12/es_user.png?w=646&#038;h=147" width="646" height="147" /></a></p>
<p>Now let&#8217;s discuss the ElasticSearch side. First a few words of explanation. If you look at the above chart you might think that ElasticSearch mailing list was silent and then users started posting on October 2012. That&#8217;s clearly not true &#8211; it is just that we didn&#8217;t add ElasticSearch to <a href="http://search-lucene.com/" target="_blank">search-lucene.com</a> until recently.  However, you may see that the number of messages during the same period of time is quite similar &#8211; both Solr and ElasticSearch saw about 670 &#8211; 730 messages during a two weeks period. This gives us 2 emails per hour on average.</p>
<h3>Distinct Users</h3>
<p>Email volume is one thing, but I was always curious about how many different people write emails on the mailing lists. Having such number would give us an additional understanding of the structure of the community around a particular search engine, new users, etc. However, we should not look only at this number, but also on things like most active people on the mailing lists. In both cases we&#8217;ve looked at the same period from 1 to 30 December 2012. We&#8217;ve used the data we index for <a href="http://www.search-lucene.com">search-lucene.com</a> to calculate these numbers.</p>
<h4>Apache Solr</h4>
<p>In case of Apache Solr there were <strong>234</strong> unique users sending mail to the users mailing list. Almost 8 unique users per day on average, nice <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<h4>ElasticSearch</h4>
<p>In case of ElasticSearch there were <strong>271</strong> unique users sending mail to the users mailing list. This gives us about 9 unique users per day on average which is even nicer.</p>
<h2>Resources Available</h2>
<p>As far as resources available, both ElasticSearch and Solr have great documentation. On Solr wiki site (<a title="Solr Wiki Address" href="http://wiki.apache.org/solr/">http://wiki.apache.org/solr/</a>) you can find information about most of the components and of course the tutorial for beginners. ElasticSearch is very similar, with tutorial and very good description of functionality available at <a href="http://www.elasticsearch.org/">http://www.elasticsearch.org/</a>. In addition to that, there are three books published about Apache Solr (in English) and more (e.g. my <a href="http://www.packtpub.com/apache-solr-4-cookbook/book" target="_blank">Apache Solr 4 Cookbook</a>) coming soon. As of now, there are no published books about ElasticSearch, but&#8230;. stay tuned <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<h2>Search Trending</h2>
<p>We also decided to use uncle Google to look at trends about Apache Solr and ElasticSearch. Let&#8217;s look at the following diagram:</p>
<p><a href="http://sematext.files.wordpress.com/2013/01/solr-vs-es-google-trends.png"><img class="alignnone size-full wp-image-2834" alt="solr-vs-es-google-trends" src="http://sematext.files.wordpress.com/2013/01/solr-vs-es-google-trends.png?w=630&#038;h=200" width="630" height="200" /></a></p>
<p>As you can see, until early 2010 there was no interest in ElasticSearch at all, at least looking from the point of view of users searching about it. Note that we published the <a href="http://blog.sematext.com/2010/05/03/elastic-search-distributed-lucene/" target="_blank">interview with Shay Banon</a> over two and a half years ago &#8211; back in May 2010 &#8211; before ElasticSearch registered on Google&#8217;s search trends radar! SolrCloud didn&#8217;t exist back then, so people slowly started looking for information on SolrCloud later in 2010.  The volume of searches mentioning SolrCloud is very small even today &#8211; perhaps because people tend to search for Solr and not SolrCloud.  And while SolrCloud is still a new kid around the block, searches for Solr dwarf searches for ElasticSearch despite the buzz surrounding ElasticSearch.</p>
<p>Of course, the above doesn&#8217;t say anything about the number of users of both search engines, but it definitely shows some information about the interest in these technologies.</p>
<h2>Developers and the Code</h2>
<p>If you are familiar with ElasticSearch and Solr you&#8217;ll probably know that ElasticSearch is much younger than Apache Solr. Apache Solr was created by Yonik Seeley in 2004 and donated to Apache Software Foundation. On the other hand, the first version of ElasticSearch was released by Shay Banon in 2010. This is quite important to say before we can talk about differences about contributors and the code itself. But getting to the point &#8211; we thought that it may be interesting to see both Apache Solr and ElasticSearch look from the Bird&#8217;s Eye perspective. To do that we&#8217;ve used the statistics and charts from <a href="http://www.ohloh.net">ohloh.net</a>. So, let&#8217;s see what they look like.</p>
<h3>Apache Solr</h3>
<h4>Code Statistics</h4>
<p>If we look at the current statistics, at the beginning of January 2013 Solr had more than 212k lines of code, with almost 7000 commits and 38 contributors. However, keep in mind that contributors are people that committed the code, not necessarily the ones that actually implemented it and provided the patch, so the actual number of contributors is much higher. The chart looks like this: <a href="http://sematext.files.wordpress.com/2013/01/lines_of_code_solr1.png"><img class="aligncenter size-full wp-image-2802" alt="lines_of_code_solr" src="http://sematext.files.wordpress.com/2013/01/lines_of_code_solr1.png?w=630"   /></a></p>
<h4>Top Contributors</h4>
<p>If we look at top contributors we see Mark Miller on top, followed by Yonik Seeley and Robert Muir in the third place <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  <a href="http://sematext.files.wordpress.com/2013/01/top_commiters_solr.png"><img class="aligncenter size-full wp-image-2803" alt="top_commiters_solr" src="http://sematext.files.wordpress.com/2013/01/top_commiters_solr.png?w=630"   /></a></p>
<h4>Active Contributors</h4>
<p>One more interesting thing is the number of contributors that were actively involved during a given period of time. Looking at Apache Solr since 2006 we can see the following: <a href="http://sematext.files.wordpress.com/2013/01/active_commiters_solr.png"><img class="aligncenter size-full wp-image-2804" alt="active_commiters_solr" src="http://sematext.files.wordpress.com/2013/01/active_commiters_solr.png?w=630"   /></a> I think that we can say that we had a stable growth of active contributors starting from 2006 until June 2012 with a bit of downfall shortly after that. However I don&#8217;t think that the number active contributors will be dropping, it&#8217;s more likely due to a bit of exhaustion of releasing Apache Lucene and Solr 4.0 <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<h3>ElasticSearch</h3>
<h4>Code Statistics</h4>
<p>Current code statistics for ElasticSeach shows that the code base just hit the 240k LOC  with about 4.2k commits and 87 contributors. <a href="http://sematext.files.wordpress.com/2013/01/lines_of_code_es.png"><img class="aligncenter size-full wp-image-2809" alt="lines_of_code_es" src="http://sematext.files.wordpress.com/2013/01/lines_of_code_es.png?w=630"   /></a></p>
<h4>Top Contributors</h4>
<p>As we&#8217;d expect, Shay Banon is the top contributor to ElasticSearch. In the second place on the podium we have Martijn van Groningen and Igor Motov in the third place: <a href="http://sematext.files.wordpress.com/2013/01/top_commiters_es.png"><img class="aligncenter size-full wp-image-2810" alt="top_commiters_es" src="http://sematext.files.wordpress.com/2013/01/top_commiters_es.png?w=630"   /></a></p>
<h4>Active Contributors</h4>
<p>And finally the active contributors. We don&#8217;t have the same time frame comparing to Apache Solr, which is understandable as ElasticSearch is younger, but still we can see what is happening. <a href="http://sematext.files.wordpress.com/2013/01/active_commiters_es.png"><img class="aligncenter size-full wp-image-2811" alt="active_commiters_es" src="http://sematext.files.wordpress.com/2013/01/active_commiters_es.png?w=630"   /></a> As you can see from the first quarter of 2011 there was a number of active contributors varying from 5 to about 10 with the top at the same time as in Solr &#8211; 12 active contributors in June 2012.</p>
<h2>Summary</h2>
<p>As everything in this post indicates, both projects&#8217; development and user communities are strong, active, and about equal. 2013 will be an interesting year for both projects.</p>
<p>We are nearing the end of our SolrCloud vs. ElasticSearch series.  What else would you like us to cover?  Please use the comments to let us know!</p>
<p>- <a href="http://twitter.com/kucrafal">@kucrafal</a>, <a href="http://twitter.com/sematext">@sematext</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/2727/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/2727/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2727&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2013/01/22/solr-vs-elasticsearch-userdev-communities/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d05d7fbddd69c91eefaad0b9ec624111?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">kucrafal</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2012/12/solr_user.png" medium="image">
			<media:title type="html">Solr User Mailing List Activity</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2012/12/es_user.png" medium="image">
			<media:title type="html">ElasticSearch User Mailing List</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2013/01/solr-vs-es-google-trends.png" medium="image">
			<media:title type="html">solr-vs-es-google-trends</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2013/01/lines_of_code_solr1.png" medium="image">
			<media:title type="html">lines_of_code_solr</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2013/01/top_commiters_solr.png" medium="image">
			<media:title type="html">top_commiters_solr</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2013/01/active_commiters_solr.png" medium="image">
			<media:title type="html">active_commiters_solr</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2013/01/lines_of_code_es.png" medium="image">
			<media:title type="html">lines_of_code_es</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2013/01/top_commiters_es.png" medium="image">
			<media:title type="html">top_commiters_es</media:title>
		</media:content>

		<media:content url="http://sematext.files.wordpress.com/2013/01/active_commiters_es.png" medium="image">
			<media:title type="html">active_commiters_es</media:title>
		</media:content>
	</item>
		<item>
		<title>Solr vs ElasticSearch: Part 5 &#8211; Management API Capabilities</title>
		<link>http://blog.sematext.com/2013/01/08/solr-vs-elasticsearch-part-5-management-api-capabilities/</link>
		<comments>http://blog.sematext.com/2013/01/08/solr-vs-elasticsearch-part-5-management-api-capabilities/#comments</comments>
		<pubDate>Tue, 08 Jan 2013 14:40:04 +0000</pubDate>
		<dc:creator>Rafał Kuć</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[elasticsearch]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[solrcloud]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=2597</guid>
		<description><![CDATA[In previous posts, all listed below, we&#8217;ve discussed general architecture, full text search capabilities and facet aggregations possibilities. However, till now we have not discussed any of the administration and management options and things you can do on a live cluster without any restart. So let&#8217;s get into it and see what Apache Solr and [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2597&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In previous posts, all listed below, we&#8217;ve discussed general architecture, full text search capabilities and facet aggregations possibilities. However, till now we have not discussed any of the administration and management options and things you can do on a live cluster without any restart. So let&#8217;s get into it and see what Apache Solr and ElasticSearch have to offer.</p>
<ul>
<li><a href="http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/">Solr vs. ElasticSearch: Part 1 &#8211; Overview</a></li>
<li><a href="http://blog.sematext.com/2012/09/04/solr-vs-elasticsearch-part-2-data-handling/">Solr vs. ElasticSearch: Part 2 &#8211; Data Handling</a></li>
<li><a href="http://blog.sematext.com/2012/10/01/solr-vs-elasticsearch-part-3-searching/">Solr vs. ElasticSearch: Part 3 &#8211; Searching</a></li>
<li><a href="http://blog.sematext.com/2012/10/30/solr-vs-elasticsearch-part-4-faceting/">Solr vs. ElasticSearch: Part 4 &#8211; Faceting</a></li>
<li><a title="Solr vs ElasticSearch: Part 5 – Management API Capabilities" href="http://blog.sematext.com/2013/01/08/solr-vs-elasticsearch-part-5-management-api-capabilities/">Solr vs. ElasticSearch: Part 5 &#8211; Management API Capabilities</a></li>
<li><a href="http://blog.sematext.com/2013/01/22/solr-vs-elasticsearch-userdev-communities/">Solr vs. ElasticSearch: Part 6 &#8211; User &amp; Dev Communities Compared</a></li>
</ul>
<p><span id="more-2597"></span></p>
<h2>Input/Output Format</h2>
<h3>ElasticSearch</h3>
<p>As you probably know ElasticSearch offers a single way to talk to it &#8211; its HTTP REST API &#8211; JSON structured queries and responses. In most cases, especially during query time, it is very handy, because it let&#8217;s you perfectly control the structure of your queries and thus control the logic.</p>
<h3>Apache Solr</h3>
<p>On the other hand we have Apache Solr. If you are familiar with it you know that in order to send a query to Solr one needs to send it using URL request parameters.  This makes communication much less structured compared to ElasticSearch JSON format. In response you can get multiple response formats that are supported out of the box, like the default XML, JSON, CSV, PHP serialized, or Ruby.</p>
<h2>Statistics API</h2>
<p>Most of the time your search cluster will be fine and you won&#8217;t have any problems with it. However, there are times where you may need to see what is happening inside Apache Solr or ElasticSearch to diagnose problems, such as performance problems (hello <a href="http://sematext.com/spm/index.html">SPM</a>!), stability issues, or anything like that. In such cases, both search engines provide some amount of statistics.</p>
<h3>Apache Solr</h3>
<p>In Solr we can use JMX or HTTP requests to retrieve information about handler usage, cache statistics or information about most Solr components.</p>
<h3>ElasticSearch</h3>
<p>ElasticSearch was designed to be able to return various statistics about itself. With the REST API calls we can get information from the simplest ones like cluster health or nodes statistic, to extent information like the detailed ones about indices with merges, refreshes. The same stats are available via JMX, too.</p>
<h2>Settings API</h2>
<h3>ElasticSearch</h3>
<p>ElasticSearch allows us to modify most of the configuration values dynamically. For example, you can clear you caches (or just specific type of cache), you can move shards and replicas to specific nodes in your cluster. In addition to that you are also allowed to update mappings (to some extent), define warming queries (since version 0.20), etc. You can even shut down a single node or a whole cluster with the use of a single HTTP call. Of course, this is just an example and doesn&#8217;t cover all the possibilities exposed by ElasticSearch.</p>
<h3>Apache Solr</h3>
<p>In case of Apache Solr we do not (yet) have the possibility of changing configuration values (like warming queries) with API calls.</p>
<h2>Index / Collection Administration Capabilities</h2>
<p>In addition to the capabilities mentioned above both ElasticSearch and Apache Solr provide APIs that allows us to modify our deployment when it comes to collections and indices.</p>
<h3>Apache Solr</h3>
<p>Pre 4.0 we were able to manipulate cores inside our Solr instances. We could create new cores, reload them, get their status, rename, swap two of them, and finally remove a core from the instance. With Solr 4.0, a new API was introduced that is built on top of core admin API &#8211; the collections API. It allows us to create collections on started SolrCloud cluster, reload them and of course delete them. As the collections API is built on top of the core admin API,  if you create a new collection all the needed cores on all instances will be created. Of course, the same goes for reloading and deleting &#8211; all the cores will be appropriately informed and processed.</p>
<h3>ElasticSearch</h3>
<p>In case of ElasticSearch we can create and delete indices by running a simple HTTP command (GET or DELETE method) with the index name we are interested in. In addition to that, with a simple API call we can increase and decrease the number of replicas without the need of shutting down nodes or creating new nodes. With the newer ElasticSearch versions we can even manipulate shard placement with the cluster reroute API. With the use of that API we can move shards between nodes, we can cancel shard allocation process and we can also force shard allocation &#8211; everything on a live cluster.</p>
<h2>Query Analysis</h2>
<h3>Apache Solr</h3>
<p>If you&#8217;ve used Apache Solr you probably come across the <em>debugQuery</em> parameter and the <em>explainOther</em> parameter. Those two allows to see the detailed score calculation for the given query and documents found in the results (the <em>debugQuery</em> parameter) and the specified ones (the <em>explainOther</em>). In addition, we can also see how the analysis process is done with the use of analysis handler or by using the analysis page of the Solr administration panel provided with Solr.</p>
<p>For example this is how debug information returned by Solr can look like:</p>
<pre>&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
.
.
.
&lt;lst name="debug"&gt;
 &lt;str name="rawquerystring"&gt;ten&lt;/str&gt;
 &lt;str name="querystring"&gt;ten&lt;/str&gt;
 &lt;str name="parsedquery"&gt;(+DisjunctionMaxQuery((prefixTok:ten)~0.01) ())/no_coord&lt;/str&gt;
 &lt;str name="parsedquery_toString"&gt;+(prefixTok:ten)~0.01 ()&lt;/str&gt;
 &lt;str name="QParser"&gt;DisMaxQParser&lt;/str&gt;
 &lt;null name="altquerystring"/&gt;
 &lt;null name="boostfuncs"/&gt;
 &lt;lst name="timing"&gt;
  &lt;double name="time"&gt;2.0&lt;/double&gt;
  &lt;lst name="prepare"&gt;
   &lt;double name="time"&gt;1.0&lt;/double&gt;
   &lt;lst name="org.apache.solr.handler.component.QueryComponent"&gt;
    &lt;double name="time"&gt;1.0&lt;/double&gt;
   &lt;/lst&gt;
   &lt;lst name="org.apache.solr.handler.component.FacetComponent"&gt;
    &lt;double name="time"&gt;0.0&lt;/double&gt;
   &lt;/lst&gt;
   &lt;lst name="org.apache.solr.handler.component.MoreLikeThisComponent"&gt;
    &lt;double name="time"&gt;0.0&lt;/double&gt;
   &lt;/lst&gt;
   &lt;lst name="org.apache.solr.handler.component.HighlightComponent"&gt;
    &lt;double name="time"&gt;0.0&lt;/double&gt;
   &lt;/lst&gt;
   &lt;lst name="org.apache.solr.handler.component.StatsComponent"&gt;
    &lt;double name="time"&gt;0.0&lt;/double&gt;
   &lt;/lst&gt;
   &lt;lst name="org.apache.solr.handler.component.DebugComponent"&gt;
    &lt;double name="time"&gt;0.0&lt;/double&gt;
   &lt;/lst&gt;
 &lt;/lst&gt;
 &lt;lst name="process"&gt;
  &lt;double name="time"&gt;1.0&lt;/double&gt;
  &lt;lst name="org.apache.solr.handler.component.QueryComponent"&gt;
   &lt;double name="time"&gt;0.0&lt;/double&gt;
  &lt;/lst&gt;
  &lt;lst name="org.apache.solr.handler.component.FacetComponent"&gt;
   &lt;double name="time"&gt;0.0&lt;/double&gt;
  &lt;/lst&gt;
  &lt;lst name="org.apache.solr.handler.component.MoreLikeThisComponent"&gt;
   &lt;double name="time"&gt;0.0&lt;/double&gt;
  &lt;/lst&gt;
  &lt;lst name="org.apache.solr.handler.component.HighlightComponent"&gt;
   &lt;double name="time"&gt;0.0&lt;/double&gt;
  &lt;/lst&gt;
  &lt;lst name="org.apache.solr.handler.component.StatsComponent"&gt;
   &lt;double name="time"&gt;0.0&lt;/double&gt;
  &lt;/lst&gt;
  &lt;lst name="org.apache.solr.handler.component.DebugComponent"&gt;
   &lt;double name="time"&gt;1.0&lt;/double&gt;
  &lt;/lst&gt;
 &lt;/lst&gt;
&lt;/lst&gt;
&lt;lst name="explain"&gt;
 &lt;str name="Ten mices"&gt;
1.3527006 = (MATCH) sum of:
 1.3527006 = (MATCH) weight(prefixTok:ten in 35158) [DefaultSimilarity], result of:
 1.3527006 = fieldWeight in 35158, product of:
 1.4142135 = tf(freq=2.0), with freq of:
 2.0 = termFreq=2.0
 6.1216245 = idf(docFreq=6355, maxDocs=1065313)
 0.15625 = fieldNorm(doc=35158)
 &lt;/str&gt;
&lt;/lst&gt;
&lt;/lst&gt;
&lt;/response&gt;</pre>
<p>As you can see, we can get information about timings of each of the used components. In addition to that, we see the parsed query and of course the explain information showing us how the document score was calculated.</p>
<h3>ElasticSearch</h3>
<p>ElasticSearch exposes three separate REST end-points to analyze our queries, documents and explain the documents score. The <em>Analyze API</em> allows us to test our analyzer on a specified text in order to see how it is processed and is similar to the analysis page functionality of Solr. The <em>Explain API</em> provides us with information about the score calculation for a given documents. Finally, the <em>Validate API </em>can validate our query to see is it is proper and how expensive it can be.</p>
<p>For example, this is what Explain API response looks like:</p>
<pre>{
 "ok" : true,
 "_index" : "docs",
 "_type" : "doc",
 "_id" : "1",
 "matched" : true,
 "explanation" : {
   "value" : 0.15342641,
   "description" : "fieldWeight(_all:document in 0), product of:",
   "details" : [ {
     "value" : 1.0,
     "description" : "tf(termFreq(_all:document)=1)"
   }, {
     "value" : 0.30685282,
     "description" : "idf(docFreq=1, maxDocs=1)"
   }, {
     "value" : 0.5,
     "description" : "fieldNorm(field=_all, doc=0)"
   } ]
 }
}</pre>
<p>You can see the description about score calculation that is returned from the Explain API.</p>
<h2>Before We End</h2>
<p>There are a few words more we wanted to write before summarizing this comparison. First of all the above mentioned APIs and possibilities are not all that it is available, especially when it comes to ElasticSearch. For example, with ElasticSearch you can clear caches on the index level, you can check index and types existence, you can retrieve and manage your warming queries, clear the transaction log by running the Flush API, or  even close an index or open those that were closed. We wanted to point some differences and similarities between Apache Solr and ElasticSearch, but we didn&#8217;t want to make a summary of the documentation. <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  So, if you are interested in some functionality and you don&#8217;t know if it exists, just send a mail to Apache Solr or ElasticSearch mailing list or leave a comment here, and we will be glad to help.</p>
<h2>Summary</h2>
<p>When we first started the Solr vs ElasticSearch series we planned to initially divide the series into five posts, which are now published. However after seeing the popularity of the series and the amount of feedback we&#8217;ve received, we decided to extend the series. You can soon expect the next part, which will be dedicated to non-technical, but deeply important and interesting aspects of both search servers. After that, we&#8217;ll get back to the technical details with the subsequent post  dedicated to score influence capabilities, describing how we can change the default Lucene scoring and influence it from configuration, during indexing time and finally during querying.</p>
<p>If you liked this post, please <a href="https://twitter.com/intent/tweet?source=webclient&amp;text=Solr vs ElasticSearch: Part 5 - API Usage Possibilities via @sematext - http://blog.sematext.com/2013/01/08/solr-vs-elasticsearch-part-5-management-api-capabilities/" target="_blank">tweet it</a>!</p>
<p>- <a href="http://twitter.com/kucrafal">@kucrafal</a>, <a href="http://twitter.com/sematext">@sematext</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/2597/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/2597/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2597&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2013/01/08/solr-vs-elasticsearch-part-5-management-api-capabilities/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d05d7fbddd69c91eefaad0b9ec624111?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">kucrafal</media:title>
		</media:content>
	</item>
		<item>
		<title>HBaseWD and HBaseHUT: Handy HBase Libraries Available in Public Maven Repo</title>
		<link>http://blog.sematext.com/2012/12/24/hbasewd-and-hbasehut-handy-hbase-libraries-available-in-public-maven-repo/</link>
		<comments>http://blog.sematext.com/2012/12/24/hbasewd-and-hbasehut-handy-hbase-libraries-available-in-public-maven-repo/#comments</comments>
		<pubDate>Mon, 24 Dec 2012 18:55:52 +0000</pubDate>
		<dc:creator>Alex Baranau</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[distributed]]></category>
		<category><![CDATA[hbase]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=2754</guid>
		<description><![CDATA[HBaseWD is aimed to help distribute writes of records with sequential row keys in HBase (and avoid RegionServer hotspotting). Good introduction can be found here. We recently published 0.1.0 version of the library to Sonatype public maven repository. Thus, integration in your project became much easier: &#60;repositories&#62; &#60;repository&#62; &#60;id&#62;sonatype release&#60;/id&#62; &#60;url&#62;https://oss.sonatype.org/content/repositories/releases/&#60;/url&#62; &#60;/repository&#62; &#60;/repositories&#62; &#60;dependency&#62; &#60;groupId&#62;com.sematext.hbasewd&#60;/groupId&#62; &#60;artifactId&#62;hbasewd&#60;/artifactId&#62; &#60;version&#62;0.1.0&#60;/version&#62; &#60;/dependency&#62; [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2754&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="https://github.com/sematext/HBaseWD">HBaseWD</a> is aimed to help distribute writes of records with sequential row keys in <a href="http://hbase.apache.org/">HBase</a> (and avoid <a href="http://hbase.apache.org/book/rowkey.design.html#timeseries">RegionServer hotspotting</a>). Good introduction can be found <a href="http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/">here</a>.</p>
<p>We recently published 0.1.0 version of the library to Sonatype public maven repository. Thus, integration in your project became much easier:</p>
<pre>  &lt;repositories&gt;
    &lt;repository&gt;
      &lt;id&gt;sonatype release&lt;/id&gt;
      &lt;url&gt;https://oss.sonatype.org/content/repositories/releases/&lt;/url&gt;
    &lt;/repository&gt;
  &lt;/repositories&gt;</pre>
<div>
<pre>  &lt;dependency&gt;
    &lt;groupId&gt;com.sematext.hbasewd&lt;/groupId&gt;
    &lt;artifactId&gt;hbasewd&lt;/artifactId&gt;
    &lt;version&gt;0.1.0&lt;/version&gt;
  &lt;/dependency&gt;</pre>
<p><a href="https://github.com/sematext/HBaseHUT">HBaseHUT</a> is aimed to help in situations when you need to update a lot of records in HBase in read-modify-write style. Good introduction can be found <a href="http://blog.sematext.com/2012/04/22/hbase-real-time-analytics-rollbacks-via-append-based-updates/">here</a>.</p>
<p>We recently published 0.1.0 version of this library to Sonatype public maven repository too. Integration info:</p>
<div>
<pre>  &lt;repositories&gt;
    &lt;repository&gt;
      &lt;id&gt;sonatype release&lt;/id&gt;
      &lt;url&gt;https://oss.sonatype.org/content/repositories/releases/&lt;/url&gt;
    &lt;/repository&gt;
  &lt;/repositories&gt;</pre>
<div>
<pre>  &lt;dependency&gt;
    &lt;groupId&gt;com.sematext.hbasehut&lt;/groupId&gt;
    &lt;artifactId&gt;hbasehut&lt;/artifactId&gt;
    &lt;version&gt;0.1.0&lt;/version&gt;
  &lt;/dependency&gt;</pre>
<p>For running (MR jobs) on hadoop-2.0+ (which is a part of CDH4.1+) use 0.1.0-hadoop-2.0 version:</p>
</div>
<div>
<pre>  &lt;dependency&gt;
    &lt;groupId&gt;com.sematext.hbasehut&lt;/groupId&gt;
    &lt;artifactId&gt;hbasehut&lt;/artifactId&gt;
    &lt;version&gt;0.1.0-hadoop-2.0&lt;/version&gt;
  &lt;/dependency&gt;</pre>
</div>
<p>Thank you to all contributors and users of the libraries!</p></div>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/2754/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/2754/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2754&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2012/12/24/hbasewd-and-hbasehut-handy-hbase-libraries-available-in-public-maven-repo/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cab7d4c69f7d86484454927e392600e0?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">abaranau</media:title>
		</media:content>
	</item>
		<item>
		<title>SPM Discountorama Announcement</title>
		<link>http://blog.sematext.com/2012/12/06/spm-performance-monitoring-discounts/</link>
		<comments>http://blog.sematext.com/2012/12/06/spm-performance-monitoring-discounts/#comments</comments>
		<pubDate>Thu, 06 Dec 2012 07:18:09 +0000</pubDate>
		<dc:creator>sematext</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[announcement]]></category>
		<category><![CDATA[elasticsearch]]></category>
		<category><![CDATA[hbase]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[sensei]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[spm]]></category>

		<guid isPermaLink="false">http://blog.sematext.com/?p=2693</guid>
		<description><![CDATA[We are happy to announce the General Availability of SPM, our performance monitoring solution for Apache Solr, ElasticSearch, HBase, SenseiDB, and Java applications, and of course all system metrics. You can also vote for what else you want SPM to monitor.  Over the last N months that we&#8217;ve been running SPM we&#8217;ve received a lot [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2693&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>We are happy to announce the General Availability of <a href="http://sematext.com/spm/index.html">SPM</a>, our performance monitoring solution for Apache Solr, ElasticSearch, HBase, SenseiDB, and Java applications, and of course all system metrics. You can also vote for <a href="https://docs.google.com/a/sematext.com/spreadsheet/viewform?formkey=dFlVbUNxOHR6UFlQem5XeGIzTjV6Qmc6MQ">what else you want SPM to monitor</a>.  Over the last N months that we&#8217;ve been running SPM we&#8217;ve received a lot of good feedback (thanks!), a lot of words of encouragement (thanks!), and even a few nice quotes (another thanks!). Here is one from Jerry Yang, a Software Engineer at Walmart Labs: <em>&#8220;I have been using SPM for couple of days and it has been amazing. I learned a lot about my Solr services and was able to optimize based on the results on SPM. Great work.&#8221;</em></p>
<h3>Discount Codes</h3>
<p>Since holiday season is coming up, we thought we&#8217;d offer some discounts every week between now until the end of the year.  Each of the following discounts can be used only during &#8220;its week&#8221; specified below.  There is a limit to the number of people who can use each discount, so if you want it, don&#8217;t waste too much time.  <strong>Each discount will reduce the price of SPM SaaS for 365 days after you&#8217;ve used it, which effectively means you will get discount until the end of 2013.</strong>  Note that when you <a href="http://apps.sematext.com/users-web/register.do">register for SPM</a> you do not need to enter your credit card information.  You also don&#8217;t need to provide it when you create the SPM application for the system you want to monitor.  And it is <strong>when you create your SPM application that you can enter the discount code</strong>.</p>
<ul>
<li><strong><span style="color:#ff0000;">20</span>%</strong> for the remainder of this week until the end of this Sunday, December 9: <span style="color:#339966;"><strong>NY201320</strong></span></li>
<li><strong><span style="color:#ff0000;">15</span>%</strong> for the week of December 10, 2012: <span style="color:#339966;"><strong>NY201315</strong></span></li>
<li><strong><span style="color:#ff0000;">10</span>%</strong> for the week of December 17, 2012: <span style="color:#339966;"><strong>NY201310</strong></span></li>
<li><strong><span style="color:#ff0000;">5</span>%</strong> for the week of December 24, 2012: <span style="color:#339966;"><strong>NY201305</strong></span></li>
</ul>
<p>Note that each discount code expires on Sunday at 00:00 UTC.</p>
<h3>SPM Flavours</h3>
<p>The above discounts are good for our SPM SaaS.  However, if you&#8217;d rather run SPM on your own servers, we do offer SPM on Premises &#8211; please <a href="http://sematext.com/about/contact.html">get in touch</a> if you are interested in the on premises version.  You can also <a href="https://docs.google.com/a/sematext.com/spreadsheet/viewform?formkey=dHJfcEdLakE3aXNXYTA1Q2dQOFpWZVE6MQ">vote for SPM SaaS vs. On Premise </a>and that way tell us which version you prefer or want.</p>
<h3>SPM Plans</h3>
<p>There are a few different subscription plans available in SPM SaaS:</p>
<ul>
<li><strong>Basic</strong> plan that is <strong>free</strong> and shows the last 30 minutes of performance data</li>
<li><strong>Standard</strong> plan that shows the last 30 days of data and costs $0.035/server/hour</li>
<li><strong>Pro</strong> plan that shows the last 60 days of performance data and costs $0.070/server/hour</li>
</ul>
<p>If you have not used SPM before, here is what you can expect to see &#8211; click on the image to see a large, non-fuzzy version:</p>
<p><a href="http://sematext.com/img/products/spm/spm-solr-overview.png"><img alt="" src="http://sematext.com/img/products/spm/spm-solr-overview.png" height="505" width="845" /></a></p>
<p>We hope you will find <a href="http://sematext.com/spm/index.html">SPM</a> useful and fun to use.  We are always looking for feedback &#8211; just email <a href="mailto:spm-support@sematext.com">spm-support@sematext.com</a> or ping <a href="http://twitter.com/sematext">@sematext</a> and tell us what you like or don&#8217;t like about SPM.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sematext.wordpress.com/2693/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sematext.wordpress.com/2693/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.sematext.com&#038;blog=7678707&#038;post=2693&#038;subd=sematext&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.sematext.com/2012/12/06/spm-performance-monitoring-discounts/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/18ddfa48f19355af2afd971dce4c899c?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">sematext</media:title>
		</media:content>

		<media:content url="http://sematext.com/img/products/spm/spm-solr-overview.png" medium="image" />
	</item>
	</channel>
</rss>
