What’s New in SPM 1.13.0

We pushed a new SPM release to production this morning and it’s loaded with goodies.  Here is a quick run-down of a few interesting ones. The slightly longer version can be found in SPM Changelog:

PagerDuty integration. If you are a PagerDuty user, your alerts from SPM can now go to your PagerDuty account where you can handle them along with all your other alerts.

Ruby & Java libraries for Custom Metrics.  We open-sourced sematext-metrics, a Ruby gem for sending Custom Metrics to SPM as well as sematext-metrics for doing the same from Java.

Coda Metrics & Ruby Metriks support.  We open-sourced sematext-metrics-reporter, a Coda’s Metrics reporter for sending Custom Metrics to SPM from Java, Scala, Clojure, and other JVM-based apps, and we’ve done the same for Metriks - the Ruby equivalent of Coda’s Metrics library.

Puppet metrics. We begged James Turnbull to marry Puppet and SPM and write a Puppet report processor thats sends each of the metrics generated by a Puppet run to SPM, which he did without us having to buy him drinks….yet.

Performance.  We’ve done a bit of work in the layer right behind the UI to make the UI a little faster.

CentOS 5.x support.  Apparently a good number of people still use CentOS 5.x, so we’ve update the SPM client SPM to work with it.  You can grab from SPM Client page.

@sematext

What’s New in Sematext Search Analytics

We’ve been busy with adding functionality and improving SPM, our Performance Monitoring service, but we’ve also been quietly working on our free Search Analytics service (internally known as SA).  As a matter of fact, not coincidentally, SPM and SA share a lot of backend components, as well as UI-level pieces.  This, of course, allows a good amount of software reuse and lets us improve both services without double the effort.

Here are some of the new things in Search Analytics:

Live Demo. Before you create your Sematext Apps account (it’s free, no need to take out your credit card) you can check out the live demo and see both SPM and SA in action.

Real-time. Previously, SA used MapReduce jobs to process the collected data and make them available as reports.  That is no longer the case. We’ve put SA on the same real-time OLAP engine that powers SPM.  This means you’ll see your graphs refresh and change before your eyes.

Dashboards. Just like we’ve added Dashboards to SPM, we’ve added them to SA, too.  You can now create custom Dashboards, pick which graphs you want on them, and where you want to put them on a Dashboard.  This is great if you want to display your search stats and trends on a large office monitor, as some SA users are already doing.  Moreover, you can put widgets from multiple SA and SPM Apps all on the same Dashboard, so you can see your performance metrics, SPM custom metrics (e.g. your KPIs), and your SA metrics on a single Dashboard, side by side.

Report/URL Sharing. Just like in SPM, we’ve made it possible to copy the URL from the browser, give it to anyone who has access to your SA reports.  When this URL is opened any filters or time selection will be automatically applied.  This makes it very easy for multiple people to easily share their “SA view” by sharing the URL instead of having to tell others which report they should look at, what time and what filters they should select, etc.

Graph Embedding and Sharing. Similar to URL Sharing (but different!), you can now share and embed individual SA graphs.  Each graph has a short URL that you can Tweet or share elsewhere.  You can also get a URL/HTML snippet and embed SA graphs in your blog, wiki, web site, etc.

User Sessions. We’ve added a User Sessions report. This report shows you the number of search sessions over time, the number of queries per session, as well as the number of distinct users using your search.  If anyone asked you to provide these numbers for your site, would you know them?  Most people would say no.  These metrics are good to know and with SA everyone will now be able to say yes to that question.

Distinct Queries. We’ve added the number of Distinct Queries to the Rate & Volume report.  Another nice metric.

Hourly Granularity. All graphs in SA now go down to hourly granularity.  This let’s you see how trends change over the course of each day.  This can lead to insights around differences in how your users use your search in the morning vs. during work hours vs. evening.

HTTPS/SSL. The SA JavaScript beacon can now use HTTPS.  This is important if your site uses HTTPS when displaying search results.  To send your search and clickstream data VIA HTTPS just replace http:// with https:// in SA JavaScript beacon.

We hope you like these changes.  Please leave a comment or let us know if you have suggestions for other improvements or new features you would like to see in Sematext Search Analytics.

- @sematext

What’s new in SPM 1.12.0

We’ve been very heads down since our last official release of SPM, as you can see from our SPM Changelog.  Here is some more info about new things you’ll find in SPM:

  • For the impatient – there is now a demo user (click on this and you will be logged in as the demo user).  This lets you into both SPM and Search Analytics even if you don’t have an account, so you can see  reports for various types of SPM Apps as well as what Search Analytics reports are like.
  • The SPM Client (aka SPM Agent) can now be installed as an RPM or a Debian packages – check SPM client page.  Until now, the installer was completely written in Bash, but using Jordan Sissel’s fpm we were able to easily put together SPM Client packages.  Moreover, you can now easily install SPM Client on Redhat, CentOS, Fedora, Debian, Ubuntu, SuSE, Amazon Linux AMI, and maybe some other smaller distros we didn’t get to test.  If you try it on some  other distro, please let us know if it worked or if you had issues, so we can help!
  • SPM for Hadoop can now be used to monitor Hadoop HDFS, MapReduce, and YARN clusters.  We have tested SPM for Hadoop with CDH (Cloudera Hadoop Distribution), but it should work just as well with Apache Hadoop, or HDP (Hortonworks Data Platform), and perhaps MapR’s Hadoop distros as well.  Of course, you can use the same Sematext Apps account to monitor any number of your clusters and clusters of any type, so it’s extremely easy to switch from looking at your Hadoop metrics, to HBase metrics, to Solr or ElasticSearch metrics.  We are working on expanding monitoring support to other technologies.  Tell us what you would like SPM to monitor for you!
  • SPM users already loved the Overview panel in SPM (see Screenshots), but the new Custom Dashboards are even cooler!  Here are some things to know about the new SPM Dashboards:
    • You can have any number of them and you can name them – there are no limits
    • You can add any SPM graph to any of your existing dashboards and it’s super easy to create a new dashboard when you realize you want the graph you are looking at on a new dashboard
    • You can drag and drop dashboard panels wherever you want, resize them, and they’ll nudge other panels around and snap onto a neat, invisible grid
    • You can add graphs from any number of different SPM applications to the same dashboard.  So you can create a dashboard that has all the graphs that are important to you and your application(s), and combine metrics from different (types of) apps in a single view.  For example, you can have a dashboard that shows the number of HBase compactions as well as ElasticSearch index merging stats on the same dashboard, right next to each other.
    • Not only can you mix and match graphs from different SPM Apps on the same dashboard, but if you are a Search Analytics user you can also have Search Analytics graphs right next to your performance graphs!  That’s powerful!
    • And for those who discovered Logsene, our soon to be announced Data & Log Analytics service, you can imagine how eventually SPM & Logsene will be able to play together and share the same dashboards!
  • SPM Clients do a good job of gathering server and application metrics for applications covered by SPM.  But what if you want to monitor something else in addition to what SPM collects and graphs?  Or what if you want to feed in some non-performance data – say some user engagement metrics or some other Business KPI?  Well, you can do that now!  We’ve added Custom Metrics to all plans and this addition is free of charge!  And guess what?  You can build graphs with your custom metrics and put them on any and however many of your Dashboards you want, so you could potentially have your KPIs be shown next to your Performance Metrics, next to your Log Analytics, next to your Search Analytics!
  • To help SPM users feed custom metrics into SPM we’ve released and open-sourced the first Sematext Metrics library for Java, with libraries for other languages to follow. Improvements welcome, pull requests welcome, support for other languages super welcome!
  • You can now share, tweet, and embed your SPM graphs as you please.  Next to each icon you will see a little “share” icon that will open up a dialog window and there you can save the displayed short link, which you can tweet, or email.  You’ll also see an HTML snippet that you can copy and paste into your blog or your Wiki.  The shared or embedded widget is cool because you can select very specific filters in SPM and the shared/embedded graph will remember them.  Furthermore, you can choose to have your graph show a specific, fixed time range or, alternatively, you can choose to always show the last N minutes, or hours, or days…. This can be quite handy for sharing with team members who may not have access to SPM for one reason or another, but do want access to these graphs – think C*Os or anyone else who needs to be fed graphs and graphs, and some more graphs.
  • Here is a cool and very handy feature that was suggested by George Stathis from Traackr, one of our early SPM users:  If you are a developer in charge of your application, your servers, your cluster, what happens when something breaks or just starts working poorly?  Many people turn to mailing lists to seek help from the open source community.  Experts on these mailing lists often ask for more information so they can provide better help.  Very often this information is in the form of graphs of various performance metrics.  So if you look at SPM you will see a little ambulance icon that, when clicked, puts together an email template that includes a short, public link to whichever graph’s ambulance icon you clicked on.  You can then incrementally keep adding links to other graphs to this email and, when you are ready, email it without leaving SPM.  This is incredibly easy and it makes it a breeze to put together an email that has a bunch of informative graphs in it.  This means you don’t have to go open your email client, you don’t have to type in the mailing list address, and you don’t have to jump between the email you are composing and SPM while you copy and paste links from SPM into your email client.  And while this functionality is really handy when you need help from experts and you want to give them as much information about your system as you can so they can help you better, you can also simply change where this email will get sent and send it to whoever you want.
  • We listen to our user really carefully (see above!).  Some SPM users would like to keep their data longer, so we’ve added Pro Silver – a new plan with longer data retention, more metrics, more everything.
  • We’ve improved SPM for ElasticSearch so it can now collect metrics even from non-HTTP nodes.
  • We have made SPM Sender, the component responsible for sending all the metrics over to us for processing, more resilient and even lighter than before.

 

We want to hear what you think and what you want!  Please use comments to leave suggestions, requests, and feedback, or simply email us or tweet @sematext.

ElasticSearch & Solr Book Giveaway at Berlin Buzzwords

We’ve given away all 3 free Berlin Buzzwords tickets, but we have more stuff to give away.  Stop by our desk at Berlin Buzzwords, say hello, and one of these could be yours to take home:

Solr 4 Cookbook

Solr 4 Cookbook

or perhaps

ElasticSearch Server

ElasticSearch Server

 

And don’t miss our 2 talks in Berlin this year!

Announcing Hadoop Monitoring in SPM

Take it from one of the must trusted names from the world of Hadoop and HBase, as well as one of the friendliest people you’ll encounter on the Hadoop conference circuit, Lars George from Cloudera:

Hadoop Club Monitoring

We’re happy to announce the immediate availability of SPM for Hadoop (see Sneak Peek: Hadoop Monitoring comes to SPM for some screenshots).  With the latest SPM release Hadoop joins Apache Solr, Apache HBase, ElasticSearch, Sensei, and the JVM as the list of technologies you can monitor with SPM. With  SPM for Hadoop you go from zero to seeing all key metrics for your Hadoop cluster metrics in just a few minutes.  Included in the reports are metrics for both HDFS and MapReduce – metrics for NameNode, JobTracker, TaskTracker, and DataNode are all included along with all the default server metrics.  The YARN version of Hadoop is also supported and includes metrics for NodeManager, ResourceManager, etc.

Don’t forget that SPM monitoring agent can run as in-process agent, as well as in standalone mode (i.e., as an external process).  Running in the standalone mode means you may not have to restart various daemons of your existing Hadoop cluster that you want to monitor (assuming you already enabled JMX), so you can quickly get to your Hadoop metrics without interrupting anything!

What else would you like us to monitor?  Please select your candidates!

Structured Logging with Rsyslog and Elasticsearch

As more and more organizations are starting to use our Performance Monitoring and Search Analytics services, we have more and more logs from various components that make up these applications.  So what do we do?  Do we just keep logging everything to files, rotating them, and grepping them when we need to troubleshoot something?  There must be something better we can do!  And indeed, there is – so much so, that we’ll soon be launching Logsene – a Log Analytics service to complement SPM.  When your applications generate a lot of logs, you’d probably want to make some sense of them by searching and/or statistics. Here’s when structured logging comes in handy, and I would like to share some thoughts and configuration examples of how you could use a popular syslog daemon like rsyslog to handle both structured and unstructured logs. Then I’m going to look at how you can take those logs, format them in JSON, and index them with Elasticsearch – for some fast and easy searching and statistics.  If you are going to Berlin Buzzwords this year and you are into logging, Logstash, ElasticSearch, or Kibana, I’ll be talking about them in my JSON logging with ElasticSearch presentation.

On structured logging

If we take an unstructured log message, like:

Joe bought 2 apples

And compare it with a similar one in JSON, like:

{“name”: “Joe”, “action”: “bought”, “item”: “apples”, “quantity”: 2}

We can immediately spot a couple of advantages and disadvantages of structured logging:
if we index these logs, it will be faster and more precise to search for “apples” in the “item” field, rather than in the whole document. At the same time, the structured log will take up more space than the unstructured one.

But in most use-cases there will be more applications that would log the same subset of fields. So if you want to search for the same user across those applications, it’s nice to be able to pinpoint the “name” field everywhere. And when you add statistics, like who’s the user buying most of our apples, that’s when structured logging really becomes useful.

Finally, it helps to have a structure when it comes to maintenance. If a new version of the application adds a new field, and your log becomes:

Joe bought 2 red apples

it might break some log-parsing, while structured logs rarely suffer from the same problem.

Enter CEE and Lumberjack: structured logging within syslog

With syslog, as defined by RFC3164, there is already a structure in the sense that there’s a priority value (severity*8 + facility), a header (timestamp and hostname) and a message. But this usually isn’t the structure we’re looking for.

CEE and Lumberjack are efforts to introduce structured logging to syslog in a backwards-compatible way. The process is quite simple: in the message part of the log, one would start with a cookie string “@cee:”, followed by an optional space and then a JSON or XML. From this point on I will talk about JSON, since it’s the format that both rsyslog and Elasticsearch prefer. Here’s a sample CEE-enhanced syslog message:

@cee: {“foo”: “bar”}

This makes it quite easy to use CEE-enhanced syslog with existing syslog libraries, although there are specific libraries like liblumberlog, which make it even easier. They’ve also defined a list of standard fields, and applications should use those fields where they’re applicable – so that you get the same field names for all applications. But the schema is free, so you can add custom fields at will.

CEE-enhanced syslog with rsyslog

rsyslog has a module named mmjsonparse for handling CEE-enhanced syslog messages. It checks for the “CEE cookie” at the beginning of the message, and then tries to parse the following JSON. If all is well, the fields from that JSON are loaded and you can then use them in templates to extract whatever information seems important. Fields from your JSON can be accessed like this: $!field-name. An example of how they can be used is shown here.

To get started, you need to have at least rsyslog version 6.6.0, and I’d recommend using version 7 or higher. If you don’t already have that, check out Adiscon’s repositories for RHEL/CentOS and Ubuntu.

Also, mmjsonparse is not enabled by default. If you use the repositories, install the rsyslog-mmjsonparse package. If you compile rsyslog from sources, specify –enable-mmjsonparse when you run the configure script. In order for that to work you’d probably have to install libjson and liblognorm first, depending on your operating system.

For a proof of concept, we can take this config:

#load needed modules
module(load=”imuxsock”) # provides support for local system logging
module(load=”imklog”) # provides kernel logging support
module(load=”mmjsonparse”) #for parsing CEE-enhanced syslog messages

#try to parse structured logs
*.* :mmjsonparse:

#define a template to print field “foo”
template(name=”justFoo” type=”list”) {
property(name=”$!foo”)
constant(value=”\n”) #we’ll separate logs with a newline
}

#and now let’s write the contents of field “foo” in a file
*.* action(type=”omfile”
template=”justFoo”
file=”/var/log/foo”)

To see things, better, you can start rsyslog in foreground and in debug mode:

rsyslogd -dn

And in another terminal, you can send a structured log, then see the value in your file:

# logger ‘@cee: {“foo”:”bar”}’
# cat /var/log/foo
bar

If we send an unstructured log, or an invalid JSON, nothing will be added

# logger ‘test’
# logger ‘@cee: test2′
# cat /var/log/foo
bar

But you can see in the debug output of rsyslog why:

mmjsonparse: no JSON cookie: ‘test’
[...]
mmjsonparse: toParse: ‘ test2′
mmjsonparse: Error parsing JSON ‘ test2′: boolean expected

Indexing logs in Elasticsearch

To index our logs in Elasticsearch, we will use an output module of rsyslog called omelasticsearch. Like mmjsonparse, it’s not compiled by default, so you will have to add the –enable-elasticsearch parameter to the configure script to get it built when you run make. If you use the repositories, you can simply install the rsyslog-elasticsearch package.

omelasticsearch expects a valid JSON from your template, to send it via HTTP to Elasticsearch. You can select individual fields, like we did in the previous scenario, but you can also select the JSON part of the message via the $!all-json property. That would produce the message part of the log, without the “CEE cookie”.

The configuration below should be good for inserting the syslog message to an Elasticsearch instance running on localhost:9200, under the index “system” and type “events“. These are the default options, and you can take a look at this tutorial if you need some info on changing them.

#load needed modules
module(load=”imuxsock”) # provides support for local system logging
module(load=”imklog”) # provides kernel logging support
module(load=”mmjsonparse”) #for parsing CEE-enhanced syslog messages
module(load=”omelasticsearch”) #for indexing to Elasticsearch

#try to parse a structured log
*.* :mmjsonparse:

#define a template to print all fields of the message
template(name=”messageToES” type=”list”) {
property(name=”$!all-json”)
}

#write the JSON message to the local ES node
*.* action(type=”omelasticsearch”
template=”messageToES”)

After restarting rsyslog, you can see your JSON will be indexed:

# logger ‘@cee: {“foo”: “bar”, “foo2″: “bar2″}’
# curl -XPOST localhost:9200/system/events/_search?q=foo2:bar2 2>/dev/null | sed s/.*_source//
” : { “foo”: “bar”, “foo2″: “bar2″ }}]}}

As for unstructured logs, $!all-json will produce a JSON with a field named “msg”, having the message as a value:

# logger test
# curl -XPOST localhost:9200/system/events/_search?q=test 2>/dev/null | sed s/.*_source//
” : { “msg”: “test” }}]}}

It’s “msg” because that’s rsyslog’s property name for the syslog message.

Including other properties

But the message isn’t the only interesting property. I would assume most would want to index other information, like the timestamp, severity, or host which generated that message.

To do that, one needs to play with templates and properties. In the future it might be made easier, but at the time of this writing (rsyslog 7.2.3), you need to manually craft a valid JSON to pass it to omelasticsearch. For example, if we want to add the timestamp and the syslogtag, a working template might look like this:

template(name=”customTemplate”
type=”list”) {
#- open the curly brackets,
#- add the timestamp field surrounded with quotes
#- add the colon which separates field from value
#- open the quotes for the timestamp itself
constant(value=”{\”timestamp\”:\”")
#- add the timestamp from the log,
# format it in RFC-3339, so that ES detects it by default
property(name=”timereported” dateFormat=”rfc3339″)
#- close the quotes for timestamp,
#- add a comma, then the syslogtag field in the same manner
constant(value=”\”,\”syslogtag\”:\”")
#- now the syslogtag field itself
# and format=”json” will ensure special characters
# are escaped so they won’t break our JSON
property(name=”syslogtag” format=”json”)
#- close the quotes for syslogtag
#- add a comma
#- then add our JSON-formatted syslog message,
# but start from the 2nd position to omit the left
# curly bracket
constant(value=”\”,”)
property(name=”$!all-json” position.from=”2″)
}

Summary

If you’re interested in searching or analyzing lots of logs, structured logging might help. And you can do it with the existing syslog libraries, via CEE-enhanced syslog. If you use a newer version of rsyslog, you can parse these logs with mmjsonparse and index them in Elasticsearch with omelasticsearch.  If you want to use Logsene, it will consume your structured logs as described in this post.

Berlin Buzzwords 2013 – Free Tickets

Sematext is a Berlin Buzzwords sponsor for the 3rd year in a row and we have three free tickets to give away.  If you are a Sematext follower, a client, an SPM or Search Analytics user, of Logsene beta tester, or simply want to go to Berlin to listen to our talks or any other talks about large scale search, storage, data analytics, NoSQL, BigData, and a few other buzzwords, we are giving away 3 free Berlin Buzzwords tickets. Get in touch!

Follow

Get every new post delivered to your Inbox.

Join 1,218 other followers