SPM Client Async Metrics Collection

Do your SPM charts ever look choppy like this?

choppy-spm-chart-slow-es

Let’s hope not!  However, if you run large Elasticsearch clusters, especially those with thousands of shards, you may suffer from Elasticsearch sometimes taking a long time to provide information about its metrics that SPM, of course, uses to render various metrics charts.  While this slowness is something Elasticsearch developers are undoubtedly working on, we’ve also proactively improved the SPM Client (aka agent, aka monitor), specifically by making metric collection from various sources asynchronous, to better cope with this slowness.

There are other goodies in more recent versions of the SPM Client, including additional optimizations (some specific to Elasticsearch, but others beneficial to all SPM users), but also some that will enable exciting new functionality we’ll be announcing next month.

To get the latest SPM Client please go to https://apps.sematext.com/spm-reports/client.do.

Custom Metrics from Node.js Apps

We recently added support for Node.js and io.js monitoring to SPM and have received great feedback.  While SPM for Node.js monitors all key Node.js metrics, most applications have additional metrics one often wants to track — things like: the number of concurrent users, the number of items placed in a shopping cart, or any other kind of IT metric, business transaction or KPI.  SPM already provides a Custom Metrics API and libraries that make shipping custom metrics from Java and from Ruby applications a snap.  But why leave Node.js behind?  Meet spm-metrics-js (it’s on Github) – the npm module for sending custom metrics from Node.js apps to SPM.  

This JavaScript module supports measurements using counters, meters, timers, and histograms. These helpers calculate values of metrics objects and ship them to SPM, where they are then turned into charts and inputs to alert rules and anomaly detection algorithms.

Here’s an example for counting users on login and logout:

Sending custom metrics is really that easy!

Now, let’s have a look at the options used when creating a custom metric object:

  • name – the name of the metric you can find in SPM’s user interface
  • aggregation – the aggregation type: ‘avg’, ‘sum’, ‘min’ or ‘max’ used in SPM’s aggregations server
  • filter1 – the SPM user interface provides two filter criteria; the value will be available in the UI as the first filter
  • filter2 – the filter value for the second filter field in SPM’s UI
  • interval – time in ms to call save() periodically. Defaults to no automatic call to save(). The save() function captures the metric and resets meters, histograms, counters or timers.
  • valueFilter – array of property names for calculated values. Only specified fields are sent to SPM (e.g. [‘count’, ‘min’, ‘max’].

Additional measurement functions are available to extend the custom metric object automatically with additional calculated properties:

  • Meter – measure rates and provide the following calculated properties:
    • mean: the average rate since the meter was started
    • count: the total of all values added to the meter
    • currentRate: the rate of the meter since the meter was started
    • 1MinuteRate: the rate of the meter biased toward the last 1 minute
    • 5MinuteRate: the rate of the meter biased toward the last 5 minutes
    • 15MinuteRate: the rate of the meter biased toward the last 15 minutes
  • Histogram – build percentile, min, max, & sum aggregations over time
    • min: the lowest observed value
    • max: the highest observed value
    • sum: the sum of all observed values
    • variance: the variance of all observed values
    • mean: the average of all observed values
    • stddev: the stddev of all observed values
    • count: the number of observed values
    • median: 50% of all values in the reservoir are at or below this value.
    • p75: see median, 75% percentile
    • p95: see median, 95% percentile
    • p99: see median, 99% percentile
    • p999: see median, 99.9% percentile
  • Timer – measures time and captures rates in an internal meter and histogram

If this is more than you actually need, we recommend selecting only the relevant properties (using the ‘valueFilter’ option). Please note that Custom Metrics are aggregated by the specified aggregation type (‘avg’, ‘sum’, ‘min’, ‘max’).  Moreover, the aggregation type for each property can be defined – for further details please check the package documentation.

Adding instrumentation always raises the question of performance; in spm-metrics-js all metrics are buffered and efficiently ship metrics to SPM in bulk using asynchronous functions. We recommend using a transmit time of 60 seconds.

Once you send custom metrics to SPM you can create alerts on them, have SPM detect and alert you about anomalies, put charts with those metrics on dashboards, share charts with those metrics publicly or just with your team or organization, etc.

custom-metric-alert

Actions for Metrics – e.g. define alerts using anomaly detection

custom-metric-dashboard

Dashboard with Custom Metric and other Metrics

Please note the free plan has no limits on the number of monitored Applications, Processes, Dashboards or Users and you can share Accounts with your whole DevOps team and integrate SPM with Slack, HipChat, PagerDuty, Webhooks, etc. If you don’t use SPM yet, grab a free account to start monitoring your Node.js and io.js applications and benefit from all standard SPM features such as alerting, anomaly detection, event and log correlation, unlimited dashboards, secure information sharing, etc. Check out spm-metrics-js (or on Github) and drop us a line (or tweet 140 characters to @sematext) — we’d love to hear from you!

SPM REST API

By popular demand…we’ve just added some new goodness to SPM with the SPM REST API.  This new API lets you:

  • create SPM Apps for monitoring (e.g. generate a new SPM App + its token during deployment)
  • list all available metrics and charts for a specific App
  • list all alerts defined for some app (threshold, anomaly or heartbeat)
  • create new alerts (of any type: heartbeat, threshold, anomaly)
  • enable/disable or delete individual alerts

As you can tell from the above, we started by exposing APIs for SPM Alerts first.  Of course, we’ll be expanding the API to expose more SPM functionality, as well as Logsene. You can see all the details — including Listing Alerts, Creating, Disabling and Deleting Alerts, Metrics API, and more are on our wiki.

Alert Creation Designer

One feature of SPM Alerts HTTP API that we do want to call out here is the Alert Creation Designer.  The easiest way to prepare alert rules to be used in API calls is by using the Create Alert dialog available in SPM.

In the example below we use Solr’s req. count metric, whose chart is under the Req. Rate & Latency report, as shown in the screenshot below.

Req Rate Latency

Above every chart in SPM there is a little pull-down menu.  From there simply click on the Create Alert option to open the Alert dialog seen below.  A new “Show API Call” link is shown at the bottom of the dialog. Once clicked, it displays the API call as a “curl” command line. You can modify attributes shown in the dialog and update the API call details by clicking on the little Refresh icon.  Once you’re happy with alert rule parameters you can copy the curl command and execute it from the terminal.

Alert

Finally, for Heartbeat Alerts, you would just click on Heart icon on any report (as displayed below) to open a similar dialog:

Heartbeat

Typically, you would prepare your alert templates this way once, and then just tweak them remotely (for example, just use different token parameter to get it applied to your other apps, adjust metric names, thresholds, etc.).

Would this make your life easier?

If this functionality looks like it will make managing alerts more efficient, then check out a Free 30-day SPM trial by signing up.  There’s no commitment and no credit card required.  SPM monitors a ton of applications, like Elasticsearch, Solr, Hadoop, Spark, HBase, Kafka, Storm, Cassandra, Node.js & io.js, and more.

Node.js and io.js Monitoring Support

Node.js and io.js are increasingly being used to run JavaScript on the server side for many types of applications, such as websites, real-time messaging and controllers for small devices with limited resources. For DevOps it is crucial to monitor the whole application stack and Node.js is rapidly becoming an important part of the stack in many organizations. Sematext has historically had a strong support for monitoring big data applications such as Elastic (aka Elasticsearch), Cassandra, Solr, Spark, Hadoop, and HBase, as well as more traditional databases, web servers like Nginx, Nginx Plus and Apache, Java applications, cache servers like Redis and Memcached, messaging middleware like everyone’s darling Kafka, etc.  With such rapid adoption of Node.js and now io.js, we’d be remiss not to add performance monitoring, alerting, and anomaly detection for them in SPM!

spm-node-io

SPM for Node.js

We’re happy to announce we’ve just added Node.js monitoring to this growing list of SPM integrations.  SPM for Node.js covers key Node.js metrics such as Event Loop, Garbage Collection, CPU, Memory and web services metrics.  All metrics are organized in out-of-the-box charts, which can be put on additional dashboards and placed next to performance charts for other parts of the application stack.

Overview for top node.js and io.js metrics

Overview for top node.js and io.js metrics

 

Of course, you can view your Node.js metrics in a larger context.  For example, here is a dashboard that shows Node.js metrics together with Elasticsearch metrics, making it easier to correlate performance across multiple tiers of the application stack.  You could also get your event and log charts on the same dashboard for an even more thorough correlation.

nodejs-elasticsearch-dashboard

Dashboard with node.js HTTP response time and Elasticsearch query latency

Needless to say, we made sure everything works for the latest versions of Node.js (0.12) and io.js (1.6). Installation is as easy as integration of any other module using npm.  If you are not using SPM yet, you can sign up with no commitment or credit card.  You have 30-days free on any new app you create.  If you are already using SPM, you can simply add a new SPM App for Node.js and see all your Node.js metrics in just a few minutes.  Don’t see something in SPM for Node.js?  Please let us know (@sematext) or comment below, we are looking for feedback!

 

HBase 0.98 Monitoring Support

HBase is a popular open-source, non-relational (NoSQL), column-oriented, distributed database that runs on top of the Hadoop Distributed File System (HDFS).  HBase is well suited for sparse data sets, which are common in many big data use cases.  Fortunately for all its users, SPM now supports monitoring, alerting and anomaly detection for HBase version 0.98.  Even those of you not running version 0.98 (here are the results for our HBase version distribution poll) are still in luck because a lot of HBase metrics captured by SPM are also in 0.94.x, 0.96.x, and even the recently released 1.0 version.  That said, HBase is one of those projects whose metrics change from version to version – some are deletes, some are added, others are modified.  If you have your own tools for monitoring HBase and are trying to monitor more than just the most basic HBase metrics, maintaining those tools must not be fun. Related to this common issue, we recently put together a “Build vs. Buy” post that weighs the pros and cons.

Here at Sematext we make heavy use of HBase.  We have recently moved from 0.94.x to 0.98.x and have been enjoying all its benefits.  Furthermore, we’ve recently updated SPM for HBase to monitor a pile of new HBase metrics.  Of course, we eat our own dog food and immediately got new and interesting insights about our own HBase clusters through some of the new metric charts.

For example, from https://apps.sematext.com/spm-reports/s/VhOltU14Cy we are now able to see the dramatic impact of major compactions on data locality (and thus HBase performance!):   (click to enlarge)

Local_Files_1

And from https://apps.sematext.com/spm-reports/s/7LU1qvs7ur we can see the number and size of HLog files over time:   (click to enlarge)

HLog

Alright, on to all the details!

Shiny, New HBase Metrics

In total, we’re talking 290 metrics: 195 for 0.98 and 95 for previous versions.  And lots of them changed in 0.98.  Here’s a summary of top-level SPM reports.  Each report listed below has one or more charts with one or more HBase metrics.  Juicy stuff.

Master:

  • Servers
  • Assign Manager
  • Balancer
  • FS
  • Snapshot

Region Server:

  • Regions & Stores
  • Requests
  • Files
  • Compact & Flush
  • Cache
  • Operations
  • Check & Mutate
  • WAL
  • Hedged Reads
  • MOB
  • Replication
  • Replication Source

Common / pre-0.98:

  • IPC
  • HBase JVM
  • UGI
  • Requests (pre-0.98)
  • Regions (pre-0.98)
  • Split (pre-0.98)
  • Memstore (pre-0.98)
  • Store (pre-0.98)
  • Compactions (pre-0.98)
  • FS (pre-0.98)
  • Block Cache (pre-0.98)

Screenshot: HBase Operation Calls & Time  (click to enlarge)

HBase_Ops_calls

Screenshot: HBase Slow Operations  (click to enlarge)

HBase_Slow_Ops

Screenshot: HBase Sync & Append Ops & Time  (click to enlarge)

HBase_Sync_Apend_Ops_Time

OK OK, how do I get all this stuff?

If you are not using SPM yet, simply sign up, create your first SPM App, and follow the directions in the UI.  You should see all your HBase metrics in a matter of minutes.  SPM is free for 30 days, requires no commitment or credit card and has no limit.  On Premises version is available as well.

If you are already using SPM, but not monitoring HBase, just create the SPM App for HBase, and follow the directions for installing the SPM agent on your HBase nodes.

If you are already using SPM for monitoring HBase, you just need to upgrade the SPM agent and configure it.

Cassandra Case Study – including Performance Monitoring

If you use Cassandra you will find some interesting insights in this Planet Cassandra case study by Sematext client Recruiting.com.  Hitendra Pratap Singh, a Cassandra Software Engineer, talks about why they decided to deploy Cassandra, other NoSQL solutions they looked at, advice for new Cassandra users, and more.

Here’s an excerpt:

Monitoring Apache Cassandra with SPM

“We started using SPM Performance Monitoring and Reporting from Sematext for Apache Solr and were impressed with the amount of real-time stats we could analyze using SPM. We expected the same amount of details for Cassandra as well and decided to go with SPM.  Some of the benefits we’ve seen from SPM include the alert notification system, graphical interface [i.e. easy to analyze], detailed stats related to JVM, and creation of our own custom metrics.

We also utilize SPM for monitoring our deployments of Apache Solr and Memcached servers.”

On the “Overview” screen found below, you can check out some Cassandra metrics, as well as various OS metrics. Specific Cassandra metrics can be drilled down by clicking on one of the tabs along the left side; these metrics include: Compactions, Bloom Filter (space used, false positives ratio), Write Requests (rate, count, latency), Pending Read Operations (read requests, read repair tasks, compactions), and more.

SPM for Cassandra Overview  (click to enlarge)

cassandra_overview_2

You can read the full version of “Recruiting.com Powers Real-Time High Throughput Application with Apache Cassandra” at Planet Cassandra.

And if you’d like to monitor Cassandra yourself (or any number of applications like Hadoop, HBase, Spark, Kafka, Elasticsearch, Solr, etc.), check out a Free 30-day trial by registering here.  There’s no commitment and no credit card required.  You can also see our Cassandra monitoring blog post for more details and screenshots.

Follow

Get every new post delivered to your Inbox.

Join 157 other followers