Integrate PagerDuty with SPM Performance Monitoring

Got Alarm Fatigue?

If so, you are not alone!  We talk to a lot of people who want to reduce the frequent “noise” from monitoring alarms.  To solve this common problem, Sematext added anomaly detection for alerts and PagerDuty integration to its SPM Performance Monitoring solution to dramatically reduce this noise compared with simple threshold-based alerting mechanisms.  The integration with PagerDuty helps DevOps with incident management, i.e., managing escalation and routing alerts to the right person by defined schedules and communication channels.

PagerDuty is an alarm aggregation and dispatching service for system administrators and support teams. It collects alerts from your monitoring tools, gives you an overall view of all of your monitoring alarms, and alerts an on-duty engineer if there’s a problem. PagerDuty allows you to build sophisticated alerting rules to determine who to contact when problems occur. You can build on-call schedules to equitably share on-call responsibilities. You can also set up multiple levels of coverage, so if the “primary” on-call person doesn’t respond to an alert in a timely fashion, it’s automatically escalated to a “secondary” person, and so on.” - Source: PagerDuty FAQ.

SPM Performance Monitoring is an enterprise-class, server and application performance monitoring, alerting, and anomaly detection solution. It is available both in the cloud (SaaS) and On Premises.  SPM also integrates with Logsene Log Management and Analytics to correlate metrics, alerts, anomalies, and events with application and server logs.

Get started

Basic setup steps are required to hook up both services:

  1. In PagerDuty: Get an API Key
  2. In SPM: Enter the API Key in SPM alert settings

1) In PagerDuty:

Create a new service:

  1. In your account, under the Services tab, click “Add New Service”.
  2. Select an Escalation Policy (e.g. default)
  3. Start typing “Sematext” for the Integration Type, which will narrow your filtering.
    PagerDuty add service
  4. Click the Add Service button
  5. Once the service is created, you’ll be taken to the Service page. On this page, you’ll see the “Service API key,” which you will need when you configure Sematext products to send events to PagerDuty. Copy the “Service API Key“ to the clipboard. PagerDuty service key

2) In SPM

1) Navigate to SPM Application Settings of your SPM App by clicking the App Settings button in the top right when you’re in the SPM UI.

 SPM - App Settings

2) Navigate to Alerts / PagerDuty

SPM - Service API Key for PagerDuty

3) Enter the API key from PagerDuty in the field Service API key

4) Press the Save button

Done. Every alert from your SPM app will be forwarded to PagerDuty, where you can manage escalation policies and configure notifications to other services like HipChat, Slack, Zapier, Flowdock, and more.

If you’ve got some feedback on this post or ideas for similar posts please let us know!

Elasticsearch Monitoring: SPM vs. Marvel

While many SPM Performance Monitoring users quickly see the benefits of SPM and adopt it in their organizations for monitoring — not just for Elasticsearch, but for their complete application stack — some Elasticsearch users evaluate SPM and compare it to Marvel from Elasticsearch.  We’ve been asked about SPM vs. Marvel enough times that we decided to put together this focused comparison to show some of the key differences and help individuals and organizations pick the right tool for their needs.

Marvel is a relatively young product that provides a detailed visualization of Elasticsearch metrics in a Kibana-based UI. It installs as an Elasticsearch plug-in and includes ‘Sense’ (a developer console), plus a replay functionality for shard allocation history.

SPM, on the other hand, offers multiple agent deployment modes, has both Cloud and On Premises versions, includes alerts and anomaly detection, is not limited to Elasticsearch monitoring, integrates with third party services, etc. The following Venn diagram shows key areas that SPM and Marvel have in common and also the areas where they differ.

SPM-vs-Marvel

Looking into the details surfaces many notable differences.  For example:

  • The SPM agent can run independently from the Elasticsearch process and an upgrade of the agent does not require a restart of Elasticsearch
  • Dashboards are defined with different philosophies: Marvel exposes each Metric in a separate chart, while SPM groups related metrics together in a single chart or in adjacent charts (thus making it easy for people to have more information in a single place without needing to jump between multiple views)
  • Both have the ability to show metrics from multiple nodes in a single chart: Marvel draws a separate line for each node, while in SPM you can choose to aggregate values or display them separately.

The following “SPM vs. Marvel Comparison Table” is a starting point to evaluate monitoring products for organization’s individual needs.

SPM vs. Marvel Comparison Table

Feature SPM by Sematext Marvel by Elasticsearch
Supported Applications Elasticsearch, Hadoop, Spark, Kafka, Storm, Cassandra, HBase, Redis, Memcached, NGINX(+), Apache, MySQL, Solr, AWS CloudWatch, JVM, … Elasticsearch
Agent deployment mode in- and out-of-process
(out-of-process allows for seamless updates without requiring Elasticsearch restarts)
in-process
(as Elasticsearch plug-in; updates require Elasticsearch restarts)
Predefined dashboard graphs organized in groups YES YES
Saving Individual Dashboards Each user can store multiple dashboards, mixing charts from all applications, including both metrics and logs. Current view can be saved, reset to defaults possible. These changes are global.
API for Custom Metrics and Business KPIs YES NO
Extra Elasticsearch Metrics NO

  • Metrics are added based on user demand and users  can always graph them as Custom Metrics.
YES

  • Circuit Breakers
  • ID Cache
  • Lucene memory
  • ES Threadpools
  • Percolator
OS and JVM Metrics YES (+)

  • JVM pool sizes
  • JVM pool utilization
YES
Correlation of Metrics with Logs, Events, Alerts, and Anomalies YES

  • SPM and Logsene integration
  • Ability to ingest and chart arbitrary external Events
NO

  • Cluster Pulse displays only Elasticsearch Events
Deployment model SaaS or On Premises On Premises
Security/User Roles &
Permissions
YES NO
Easy & Secure Sharing of Reports with internal and external organizations YES

  • via short links
  • vie embeds / iframe
  • via email
NO
Machine Learning-based Anomaly Detection YES NO
Threshold based Alerts YES NO
Heartbeat Alerts YES NO
Forwarding Alerts to 3rd parties YES

  • E-Mail
  • PagerDuty
  • Nagios / Shinken
  • HipChat
  • Slack
  • Webhooks
NO
Metrics Aggregation YES

  • Pre-aggregation at multiple granularity levels, including 1 min granularity.  Advantage: more efficient storage, scales better, faster for graphing performance over longer time periods at the expense of sub-minute precision.
YES

  • Query-time aggregation. No write or query-time aggregation.
    Advantage: 10 second precision by default at the expense of storage size, write, and read performance and memory footprint.

As an aside, most of the features in this comparison table would also apply if we compared SPM to BigDesk, ElasticHQ, Statsd, Graphite, Ganglia, Nagios, Riemann, and other application-specific monitoring or alerting tools out there.

If you have any questions about this comparison or have any feedback, please let us know!

Integrating SPM Performance Monitoring with HipChat

Many agile DevOps teams rely on communication via HipChat,  which provides an API and mobile apps to receive messages while being away from one’s desktop. SPM Performance Monitoring‘s new integration via WebHooks provides the capability to forward alerts to many services, including HipChat.

The integration of both services can be achieved by collecting the room_id and an access token from HipChat and then building a WebHook in SPM.  The SPM Wiki explains how to get this information from HipChat and build the WebHook in SPM: Alerts – HipChat integration

Performance-Monitoring-Hip-Chat-Integration

This whole process only takes a minute or two.  HipChat is a tool that is becoming more popular among the DevOps crowd, and here at Sematext we pride ourselves on staying on top of what our users need and expect.

Need some extra help with this setup or another app you might want to integrate?  Have ideas for other integrations we should explore? Please drop us a line, we’re here to help and listen.

Solr Presentations from Lucene/Solr Revolution 2014

Thanks to everyone who stopped by the Sematext booth at last week’s Lucene/Solr Revolution event in Washington, DC and attended our two talks:

The attendance, questions and interest are very much appreciated.  As a company that prides itself on its Solr expertise (and Elasticsearch expertise too, for that matter), it was nice to spend a couple days talking about search and Big Data challenges, performance monitoring and logging with fellow experts from around the world. Here are the slides for the two talks we gave (summaries of the talks can be found here):

 

  Videos of the talks will be posted here soon.  Hope to see everyone again next year!

Sematext at Lucene/Solr Revolution 2014

Going to Lucene/Solr Revolution next week — November 11-14 — in Washington, DC?  If so…Sematext will be there exhibiting AND giving two talks!  If you are going, stop by our table to say hello.  We can show you the latest versions of SPM Performance Monitoring, Logsene Log Management and Analytics, Site Search Analytics, and, of course, talk about metrics, centralized log management, Lucene, Solr, Elasticsearch, and just about any other search-related topic you might be interested in.  After all, not only have we blogged, given talks and spread the word in all sorts of ways, we’ve also written books on these subjects!

Both of the Sematext engineer talks take place on Friday, November 14.  They are:

Radu Gheorghe will talk about “Tuning Solr for Logs” at 10:15 am

Summary:  Performance tuning is always nice for keeping your applications snappy and your costs down. This is especially the case for logs, social media and other stream-like data that can easily grow into terabyte territory. While you can always use SolrCloud to scale out of performance issues, this talk is about optimizing. The following questions about Solr settings will be answered. How often should you commit and merge? How can you have one collection per day/month/year/etc? What are the performance trade-offs for these options?  There will also be a discussion around choosing the appropriate hardware.  Radu will talk about optimizing the infrastructure when pushing logs to Solr. This includes tuning Apache Flume to handle large flows of logs and overall design options that also apply to other shippers, like Logstash.

Rafal Kuc will talk about “Solr Anti-Patterns” at 10:55 am

Summary:  Working as a consultant, software engineer and helping people in various ways, Rafał has seen multiple patterns in how Solr is used and how it should be used. Consulting on best practices is common, but talking about what NOT to do is not. This talk will point out common mistakes and roads that should be avoided at all costs, covering use cases and guidelines around general configuration pitfalls, data modeling and what to avoid when making your data indexable, and mistakes made when it comes to queries and searching for indexed data. Each use case will be illustrated by a before and after analysis where changes in metrics will be shown to bring a know-how worth remembering.

20% Discount Code

If you currently use a Sematext product or have been a client in the past and want to go, drop us a line for more info.

Hope to see you in DC!

Performance Monitoring Comparison: Build vs. Buy

Using a performance monitoring system that you built yourself? You are not alone!  Many organizations monitor their applications and IT infrastructure with a bolted-together and often incompatible assortment of tools.  With larger organizations this can number to a dozen or more different tools.  Seriously.  Build vs. Buy, Do-It-Yourself (DIY), homegrown, in-house, Not Invented Here (NIH) — there are almost as many terms to describe this approach as there are products to do the monitoring.

There’s a good chance you’re using tools like Statsd, Graphite, Nagios and others to stay on top of things.  But that’s a LOT of work.  And why spend all the time doing all that work yourself?  Life, as we all know, is too short.  Is glueing together N tools or building yet another custom monitoring tool really a good use of (y)our (life)time?  This also leads to the next obvious question:

Why Not Use One Monitoring Solution to Do It All?

SPM Performance Monitoring, Alerting and Anomaly Detection is a comprehensive solution that does the work of many individual monitoring tools in one powerful package.  Applications, servers, other key IT devices — even logs! — are all covered.  A partial list of monitored apps includes Elasticsearch, Solr, Hadoop, HBase, Spark, Cassandra, Kafka, Storm, Redis, NGINX Plus and NGINX.  You can even see what I’m talking about right now by checking out our SPM live demo.

In fact, as one SPM user recently told us:

“I don’t want to be a data ape and consume your data to build other reports.  I think that is one of the attractions with SPM — I can push the data to Graphite or another monitoring tool, but you already have the reports done. So my time to insight is much faster.”

There Are Some Huge Differences Between Building and Buying

If your Building approach is draining engineer time that could be better spent elsewhere, then you should consider some of the key differences between building your own monitoring “system” and using SPM, including:

  • Log & Event Correlation: SPM can aggregate, graph and correlate logs with performance metrics and alerts (via integration with Logsene Log Management and Analytics).  If you are managing your logs then you are using a separate solution that does not integrate with your “Build” monitoring system.  Being able to see logs along with performance metrics is essential for effective troubleshooting.
  • On Premises or in Cloud: SPM offers an On Premises version in addition to SaaS.  Most app-specific monitoring tools are SaaS-only, but some organizations like their metrics and logs close to home base.
  • Native App Monitoring vs. 3rd Party Plugins: SPM monitors all apps natively.  If you are monitoring a number of individual apps via a range of 3rd party plugins then you have to deal with multiple installation and data collection mechanisms, various levels of maturity, and widely varying qualities of implementation and of reporting.
  • Anomaly Detection: SPM has support not only for heartbeat alerts and threshold-based alerts, but also for automatic machine learning based anomaly detection.  A Build system most likely does not have comprehensive anomaly detection capabilities.

And Then There is the Cost of Using All Those Different Monitoring Tools…

Cost comparisons between Building your own monitoring system and Buying a solution like SPM are not linear.  While some monitoring tools are open-source and free (though the time to configure them can be costly in its own right), commercial tools run a wide gamut of costs, infrastructure limits, data limits, time limits, pricing schemes, etc.  Just keeping track of the costs is often a job in itself.  In general, the more tools you have, the more value SPM delivers.

Here’s one scenario that will give you an idea of potential Build costs:

Build Your Own Monitoring System — Cost Scenario

  • Hourly rate:        $100 (ballpark figure; could be much higher)
  • Installation:        2 hours (very optimistic)
  • Configuration:   8 hours (very optimistic)
  • Maintenance:    2 hours/month (optimistic)
  • Upgrading:        2 days (i.e., ~20 hours)/year (IF all goes well!)
  • # of servers to run this configuration:  3 (monitoring 10 total servers*)
  • Cost per server (hardware): $1,000 each (i.e., $3,000 total)

___________________________________________________________

  • Total Cost in Year 1:        $6,200
  • Total Cost in Year 2:        $3,200 (not including any additional server purchases)
  • Total Cost in Year 3:        $3,200 (at least, though most likely higher)

And we didn’t even count the time cost to actually learn how to use all these tools!

Moreover, we used very optimistic numbers, assumed nothing will go wrong, assumed no issues like backwards incompatibilities, problems around dependencies, etc. etc. – all issues that are actually very common and can consume days and make the above costs much higher.  We do a ton of DevOps work at Sematext and, like everyone in this field, know how common this is.

* this number can vary widely, but for example purposes, if you want a complete monitoring solution that can do everything SPM can do — monitoring, alerting, anomaly detection, graph emailing, embedding, etc. — then the total servers that can be monitored with 3 monitoring servers will be lower than it would with a bare bones, incomplete monitoring tool.

SPM — Cost Scenario

  • # of servers: 10 servers (for example purposes)
  • Standard plan (our lowest cost plan beyond Free): $25/server/month
  • Time to Register and Install N agents: 1 hour (or $100 at hourly rate)

________________________________________________________________

  • Total Cost in Year 1:        $3,100
  • Total Cost in Year 2:        $3,000
  • Total Cost in Year 3:        $3,000

And these costs don’t even include any Volume Discounts that we would offer!

You can find clear, simple pricing plans for SPM right here.

Conclusion

While it’s great that there are many tools available for monitoring — some of them free — and communities built around those tools, in the DevOps world it still comes back to time.  Time to learn these tools.  Time to stay up-to-date on them.  Time to deploy.  Time to configure.  Time to maintain.  Time to assemble a bunch of disparate tools so you can monitor more than just one app.  You get the picture.  And time, as we all know, carries a cost.  With DevOps this is typically a significant cost.  So before undertaking a long and neverending “Build” journey, it makes sense to look at all the costs — in money and time — of buying a complete monitoring solution like SPM vs. building your own system. The closer you look, the more value a tool like SPM offers you and your organization.

Try SPM for Free for 30 Days

Tired of building, and building, and building…  Try SPM Performance Monitoring for Free for 30 days by registering here.  There’s no commitment and no credit card required.

Video and Slides: Centralized Logging with Logstash and Elasticsearch

Sematext engineer and Elasticsearch / Logstash expert Rafal Kuc gave a well-received talk at the recent DevOps Days Warsaw event.  The talk was titled “From Zero to Hero – Centralized Logging with Logstash & Elasticsearch” and you can watch the video here:

And check out the slides here:

Brief Summary

Rafal talked about the common problem of digging through logs to find one particular event — or group of them.  And going even further into this pain point — what if you have lots of servers and you don’t have a single place to look for logs?  Do you really want to ssh to one or more servers and grep log files?  Of course not!  It’s 2014 and there are tools and services that help you spend less time hunting around for problems and more time actually fixing them.

To help solve this problem Rafal guided the audience through the basics of using Logstash and Elasticsearch together as the perfect combination for handling logs from multiple applications.  Attendees also learned how to set up Logstash, how to configure it to parse logs and, finally, how to send them to an Elasticsearch cluster.

Rafal also discussed tuning Elasticsearch for log management and centralized logging purposes, and showed how to easily switch between shipping logs to a self-hosted solution like Elasticsearch / Logstash / Kibana (aka ELK) and instead ship logs to Logsene Log Management and Analytics by changing a single line in Logstash configuration.

See also:

Enjoy!  And thanks to everyone who attended Rafal’s talk in person and stopped by the Sematext booth.

Follow

Get every new post delivered to your Inbox.

Join 143 other followers