Introducing Akka Monitoring

Akka is a toolkit and runtime for building highly concurrent, distributed and resilient message-driven applications on the JVM. It’s a part of Scala’s standard distribution for the implementation of the “actor model”.

How Akka Works

Messages between Actors are exchanged in Mailbox queues and Dispatcher provides various concurrency models, while Routers manage the message flow between Actors. That’s quite a lot Akka is doing for developers!

But how does one find bottlenecks in distributed Akka applications? Well, many Akka users already use the great Kamon Open-Source Monitoring Tool, which makes it easy to collect Akka metrics.  However — and this is important! — predefined visualizations, dashboards, anomaly detection, alerts and role-based access controls for the DevOps team are out of scope for Kamon, which is focused on metrics collection only.  To overcome this challenge, Kamon’s design makes it possible to integrate Kamon with other monitoring tools.

Needless to say, Sematext has embraced this philosophy and contributed the Kamon backend to SPM.  This gives Akka users the option to use detailed Metrics from Kamon along with the visualization, alerting, anomaly detection, and team collaboration functionalities offered by SPM.

The latest release of Kamon 0.5.x includes kamon-spm module and was announced on August 17th, 2015 on the Kamon blog.  Here’s an excerpt:

Pavel Zalunin from Sematext contributed the new kamon-spm module, which as you might guess allows you to push metrics data to the Sematext Performance Monitor platform. This contribution is particularly special to us, given the fact that this is the first time that a commercial entity in the performance monitoring sector takes the first step to integrate with Kamon, and they did it so cleanly that we didn’t even have to ask any changes to the PR, it was just perfect. We sincerely hope that more companies follow the steps of Sematext in this matter.

Now let’s take a look at the result of this integration work:

  • Metrics pushed to SPM are displayed in predefined reports, including:
    • An overview of all key Akka metrics
    • Metrics for Actors, Dispatchers and Routers
    • Common metrics for CPU, Memory, Network, I/O,  JVM and Garbage Collection
  • Each chart has the “Action” menu to:
    • Define criteria for anomaly detection and alerts
    • Create scheduled email reports
    • Securely share charts with read-only links
    • Embed charts into custom dashboards
  • A single SPM App can take metrics from multiple hosts to monitor a whole cluster; filters by Host, Actor, Dispatcher, and Router make it easy to drill down to the relevant piece of information.
  • All other SPM features, are available for Akka users, too.  For example:

Akka_overview

Akka Metrics Overview

Actor Metrics

Actors send and receive messages, therefore the key metrics for Actors are:

  • Time in Mailbox
    Messages are waiting to be processed in the Mailbox – high Time in Mailbox values indicate potential delays in processing.
  • Processing Time
    This is the time Actors need to process the received messages – use this to discover slow Actors
  • Mailbox Size
    Large Mailbox Size could indicate pending operations, e.g. when it is constantly growing.

Each of the above metrics is presented in aggregate for all Actors, but one can also use SPM filtering feature to view all Actors’ metrics separately or select one or more specific Actors and visualize only their metrics.  Filtering by Host is also possible, as show below.

Akka_actors

Akka Actors

Dispatcher Metrics

In Akka a Dispatcher is what makes Actors ‘tick’. Each Actor is associated with a particular Dispatcher (default one is used if no explicit Dispatcher is set). Each Dispatcher is associated with a particular Executor – Thread Pool or Fork Join Pool. The SPM Dispatcher report shows information about Executors:

  • Fork Join Pool
  • Thread Pool Executor

All metrics can be filtered by Host and Dispatcher.

Akka_dispatchers

Akka Dispatchers

Router Metrics

Routers can be used to efficiently route messages to destination Actors, called Routees.

  • Routing Time – Time to route message to selected destination
  • Time In Mailbox – Time spent in routees mailbox.
  • Processing Time – Time spent by routee actor to process routed messages
  • Errors Count – Errors count during processing messages by routee

For all these metrics, lower values are better, of course.

Akka_routers

Akka Routers

You can set Alerts and enable Anomaly Detection for any Akka or OS metrics you see in SPM and you can create custom Dashboards with any combination of charts, whether from your Akka apps or other apps monitored by SPM.

We hope you like this new addition to SPM.  Got ideas how we could make it more useful for you?  Let us know via comments, email, or @sematext.

Not using SPM yet? Check out the free 30-day SPM trial by registering here (ping us if you’re a startup, a non-profit, or education institution – we’ve got special pricing for you!).  There’s no commitment and no credit card required.  SPM monitors a ton of applications, like Elasticsearch, Solr, Cassandra, Hadoop, Spark, Node.js (open-source), Docker (get open-source Docker image), CoreOS, RancherOS and more.

Introducing AppMap

[Note: This post is part of a series on Transaction Tracing — links to the other posts are at the bottom on this post]

As mentioned in the Transaction Tracing for Performance Bottleneck Detection and Transaction Tracing Reports and Traces posts, when you enable Transaction Tracing in SPM you will also automatically get:

  • Request Throughput
  • Request Latency
  • Error & Exception Rates
  • AppMap

Today we’re happy to officially introduce AppMaps. What’s AppMap? As you can see below, AppMap is a map-like visual representation of your complete application architecture. AppMaps show which components are communicating with which components, at what throughput and latency, at what network speed, whether there are any errors between them, etc.  Connections to external services and databases are also captured and visualized.

As such, AppMaps help you:

  • Instantly see your whole architecture and its operational state and health
  • Bring up to speed new team members by showing them the current architecture instead of showing them outdated architecture diagrams;
  • Keep the whole team up to date about the latest architecture

AppMap1_annotated

Things to note:

  • Errors and exceptions are shown in red when they are detected
  • Components are color-coded:
    • Orange components represent external HTTP services
    • Green components are databases (e.g., SQL server has its own shade of green; other databases have their own shades)
    • Blue components are other SPM Apps (e.g., Elasticsearch has its own shade, etc., etc.)
  • Arrows between components have variable thickness – thicker arrows mean bigger throughput (rpm).
  • Greater opacity means smaller latency.

Clicking on any of the components on the AppMap shows more details about that component, such as:

  • Overall Throughput, Latency, Error and Exception rates (also shown as sparklines)
  • Incoming and Outgoing connections and Throughput and Latency between them
  • List of Hosts/Nodes when an SPM App is selected with Throughput, Latency, Error and Exception rates for each of them

AppMap2_annotated

If you’d like to see AppMap for your applications, do the following:

That’s it!

Not using SPM yet, but would like to trace your apps? Easy: register here — there is no commitment and you can leave your credit card in your wallet.  You get 30 days Free for new SPM Apps so even if you don’t end up falling in love with SPM for monitoring, alerting and anomaly detection, or Logsene for your logs, you can use the Distributed Transaction Tracing to quickly speed up your apps!  Oh, and if you are a young startup, a small or non-profit organization, or an educational institution, ask us for a discount (see special pricing)!

——-

Here are the other posts in our Transaction Tracing series:

Transaction Tracing Reports and Traces

[Note: This post is part of a series on Transaction Tracing — links to the other posts are at the bottom of this post]

If you missed the Distributed Transaction Tracing Intro, here’s the key bit you should know:

Distributed Transaction Tracing is great for:

  • Pinpointing root causes of poor application performance
  • Finding the slowest parts of your application
  • Tracing requests across networks and apps (hence “Distributed”!)
  • Works for Java and Scala apps

It’s also worth repeating that enabling Transaction Tracing provides more than just transaction traces, such as:

  • Your app’s Request Throughput, Response Latency, plus Error & Exception Rates
  • AppMap, which which how various components in your infrastructure communicate with each other

Now let’s run through a few reports Transaction Tracing provides in SPM.

Top 10 Slowest / Fastest Controllers

Under the new Transactions tab you will first see an overview like this:

transaction-tracing-overview-annotated

On the left side we see the 10 slowest Controllers (actually methods inside them).  You can also see Top 10 Controllers by throughput or time consumed.

On the right side you can see request latency and throughput.

Not shown in this screenshot are a few more charts that show counts and rates of errors, exceptions, and requests that resulted in a 4XX or 5XX response code.

Top 10 Slowest Transactions

Clicking on one of the controllers shows the slowest transactions for that controller, as seen below:

transaction-tracing-slowest-transactions

Failed transactions are those that resulted in an error, exception, 4XX or 5XX error code.

As you can tell from this screenshot, these transactions are clickable.  Clicking on them shows details about a transaction, including all request parameters, response code, the exact URL, stop and start time, response code, etc. and, of course, the actual call trace itself, shown below:

transaction-tracing-trace

Component Counting and Timing

Transaction tracing distinguishes between various components, such as JSPs, SQL, JPA, HTTP, etc.  It counts calls in those components and keeps track of how much time was spent in each of them.  This means that if your database calls are slow, for example, this report will show that and you’ll know what you need to optimize.

transation-tracing-components-count-duration

The little green “Logs” button in top-right is not associated with transaction tracing, but it’s worth describing.  If you ship your logs to Logsene, this button will pull in your log chart as well as the actual application logs into the SPM UI, thus allowing to troubleshoot performance issues much, much faster!

Transaction Component Breakdown

Similar to the the above Components chart, SPM shows component call count and execution duration breakdown in a tabular view.

transaction-tracing-component-breakdown

Here are the key points about SPM’s transaction tracing:

  • Transaction Tracing does not require you to modify any source code – the instrumentation is done automatically, at the JVM bytecode level
  • Transaction Tracing is currently available for Java and Scala applications running inside the JVM
  • We support deep insight into specific technologies listed in SPM Transaction Tracing documentation
  • You’ll want to grab the latest version of the SPM client (it has some optimizations, too!)
  • You’ll need to use the SPM monitor in the embedded (aka javaagent) mode, not standalone
  • To add Transaction Tracing to your own custom apps you can easily create custom pointcuts

Not using SPM yet, but would like to trace your apps? Easy: register here — there is no commitment and you can leave your credit card in your wallet.  You get 30 days Free for new SPM Apps so even if you don’t end up falling in love with SPM for monitoring, alerting and anomaly detection, or Logsene for your logs, you can use the Distributed Transaction Tracing to quickly speed up your apps!  Oh, and if you are a young startup, a small or non-profit organization, or an educational institution, ask us for a discount (see special pricing)!

——-

Here are the other posts in our Transaction Tracing series:

Transaction Tracing for Performance Bottleneck Detection

[Note: This post is part of a series on Transaction Tracing — links to the other posts are at the bottom of this post]

When you’re building a monitoring solution or evaluate existing ones, what do you look for?  Probably these four core aspects of functionality:

  1. Collection and display of metrics
  2. Alerting based on metric values and anomalies
  3. Collection and display of server and application logs and other types of events
  4. Alerting based on log patterns and metrics extracted from logs

But there is really one more juicy piece of functionality one should look for:

  • Distributed Transaction Tracing

This can be especially useful in Microservices architectures where complex applications are composed of multiple components and services talking to each other over the network while servicing user requests.  As a matter of fact, Dennis Callaghan, senior analyst of infrastructure software at 451 Research, points out:

Microservices solve a lot of challenges, and that’s why they are becoming the standard architecture both within and between applications. We anticipate accelerated adoption of microservices in enterprises this year. But those enterprises need two things in order to effectively monitor microservices architectures. One is the ability to see application and transaction behavior and trace transactions across these increasingly complex and distributed environments. The other is an APM economic model that makes sense and reflects the need to monitor many more smaller instances.

SPM has always had the “APM economic model that makes sense and reflects the need to monitor many more smaller instances”, which is basically the metered model where you pay only for what you use.  This post is about the other key part highlighted in Dennis Callaghan’s statement: “ability to see application and transaction behavior and trace transactions across these increasingly complex and distributed environments”.

Read more of this post

How to Add Performance Monitoring to Node.js and io.js Applications

We have been using Node.js here at Sematext and, since eating one’s own dogfood is healthy, we wanted to be able to monitor our Node.js apps with SPM (we are in performance monitoring and big data biz). So, the first thing to do in such a case is to add monitoring capabilities for technology we use in-house (like we did for Java, Solr, Elasticsearch, Kafka, HBase, NGINX, and others).  For example we monitor Kibana4 servers (based on Node.js), which we have in production for our “1-click ELK stack”.

You may have seen our post about SPM for Node.js  —  but I thought I’d share a bit about how we monitor Node.js to help others with the same DevOps challenges when introducing new Node.js apps, or even the additional challenge of operating large deployments with a mix of technologies in the application stack:

1) npm i spm-agent-nodejs

It’s open-sourced on Github: sematext/spm-agent-nodejs

2) add a new SPM App for Node.js — each monitored App is identified by its App-Token (and yes — there is an API to automate this step)

3) set the Environment variable for the application token

export SPM_TOKEN=YOUR_TOKEN

4) add one line to the beginning of your source code when using node.js, for io.js got a better option/see below …

var spmAgent = require (‘spm-agent-nodejs’)

5) Run your app and after 60 seconds you should start seeing metrics in SPM

At this point what do I get? I can see pre-defined metric charts like these, with about 5 minutes of work :)

nodejs_1

I saved time already —there’s  no need to define metric queries/widgets/dashboards

Now I can set up alerts on Latency or Garbage Collection, or I can have anomaly detection tell me when the number of Workers in a dynamic queue changes drastically. I typically set ‘Algolerts’ (basically machine learning-based anomaly detection) to get notified (e.g. via PagerDuty) when a service suddenly slows down because  they produce less noise than regular threshold alerts. In addition, I recommend adding Heartbeat alerts for each monitored service to be notified of any server outages or network problems. In our case, where a Node.js app runs tasks on Elasticsearch, it makes sense to create a custom dashboard to see Elasticsearch and Node.js metrics together (see 2nd screenshot above) — of course, this is applicable for other applications in the stack like NGINX, Redis or HAProxy — and can be combined with Docker container metrics

nodejs_2

In fact, you can use the application token for multiple servers  to see how your cluster behaves using the “birds eye view” (a kind of top + df to show the health of all your servers)

Now, let’s have a look at how the procedure differs when using io.js …

io.js Supports Preloading Modules

When we use io.js preload command-line option, we can add instrumentaion without adding the require statement for ‘spm-agent-nodejs’ to the source code:

That’s why Step 4) could be done even better with the io.js (>1.6) :

iojs -r “./spm-agent-nodejs” yourApp.js

This is just a little feature but it shows how the io.js community is listening to the needs of users and is able to release such things quickly.

If you want to try io.js, here is how to install it:

npm i n -g

n io 2.4

The ‘node’ executable is now linked to ‘iojs’ — to switch back to node 0.12 simply use

n 0.12

I hope this helps.  If you’d like to see some Node.js / io.js metrics that are currently not being captured by SPM then please hit me on Twitter — @seti321 — or drop me an email.  Or, even better, simply open an issue here: https://github.com/sematext/spm-agent-nodejs/   Enjoy!

Centralized Log Management and Monitoring for CoreOS Clusters

[Note: We’re holding Docker Monitoring and Docker Logging webinars in September — sign up today!]

If you’ve got an interest in things like CoreOS, logs and monitoring then you should check out our previous CoreOS-related posts on Monitoring Core OS Clusters and how to get CoreOS logs into ELK in 5 minutes.  And they are only the start of SPM integrations with CoreOS!  Case in point: we have recently optimized the SPM setup on CoreOS and integrated a logging gateway to Logsene into the SPM Agent for Docker.  And that’s not all…

In this post we want to share the current state of CoreOS Monitoring and Log Management from Sematext so you know what’s coming — and you know about things that might be helpful for your organization, such as:

  1. Feature Overview
  2. Fleet Units for SPM
  3. How to Set Up Monitoring and Logging Services

1. Feature Overview

  • Quick setup
    • add monitoring and logging for the whole cluster in 5 minutes
  • Collection Performance Metrics for the CoreOS Cluster
    • Metrics for all CoreOS cluster nodes (hosts)
      • CPU, Memory, Disk usage
    • Detailed metrics for all containers on each host
      • CPU, Memory, Limits, Failures, Network and Disk I/O, …
    • Anomaly detection and alerts for all metrics
    • Anomaly detection and alerts for all logs
  • Correlated Container Events, Metrics and Logs
    • Docker Events like start/stop/destroy are related to deployments, maintenance or sometimes to errors and unwanted restarts;  correlation of metrics, events and logs is the natural way to discover problems using SPM.

Docker Events

  • Centralized configuration via etcd
    • There is often a mix of configurations in environment variables, static settings in cloud configuration files, and combinations of confd and etcd. We decided to have all settings stored in etcd, so the settings are done only once and are easy to access.
  • Automatic Log Collection
    • Logging gateway Integrated into SPM Agent
      • SPM Agent for Docker includes a logging gateway service to receive log message via TCP.  The service discovery is solved via etcd (where the exposed TCP is stored). All received messages are parsed, and the following formats are supported:
        • journalctl -o short | short-iso | json
        • integrated messages parser (e.g. for dockerd time, level and message text)
        • line delimited JSON
        • plain text messages
        • In cases where the parsing fails, the gateway adds a timestamp and keeps the message 1:1.
      • The logging gateway can be configured with the Logsene App Token – this makes it compatible with most Unix tools e.g. journalctl -o json -n 10 | netcat localhost 9000
      • SPM for Docker collects all logs from containers directly from the Docker API. The logging gateway is typically used for system logs – or anything else configured in journald (see “Log forwarding service” below)
      • The transmission to Logsene receivers is encrypted via HTTPS.
    • Log forwarding service
      • The log forwarding service streams logs to the logging gateway by pulling them from journald. In addition, it saves the ‘last log time’ to recover after a service restart. Most people take this for granted; but not all logging services have such a recovery function.  There are many tools which just capture the current log stream. Often people realize this only when they miss logs one day because of a reboot, network outage, software update, etc.  But these are exactly the types of situations where you would like to know what is going on!
SPM integrations into CoreOS

SPM integrations into CoreOS

2. Fleet Units for SPM

SPM agent services are installed via fleet (a distributed init system) in the whole cluster. Lets see those unit files before we fire them up into the Cloud.

The first unit file spm-agent.service starts SPM Agent for Docker. It takes the SPM and Logsene app tokens and port for the logging gateway etcd. It starts on every CoreOS host (global unit).

spm-agent.service

Fleet Unit File – SPM Agent incl. Log Gateway: spm-agent.service

The second unit file logsene-service.service forwards logs from journald to that logging gateway running as part of spm-agent-docker. All fields stored in the journal (down to source-code level and line numbers provided by GO modules) are then available in Logsene.

logsene-service

Fleet Unit File – Log forwarder: logsene.service

3. Set Up Monitoring and Logging Services

Preparation:

  1. Get a free account apps.sematext.com
  2. Create an SPM App of type “Docker” and copy the SPM Application Token
  3. Store the configuration in etcd
# PREPARATION
# set your application tokens for SPM and Logsene
export $SPM_TOKEN=YOUR-SPM-TOKEN
export $LOGSENE_TOKEN=YOUR-LOGSENE-TOKEN
# set the port for the Logsene Gateway
export $LG_PORT=9000
# Store the tokens in etcd
# please note the same key is used in the unit file!
etcdctl set /sematext.com/myapp/spm/token $SPM_TOKEN
etcdctl set /sematext.com/myapp/logsene/token $LOGSENE_TOKEN
etcdctl set /sematext.com/myapp/logsene/gateway_port $LG_PORT
 

Download the fleet unit files and start the service via fleetclt

# INSTALLATION
# Download the unit file for SPM
wget https://raw.githubusercontent.com/sematext/spm-agent-docker/master/coreos/spm-agent.service
# Start SPM Agent in the whole cluster
fleetctl load spm-agent.service; fleetctl start spm-agent.service
# Download the unit file for Logsene
wget https://raw.githubusercontent.com/sematext/spm-agent-docker/master/coreos/logsene.service
# Start the log forwarding service
fleetctl load logsene.service; fleetctl start logsene.service

Check the installation

systemctl status spm-agent.service
systemctl status logsene.service

Send a few log lines to see them in Logsene.

journalctl -o json -n 10 | ncat localhost 9000

After about a minute you should see Metrics in SPM and Logs in Logsene.

Core-OS-BEV

Cluster Health in ‘Birds Eye View’

docker-overview-2

Host and Container Metrics Overview for the whole cluster

logs

Logs and Metrics

Open-Source Resources

Some of the things described here are open-sourced:

Summary – What this gets you

Here’s what this setup provides for you:

  • Operating System metrics of each CoreOS cluster node
  • Container and Host Metrics on each node
  • All Logs from Docker containers and Hosts (via journald)
  • Docker Events from all nodes
  • CoreOS logs from all nodes

Having this setup allows you to take the full advantage of SPM and Logsene by defining intelligent alerts for metrics and logs (delivered via channels like e-mail, PagerDuty, Slack, HipChat or any WebHook), as well as making correlations between performance metrics, events, logs, and alerts.

Running CoreOS? Need any help getting CoreOS metrics and/or logs into SPM & Logsene?  Let us know!  Oh, and if you’re a small startup — ping @sematext — you can get a good discount on both SPM and Logsene!

Tomcat Monitoring SPM Integration

This old cat, Apache Tomcat, has been around for ages, but it’s still very much alive!  It’s at version 8, with version 7.x still being maintained, while the new development is happening on version 9.0.  We’ve just added support for Tomcat monitoring to the growing list of SPM integrations the other day, so if you run Tomcat and want to take a peek at all its juicy metrics, give SPM for Tomcat a go!  Note that SPM supports both Tomcat 7.x and 8.x.

Before you jump to the screenshot below, read this: you may reeeeeally want to enable Transaction Tracing in the SPM agent running on your Tomcat boxes.  Why?  Because that will help you find bottlenecks in your web application by tracing transactions (think HTTP requests + method call + network calls + DB calls + ….).  It will also build a whole map of all your applications talking to each other with information about latency, request rate, error and exception rate between all component!  Check this out (and just click it to enlarge):

AppMap

Everyone loves maps.  It’s human. How about charts? Some of us have a thing for charts. Here are some charts with various Tomcat metrics, courtesy of SPM:

Overview  (click to enlarge)

Tomcat_overview_2

Session Counters  (click to enlarge)

Tomcat_Session_Counters

Cache Usage  (click to enlarge)

Tomcat_Sessions_4

Threads (Threads/Connections)  (click to enlarge)

Tomcat_threads_4

Requests  (click to enlarge)

Tomcat_Requests

Hope you like this new addition to SPM.  Got ideas how we could make it more useful for you?  Let us know via comments, email, or @sematext.

Not using SPM yet? Check out the free 30-day SPM trial by registering here (ping us if you’re a startup, a non-profit, or education institution – we’ve got special pricing for you!).  There’s no commitment and no credit card required.  SPM monitors a ton of applications, like Elasticsearch, Solr, Hadoop, Spark, Node.js & io.js (open-source), Docker (get open-source Docker image), CoreOS, and more.

Follow

Get every new post delivered to your Inbox.

Join 173 other followers