Elasticsearch Training in London

3 Elasticsearch Classes in London



Elasticsearch for Developers ……. April 4-5

Elasticsearch for Logging ……… April 6

Elasticsearch Operations …….  April 6

All classes cover Elasticsearch 2.x

Hands-on — lab exercises follow each class section

Early bird pricing until February 29

Add a second seat for 50% off


Course overviews are on our Elasticsearch Training page.

Want a training in your city or on-site?  Let us know!

Attendees in all three workshops will go through several sequences of short lectures followed by interactive, group, hands-on exercises. There will be Q&A sessions in each workshop after each such lecture-practicum block.

Got any questions or suggestions for the course? Just drop us a line or hit us @sematext!

Lastly, if you can’t make it…watch this space or follow @sematext — we’ll be adding more Elasticsearch training workshops in the US, Europe and possibly other locations in the coming months.  We are also known worldwide for Elasticsearch Consulting Services, and Elasticsearch Production Support.
We hope to see you in London in April!

Core Solr Training in London


April 4 & 5 — Covers Solr 5.x

Hands-on — lab exercises follow each class section

Early bird pricing until February 29

Add a second seat for 50% off

Sematext is running a 2-day, very comprehensive, hands-on workshop in London on April 4 & 5 for Developers and DevOps who want to configure, tune and manage Solr at scale.

The workshop will be taught by Sematext engineer — and author of Solr books — Rafał Kuć. Attendees will go through several sequences of short lectures followed by interactive, group, hands-on exercises. There will be a Q&A session after each such lecture-practicum block.  See details, including training overview.



Target audience:

Developers who want to configure, tune and manage Solr at scale and learn about a wide array of Solr features this training covers in its 23 chapters – we mean it when we say this is comprehensive!

What you’ll get out of it:

In two days of training Rafal will:

  1. Bring Solr novices to the level where he/she would be comfortable with taking Solr to production
  2. Give experienced Solr users proven and practical advice based on years of experience designing, tuning, and operating numerous Solr clusters to help with their most advanced and pressing issues

When & Where:

  • Dates: April 4-5 (Monday & Tuesday)
  • Time: 9:00 am to 5:00 pm
  • Place: Imparando City of London Training Centre — 56 Commercial Road, Aldgate, London, E1 1LP (see map)
  • Cost: GBP £845.00 “Early Bird” rate (valid through February 29) and GBP £1.045.00 afterward.  There’s also a 50% discount for the purchase of a 2nd seat! (limit of 1 discounted seat per full-price seat)
  • Food/Drinks: Light morning & afternoon refreshments and Lunch will be provided

Got any questions or suggestions for the course? Just drop us a line or hit us @sematext!

Lastly, if you can’t make it…watch this space or follow @sematext — we’ll be adding more Solr training workshops in the US, Europe and possibly other locations in the coming months.  We are also known worldwide for our Solr Consulting Services and Solr Production Support.


Hope to see you in the London in April!  See detailed info about this training.



2015 in Review

Another year is behind us, and it’s been another good year for us at Sematext.  Here are the highlights in the chronological order.  If you prefer looking non-chronological overview, look further below.


We started the year by doing a ton of publishing on the blog – about Solr-Redis, about SPM and Slack, about Solr vs. Elasticsearch – always a popular topic, Spark, Kafka, Cassandra, Solr, etc.  Logsene being ELK as a Service means we made sure users have the freedom and flexibility to create custom Elasticsearch Index Templates in Logsene.


We added Account Sharing to all our products, thus making it easier to share SPM, Logsene, and Site Search Analytics apps by teams.  We made a big contribution to Kafka 0.8.2 by reworking pretty much all Kafka metrics and making them much more useful and consumable by Kafka monitoring agents.  We also added support for HAProxy monitoring to SPM.


We announced Node.js / io.js monitoring.  This was a release of our first Node.js-based monitoring monitoring agent – spm-agent-nodejs, and our first open-source agent.  The development of this agent resulted in creation of spm-agent an extensible framework for Node.js-based monitoring agents.  HBase is one of those systems with tons of metrics and with metrics that change a lot from release to release, so we updated our HBase monitoring support for HBase 0.98.


The SPM REST API was announced in April, and a couple of weeks later the spm-metrics-js npm module for sending custom metrics from Node.js apps to SPM was released on Github.


A number of us from several different countries gathered in Krakow in May.  The excuse was to give a talk about Tuning Elasticsearch Indexing Pipeline for Logs at GeeCon and give away our eBook – Log Management & Analytics – A Quick Guide to Logging Basics while sponsoring GeeCon, but in reality it was really more about Żubrówka and Vișinată, it turned out.  Sematext grew a little in May with 3 engineers from 3 countries joining us in a span of a couple of weeks.  We were only dozen people before that, so this was decent growth for us.

Right after Krakow some of us went to Berlin to give another talk: Solr and Elasticsearch – Side by Side with Elasticsearch and Solr: Performance and Scalability.  While in Berlin we held our first public Elasticsearch training and, following that, quickly hopped over to Hamburg to give a talk at a local search meetup.


In June we gave a talk on the other side of the Atlantic – in NYC – Beyond POC: Processing Metrics, Logs and Traces … at Scale.  We were conference sponsors there as well and took part in the panel about microservices.  We published our second eBook – Elasticsearch Monitoring Essentials eBook.  The two most important June happenings were the announcement of Docker monitoring – SPM for Docker – our solution for monitoring Docker containers, as well as complete, seamless integration of Kibana 4 into Logsene.  We’ve added Servers View to SPM and Logsene got much needed Alerting and Anomaly Detection, as well as Saved Searches and Scheduled Reporting.


In July we announced public Solr and Elasticsearch trainings, both in New York City, scheduled for October.  We built and open-sourced Logsene Command Line Interfacelogsene-cli – and we added Tomcat monitoring integration to SPM.


At Sematext we use Akka, among other things, and in August we introduced Akka monitoring integration for SPM and open-sourced the Kamon backend for SPM.  We also worked on and announced Transaction Tracing that lets you easily find slow transactions and bottlenecks that caused their slowness, along with AppMaps, which are a wonderful way to visualize all your infrastructure along applications running on it and see, in real-time, which apps and servers are communicating, how much, how often there are errors in each app, and so on.


In September we held our first 2 webinars on Docker Monitoring and Docker Logging.  You can watch them both in Sematext’s YouTube channel.


We presented From zero to production hero: Log Analysis with Elasticsearch at O’Reilly’s Velocity conference in New York and then Large Scale Log Analytics with Solr at Lucene/Solr Revolution in Austin.  After Texas we came back to New York for our Solr and Elasticsearch trainings.


Logsene users got Live Tail in November, while SPM users welcomed the new Top Database Operations report.  Live Tail comes in very handy when you want to semi-passively watch out for errors (or other types of logs) without having to constantly search for them.  While most SPM users have SPM monitoring agents on their various backend components, Top Database Operations gives them the ability to gain more insight in performance between the front-end/web applications and backend servers like Solr, Elasticsearch, or other databases by putting the monitoring agents on applications that act as clients for those backend services.  We worked with O’Reilly and produced a 3-hour Working with Elasticsearch Training Video.


We finished off 2015 by adding MongoDB monitoring to SPM, joining Docker’s ETP Program for Logging, further integrating monitoring and logging, ensuring Logsene works with Grafana, writing about monitoring Solr on Docker, publishing the popular Top 10 Node.js Metrics to Watch, as well as a SPM vs. New Relic APM comparison.  

Pivoting the above and grouping it by our products and services:


  • Live Tail
  • Alerting
  • Anomaly Detection
  • logsene-cli + logsene.js + logagent-js
  • Saved Searches
  • Scheduled Email Reporting
  • Integrated Kibana
  • Compatibility with Grafana
  • Search AutoComplete
  • Powerful click-and-filter
  • Native charting of numerical fields
  • Account Sharing



  • Transaction Tracing
  • SPM Tracing API
  • AppMap
  • NetMap
  • On Demand Profiling
  • Integration with Logsene
  • Expanded monitoring for Elasticsearch, Solr, HBase, and Kafka
  • Added monitoring for Docker, Node.js, Akka, MongoDB, HAProxy, and Tomcat
  • Birds Eye Servers View
  • Account Sharing



  • Docker Monitoring
  • Docker Logging



  • Elasticsearch training in Berlin
  • Solr and Elasticsearch trainings in New York



  • Elasticsearch Monitoring Essentials
  • Log Management & Analytics – A Quick Guide to Logging Basics


Talks / Presentations / Conferences:

  • Lucene/Solr Revolution, Austin, TX – Large Scale Log Analytics with Solr
  • Velocity Conference, NYC, NY – Log Analysis with Elasticsearch
  • Berlin Buzzwords, Berlin, Germany – Side by Side with Elasticsearch and Solr: Performance and Scalability
  • GeeCon, Krakow, Poland – Tuning Elasticsearch Indexing Pipeline for Logs
  • DevOps Days, Warsaw, Poland – Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
  • DevOps Expo, NYC, NY – Process Metrics, Logs, and Traces at Scale



All numbers are up  – our SPM and Logsene signups are up, product revenue is up a few hundred percent from last year, we’ve nearly doubled our blogging volume, our site traffic is up,we’ve made several UI-level facelifts for both apps.sematext.com and www.sematext.com, our team has grown, we’ve increased the number of our Solr and Elasticsearch Production Support customers, and we’ve added Solr and Elasticsearch Training to the list of our professional services.


Kafka Real-time Stream Multi-topic Catch up Trick

Half of the world, Sematext included, seems to be using Kafka.

Kafka is the spinal cord that connects various components in SPM, Site Search Analytics, and Logsene.  If Kafka breaks, we’re in trouble (but we have anomaly detection all over the place to catch issues early).  In many Kafka deployments, ours included, the most recent data is the most valuable.  Consider the case of Kafka in SPM, which processes massive amounts of performance metrics for monitoring applications and servers.  Clearly, in a performance monitoring system you primarily care about current performance numbers.  Thus, if SPM’s Kafka pipeline were to break and we restore it, what we’d really like to avoid is processing all data sequentially, oldest to newest.  What we’d prefer is processing new metrics data first and then processing older data using any spare capacity we have in order to “fill the gap” caused by Kafka downtime.

Here’s a very quick “video” that show this in action:

Kafka Catch Up
Kafka Catch Up


How does this work?

We asked about it back in 2013, but didn’t really get good tips.  Shortly after that we implemented the following logic that’s been working well for us, as you can see in the animation above.

The catch up logic assumes having multiple topics to consume from and one of these topics being the “active” topic to which producer is publishing messages. Consumer sets which topic is active, although Producer can also set it if it has not already been set. The active topic is set in ZooKeeper.

Consumer looks at the lag by looking at the timestamp that Producer adds to each message published to Kafka. If the lag is over N minutes then Consumer starts paying attention to the offset.  If the offset starts getting smaller and keeps getting smaller M times in a row, then Consumer knows we are able to keep up (i.e. the offset is not getting bigger) and sets another topic as active. This signals to Producer to switch publishing to this new topic, while Consumer keeps consuming from all topics.

As the result, Consumer is able to consume both new data and the delayed/old data and avoid not having fresh data while we are in catch-up mode busy processing the backlog.  Consuming from one topic is what causes new data to be processed (this corresponds to the right-most part of the chart above “moving forward”), and consuming from the other topic is where we get data for filling in the gap.

If you run Kafka and want a good monitoring tool for Kafka, check out SPM for Kafka monitoring.


Poll: How do you ship your Logs?

Recently, a few people from Sematext’s Logsene team debated about how useful the “structured” part of syslog logs (those using the RFC5424 format) is to people.  Or has shipping logs in other structured formats, like JSON, made RFC5424 irrelevant to most people?  Here is a quick poll to help us all get some insight into that.

NOTE: the question is not what your logs looks like initially when they are generated, but about what structure they have when you ship them to a centralized logging service, whether running within your organization or a logging SaaS like Logsene.

If you choose “None of the above” please do leave a comment to help spread knowledge about alternatives.


SPM Client Async Metrics Collection

Do your SPM charts ever look choppy like this?


Let’s hope not!  However, if you run large Elasticsearch clusters, especially those with thousands of shards, you may suffer from Elasticsearch sometimes taking a long time to provide information about its metrics that SPM, of course, uses to render various metrics charts.  While this slowness is something Elasticsearch developers are undoubtedly working on, we’ve also proactively improved the SPM Client (aka agent, aka monitor), specifically by making metric collection from various sources asynchronous, to better cope with this slowness.

There are other goodies in more recent versions of the SPM Client, including additional optimizations (some specific to Elasticsearch, but others beneficial to all SPM users), but also some that will enable exciting new functionality we’ll be announcing next month.

To get the latest SPM Client please go to https://apps.sematext.com/spm-reports/client.do.

Custom Metrics from Node.js Apps

We recently added support for Node.js and io.js monitoring to SPM and have received great feedback.  While SPM for Node.js monitors all key Node.js metrics, most applications have additional metrics one often wants to track — things like: the number of concurrent users, the number of items placed in a shopping cart, or any other kind of IT metric, business transaction or KPI.  SPM already provides a Custom Metrics API and libraries that make shipping custom metrics from Java and from Ruby applications a snap.  But why leave Node.js behind?  Meet spm-metrics-js (it’s on Github) – the npm module for sending custom metrics from Node.js apps to SPM.  

This JavaScript module supports measurements using counters, meters, timers, and histograms. These helpers calculate values of metrics objects and ship them to SPM, where they are then turned into charts and inputs to alert rules and anomaly detection algorithms.

Here’s an example for counting users on login and logout:

Sending custom metrics is really that easy!

Now, let’s have a look at the options used when creating a custom metric object:

  • name – the name of the metric you can find in SPM’s user interface
  • aggregation – the aggregation type: ‘avg’, ‘sum’, ‘min’ or ‘max’ used in SPM’s aggregations server
  • filter1 – the SPM user interface provides two filter criteria; the value will be available in the UI as the first filter
  • filter2 – the filter value for the second filter field in SPM’s UI
  • interval – time in ms to call save() periodically. Defaults to no automatic call to save(). The save() function captures the metric and resets meters, histograms, counters or timers.
  • valueFilter – array of property names for calculated values. Only specified fields are sent to SPM (e.g. [‘count’, ‘min’, ‘max’].

Additional measurement functions are available to extend the custom metric object automatically with additional calculated properties:

  • Meter – measure rates and provide the following calculated properties:
    • mean: the average rate since the meter was started
    • count: the total of all values added to the meter
    • currentRate: the rate of the meter since the meter was started
    • 1MinuteRate: the rate of the meter biased toward the last 1 minute
    • 5MinuteRate: the rate of the meter biased toward the last 5 minutes
    • 15MinuteRate: the rate of the meter biased toward the last 15 minutes
  • Histogram – build percentile, min, max, & sum aggregations over time
    • min: the lowest observed value
    • max: the highest observed value
    • sum: the sum of all observed values
    • variance: the variance of all observed values
    • mean: the average of all observed values
    • stddev: the stddev of all observed values
    • count: the number of observed values
    • median: 50% of all values in the reservoir are at or below this value.
    • p75: see median, 75% percentile
    • p95: see median, 95% percentile
    • p99: see median, 99% percentile
    • p999: see median, 99.9% percentile
  • Timer – measures time and captures rates in an internal meter and histogram

If this is more than you actually need, we recommend selecting only the relevant properties (using the ‘valueFilter’ option). Please note that Custom Metrics are aggregated by the specified aggregation type (‘avg’, ‘sum’, ‘min’, ‘max’).  Moreover, the aggregation type for each property can be defined – for further details please check the package documentation.

Adding instrumentation always raises the question of performance; in spm-metrics-js all metrics are buffered and efficiently ship metrics to SPM in bulk using asynchronous functions. We recommend using a transmit time of 60 seconds.

Once you send custom metrics to SPM you can create alerts on them, have SPM detect and alert you about anomalies, put charts with those metrics on dashboards, share charts with those metrics publicly or just with your team or organization, etc.

Actions for Metrics – e.g. define alerts using anomaly detection
Dashboard with Custom Metric and other Metrics

Please note the free plan has no limits on the number of monitored Applications, Processes, Dashboards or Users and you can share Accounts with your whole DevOps team and integrate SPM with Slack, HipChat, PagerDuty, Webhooks, etc. If you don’t use SPM yet, grab a free account to start monitoring your Node.js and io.js applications and benefit from all standard SPM features such as alerting, anomaly detection, event and log correlation, unlimited dashboards, secure information sharing, etc. Check out spm-metrics-js (or on Github) and drop us a line (or tweet 140 characters to @sematext) — we’d love to hear from you!