Unleashing the ELK

For the last several months, I’ve been working with the Elasticsearch-Logstash-Kibana (often referred to as ELK) software stack. The details that drove us to deploy this technology are briefly covered in an internal slidedeck which I’ve posted as a scrubbed version at Slideshare. I’m happy to report that we now have a functioning development environment happily consuming a half gigabyte of compressed perimeter firewall log data a day from which we’ve derived insight which our current SIEM platform has been unable to frame, let alone answer. In this post, I’ll cover a few of the core elements that make up our deployment. This lays the foundation for later deep dives into what has worked for us, what has worked less well, and where we hope to go.

Logstash

Logstash is the core log processing technology for taking log data from a variety of different sources, parsing and normalizing the data, and sending it along to a data store for analysis. Logstash is a veritable Swiss Army knife when it comes to data following the timetamp + event paradigm. The logstash software provides a flexible pipeline of customizable plugins for getting data (input plugins), processing that data (filter and codec plugins) and sending that data to repositories (output plugins). I was particularly drawn to logstash’s grok filter as easy way to write patterns that does not require a bunch of impossible to maintain regular expressions. I still have nightmares of a particularly nasty bit of perl code I wrote during my consulting days for processing BlueCoat proxy logs and generating analysis reports, a scenario with which many perl hackers can relate.

Redis

Redisis an awesome in-memory database/caching system. It provides an ultra-fast caching layer between the inputs (bound only by disk and network I/O) and the CPU-intensive processing layer. The pattern of logtash reader -> redis cache -> logstash processor is a very common one and has worked well for us. Redis gives us a dead simple buffer that allows events to flow without data loss while the infrastructure grows and contracts (more on this later).

Elasticsearch

is a distributed search engine built around the Apache Lucene core search technology. Elasticsearch allows you to horizontally scale (add more boxes) to your search cluster extremely easily, creating an aggregate processing pool that can execute queries across billions of documents (or log entries) in times measures in milliseconds. The description of Elasticsearch as a search engine can be limiting however. We’ve used scan and scroll queries in Elasticsearch to pull huge volumes of bulk data out of the platform. I look at Elasticsearch as a one-stop shop for most of my log querying needs and haven’t seen a need to throw any data into RDMS systems such as MariaDB or Postgres.

Chef and Vagrant

DevOps

world and pulled in the open source and Seattle based Chef configuration tool to manage my instances. I’m enthralled with the power of Chef to completely build out infrastructures with a single command with high reliability, full infrastrcture-as-code documentation, and a repeatable build process. Vagrantenables me to specify the bare computing resources and deploy with a single command a relatively complex multi-tier cluster on VMWare in one moment, then tear down the entire environment and deploy out on AWS for public consumption. Which brings me to the next element…

Sisyphus's Stone

Unleashing the ELK

Logstash

Redis

Elasticsearch

Chef and Vagrant

Amazon Web Services