Skip to content. | Skip to navigation

Personal tools
Log in
You are here: Home - Blog

How to override the reported hostname in Ganglia

Posted by admin |

In the global section, put:

override_hostname = your.preferred.hostname

Nov 23, 2015 01:52

Keep your tmux sessions although client has upgraded

Posted by admin |

I started an upgrade of a machine and decided to do it over tmux so that losing the SSH connection would be no biggie. However when I came back I wanted to reattach to the tmux session and couldn't; the client had bee upgraded so it wouldn't connect to the old sessions. Then you can do like this:

pgrep tmux
<process id>
/proc/<process id>/exe attach


Pretty awesome hack, if you need your tmux working and not want to lose all your sessions:

Read more: Link - tmux - protocol version mismatch (client 8, server 6) when trying to upgrade - Unix & Linux Stack Exchange

Nov 19, 2015 04:59

yEd - a very nice graphing and diagramming tool

Posted by admin |

Written in Java and hence might run on many operating systems. Really nicely done. You get good results quickly.

Example (I suck at graphics though):

Nov 19, 2015 07:30

"inner-dest plugin username not found": You need to use 3.6+ of syslog-ng

Posted by admin |

If you get the message:

Error parsing afmongodb, inner-dest plugin username not found

in syslog-ng when trying to connect to mongodb, it is likely that you are running a version of syslog-ng that has support for mongodb, but not support for mongodb authentication. An example of such a versions is 3.5.3-1, which is as of this writing the one in the standard repositories for Ubuntu 14.04LTS. Version 3.6+ are supposed to have support for mongo authentication.

Revision: 3.5.3-1 [@9695e81] (Ubuntu/14.04)"
Nov 16, 2015 01:31

Open source logging, analysis and monitoring tools

Posted by admin |

logo transparent 200x75  kibana flume logosyslog ng rsylogmunin rrdtool 3dlogomongodb graphite postgresql elasticsearchinfluxdb    riak riemann ganglia   pcp logo fluentd logo collectd  statsdlogstash

An attempt to structure what open source logging and monitoring tools are available. I've just started checking out this area.

This article will first put structure to logging and monitoring needs and then list what is available, with short descriptions, categorized.

The use case for my analysis is an outfit of a dozen or so publically reachable machines, with in-house custom built services reachable over http with Rest, JSON-RPC and as web pages. Supporting these services there are database servers holding hundreds of Gigabytes of data, and a couple of other servers specific to the business.

A high-level overview of the field may look like this:

Logs and metrics -> Aggregation -> Monitoring -> Notification

Logs and metrics -> Aggregation -> Storage ->  Log analysis

Availability, sanity and fixing

So, why should you monitor servers and log data from them? It could be divided into ensuring the availability of your systems, the sanity of your systems and fixing the systems:

Availability (Monitoring)

Are the servers on-line and the components working? How would you know? You could have:

  • Alarms sent when the monitoring system detects services not working at all or other critical conditions
  • As an aside, you could also consider a bit of "monitor-less monitoring" -Let the customers do the monitoring,  and have a way for them to quickly indicate that something isn't running smoothly. For example as a form that submits the problem with automatic indication of what machine/service that message comes from, or just a text with indication of where to file a ticket.
  • There is probably a minimum good set of monitor info you want from the system in general: CPU, memory, disk space, open file descriptors.
  • There should be a place where you can see graphs of the last seven days of monitoring output.
  • Monitoring of application-level services, such as those running under a process manager such as pm2 or supervisord. At a minimum memory consumption per process and status


Sanity (Monitoring)

Even if a a system is available and responding to the customer's actions, it may not be accurate.

  • No instrumentation needed for this on the servers, simply monitor services from another machine, make http requests. Check response time and accuracy of result. Will also catch network connectivity issues. This is similar to end-to-end tests, integration tests and regression tests, but on live data.

Fixing (Logging)

  • Why did the problem come about? - Traceback and error logging, comparing logs from different subsystems. There ought to be ready-made instrumentation for services used on the machine: PostgreSQL, MongoDB, Nginx and such. It is important to make sure the systems log enough info, especially your own software. If the space requirements get big, be aggressive with log rotation. There are a number of standardized log formats:

Standardized log records

There are a couple of standards with regards to the format of log records. I believe RFC 5424 is more modern than RFC 3164, and Gelf is becoming a bit of a de facto standard in newer systems (Graylog, log4j), with log data encoded in JSON.


Logging should answer the questions, in increasing order of ambition:

  • When and for how long? - When did the problem occur and how long did it persist?
  • How? - How did the problem manifest itself, i.e. out of memory, out of file descriptors
  • Why? - Why did the problem come about?

Data interfaces/aggregators

Ok, going back to the diagrams:

Logs and metrics -> Aggregation -> Monitoring -> Notification

Logs and metrics -> Aggregation -> Storage ->  Log analysis

Firstly data needs to made available for aggregation. In some cases it is about making accessible log messages that are already produced. In other cases it means introducing new data collecting services (metrics).

Writing logs to a log file that nobody knows about does not count as making data available. However writing to a well known logging service makes data available. A process that finds log files and then reads from them also makes data available.

Data interfaces/aggregators software


Analyzers and visualizers - monitoring

After you have the data, you may want to monitor and react to events and unusual circumstances. A monitoring tool can react when thresholds are reached and often can calculate and compare values, also over some (limited) time. There are also possibilities to do visualizations.

Analyzers and visualizers -logging

There are basically two kinds of analysis: One of time series data where graphs are of help, the other is events such as errors which is more texual data

  1. Store numeric time-series data
  2. Render graphs of this data on demand


Protocol brokers

These translate one protocol into another, or aggregates (which makes them a bit like a category further up). The contents of this category is just a selection of some that I found and are mostly for inspiration if/when I need to fit pieces together that may need some programming/adapting.

Storage back ends

Usually bundled in or required by log analyzers

  • postgresql  PostgreSQL
  • mongodb MongoDB
  • influxdb InfluxDB - Event database
  • RRD - Round robin database
  • Whisper - part of Graphite. In its turn uses storage back ends.

Mega systems - all in one



Nov 14, 2015 12:20

Installing syslog-ng on Ubuntu 14.04 LTS

Posted by admin |

You need to explicitly install syslog-ng-core


I think this workaround should do it: apt-get install syslog-ng syslog-ng-core

Read more: Link - Bug #1242173 “syslog-ng package fails to install” : Bugs : syslog-ng package : Ubuntu

Nov 13, 2015 12:10

jq: How to filter an array and add suffixes to values

Posted by admin |

Given the output of the jlist CLI command from pm2, you could filter it like this:

jq '.[]|{service:(.name| . += "memory"),  mem: .monit.memory},{ cpu: .monit.cpu, status: .pm2_env.status}'


.[] means for each object in array, basically. The pipe afterwards will be executed once per array object

(.name| . += "memory") means pipe value of the name key to the next function, represent it there with a dot, and add the string "memory" to it

Nov 11, 2015 03:36

How to get a terminal window unstuck after broken pipe to SSH session

Posted by admin |

Type ~. (i.e. tilde, period) at the beginning of a line.

Read more: Link - What can I do when my SSH session is stuck? - Ask Different

Nov 10, 2015 10:27

The configuration language in the widgets of riemann-dash

Posted by admin |

Pattern matching is done with the "=~" operator, and "%" is the wildcard. "|" can be used to give alternative values. Haven't used pipe yet but I believe that alternative values are grouped inside a a pair of brackets: "(val1|val2). See monitoring-setup/dashboard.json at master · algernon/monitoring-setup

service =~ "process:%:cpu:percent"

"Rows" and "Columns" decides if the data will be presented row-wise or column-wise per host.

Nov 10, 2015 12:40

Monitoring with Riemann & riemann-dash

Posted by admin |

This blog post is a work in progress

Riemann is a monitoring system written in Clojure. It can aggregate streams of data, called events, and it can filter, combine and over limited time windows integrate data, and it can also send notifications over e-mail and SMS.

Riemann is scripted in clojure, which is a Lisp-like language that runs on the Java virtual machine (JVM). Riemann is used to monitor the health and status of services such as Linux machines, databases, web servers. It is used e.g. by The Guardian newspaper, see guardian/riemann-config.

Riemann is not primarily a logger, although inside of Riemann you can define a stream that can be sent to logging.


There is an ecosystem surrounding the Riemann server, consisting of data sources, clients and a rather nice-looking and easily configurable web GUI for monitoring called riemann-dash, although I currently think I have some problems with memory leaks while running it in Google Chrome.

Config language of Riemann-dash

Riemann-dash has a configuration language, formally documented here:

reimann/Query.g at master · jdmaturen/reimann

(That is an old version of Riemann but is has the language definition as of now)

In order to select what to show in riemann-dash, you put the field you want to match against. Like so:

service = "cpu"

...if you want to monitor the CPU from the data generated by riemann-health.

The output will then be of the metric field which seems to be the default and maybe the only choice. Not sure how to change what field to display the value of.

See: Four Hours with Riemann

Pattern matching is done with the "=~" operator, and "%" is the wildcard. "|" can be used to give alternative values. Haven't used pipe yet but I believe that alternative values are grouped inside a a pair of brackets: "(val1|val2). See monitoring-setup/dashboard.json at master · algernon/monitoring-setup

Here is a working example of how to get metrics from Supermann, a plugin to Supervisord that sends data to Riemann:

service =~ "process:%:cpu:percent"

"Rows" and "Columns" decides if the data will be presented row-wise or column-wise per host.

How to save settings from riemann-dash

You need to have a JSON file to save to. You can just add the path to a config file when starting riemann-dash.

riemann-dash ./settings/config.rb

But what should that config file contain? On Ubuntu 15.04 there is an example config.rb file in


and it would nice to just copy that, but that does not work as a whole. But the local config file is only part of the config. You can just include the changes from the default that you want.

Make a copy of config.rb and delete all lines except "[:ws_config]". Write something like this:[:ws_config] = "/path/writable/by/riemann-dash/config/dir/config.json"

Riemann-dash will save its settings to that JSON file. Which file should be in a directory writable by the user that runs riemann-dash of course. I'd advise to version that config file with e.g. git, since it is easy to overwrite config changes that you actually wanted to keep.




Nov 07, 2015 01:05