Tutorial Part 1: Checking connections for a basic web app

Hello World

Suppose you have the basic webapp HWaaS (Hello World as a Service, naturally).

It returns a different translation of “Hello World” on every request, and accepts new translations via POST requests.

  • The translations are stored in a PostgreSQL database.
  • memcached is used to keep a cache of pre-rendered “Hello World” HTML pages.
  • Optionally requests are sent to the Google Translate API to get an automatically translated version of the page in the user’s language if they push a certain button and a translation in their language isn’t available in the PostgreSQL DB.
  • The Squid HTTP proxy is sat between it and the Translate API to cache requests (varied by language), to avoid hitting Google’s rate limiting.

Why use conn-check?

Our HWaaS example service depends on not only 3 internal services, but also a completely external service (the Google Translate API), and any number of issues from network routing, firewall configuration and bad service configuration to external outages could cause issues after a new deployment (or at any time really, but we’ll address that later in Nagios).

conn-check can verify connections to these dependencies using not just basic TCP/UDP connects, but also service specific ones, with authentication where needed, timeouts, and even permissions (e.g. can user A access DB schema B).

Yet another YAML file

conn-check is configured using a YAML file containing a list of checks to perform in parallel (by default, but this too is configurable with a CLI option).

Here’s an example file (it could be called hwaas-cc.yaml):

- type: postgresql
  host: gibson.hwaas.internal
  port: 5432
  username: hwaas
  password: 123456asdf
  database: hwaas_production
- type: memcached
  host: freeside.hwaas.internal
  port: 11211
- type: http
  url: https://www.googleapis.com/language/translate/v2?q=Hello%20World&target=de&source=en&key=BLAH
  proxy_host: countzero.hwaas.internal
  proxy_port: 8080
  expected_code: 200

Let’s examine those checks..

PostgreSQL

- type: postgresql
  host: gibson.hwaas.internal
  port: 5432
  username: hwaas
  password: 123456asdf
  database: hwaas_production

type: This one doesn’t require much explanation, except the fact that you can use either postgresql` or postgres (many checks have aliases), see the readme..

host, port: The host to connect to is always, understandably, required, but if not supplied the default psql port of 5432 will be used.

username, password: Auth details are required and important when used with…

database: This is the psql schema to attempt to switch to use, and username has permission to access.

memcached

- type: memcached
  host: freeside.hwaas.internal
  port: 11211

type: memcache or memcached are valid, see the readme.

host, port: If port isn’t supplied the memcached default 11211 is used instead.

HTTP

- type: http
  url: https://www.googleapis.com/language/translate/v2?q=Hello%20World&target=de&source=en&key=BLAH
  proxy_host: countzero.hwaas.internal
  proxy_port: 8080
  expected_code: 200

type: http or https are valid, see the readme.

url: As we’re doing a simple GET to the Translate API I’ve included the key in the querystring, but you could also include auth defailts as HTTP headers using the headers check option.

proxy_host, proxy_port: We supply the host/port to our Squid proxy here, we could also use the proxy_url check option instead to define the proxy as a standard HTTP URL (makes it possible to define a HTTPS proxy).

expected_code: This is the status code we expect to get back from the service if the request was successful, anything other than 200 in this case will cause the check to fail.

Using conn-check with Nagios

conn-check output tries to stay as close as possible to the Nagios plugin guidelines so that it can be used as a regular Nagios check for more constant monitoring of your service deployment (not just ad-hoc at deploy time).

Example NRPE config files, assuming conn-check is system installed:

# /etc/nagios/nrpe.d/check_conn_check.cfg
command[conn_check]=/usr/bin/conn-check --max-timeout=10  --exclude-tags=no-nagios /var/conn-check/hwaas-cc.yaml


# /var/lib/nagios/export/service__hwaas_conn_check.cfg
define service {
    use                             active-service
    host_name                       hwaas-web1.internal
    service_description             connection checks with conn-check
    check_command                   check_nrpe!conn_check
    servicegroups                   web,hwaas
}

A few arguments to note:

--max-timeout=10: This sets the global timeout to 10 seconds, which means it will error if the total time for all checks combined goes above 10s, which is the default max time allowed by Nagios for a plugin to run.

This way we still get all the individual check results back even if one of them went above the threshold.

--exclude-tags=no-nagios: Although optional, this allows you to exclude any check tagged with no-nagios, which is especially handy for checks to external/third-party services that you don’t want to be hit constantly by Nagios.

For example if we didn’t want Nagios to hit Google every few minutes:

- type: http
  url: https://www.googleapis.com/language/translate/v2?q=Hello%20World&target=de&source=en&key=BLAH
  proxy_host: countzero.hwaas.internal
  proxy_port: 8080
  expected_code: 200
  tags: [no-nagios]