Tutorial Part 1: Checking connections for a basic web app ========================================================= Hello World ----------- Suppose you have the basic webapp *HWaaS* (Hello World as a Service, naturally). It returns a different translation of "Hello World" on every request, and accepts new translations via ``POST`` requests. * The translations are stored in a *PostgreSQL* database. * *memcached* is used to keep a cache of pre-rendered "Hello World" HTML pages. * Optionally requests are sent to the `Google Translate API `_ to get an automatically translated version of the page in the user's language if they push a certain button and a translation in their language isn't available in the *PostgreSQL* DB. * The *Squid* HTTP proxy is sat between it and the Translate API to cache requests (varied by language), to avoid hitting Google's rate limiting. Why use conn-check? ------------------- Our *HWaaS* example service depends on not only 3 internal services, but also a completely external service (the Google Translate API), and any number of issues from network routing, firewall configuration and bad service configuration to external outages could cause issues after a new deployment (or at any time really, but we'll address that later in :ref:`nagios`). *conn-check* can verify connections to these dependencies using not just basic TCP/UDP connects, but also service specific ones, with authentication where needed, timeouts, and even permissions (e.g. can *user A* access *DB schema B*). Yet another YAML file --------------------- conn-check is configured using a `YAML `_ file containing a list of checks to perform in parallel (by default, but this too is configurable with a CLI option). Here's an example file (it could be called ``hwaas-cc.yaml``): .. code-block:: yaml - type: postgresql host: gibson.hwaas.internal port: 5432 username: hwaas password: 123456asdf database: hwaas_production - type: memcached host: freeside.hwaas.internal port: 11211 - type: http url: https://www.googleapis.com/language/translate/v2?q=Hello%20World&target=de&source=en&key=BLAH proxy_host: countzero.hwaas.internal proxy_port: 8080 expected_code: 200 Let's examine those checks.. ---------------------------- PostgreSQL `````````` .. code-block:: yaml - type: postgresql host: gibson.hwaas.internal port: 5432 username: hwaas password: 123456asdf database: hwaas_production *type*: This one doesn't require much explanation, except the fact that you can use either `postgresql`` or ``postgres`` (many checks have aliases), :doc:`see the readme `.. *host*, *port*: The host to connect to is always, understandably, required, but if not supplied the default psql port of ``5432`` will be used. *username*, *password*: Auth details are required and important when used with… …*database*: This is the psql schema to attempt to switch to use, and *username* has permission to access. memcached ````````` .. code-block:: yaml - type: memcached host: freeside.hwaas.internal port: 11211 *type*: ``memcache`` or ``memcached`` are valid, :doc:`see the readme `. *host*, *port*: If port isn't supplied the memcached default ``11211`` is used instead. HTTP ```` .. code-block:: yaml - type: http url: https://www.googleapis.com/language/translate/v2?q=Hello%20World&target=de&source=en&key=BLAH proxy_host: countzero.hwaas.internal proxy_port: 8080 expected_code: 200 *type*: ``http`` or ``https`` are valid, :doc:`see the readme `. *url*: As we're doing a simple GET to the Translate API I've included the ``key`` in the querystring, but you could also include auth defailts as HTTP headers using the ``headers`` check option. *proxy_host*, *proxy_port*: We supply the host/port to our Squid proxy here, we could also use the ``proxy_url`` check option instead to define the proxy as a standard HTTP URL (makes it possible to define a HTTPS proxy). *expected_code*: This is the `status code `_ we expect to get back from the service if the request was successful, anything other than ``200`` in this case will cause the check to fail. .. _nagios: Using conn-check with Nagios ---------------------------- conn-check output tries to stay as close as possible to the `Nagios plugin guidelines `_ so that it can be used as a regular `Nagios `_ check for more constant monitoring of your service deployment (not just ad-hoc at deploy time). Example NRPE config files, assuming ``conn-check`` is system installed:: # /etc/nagios/nrpe.d/check_conn_check.cfg command[conn_check]=/usr/bin/conn-check --max-timeout=9 --exclude-tags=no-nagios /var/conn-check/hwaas-cc.yaml # /var/lib/nagios/export/service__hwaas_conn_check.cfg define service { use active-service host_name hwaas-web1.internal service_description connection checks with conn-check check_command check_nrpe!conn_check servicegroups web,hwaas } A few arguments to note: ``--max-timeout=10``: This sets the global timeout to 10 seconds, which means it will error if the total time for all checks combined goes above 9s, which will execute under the default max time allowed by Nagios for a plugin to run, 10s. This way we still get all the individual check results back even if one of them went above the threshold. ``--exclude-tags=no-nagios``: Although optional, this allows you to exclude any check tagged with ``no-nagios``, which is especially handy for checks to external/third-party services that you don't want to be hit constantly by Nagios. For example if we didn't want Nagios to hit Google every few minutes: .. code-block:: yaml - type: http url: https://www.googleapis.com/language/translate/v2?q=Hello%20World&target=de&source=en&key=BLAH proxy_host: countzero.hwaas.internal proxy_port: 8080 expected_code: 200 tags: [no-nagios]