blob: 405f7401bab44803263fda686cabd8b72f70ad01 [file] [log] [blame]
Node Task Registry Design
=========================
.. toctree::
:maxdepth: 2
:caption: Contents:
****************************
Node Tasks
****************************
#####################
Basic Task Design
#####################
Warble Nodes can have one or more (or all) tasks assigned to it. Each
task consists of a target to test, as well as what to test and how to go
about that, encapsulated in a payload object. Each check you wish to
perform requires an associated task, but may be performed by multiple
nodes. Thus, testing whether your main web site works on port 80
requires a task, as does a test for https on port 443, as they are
technically two distinct targets. Specific tasks may have optional tests
built into them,for instance a SSL certificate check on a https site.
#####################
Task status
#####################
A task can be either enabled, disabled, or muted. Disabling a task
prevents it from running on nodes, whereas muting a task will still
cause nodes to perform it, but alerting will be silenced. Muting can be
used for when you still need to monitor a situation, but you don't need
to be reminded whenever the test results changes.
#####################
Task sensitivity
#####################
A task can also have a specific sensitivity set. Sensitivity denotes
how failures are treated, and when to alert about state changes:
- **low**: Alerting only happens if all currently active nodes agree that
the test has failed, e.g. the service is down completely.
- **default**: Alerting happens if a majority of nodes agree that the test
has failed. This is the default behavior and balances out the need for
speedy alerting versus the need for fewer false positives.
- **high**: Alerting happens if more than one node sees failures. While
more sensitive than the default, it still removes a fair bit of false
positives by requiring confirmation of a reported failure by at least
one other node.
- **twitchy**: Alerting happens if any node registers a failure.
This may be useful for services that have guaranteed service level
agreements, but can lead to a lot of false positives.
It should be noted that if you run a setup of Warble with only one, or
very few nodes attached, the sensitivity levels may differ very little
in terms of when alerting happens, as the definition of quorum changes
based on how many active nodes you have at any given time.
*******************
Task Categories
*******************
Each task is assigned a task category, which helps you separate tasks
into easily recognizable groups and access definitions.
Each task category has a distinct alerting and escalation path, meaning
you can assign different teams to different categories, and have alerts
go to that team, independent of other task categories. This can be
useful for having front-end issues go to a specific team, while back-end
issues go to another team.
#####################
Task Category Access
#####################
Users can be assigned the following access levels to categories, on a
per-user basis:
1. Read-only access: The user can read and analyze test results, but
cannot edit or remove tasks, nor see the specific payload details
(thus, if you add a test with credentials, users with read-only
access cannot see the credentials)
2. Read/write access: The user can read, modify, and remove existing
tests. They can also add new tests to the category.
3. Admin access: The user can, besides permissions listed above, also
modify or remove the category altogether or change its alerting
options. This access level should generally be reserved for power
users only.
It should be noted that `super users` on the system (such as the account
you create at setup) can freely access and modify any aspect of the
tasks/categories.