15 Commits

Author SHA1 Message Date
Alan Campbell
0ea950f819 Add upstart example 2017-03-20 00:59:10 -04:00
Matej Kramny
0e93d140e8 Merge pull request #61 from mightyfree/patch-1
Update link to example.cachet-monitor.service under Init Script section
2017-03-03 10:36:10 -08:00
mightyfree
aacd04b2b8 Update link to example.cachet-monitor.service under Init Script section
Update link to example.cachet-monitor.service under Init Script Section. Previous relative link 404'd. Updated with absolute path to example.cachet-monitor.service (https://github.com/CastawayLabs/cachet-monitor/blob/master/example.cachet-monitor.service).
2017-03-03 13:29:52 -05:00
Matej Kramny
3a68b19633 Merge pull request #60 from matunixe/patch-2
Add init script setup
2017-03-02 11:05:58 -08:00
Mathias B
423c8d3a23 Add init script setup
Since PR #59 we need to update the documentation to explain clearly how tu use the file example.
2017-03-02 18:49:16 +01:00
Matej Kramny
f48b5feb11 Rename exemple.cachet-monitor.service to example.cachet-monitor.service 2017-03-01 11:35:48 -08:00
Matej Kramny
b7f7f934ec Merge pull request #59 from matunixe/patch-1
Add a file init exemple
2017-03-01 11:29:04 -08:00
Mathias B
927aca5ac0 Add a file init exemple
Here is a Systemd init file, tweak it to your needs!
2017-03-01 17:24:48 +01:00
Matej Kramny
18705d1faf update readme 2017-02-13 14:07:25 -08:00
Matej Kramny
dab2264c7a Comment out unused code 2017-02-12 20:05:04 -08:00
Matej Kramny
021871b763 Add contribution section & code of conduct 2017-02-12 19:55:58 -08:00
Matej Kramny
698781afec update readme 2017-02-12 19:50:03 -08:00
Matej Kramny
e6d8d31fa5 update examples 2017-02-12 17:43:20 -08:00
Matej Kramny
6a51993296 update readme, remove tcp/icmp 2017-02-12 17:35:09 -08:00
Matej Kramny
8aae002623 DNS check 2017-02-12 13:39:37 -08:00
12 changed files with 477 additions and 116 deletions

74
CODE_OF_CONDUCT.md Normal file
View File

@@ -0,0 +1,74 @@
# Contributor Covenant Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, gender identity and expression, level of experience,
nationality, personal appearance, race, religion, or sexual identity and
orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment
include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at management@castawaylabs.com. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at [http://contributor-covenant.org/version/1/4][version]
[homepage]: http://contributor-covenant.org
[version]: http://contributor-covenant.org/version/1/4/

View File

@@ -23,7 +23,6 @@ const usage = `cachet-monitor
Usage:
cachet-monitor (-c PATH | --config PATH) [--log=LOGPATH] [--name=NAME] [--immediate]
cachet-monitor -h | --help | --version
cachet-monitor print-config
Arguments:
PATH path to config.json
@@ -39,7 +38,6 @@ Options:
-h --help Show this screen.
--version Show version
--immediate Tick immediately (by default waits for first defined interval)
print-config Print example configuration
Environment varaibles:
CACHET_API override API url from configuration
@@ -179,14 +177,6 @@ func getConfiguration(path string) (*cachet.CachetMonitor, error) {
var s cachet.DNSMonitor
err = mapstructure.Decode(rawMonitor, &s)
t = &s
case "icmp":
var s cachet.ICMPMonitor
err = mapstructure.Decode(rawMonitor, &s)
t = &s
case "tcp":
var s cachet.TCPMonitor
err = mapstructure.Decode(rawMonitor, &s)
t = &s
default:
logrus.Errorf("Invalid monitor type (index: %d) %v", index, monType)
continue

116
dns.go
View File

@@ -1,5 +1,121 @@
package cachet
import (
"net"
"regexp"
"strings"
"github.com/Sirupsen/logrus"
"github.com/miekg/dns"
)
type DNSAnswer struct {
Regex string
regexp *regexp.Regexp
Exact string
}
type DNSMonitor struct {
AbstractMonitor `mapstructure:",squash"`
// IP:port format or blank to use system defined DNS
DNS string
// A(default), AAAA, MX, ...
Question string
question uint16
Answers []DNSAnswer
}
func (monitor *DNSMonitor) Validate() []string {
errs := monitor.AbstractMonitor.Validate()
if len(monitor.DNS) == 0 {
config, _ := dns.ClientConfigFromFile("/etc/resolv.conf")
if len(config.Servers) > 0 {
monitor.DNS = net.JoinHostPort(config.Servers[0], config.Port)
}
}
if len(monitor.DNS) == 0 {
monitor.DNS = "8.8.8.8:53"
}
if len(monitor.Question) == 0 {
monitor.Question = "A"
}
monitor.Question = strings.ToUpper(monitor.Question)
monitor.question = findDNSType(monitor.Question)
if monitor.question == 0 {
errs = append(errs, "Could not look up DNS question type")
}
for i, a := range monitor.Answers {
if len(a.Regex) > 0 {
monitor.Answers[i].regexp, _ = regexp.Compile(a.Regex)
}
}
return errs
}
func (monitor *DNSMonitor) test() bool {
m := new(dns.Msg)
m.SetQuestion(dns.Fqdn(monitor.Target), monitor.question)
m.RecursionDesired = true
c := new(dns.Client)
r, _, err := c.Exchange(m, monitor.DNS)
if err != nil {
logrus.Warnf("DNS error: %v", err)
return false
}
if r.Rcode != dns.RcodeSuccess {
return false
}
for _, check := range monitor.Answers {
found := false
for _, answer := range r.Answer {
found = matchAnswer(answer, check)
if found {
break
}
}
if !found {
logrus.Warnf("DNS check failed: %v. Not found in any of %v", check, r.Answer)
return false
}
}
return true
}
func findDNSType(t string) uint16 {
for rr, strType := range dns.TypeToString {
if t == strType {
return rr
}
}
return 0
}
func matchAnswer(answer dns.RR, check DNSAnswer) bool {
fields := []string{}
for i := 0; i < dns.NumField(answer); i++ {
fields = append(fields, dns.Field(answer, i+1))
}
str := strings.Join(fields, " ")
if check.regexp != nil {
return check.regexp.Match([]byte(str))
}
return str == check.Exact
}

View File

@@ -0,0 +1,20 @@
[Unit]
Description=Cachet Monitor
After=syslog.target
After=network.target
#After=mysqld.service
#After=postgresql.service
#After=memcached.service
#After=redis.service
[Service]
Type=simple
User=root
Group=root
WorkingDirectory=/root
ExecStart=/root/cachet-monitor -c /etc/cachet-monitor.yaml
Restart=always
Environment=USER=root HOME=/root
[Install]
WantedBy=multi-user.target

View File

@@ -2,21 +2,58 @@
"api": {
"url": "https://demo.cachethq.io/api/v1",
"token": "9yMHsdioQosnyVK4iCVR",
"insecure": true
"insecure": false
},
"date_format": "02/01/2006 15:04:05 MST",
"monitors": [
{
"name": "google",
"url": "https://google.com",
"threshold": 80,
"target": "https://google.com",
"strict": true,
"method": "POST",
"component_id": 1,
"interval": 10,
"timeout": 5,
"metric_id": 4,
"template": {
"investigating": {
"subject": "{{ .Monitor.Name }} - {{ .SystemName }}",
"message": "{{ .Monitor.Name }} check **failed** (server time: {{ .now }})\n\n{{ .FailReason }}"
},
"fixed": {
"subject": "I HAVE BEEN FIXED"
}
},
"interval": 1,
"timeout": 1,
"threshold": 80,
"headers": {
"Authorization": "Basic <hash>"
},
"expected_status_code": 200,
"strict_tls": true
"expected_body": "P.*NG"
},
{
"name": "dns",
"target": "matej.me.",
"question": "mx",
"type": "dns",
"component_id": 2,
"interval": 1,
"timeout": 1,
"dns": "8.8.4.4:53",
"answers": [
{
"regex": "[1-9] alt[1-9].aspmx.l.google.com."
},
{
"exact": "10 aspmx2.googlemail.com."
},
{
"exact": "1 aspmx.l.google.com."
},
{
"exact": "10 aspmx3.googlemail.com."
}
]
}
]
}

View File

@@ -1,14 +1,65 @@
api:
# cachet url
url: https://demo.cachethq.io/api/v1
# cachet api token
token: 9yMHsdioQosnyVK4iCVR
insecure: false
# https://golang.org/src/time/format.go#L57
date_format: 02/01/2006 15:04:05 MST
monitors:
# http monitor example
- name: google
# test url
target: https://google.com
threshold: 80
# strict certificate checking for https
strict: true
# HTTP method
method: POST
# set to update component (either component_id or metric_id are required)
component_id: 1
interval: 10
timeout: 5
# set to post lag to cachet metric (graph)
metric_id: 4
# custom templates (see readme for details)
template:
investigating:
subject: "{{ .Monitor.Name }} - {{ .SystemName }}"
message: "{{ .Monitor.Name }} check **failed** (server time: {{ .now }})\n\n{{ .FailReason }}"
fixed:
subject: "I HAVE BEEN FIXED"
# seconds between checks
interval: 1
# seconds for timeout
timeout: 1
# If % of downtime is over this threshold, open an incident
threshold: 80
# custom HTTP headers
headers:
Authorization: Basic <hash>
# expected status code (either status code or body must be supplied)
expected_status_code: 200
strict: true
# regex to match body
expected_body: "P.*NG"
# dns monitor example
- name: dns
# fqdn
target: matej.me.
# question type (A/AAAA/CNAME/...)
question: mx
type: dns
# set component_id/metric_id
component_id: 2
# poll every 1s
interval: 1
timeout: 1
# custom DNS server (defaults to system)
dns: 8.8.4.4:53
answers:
# exact/regex check
- regex: [1-9] alt[1-9].aspmx.l.google.com.
- exact: 10 aspmx2.googlemail.com.
- exact: 1 aspmx.l.google.com.
- exact: 10 aspmx3.googlemail.com.

14
example.upstart.conf Normal file
View File

@@ -0,0 +1,14 @@
description "Cachet Monitor"
start on startup
env USER=root
env HOME=/root
setuid root
setgid root
chdir /root
script
exec cachet-monitor -c /cachet-monitor.json --immediate
end script

View File

@@ -1,5 +0,0 @@
package cachet
type ICMPMonitor struct {
AbstractMonitor `mapstructure:",squash"`
}

View File

@@ -28,7 +28,7 @@ type AbstractMonitor struct {
Name string
Target string
// (default)http, tcp, dns, icmp
// (default)http / dns
Type string
Strict bool
@@ -50,10 +50,10 @@ type AbstractMonitor struct {
// lag / average(lagHistory) * 100 = percentage above average lag
// PerformanceThreshold sets the % limit above which this monitor will trigger degraded-performance
PerformanceThreshold float32
// PerformanceThreshold float32
history []bool
lagHistory []float32
history []bool
// lagHistory []float32
lastFailReason string
incident *Incident
config *CachetMonitor
@@ -141,7 +141,6 @@ func (mon *AbstractMonitor) ClockStop() {
func (mon *AbstractMonitor) test() bool { return false }
// TODO: test
func (mon *AbstractMonitor) tick(iface MonitorInterface) {
reqStart := getMs()
up := iface.test()
@@ -153,7 +152,7 @@ func (mon *AbstractMonitor) tick(iface MonitorInterface) {
}
if len(mon.history) == histSize-1 {
logrus.Warnf("%v is now saturated\n", mon.Name)
logrus.Warnf("%v is now saturated", mon.Name)
}
if len(mon.history) >= histSize {
mon.history = mon.history[len(mon.history)-(histSize-1):]

7
monitor_test.go Normal file
View File

@@ -0,0 +1,7 @@
package cachet
import (
"testing"
)
func TestAnalyseData(t *testing.T) {}

213
readme.md
View File

@@ -1,103 +1,176 @@
![screenshot](https://castawaylabs.github.io/cachet-monitor/screenshot.png)
Features
--------
## Features
- [x] Creates & Resolves Incidents
- [x] Check URLs by response code and/or body contents
- [x] Posts monitor lag to cachet graphs
- [x] HTTP Checks (body/status code)
- [x] DNS Checks
- [x] Updates Component to Partial Outage
- [x] Updates Component to Major Outage if already in Partial Outage (works well with distributed monitoring)
- [x] Updates Component to Major Outage if already in Partial Outage (works with distributed monitors)
- [x] Can be run on multiple servers and geo regions
Configuration
-------------
## Example Configuration
```
{
// URL for the API. Note: Must end with /api/v1
"api_url": "https://<cachet domain>/api/v1",
// Your API token for Cachet
"api_token": "<cachet api token>",
// optional, false default, set if your certificate is self-signed/untrusted
"insecure_api": false,
"monitors": [{
// required, friendly name for your monitor
"name": "Name of your monitor",
// required, url to probe
"url": "Ping URL",
// optional, http method (defaults GET)
"method": "get",
// optional, http Headers to add (default none)
"headers": [
// specify Name and Value of Http-Header, eg. Authorization
{ "name": "Authorization", "value": "Basic <hash>" }
],
// self-signed ssl certificate
"strict_tls": true,
// seconds between checks
"interval": 10,
// seconds for http timeout
"timeout": 5,
// post lag to cachet metric (graph)
// note either metric ID or component ID are required
"metric_id": <metric id>,
// post incidents to this component
"component_id": <component id>,
// If % of downtime is over this threshold, open an incident
"threshold": 80,
// optional, expected status code (either status code or body must be supplied)
"expected_status_code": 200,
// optional, regular expression to match body content
"expected_body": "P.*NG"
}],
// optional, system name to identify bot (uses hostname by default)
"system_name": "",
// optional, defaults to stdout
"log_path": ""
}
**Note:** configuration can be in json or yaml format. [`example.config.json`](https://github.com/CastawayLabs/cachet-monitor/blob/master/example.config.json), [`example.config.yaml`](https://github.com/CastawayLabs/cachet-monitor/blob/master/example.config.yml) files.
```yaml
api:
# cachet url
url: https://demo.cachethq.io/api/v1
# cachet api token
token: 9yMHsdioQosnyVK4iCVR
insecure: false
# https://golang.org/src/time/format.go#L57
date_format: 02/01/2006 15:04:05 MST
monitors:
# http monitor example
- name: google
# test url
target: https://google.com
# strict certificate checking for https
strict: true
# HTTP method
method: POST
# set to update component (either component_id or metric_id are required)
component_id: 1
# set to post lag to cachet metric (graph)
metric_id: 4
# custom templates (see readme for details)
# leave empty for defaults
template:
investigating:
subject: "{{ .Monitor.Name }} - {{ .SystemName }}"
message: "{{ .Monitor.Name }} check **failed** (server time: {{ .now }})\n\n{{ .FailReason }}"
fixed:
subject: "I HAVE BEEN FIXED"
# seconds between checks
interval: 1
# seconds for timeout
timeout: 1
# If % of downtime is over this threshold, open an incident
threshold: 80
# custom HTTP headers
headers:
Authorization: Basic <hash>
# expected status code (either status code or body must be supplied)
expected_status_code: 200
# regex to match body
expected_body: "P.*NG"
# dns monitor example
- name: dns
# fqdn
target: matej.me.
# question type (A/AAAA/CNAME/...)
question: mx
type: dns
# set component_id/metric_id
component_id: 2
# poll every 1s
interval: 1
timeout: 1
# custom DNS server (defaults to system)
dns: 8.8.4.4:53
answers:
# exact/regex check
- regex: [1-9] alt[1-9].aspmx.l.google.com.
- exact: 10 aspmx2.googlemail.com.
- exact: 1 aspmx.l.google.com.
- exact: 10 aspmx3.googlemail.com.
```
Installation
------------
## Installation
1. Download binary from [release page](https://github.com/CastawayLabs/cachet-monitor/releases)
2. Create your configuration ([example](https://raw.githubusercontent.com/CastawayLabs/cachet-monitor/master/example.config.json))
3. `cachet-monitor -c /etc/cachet-monitor.config.json`
2. Create a configuration
3. `cachet-monitor -c /etc/cachet-monitor.yaml`
pro tip: run in background using `nohup cachet-monitor 2>&1 > /var/log/cachet-monitor.log &`
```
Usage of cachet-monitor:
-c="/etc/cachet-monitor.config.json": Config path
-log="": Log path
-name="": System Name
Usage:
cachet-monitor (-c PATH | --config PATH) [--log=LOGPATH] [--name=NAME] [--immediate]
cachet-monitor -h | --help | --version
Arguments:
PATH path to config.json
LOGPATH path to log output (defaults to STDOUT)
NAME name of this logger
Examples:
cachet-monitor -c /root/cachet-monitor.json
cachet-monitor -c /root/cachet-monitor.json --log=/var/log/cachet-monitor.log --name="development machine"
Options:
-c PATH.json --config PATH Path to configuration file
-h --help Show this screen.
--version Show version
--immediate Tick immediately (by default waits for first defined interval)
Environment varaibles:
CACHET_API override API url from configuration
CACHET_TOKEN override API token from configuration
CACHET_DEV set to enable dev logging
```
Environment variables
---------------------
## Init script
| Name | Example Value | Description |
| ------------ | ------------------------------ | --------------------------- |
| CACHET_API | http://demo.cachethq.io/api/v1 | URL endpoint for cachet api |
| CACHET_TOKEN | APIToken123 | API Authentication token |
| CACHET_DEV | 1 | Strips logging |
If your system is running systemd (like Debian, Ubuntu 16.04, Fedora or Archlinux) you can use the provided example file: [example.cachet-monitor.service](https://github.com/CastawayLabs/cachet-monitor/blob/master/example.cachet-monitor.service).
Vision and goals
----------------
1. Simply put it in the right place with `cp example.cachet-monitor.service /etc/systemd/system/cachet-monitor.service`
2. Then do a `systemctl daemon-reload` in your terminal to update Systemd configuration
3. Finally you can start cachet-monitor on every startup with `systemctl enable cachet-monitor.service`! 👍
## Templates
This package makes use of [`text/template`](https://godoc.org/text/template). [Default HTTP template](https://github.com/CastawayLabs/cachet-monitor/blob/master/http.go#L14)
The following variables are available:
| Root objects |
| ------------- | -----------------
| `.SystemName` | system name
| `.API` | `api` object from configuration
| `.Monitor` | `monitor` object from configuration
| `.now` | formatted date string
| Monitor variables |
| ------------------ |
| `.Name` |
| `.Target` |
| `.Type` |
| `.Strict` |
| `.MetricID` |
| ... |
All monitor variables are available from `monitor.go`
## Vision and goals
We made this tool because we felt the need to have our own monitoring software (leveraging on Cachet).
The idea is a stateless program which collects data and pushes it to a central cachet instance.
This gives us power to have an army of geographically distributed loggers and reveal issues in both latency & downtime on client websites.
Package usage
-------------
## Package usage
When using `cachet-monitor` as a package in another program, you should follow what `cli/main.go` does. It is important to call `ValidateConfiguration` on `CachetMonitor` and all the monitors inside.
When using `cachet-monitor` as a package in another program, you should follow what `cli/main.go` does. It is important to call `Validate` on `CachetMonitor` and all the monitors inside.
[API Documentation](https://godoc.org/github.com/CastawayLabs/cachet-monitor)
# Contributions welcome
We'll happily accept contributions for the following (non exhaustive list).
- Implement ICMP check
- Implement TCP check
- Any bug fixes / code improvements
- Test cases
## License
MIT License
@@ -120,4 +193,4 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
SOFTWARE.

15
tcp.go
View File

@@ -1,15 +0,0 @@
package cachet
type TCPMonitor struct {
AbstractMonitor `mapstructure:",squash"`
// same as output from net.JoinHostPort
// defaults to parsed config from /etc/resolv.conf when empty
DNSServer string
// Will be converted to FQDN
Domain string
Type string
// expected answers (regex)
Expect []string
}