30 Commits

Author SHA1 Message Date
Matej Kramny
5f664959db fix cachet api returning bad response code 2019-07-27 23:57:06 +07:00
Matej Kramny
49c009eb30 remove unused imports 2019-06-02 13:34:59 +08:00
Matej Kramny
80d424722b fix panics 2019-06-02 13:34:38 +08:00
Matej Kramny
162d55b3f3 huge refactor
- extendable backends
- better project structure
- better cli interface
2019-02-20 11:14:45 +08:00
Matej Kramny
df31238a1f Merge pull request #66 from CastawayLabs/feature/upstart-example
Add upstart example
2019-01-08 10:05:23 +08:00
Matej Kramny
a6b879bcee Merge pull request #99 from osallou/fix_set_version
add version information
2019-01-08 10:05:05 +08:00
Matej Kramny
c4f7544640 Merge pull request #92 from srabouin/master
Close HTTP requests
2019-01-08 10:04:19 +08:00
Matej Kramny
ae3a18591f Merge pull request #85 from tolbxela/patch-1
The table in the Templates section was corrected
2019-01-08 10:03:45 +08:00
Matej Kramny
7cbc234c42 Merge pull request #84 from sgreene570/master
Add license and cleanup readme
2019-01-08 10:03:33 +08:00
Matej Kramny
0b63b0e63d Merge pull request #81 from axnsan12/master
fix component status parsing
2019-01-08 10:03:05 +08:00
Olivier Sallou
5d34e1cf38 add version information
Sets at build time the version information with -ldflags "-X
main.version=3.0.0" for example, the --version will display software
version
2018-10-02 10:40:44 +02:00
Steve Rabouin
581b1465e6 Close HTTP requests
Resolves #75
2018-05-04 22:01:30 -04:00
Tolbxela Bot
8bf7a9921e The table in the Templates section was corrected 2017-10-24 11:43:12 +02:00
Stephen Greene
f17ee284a9 Add license and cleanup readme 2017-10-12 21:02:07 -04:00
Cristi Vîjdea
c2c9898d68 fix component status parsing
The Cachet API returns component status as a string,
but cachet-monitor attempts to parse it as int,
resulting in "cannot unmarshal string into Go value of type int".
This applies at least as early as cachet 2.3.12 - 29/06/2017.
2017-10-05 21:02:13 +00:00
Alan Campbell
0ea950f819 Add upstart example 2017-03-20 00:59:10 -04:00
Matej Kramny
0e93d140e8 Merge pull request #61 from mightyfree/patch-1
Update link to example.cachet-monitor.service under Init Script section
2017-03-03 10:36:10 -08:00
mightyfree
aacd04b2b8 Update link to example.cachet-monitor.service under Init Script section
Update link to example.cachet-monitor.service under Init Script Section. Previous relative link 404'd. Updated with absolute path to example.cachet-monitor.service (https://github.com/CastawayLabs/cachet-monitor/blob/master/example.cachet-monitor.service).
2017-03-03 13:29:52 -05:00
Matej Kramny
3a68b19633 Merge pull request #60 from matunixe/patch-2
Add init script setup
2017-03-02 11:05:58 -08:00
Mathias B
423c8d3a23 Add init script setup
Since PR #59 we need to update the documentation to explain clearly how tu use the file example.
2017-03-02 18:49:16 +01:00
Matej Kramny
f48b5feb11 Rename exemple.cachet-monitor.service to example.cachet-monitor.service 2017-03-01 11:35:48 -08:00
Matej Kramny
b7f7f934ec Merge pull request #59 from matunixe/patch-1
Add a file init exemple
2017-03-01 11:29:04 -08:00
Mathias B
927aca5ac0 Add a file init exemple
Here is a Systemd init file, tweak it to your needs!
2017-03-01 17:24:48 +01:00
Matej Kramny
18705d1faf update readme 2017-02-13 14:07:25 -08:00
Matej Kramny
dab2264c7a Comment out unused code 2017-02-12 20:05:04 -08:00
Matej Kramny
021871b763 Add contribution section & code of conduct 2017-02-12 19:55:58 -08:00
Matej Kramny
698781afec update readme 2017-02-12 19:50:03 -08:00
Matej Kramny
e6d8d31fa5 update examples 2017-02-12 17:43:20 -08:00
Matej Kramny
6a51993296 update readme, remove tcp/icmp 2017-02-12 17:35:09 -08:00
Matej Kramny
8aae002623 DNS check 2017-02-12 13:39:37 -08:00
28 changed files with 1518 additions and 823 deletions

2
.gitignore vendored
View File

@@ -1,3 +1,5 @@
/config.yml /config.yml
/config.json /config.json
examples/ examples/
vendor/
cachet-monitor

74
CODE_OF_CONDUCT.md Normal file
View File

@@ -0,0 +1,74 @@
# Contributor Covenant Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, gender identity and expression, level of experience,
nationality, personal appearance, race, religion, or sexual identity and
orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment
include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at management@castawaylabs.com. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at [http://contributor-covenant.org/version/1/4][version]
[homepage]: http://contributor-covenant.org
[version]: http://contributor-covenant.org/version/1/4/

21
LICENSE.txt Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2017 Castaway Labs LLC
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

79
api.go
View File

@@ -1,79 +0,0 @@
package cachet
import (
"bytes"
"crypto/tls"
"encoding/json"
"errors"
"net/http"
"strconv"
"time"
"github.com/Sirupsen/logrus"
)
type CachetAPI struct {
URL string `json:"url"`
Token string `json:"token"`
Insecure bool `json:"insecure"`
}
type CachetResponse struct {
Data json.RawMessage `json:"data"`
}
// TODO: test
func (api CachetAPI) Ping() error {
resp, _, err := api.NewRequest("GET", "/ping", nil)
if err != nil {
return err
}
if resp.StatusCode != 200 {
return errors.New("API Responded with non-200 status code")
}
return nil
}
// SendMetric adds a data point to a cachet monitor
func (api CachetAPI) SendMetric(id int, lag int64) {
logrus.Debugf("Sending lag metric ID:%d RTT %vms", id, lag)
jsonBytes, _ := json.Marshal(map[string]interface{}{
"value": lag,
"timestamp": time.Now().Unix(),
})
resp, _, err := api.NewRequest("POST", "/metrics/"+strconv.Itoa(id)+"/points", jsonBytes)
if err != nil || resp.StatusCode != 200 {
logrus.Warnf("Could not log metric! ID: %d, err: %v", id, err)
}
}
// TODO: test
// NewRequest wraps http.NewRequest
func (api CachetAPI) NewRequest(requestType, url string, reqBody []byte) (*http.Response, CachetResponse, error) {
req, err := http.NewRequest(requestType, api.URL+url, bytes.NewBuffer(reqBody))
req.Header.Set("Content-Type", "application/json")
req.Header.Set("X-Cachet-Token", api.Token)
transport := http.DefaultTransport.(*http.Transport)
transport.TLSClientConfig = &tls.Config{InsecureSkipVerify: api.Insecure}
client := &http.Client{
Transport: transport,
}
res, err := client.Do(req)
if err != nil {
return nil, CachetResponse{}, err
}
var body struct {
Data json.RawMessage `json:"data"`
}
err = json.NewDecoder(res.Body).Decode(&body)
return res, body, err
}

304
backends/cachet/backend.go Normal file
View File

@@ -0,0 +1,304 @@
package cachetbackend
import (
"bytes"
"crypto/tls"
"encoding/json"
"errors"
"fmt"
"net/http"
"strconv"
"strings"
"time"
"github.com/castawaylabs/cachet-monitor/monitors"
"github.com/sirupsen/logrus"
)
const DefaultTimeFormat = "15:04:05 Jan 2 MST"
type CachetBackend struct {
URL string `json:"url" yaml:"url"`
Token string `json:"token" yaml:"token"`
Insecure bool `json:"insecure" yaml:"insecure"`
DateFormat string `json:"date_format" yaml:"date_format"`
}
type CachetResponse struct {
Data json.RawMessage `json:"data"`
}
func (api CachetBackend) ValidateMonitor(mon *monitors.AbstractMonitor) []string {
errs := []string{}
params := mon.Params
componentID, componentIDOk := params["component_id"]
metricID, metricIDOk := params["metric_id"]
if !componentIDOk && !metricIDOk {
errs = append(errs, "component_id and metric_id is unset")
}
if _, ok := componentID.(int); !ok && componentIDOk {
errs = append(errs, "component_id not integer")
}
if _, ok := metricID.(int); !ok && metricIDOk {
errs = append(errs, "metric_id not integer")
}
return errs
}
func (api CachetBackend) Validate() []string {
errs := []string{}
if len(api.URL) == 0 {
errs = append(errs, "Cachet API URL invalid")
}
if len(api.Token) == 0 {
errs = append(errs, "Cachet API Token invalid")
}
if len(api.DateFormat) == 0 {
api.DateFormat = DefaultTimeFormat
}
return errs
}
// TODO: test
func (api CachetBackend) Ping() error {
resp, _, err := api.NewRequest("GET", "/ping", nil)
if err != nil {
return err
}
if resp.StatusCode != 200 {
return errors.New("API Responded with non-200 status code")
}
defer resp.Body.Close()
return nil
}
// TODO: test
// NewRequest wraps http.NewRequest
func (api CachetBackend) NewRequest(requestType, url string, reqBody []byte) (*http.Response, interface{}, error) {
req, err := http.NewRequest(requestType, api.URL+url, bytes.NewBuffer(reqBody))
req.Header.Set("Content-Type", "application/json")
req.Header.Set("X-Cachet-Token", api.Token)
transport := http.DefaultTransport.(*http.Transport)
transport.TLSClientConfig = &tls.Config{InsecureSkipVerify: api.Insecure}
client := &http.Client{
Transport: transport,
}
res, err := client.Do(req)
if err != nil {
return nil, CachetResponse{}, err
}
defer res.Body.Close()
defer req.Body.Close()
var body CachetResponse
err = json.NewDecoder(res.Body).Decode(&body)
return res, body, err
}
func (mon CachetBackend) Describe() []string {
features := []string{"Cachet API"}
return features
}
func (api CachetBackend) SendMetric(monitor monitors.MonitorInterface, lag int64) error {
mon := monitor.GetMonitor()
if _, ok := mon.Params["metric_id"]; !ok {
return nil
}
metricID := mon.Params["metric_id"].(int)
// report lag
logrus.Debugf("Sending lag metric ID: %d RTT %vms", metricID, lag)
jsonBytes, _ := json.Marshal(map[string]interface{}{
"value": lag,
"timestamp": time.Now().Unix(),
})
resp, _, err := api.NewRequest("POST", "/metrics/"+strconv.Itoa(metricID)+"/points", jsonBytes)
if err != nil || resp.StatusCode != 200 {
logrus.Warnf("Could not log metric! ID: %d, err: %v", metricID, err)
}
if resp != nil && resp.Body != nil {
defer resp.Body.Close()
}
return nil
}
func (api CachetBackend) UpdateMonitor(mon monitors.MonitorInterface, status, previousStatus monitors.MonitorStatus, errs []error) error {
monitor := mon.GetMonitor()
l := logrus.WithFields(logrus.Fields{
"monitor": monitor.Name,
"time": time.Now().Format(api.DateFormat),
})
errors := make([]string, len(errs))
for i, err := range errs {
errors[i] = err.Error()
}
fmt.Println("errs", errs)
componentID := monitor.Params["component_id"].(int)
incident, err := api.findIncident(componentID)
if err != nil {
l.Errorf("Couldn't find existing incidents: %v", err)
}
if incident == nil {
// create a new one
incident = &Incident{
Name: "",
ComponentID: componentID,
Message: "",
Notify: true,
}
} else {
// find component status
component, err := api.getComponent(incident.ComponentID)
if err != nil {
panic(err)
}
incident.ComponentStatus = component.Status
}
tpls := monitor.Template
tplData := api.getTemplateData(monitor)
var tpl monitors.MessageTemplate
if status == monitors.MonitorStatusDown {
tpl = tpls.Investigating
tplData["FailReason"] = strings.Join(errors, "\n - ")
l.Warnf("updating component. Monitor is down: %v", tplData["FailReason"])
} else {
// was down, created an incident, its now ok, make it resolved.
tpl = tpls.Fixed
l.Warn("Resolving incident")
}
tplData["incident"] = incident
subject, message := tpl.Exec(tplData)
if incident.ID == 0 {
incident.Name = subject
incident.Message = message
} else {
incident.Message += "\n\n---\n\n" + subject + ":\n\n" + message
}
if status == monitors.MonitorStatusDown && (incident.ComponentStatus == 0 || incident.ComponentStatus > 2) {
incident.Status = 1
fmt.Println("incident status", incident.ComponentStatus)
if incident.ComponentStatus >= 3 {
// major outage
incident.ComponentStatus = 4
} else {
incident.ComponentStatus = 3
}
} else if status == monitors.MonitorStatusUp {
incident.Status = 4
incident.ComponentStatus = 1
}
incident.Notify = true
// create/update incident
if err := incident.Send(api); err != nil {
l.Errorf("Error sending incident: %v", err)
return err
}
return nil
}
func (api CachetBackend) Tick(monitor monitors.MonitorInterface, status monitors.MonitorStatus, errs []error, lag int64) {
mon := monitor.GetMonitor()
if mon.GetLastStatus() == status || status == monitors.MonitorStatusNotSaturated {
return
}
logrus.Infof("updating backend for monitor")
lastStatus := mon.UpdateLastStatus(status)
api.UpdateMonitor(monitor, status, lastStatus, errs)
if _, ok := mon.Params["metric_id"]; ok && lag > 0 {
api.SendMetric(monitor, lag)
}
}
func (api CachetBackend) getComponent(componentID int) (*Component, error) {
resp, body, err := api.NewRequest("GET", "/components/"+strconv.Itoa(componentID), nil)
if err != nil {
return nil, err
}
var data *Component
if err := json.Unmarshal(body.(CachetResponse).Data, &data); err != nil {
return nil, fmt.Errorf("Cannot decode component: %v", err)
}
if resp.StatusCode != 200 {
return nil, fmt.Errorf("Could not get component! %v", err)
}
return data, nil
}
func (api CachetBackend) findIncident(componentID int) (*Incident, error) {
// fetch watching, identified & investigating
statuses := []int{3, 2, 1}
for _, status := range statuses {
incidents, err := api.findIncidents(componentID, status)
if err != nil {
return nil, err
}
for _, incident := range incidents {
incident.Status = status
return incident, nil
}
}
return nil, nil
}
func (api CachetBackend) findIncidents(componentID int, status int) ([]*Incident, error) {
resp, body, err := api.NewRequest("GET", "/incidents?component_Id="+strconv.Itoa(componentID)+"&status="+strconv.Itoa(status), nil)
if err != nil {
return nil, err
}
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("GET /incidents returned %d", resp.StatusCode)
}
var data []*Incident
if err := json.Unmarshal(body.(CachetResponse).Data, &data); err != nil {
return nil, fmt.Errorf("Cannot find incidents: %v", err)
}
if resp.StatusCode != 200 {
return nil, fmt.Errorf("Could not fetch incidents! %v", err)
}
return data, nil
}

View File

@@ -0,0 +1,14 @@
package cachetbackend
// Incident Cachet data model
type Component struct {
ID int `json:"id"`
Name string `json:"name"`
Message string `json:"message"`
Status int `json:"status"`
Visible int `json:"visible"`
Notify bool `json:"notify"`
ComponentID int `json:"component_id"`
ComponentStatus int `json:"component_status"`
}

View File

@@ -0,0 +1,67 @@
package cachetbackend
import (
"encoding/json"
"fmt"
"strconv"
"time"
"github.com/castawaylabs/cachet-monitor/backends"
"github.com/castawaylabs/cachet-monitor/monitors"
)
// "github.com/sirupsen/logrus"
// Incident Cachet data model
type Incident struct {
ID int `json:"id"`
Name string `json:"name"`
Message string `json:"message"`
Status int `json:"status"`
Visible int `json:"visible"`
Notify bool `json:"notify"`
ComponentID int `json:"component_id"`
ComponentStatus int `json:"component_status"`
}
// Send - Create or Update incident
func (incident *Incident) Send(backend backends.BackendInterface) error {
requestURL := "/incidents"
requestMethod := "POST"
jsonBytes, _ := json.Marshal(incident)
if incident.ID > 0 {
// create an incident update
requestMethod = "PUT"
requestURL += "/" + strconv.Itoa(incident.ID)
}
resp, body, err := backend.NewRequest(requestMethod, requestURL, jsonBytes)
if err != nil {
return err
}
var data struct {
ID int `json:"id"`
}
if err := json.Unmarshal(body.(CachetResponse).Data, &data); err != nil {
return fmt.Errorf("Cannot parse incident body: %v, %v", err, string(body.(CachetResponse).Data))
}
incident.ID = data.ID
if resp.StatusCode != 200 {
return fmt.Errorf("Could not update/create incident!")
}
return nil
}
func (api *CachetBackend) getTemplateData(monitor *monitors.AbstractMonitor) map[string]interface{} {
return map[string]interface{}{
// "SystemName": monitor.config.SystemName,
"Monitor": monitor,
"now": time.Now().Format(api.DateFormat),
// "incident": monitor.incident,
}
}

19
backends/interface.go Normal file
View File

@@ -0,0 +1,19 @@
package backends
import (
"net/http"
"github.com/castawaylabs/cachet-monitor/monitors"
)
type BackendInterface interface {
Ping() error
Tick(monitor monitors.MonitorInterface, status monitors.MonitorStatus, errs []error, lag int64)
SendMetric(monitor monitors.MonitorInterface, lag int64) error
UpdateMonitor(monitor monitors.MonitorInterface, status, previousStatus monitors.MonitorStatus, errs []error) error
NewRequest(requestType, url string, reqBody []byte) (*http.Response, interface{}, error)
Describe() []string
Validate() []string
ValidateMonitor(monitor *monitors.AbstractMonitor) []string
}

View File

@@ -1,206 +0,0 @@
package main
import (
"encoding/json"
"errors"
"io/ioutil"
"net/http"
"net/url"
"os"
"os/signal"
"strings"
"sync"
"github.com/Sirupsen/logrus"
cachet "github.com/castawaylabs/cachet-monitor"
docopt "github.com/docopt/docopt-go"
"github.com/mitchellh/mapstructure"
"gopkg.in/yaml.v2"
)
const usage = `cachet-monitor
Usage:
cachet-monitor (-c PATH | --config PATH) [--log=LOGPATH] [--name=NAME] [--immediate]
cachet-monitor -h | --help | --version
cachet-monitor print-config
Arguments:
PATH path to config.json
LOGPATH path to log output (defaults to STDOUT)
NAME name of this logger
Examples:
cachet-monitor -c /root/cachet-monitor.json
cachet-monitor -c /root/cachet-monitor.json --log=/var/log/cachet-monitor.log --name="development machine"
Options:
-c PATH.json --config PATH Path to configuration file
-h --help Show this screen.
--version Show version
--immediate Tick immediately (by default waits for first defined interval)
print-config Print example configuration
Environment varaibles:
CACHET_API override API url from configuration
CACHET_TOKEN override API token from configuration
CACHET_DEV set to enable dev logging`
func main() {
arguments, _ := docopt.Parse(usage, nil, true, "cachet-monitor", false)
cfg, err := getConfiguration(arguments["--config"].(string))
if err != nil {
logrus.Panicf("Unable to start (reading config): %v", err)
}
if immediate, ok := arguments["--immediate"]; ok {
cfg.Immediate = immediate.(bool)
}
if name := arguments["--name"]; name != nil {
cfg.SystemName = name.(string)
}
logrus.SetOutput(getLogger(arguments["--log"]))
if len(os.Getenv("CACHET_API")) > 0 {
cfg.API.URL = os.Getenv("CACHET_API")
}
if len(os.Getenv("CACHET_TOKEN")) > 0 {
cfg.API.Token = os.Getenv("CACHET_TOKEN")
}
if len(os.Getenv("CACHET_DEV")) > 0 {
logrus.SetLevel(logrus.DebugLevel)
}
if valid := cfg.Validate(); !valid {
logrus.Errorf("Invalid configuration")
os.Exit(1)
}
logrus.Debug("Configuration valid")
logrus.Infof("System: %s", cfg.SystemName)
logrus.Infof("API: %s", cfg.API.URL)
logrus.Infof("Monitors: %d\n", len(cfg.Monitors))
logrus.Infof("Pinging cachet")
if err := cfg.API.Ping(); err != nil {
logrus.Errorf("Cannot ping cachet!\n%v", err)
os.Exit(1)
}
logrus.Infof("Ping OK")
wg := &sync.WaitGroup{}
for index, monitor := range cfg.Monitors {
logrus.Infof("Starting Monitor #%d: ", index)
logrus.Infof("Features: \n - %v", strings.Join(monitor.Describe(), "\n - "))
go monitor.ClockStart(cfg, monitor, wg)
}
signals := make(chan os.Signal, 1)
signal.Notify(signals, os.Interrupt, os.Kill)
<-signals
logrus.Warnf("Abort: Waiting monitors to finish")
for _, mon := range cfg.Monitors {
mon.GetMonitor().ClockStop()
}
wg.Wait()
}
func getLogger(logPath interface{}) *os.File {
if logPath == nil || len(logPath.(string)) == 0 {
return os.Stdout
}
file, err := os.Create(logPath.(string))
if err != nil {
logrus.Errorf("Unable to open file '%v' for logging: \n%v", logPath, err)
os.Exit(1)
}
return file
}
func getConfiguration(path string) (*cachet.CachetMonitor, error) {
var cfg cachet.CachetMonitor
var data []byte
// test if its a url
url, err := url.ParseRequestURI(path)
if err == nil && len(url.Scheme) > 0 {
// download config
response, err := http.Get(path)
if err != nil {
logrus.Warn("Unable to download network configuration")
return nil, err
}
defer response.Body.Close()
data, _ = ioutil.ReadAll(response.Body)
logrus.Info("Downloaded network configuration.")
} else {
data, err = ioutil.ReadFile(path)
if err != nil {
return nil, errors.New("Unable to open file: '" + path + "'")
}
}
if strings.HasSuffix(path, ".yaml") || strings.HasSuffix(path, ".yml") {
err = yaml.Unmarshal(data, &cfg)
} else {
err = json.Unmarshal(data, &cfg)
}
if err != nil {
logrus.Warnf("Unable to parse configuration file")
}
cfg.Monitors = make([]cachet.MonitorInterface, len(cfg.RawMonitors))
for index, rawMonitor := range cfg.RawMonitors {
var t cachet.MonitorInterface
var err error
// get default type
monType := cachet.GetMonitorType("")
if t, ok := rawMonitor["type"].(string); ok {
monType = cachet.GetMonitorType(t)
}
switch monType {
case "http":
var s cachet.HTTPMonitor
err = mapstructure.Decode(rawMonitor, &s)
t = &s
case "dns":
var s cachet.DNSMonitor
err = mapstructure.Decode(rawMonitor, &s)
t = &s
case "icmp":
var s cachet.ICMPMonitor
err = mapstructure.Decode(rawMonitor, &s)
t = &s
case "tcp":
var s cachet.TCPMonitor
err = mapstructure.Decode(rawMonitor, &s)
t = &s
default:
logrus.Errorf("Invalid monitor type (index: %d) %v", index, monType)
continue
}
t.GetMonitor().Type = monType
if err != nil {
logrus.Errorf("Unable to unmarshal monitor to type (index: %d): %v", index, err)
continue
}
cfg.Monitors[index] = t
}
return &cfg, err
}

115
cli/root.go Normal file
View File

@@ -0,0 +1,115 @@
package main
import (
"os"
"os/signal"
"strings"
"sync"
cachet "github.com/castawaylabs/cachet-monitor"
"github.com/sirupsen/logrus"
"github.com/spf13/cobra"
)
var cfgFile string
// rootCmd represents the base command when called without any subcommands
var rootCmd = &cobra.Command{
Use: "cmd",
Short: "cachet-monitor",
// Uncomment the following line if your bare application
// has an action associated with it:
Run: func(cmd *cobra.Command, args []string) {
Action(cmd, args)
},
}
func main() {
if err := rootCmd.Execute(); err != nil {
panic(err)
}
}
func init() {
// Here you will define your flags and configuration settings.
// Cobra supports persistent flags, which, if defined here,
// will be global for your application.
pf := rootCmd.PersistentFlags()
pf.StringVarP(&cfgFile, "config", "c", "", "config file (default is $(pwd)/config.yml)")
pf.String("log", "", "log output")
pf.String("format", "text", "log format [text/json]")
pf.String("name", "", "machine name")
pf.Bool("immediate", false, "Tick immediately (by default waits for first defined")
}
func Action(cmd *cobra.Command, args []string) {
cfg, err := cachet.New(cfgFile)
if err != nil {
logrus.Panicf("Unable to start (reading config): %v", err)
}
if immediate, err := cmd.Flags().GetBool("immediate"); err == nil && immediate {
cfg.Immediate = immediate
}
if name, err := cmd.Flags().GetString("name"); err == nil && len(name) > 0 {
cfg.SystemName = name
}
logrus.SetOutput(getLogger(cmd))
if format, err := cmd.Flags().GetString("format"); err == nil && format == "json" {
logrus.SetFormatter(&logrus.JSONFormatter{})
}
if valid := cfg.Validate(); !valid {
logrus.Errorf("Invalid configuration")
os.Exit(1)
}
logrus.Debug("Configuration valid")
logrus.Infof("System: %s", cfg.SystemName)
// logrus.Infof("API: %s", cfg.API.URL)
logrus.Infof("Monitors: %d", len(cfg.Monitors))
logrus.Infof("Backend: %v", strings.Join(cfg.Backend.Describe(), "\n - "))
logrus.Infof("Pinging backend")
if err := cfg.Backend.Ping(); err != nil {
logrus.Errorf("Cannot ping backend!\n%v", err)
// os.Exit(1)
}
logrus.Infof("Ping OK")
logrus.Warnf("Starting!")
wg := &sync.WaitGroup{}
for index, monitor := range cfg.Monitors {
logrus.Infof("Starting Monitor #%d: ", index)
logrus.Infof("Features: \n - %v", strings.Join(monitor.Describe(), "\n - "))
go monitor.Start(monitor.GetTestFunc(), wg, cfg.Backend.Tick, cfg.Immediate)
}
signals := make(chan os.Signal, 1)
signal.Notify(signals, os.Interrupt, os.Kill)
<-signals
logrus.Warnf("Abort: Waiting for monitors to finish")
for _, mon := range cfg.Monitors {
mon.GetMonitor().Stop()
}
wg.Wait()
}
func getLogger(cmd *cobra.Command) *os.File {
logPath, _ := cmd.Flags().GetString("log")
if len(logPath) == 0 {
return os.Stdout
}
file, err := os.Create(logPath)
if err != nil {
logrus.Errorf("Unable to open file '%v' for logging: \n%v", logPath, err)
os.Exit(1)
}
return file
}

144
config.go
View File

@@ -1,24 +1,132 @@
package cachet package cachet
import ( import (
"encoding/json"
"errors"
"io/ioutil"
"net" "net"
"net/http"
"net/url"
"os" "os"
"strings" "strings"
"time"
"github.com/Sirupsen/logrus" "github.com/castawaylabs/cachet-monitor/backends"
cachetbackend "github.com/castawaylabs/cachet-monitor/backends/cachet"
"github.com/castawaylabs/cachet-monitor/monitors"
"github.com/mitchellh/mapstructure"
"github.com/sirupsen/logrus"
yaml "gopkg.in/yaml.v2"
) )
type CachetMonitor struct { type CachetMonitor struct {
SystemName string `json:"system_name" yaml:"system_name"`
DateFormat string `json:"date_format" yaml:"date_format"`
API CachetAPI `json:"api"`
RawMonitors []map[string]interface{} `json:"monitors" yaml:"monitors"` RawMonitors []map[string]interface{} `json:"monitors" yaml:"monitors"`
RawBackend map[string]interface{} `json:"backend" yaml:"backend"`
Monitors []MonitorInterface `json:"-" yaml:"-"` SystemName string `json:"system_name" yaml:"system_name"`
Backend backends.BackendInterface `json:"-" yaml:"-"`
Monitors []monitors.MonitorInterface `json:"-" yaml:"-"`
Immediate bool `json:"-" yaml:"-"` Immediate bool `json:"-" yaml:"-"`
} }
func New(path string) (*CachetMonitor, error) {
var cfg *CachetMonitor
var data []byte
// test if its a url
url, err := url.ParseRequestURI(path)
if err == nil && len(url.Scheme) > 0 {
// download config
response, err := http.Get(path)
if err != nil {
logrus.Warn("Unable to download network configuration")
return nil, err
}
defer response.Body.Close()
data, _ = ioutil.ReadAll(response.Body)
logrus.Info("Downloaded network configuration.")
} else {
data, err = ioutil.ReadFile(path)
if err != nil {
return nil, errors.New("Unable to open file: '" + path + "'")
}
}
if strings.HasSuffix(path, ".yaml") || strings.HasSuffix(path, ".yml") {
err = yaml.Unmarshal(data, &cfg)
} else {
err = json.Unmarshal(data, &cfg)
}
if err != nil {
logrus.Warnf("Unable to parse configuration file")
return nil, err
}
// get default type
if backend, ok := cfg.RawBackend["type"].(string); !ok {
err = errors.New("Cannot determine backend type")
} else {
switch backend {
case "cachet":
var backend cachetbackend.CachetBackend
err = mapstructure.Decode(cfg.RawBackend, &backend)
cfg.Backend = &backend
// backend.config = cfg
default:
err = errors.New("Invalid backend type: %v" + backend)
}
}
if errs := cfg.Backend.Validate(); len(errs) > 0 {
logrus.Errorf("Backend validation errors: %v", errs)
os.Exit(1)
}
if err != nil {
logrus.Errorf("Unable to unmarshal backend: %v", err)
return nil, err
}
cfg.Monitors = make([]monitors.MonitorInterface, len(cfg.RawMonitors))
for index, rawMonitor := range cfg.RawMonitors {
var t monitors.MonitorInterface
// get default type
monType := GetMonitorType("")
if t, ok := rawMonitor["type"].(string); ok {
monType = GetMonitorType(t)
}
switch monType {
case "http":
var mon monitors.HTTPMonitor
err = mapstructure.Decode(rawMonitor, &mon)
t = &mon
case "dns":
var mon monitors.DNSMonitor
err = mapstructure.Decode(rawMonitor, &mon)
t = &mon
default:
logrus.Errorf("Invalid monitor type (index: %d) %v", index, monType)
continue
}
if err != nil {
logrus.Errorf("Unable to unmarshal monitor to type (index: %d): %v", index, err)
continue
}
mon := t.GetMonitor()
mon.Type = monType
cfg.Monitors[index] = t
}
return cfg, err
}
// Validate configuration // Validate configuration
func (cfg *CachetMonitor) Validate() bool { func (cfg *CachetMonitor) Validate() bool {
valid := true valid := true
@@ -28,22 +136,13 @@ func (cfg *CachetMonitor) Validate() bool {
cfg.SystemName = getHostname() cfg.SystemName = getHostname()
} }
if len(cfg.DateFormat) == 0 {
cfg.DateFormat = DefaultTimeFormat
}
if len(cfg.API.Token) == 0 || len(cfg.API.URL) == 0 {
logrus.Warnf("API URL or API Token missing.\nGet help at https://github.com/castawaylabs/cachet-monitor")
valid = false
}
if len(cfg.Monitors) == 0 { if len(cfg.Monitors) == 0 {
logrus.Warnf("No monitors defined!\nSee help for example configuration") logrus.Warnf("No monitors defined!\nSee help for example configuration")
valid = false valid = false
} }
for index, monitor := range cfg.Monitors { for index, monitor := range cfg.Monitors {
if errs := monitor.Validate(); len(errs) > 0 { if errs := monitor.Validate(cfg.Backend.ValidateMonitor); len(errs) > 0 {
logrus.Warnf("Monitor validation errors (index %d): %v", index, "\n - "+strings.Join(errs, "\n - ")) logrus.Warnf("Monitor validation errors (index %d): %v", index, "\n - "+strings.Join(errs, "\n - "))
valid = false valid = false
} }
@@ -67,10 +166,6 @@ func getHostname() string {
return addrs[0].String() return addrs[0].String()
} }
func getMs() int64 {
return time.Now().UnixNano() / int64(time.Millisecond)
}
func GetMonitorType(t string) string { func GetMonitorType(t string) string {
if len(t) == 0 { if len(t) == 0 {
return "http" return "http"
@@ -78,12 +173,3 @@ func GetMonitorType(t string) string {
return strings.ToLower(t) return strings.ToLower(t)
} }
func getTemplateData(monitor *AbstractMonitor) map[string]interface{} {
return map[string]interface{}{
"SystemName": monitor.config.SystemName,
"API": monitor.config.API,
"Monitor": monitor,
"now": time.Now().Format(monitor.config.DateFormat),
}
}

5
dns.go
View File

@@ -1,5 +0,0 @@
package cachet
type DNSMonitor struct {
AbstractMonitor `mapstructure:",squash"`
}

View File

@@ -0,0 +1,20 @@
[Unit]
Description=Cachet Monitor
After=syslog.target
After=network.target
#After=mysqld.service
#After=postgresql.service
#After=memcached.service
#After=redis.service
[Service]
Type=simple
User=root
Group=root
WorkingDirectory=/root
ExecStart=/root/cachet-monitor -c /etc/cachet-monitor.yaml
Restart=always
Environment=USER=root HOME=/root
[Install]
WantedBy=multi-user.target

View File

@@ -2,21 +2,58 @@
"api": { "api": {
"url": "https://demo.cachethq.io/api/v1", "url": "https://demo.cachethq.io/api/v1",
"token": "9yMHsdioQosnyVK4iCVR", "token": "9yMHsdioQosnyVK4iCVR",
"insecure": true "insecure": false
}, },
"date_format": "02/01/2006 15:04:05 MST",
"monitors": [ "monitors": [
{ {
"name": "google", "name": "google",
"url": "https://google.com", "target": "https://google.com",
"threshold": 80, "strict": true,
"method": "POST",
"component_id": 1, "component_id": 1,
"interval": 10, "metric_id": 4,
"timeout": 5, "template": {
"investigating": {
"subject": "{{ .Monitor.Name }} - {{ .SystemName }}",
"message": "{{ .Monitor.Name }} check **failed** (server time: {{ .now }})\n\n{{ .FailReason }}"
},
"fixed": {
"subject": "I HAVE BEEN FIXED"
}
},
"interval": 1,
"timeout": 1,
"threshold": 80,
"headers": { "headers": {
"Authorization": "Basic <hash>" "Authorization": "Basic <hash>"
}, },
"expected_status_code": 200, "expected_status_code": 200,
"strict_tls": true "expected_body": "P.*NG"
},
{
"name": "dns",
"target": "matej.me.",
"question": "mx",
"type": "dns",
"component_id": 2,
"interval": 1,
"timeout": 1,
"dns": "8.8.4.4:53",
"answers": [
{
"regex": "[1-9] alt[1-9].aspmx.l.google.com."
},
{
"exact": "10 aspmx2.googlemail.com."
},
{
"exact": "1 aspmx.l.google.com."
},
{
"exact": "10 aspmx3.googlemail.com."
}
]
} }
] ]
} }

View File

@@ -1,14 +1,65 @@
api: api:
# cachet url
url: https://demo.cachethq.io/api/v1 url: https://demo.cachethq.io/api/v1
# cachet api token
token: 9yMHsdioQosnyVK4iCVR token: 9yMHsdioQosnyVK4iCVR
insecure: false
# https://golang.org/src/time/format.go#L57
date_format: 02/01/2006 15:04:05 MST
monitors: monitors:
# http monitor example
- name: google - name: google
# test url
target: https://google.com target: https://google.com
threshold: 80 # strict certificate checking for https
strict: true
# HTTP method
method: POST
# set to update component (either component_id or metric_id are required)
component_id: 1 component_id: 1
interval: 10 # set to post lag to cachet metric (graph)
timeout: 5 metric_id: 4
# custom templates (see readme for details)
template:
investigating:
subject: "{{ .Monitor.Name }} - {{ .SystemName }}"
message: "{{ .Monitor.Name }} check **failed** (server time: {{ .now }})\n\n{{ .FailReason }}"
fixed:
subject: "I HAVE BEEN FIXED"
# seconds between checks
interval: 1
# seconds for timeout
timeout: 1
# If % of downtime is over this threshold, open an incident
threshold: 80
# custom HTTP headers
headers: headers:
Authorization: Basic <hash> Authorization: Basic <hash>
# expected status code (either status code or body must be supplied)
expected_status_code: 200 expected_status_code: 200
strict: true # regex to match body
expected_body: "P.*NG"
# dns monitor example
- name: dns
# fqdn
target: matej.me.
# question type (A/AAAA/CNAME/...)
question: mx
type: dns
# set component_id/metric_id
component_id: 2
# poll every 1s
interval: 1
timeout: 1
# custom DNS server (defaults to system)
dns: 8.8.4.4:53
answers:
# exact/regex check
- regex: [1-9] alt[1-9].aspmx.l.google.com.
- exact: 10 aspmx2.googlemail.com.
- exact: 1 aspmx.l.google.com.
- exact: 10 aspmx3.googlemail.com.

14
example.upstart.conf Normal file
View File

@@ -0,0 +1,14 @@
description "Cachet Monitor"
start on startup
env USER=root
env HOME=/root
setuid root
setgid root
chdir /root
script
exec cachet-monitor -c /cachet-monitor.json --immediate
end script

17
go.mod Normal file
View File

@@ -0,0 +1,17 @@
module github.com/castawaylabs/cachet-monitor
go 1.12
require (
github.com/gizak/termui v2.3.0+incompatible // indirect
github.com/maruel/panicparse v1.2.1 // indirect
github.com/mattn/go-runewidth v0.0.4 // indirect
github.com/miekg/dns v1.1.13
github.com/mitchellh/go-wordwrap v1.0.0 // indirect
github.com/mitchellh/mapstructure v1.1.2
github.com/nsf/termbox-go v0.0.0-20190325093121-288510b9734e // indirect
github.com/sirupsen/logrus v1.4.2
github.com/spf13/cobra v0.0.4
golang.org/x/net v0.0.0-20190522155817-f3200d17e092 // indirect
gopkg.in/yaml.v2 v2.2.2
)

63
go.sum Normal file
View File

@@ -0,0 +1,63 @@
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
github.com/armon/consul-api v0.0.0-20180202201655-eb2c6b5be1b6/go.mod h1:grANhF5doyWs3UAsr3K4I6qtAmlQcZDesFNEHPZAzj8=
github.com/coreos/etcd v3.3.10+incompatible/go.mod h1:uF7uidLiAD3TWHmW31ZFd/JWoc32PjwdhPthX9715RE=
github.com/coreos/go-etcd v2.0.0+incompatible/go.mod h1:Jez6KQU2B/sWsbdaef3ED8NzMklzPG4d5KIOhIy30Tk=
github.com/coreos/go-semver v0.2.0/go.mod h1:nnelYz7RCh+5ahJtPPxZlU+153eP4D4r3EedlOD2RNk=
github.com/cpuguy83/go-md2man v1.0.10/go.mod h1:SmD6nW6nTyfqj6ABTjUi3V3JVMnlJmwcJI5acqYI6dE=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/fsnotify/fsnotify v1.4.7/go.mod h1:jwhsz4b93w/PPRr/qN1Yymfu8t87LnFCMoQvtojpjFo=
github.com/gizak/termui v2.3.0+incompatible h1:S8wJoNumYfc/rR5UezUM4HsPEo3RJh0LKdiuDWQpjqw=
github.com/gizak/termui v2.3.0+incompatible/go.mod h1:PkJoWUt/zacQKysNfQtcw1RW+eK2SxkieVBtl+4ovLA=
github.com/hashicorp/hcl v1.0.0/go.mod h1:E5yfLk+7swimpb2L/Alb/PJmXilQ/rhwaUYs4T20WEQ=
github.com/inconshreveable/mousetrap v1.0.0/go.mod h1:PxqpIevigyE2G7u3NXJIT2ANytuPF1OarO4DADm73n8=
github.com/konsorten/go-windows-terminal-sequences v1.0.1/go.mod h1:T0+1ngSBFLxvqU3pZ+m/2kptfBszLMUkC4ZK/EgS/cQ=
github.com/magiconair/properties v1.8.0/go.mod h1:PppfXfuXeibc/6YijjN8zIbojt8czPbwD3XqdrwzmxQ=
github.com/maruel/panicparse v1.2.1 h1:mNlHGiakrixj+AwF/qRpTwnj+zsWYPRLQ7wRqnJsfO0=
github.com/maruel/panicparse v1.2.1/go.mod h1:vszMjr5QQ4F5FSRfraldcIA/BCw5xrdLL+zEcU2nRBs=
github.com/mattn/go-colorable v0.1.1/go.mod h1:FuOcm+DKB9mbwrcAfNl7/TZVBZ6rcnceauSikq3lYCQ=
github.com/mattn/go-isatty v0.0.5/go.mod h1:Iq45c/XA43vh69/j3iqttzPXn0bhXyGjM0Hdxcsrc5s=
github.com/mattn/go-isatty v0.0.7/go.mod h1:Iq45c/XA43vh69/j3iqttzPXn0bhXyGjM0Hdxcsrc5s=
github.com/mattn/go-runewidth v0.0.4 h1:2BvfKmzob6Bmd4YsL0zygOqfdFnK7GR4QL06Do4/p7Y=
github.com/mattn/go-runewidth v0.0.4/go.mod h1:LwmH8dsx7+W8Uxz3IHJYH5QSwggIsqBzpuz5H//U1FU=
github.com/mgutz/ansi v0.0.0-20170206155736-9520e82c474b/go.mod h1:01TrycV0kFyexm33Z7vhZRXopbI8J3TDReVlkTgMUxE=
github.com/miekg/dns v1.1.13 h1:x7DQtkU0cedzeS8TD36tT/w1Hm4rDtfCaYYAHE7TTBI=
github.com/miekg/dns v1.1.13/go.mod h1:W1PPwlIAgtquWBMBEV9nkV9Cazfe8ScdGz/Lj7v3Nrg=
github.com/mitchellh/go-homedir v1.1.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrkLzIz1N1q0pr0=
github.com/mitchellh/go-wordwrap v1.0.0 h1:6GlHJ/LTGMrIJbwgdqdl2eEH8o+Exx/0m8ir9Gns0u4=
github.com/mitchellh/go-wordwrap v1.0.0/go.mod h1:ZXFpozHsX6DPmq2I0TCekCxypsnAUbP2oI0UX1GXzOo=
github.com/mitchellh/mapstructure v1.1.2 h1:fmNYVwqnSfB9mZU6OS2O6GsXM+wcskZDuKQzvN1EDeE=
github.com/mitchellh/mapstructure v1.1.2/go.mod h1:FVVH3fgwuzCH5S8UJGiWEs2h04kUh9fWfEaFds41c1Y=
github.com/nsf/termbox-go v0.0.0-20190325093121-288510b9734e h1:Vbib8wJAaMEF9jusI/kMSYMr/LtRzM7+F9MJgt/nH8k=
github.com/nsf/termbox-go v0.0.0-20190325093121-288510b9734e/go.mod h1:IuKpRQcYE1Tfu+oAQqaLisqDeXgjyyltCfsaoYN18NQ=
github.com/pelletier/go-toml v1.2.0/go.mod h1:5z9KED0ma1S8pY6P1sdut58dfprrGBbd/94hg7ilaic=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/russross/blackfriday v1.5.2/go.mod h1:JO/DiYxRf+HjHt06OyowR9PTA263kcR/rfWxYHBV53g=
github.com/sirupsen/logrus v1.4.2 h1:SPIRibHv4MatM3XXNO2BJeFLZwZ2LvZgfQ5+UNI2im4=
github.com/sirupsen/logrus v1.4.2/go.mod h1:tLMulIdttU9McNUspp0xgXVQah82FyeX6MwdIuYE2rE=
github.com/spf13/afero v1.1.2/go.mod h1:j4pytiNVoe2o6bmDsKpLACNPDBIoEAkihy7loJ1B0CQ=
github.com/spf13/cast v1.3.0/go.mod h1:Qx5cxh0v+4UWYiBimWS+eyWzqEqokIECu5etghLkUJE=
github.com/spf13/cobra v0.0.4 h1:S0tLZ3VOKl2Te0hpq8+ke0eSJPfCnNTPiDlsfwi1/NE=
github.com/spf13/cobra v0.0.4/go.mod h1:3K3wKZymM7VvHMDS9+Akkh4K60UwM26emMESw8tLCHU=
github.com/spf13/jwalterweatherman v1.0.0/go.mod h1:cQK4TGJAtQXfYWX+Ddv3mKDzgVb68N+wFjFa4jdeBTo=
github.com/spf13/pflag v1.0.3 h1:zPAT6CGy6wXeQ7NtTnaTerfKOsV6V6F8agHXFiazDkg=
github.com/spf13/pflag v1.0.3/go.mod h1:DYY7MBk1bdzusC3SYhjObp+wFpr4gzcvqqNjLnInEg4=
github.com/spf13/viper v1.3.2/go.mod h1:ZiWeW+zYFKm7srdB9IoDzzZXaJaI5eL9QjNiN/DMA2s=
github.com/stretchr/objx v0.1.1/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs=
github.com/ugorji/go/codec v0.0.0-20181204163529-d75b2dcb6bc8/go.mod h1:VFNgLljTbGfSG7qAOspJ7OScBnGdDN/yBr0sguwnwf0=
github.com/xordataexchange/crypt v0.0.3-0.20170626215501-b2862e3d0a77/go.mod h1:aYKd//L2LvnjZzWKhF00oedf4jCCReLcmhLdhm1A27Q=
golang.org/x/crypto v0.0.0-20181203042331-505ab145d0a9 h1:mKdxBk7AujPs8kU4m80U72y/zjbZ3UcXC7dClwKbUI0=
golang.org/x/crypto v0.0.0-20181203042331-505ab145d0a9/go.mod h1:6SG95UA2DQfeDnfUPMdvaQW0Q7yPrPDi9nlGo2tz2b4=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2 h1:VklqNMn3ovrHsnt90PveolxSbWFaJdECFbxSq0Mqo2M=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/net v0.0.0-20190522155817-f3200d17e092 h1:4QSRKanuywn15aTZvI/mIDEgPQpswuFndXpOj3rKEco=
golang.org/x/net v0.0.0-20190522155817-f3200d17e092/go.mod h1:HSz+uSET+XFnRR8LxR5pz3Of3rY3CfYBVs4xY44aLks=
golang.org/x/sys v0.0.0-20181205085412-a5c9d58dba9a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190222072716-a9d3bda3a223/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190422165155-953cdadca894 h1:Cz4ceDQGXuKRnVBDTS23GTn/pU5OE2C0WrNTOYK1Uuc=
golang.org/x/sys v0.0.0-20190422165155-953cdadca894/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v2 v2.2.2 h1:ZCJp+EgiOT7lHqUV2J862kp8Qj64Jo6az82+3Td9dZw=
gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=

View File

@@ -1,5 +0,0 @@
package cachet
type ICMPMonitor struct {
AbstractMonitor `mapstructure:",squash"`
}

View File

@@ -1,112 +0,0 @@
package cachet
import (
"encoding/json"
"fmt"
"strconv"
"github.com/Sirupsen/logrus"
)
// Incident Cachet data model
type Incident struct {
ID int `json:"id"`
Name string `json:"name"`
Message string `json:"message"`
Status int `json:"status"`
Visible int `json"visible"`
Notify bool `json:"notify"`
ComponentID int `json:"component_id"`
ComponentStatus int `json:"component_status"`
}
// Send - Create or Update incident
func (incident *Incident) Send(cfg *CachetMonitor) error {
switch incident.Status {
case 1, 2, 3:
// partial outage
incident.ComponentStatus = 3
componentStatus, err := incident.GetComponentStatus(cfg)
if componentStatus == 3 {
// major outage
incident.ComponentStatus = 4
}
if err != nil {
logrus.Warnf("cannot fetch component: %v", err)
}
case 4:
// fixed
incident.ComponentStatus = 1
}
requestType := "POST"
requestURL := "/incidents"
if incident.ID > 0 {
requestType = "PUT"
requestURL += "/" + strconv.Itoa(incident.ID)
}
jsonBytes, _ := json.Marshal(incident)
resp, body, err := cfg.API.NewRequest(requestType, requestURL, jsonBytes)
if err != nil {
return err
}
var data struct {
ID int `json:"id"`
}
if err := json.Unmarshal(body.Data, &data); err != nil {
return fmt.Errorf("Cannot parse incident body: %v, %v", err, string(body.Data))
}
incident.ID = data.ID
if resp.StatusCode != 200 {
return fmt.Errorf("Could not create/update incident!")
}
return nil
}
func (incident *Incident) GetComponentStatus(cfg *CachetMonitor) (int, error) {
resp, body, err := cfg.API.NewRequest("GET", "/components/"+strconv.Itoa(incident.ComponentID), nil)
if err != nil {
return 0, err
}
if resp.StatusCode != 200 {
return 0, fmt.Errorf("Invalid status code. Received %d", resp.StatusCode)
}
var data struct {
Status int `json:"status"`
}
if err := json.Unmarshal(body.Data, &data); err != nil {
return 0, fmt.Errorf("Cannot parse component body: %v. Err = %v", string(body.Data), err)
}
return data.Status, nil
}
// SetInvestigating sets status to Investigating
func (incident *Incident) SetInvestigating() {
incident.Status = 1
}
// SetIdentified sets status to Identified
func (incident *Incident) SetIdentified() {
incident.Status = 2
}
// SetWatching sets status to Watching
func (incident *Incident) SetWatching() {
incident.Status = 3
}
// SetFixed sets status to Fixed
func (incident *Incident) SetFixed() {
incident.Status = 4
}

10
make.sh Executable file
View File

@@ -0,0 +1,10 @@
#!/bin/bash
set -e
if [ "$1" == "test" ]; then
reflex -r '\.go$' -s -d none -- sh -c 'go test ./...'
fi
reflex -r '\.go$' -s -d none -- sh -c 'go build -o ./cachet-monitor ./cli/ && ./cachet-monitor -c config.yml'
exit 0

View File

@@ -1,253 +0,0 @@
package cachet
import (
"sync"
"time"
"github.com/Sirupsen/logrus"
)
const DefaultInterval = time.Second * 60
const DefaultTimeout = time.Second
const DefaultTimeFormat = "15:04:05 Jan 2 MST"
const HistorySize = 10
type MonitorInterface interface {
ClockStart(*CachetMonitor, MonitorInterface, *sync.WaitGroup)
ClockStop()
tick(MonitorInterface)
test() bool
Validate() []string
GetMonitor() *AbstractMonitor
Describe() []string
}
// AbstractMonitor data model
type AbstractMonitor struct {
Name string
Target string
// (default)http, tcp, dns, icmp
Type string
Strict bool
Interval time.Duration
Timeout time.Duration
MetricID int `mapstructure:"metric_id"`
ComponentID int `mapstructure:"component_id"`
// Templating stuff
Template struct {
Investigating MessageTemplate
Fixed MessageTemplate
}
// Threshold = percentage / number of down incidents
Threshold float32
ThresholdCount bool `mapstructure:"threshold_count"`
// lag / average(lagHistory) * 100 = percentage above average lag
// PerformanceThreshold sets the % limit above which this monitor will trigger degraded-performance
PerformanceThreshold float32
history []bool
lagHistory []float32
lastFailReason string
incident *Incident
config *CachetMonitor
// Closed when mon.Stop() is called
stopC chan bool
}
func (mon *AbstractMonitor) Validate() []string {
errs := []string{}
if len(mon.Name) == 0 {
errs = append(errs, "Name is required")
}
if mon.Interval < 1 {
mon.Interval = DefaultInterval
}
if mon.Timeout < 1 {
mon.Timeout = DefaultTimeout
}
if mon.Timeout > mon.Interval {
errs = append(errs, "Timeout greater than interval")
}
if mon.ComponentID == 0 && mon.MetricID == 0 {
errs = append(errs, "component_id & metric_id are unset")
}
if mon.Threshold <= 0 {
mon.Threshold = 100
}
if err := mon.Template.Fixed.Compile(); err != nil {
errs = append(errs, "Could not compile \"fixed\" template: "+err.Error())
}
if err := mon.Template.Investigating.Compile(); err != nil {
errs = append(errs, "Could not compile \"investigating\" template: "+err.Error())
}
return errs
}
func (mon *AbstractMonitor) GetMonitor() *AbstractMonitor {
return mon
}
func (mon *AbstractMonitor) Describe() []string {
features := []string{"Type: " + mon.Type}
if len(mon.Name) > 0 {
features = append(features, "Name: "+mon.Name)
}
return features
}
func (mon *AbstractMonitor) ClockStart(cfg *CachetMonitor, iface MonitorInterface, wg *sync.WaitGroup) {
wg.Add(1)
mon.config = cfg
mon.stopC = make(chan bool)
if cfg.Immediate {
mon.tick(iface)
}
ticker := time.NewTicker(mon.Interval * time.Second)
for {
select {
case <-ticker.C:
mon.tick(iface)
case <-mon.stopC:
wg.Done()
return
}
}
}
func (mon *AbstractMonitor) ClockStop() {
select {
case <-mon.stopC:
return
default:
close(mon.stopC)
}
}
func (mon *AbstractMonitor) test() bool { return false }
// TODO: test
func (mon *AbstractMonitor) tick(iface MonitorInterface) {
reqStart := getMs()
up := iface.test()
lag := getMs() - reqStart
histSize := HistorySize
if mon.ThresholdCount {
histSize = int(mon.Threshold)
}
if len(mon.history) == histSize-1 {
logrus.Warnf("%v is now saturated\n", mon.Name)
}
if len(mon.history) >= histSize {
mon.history = mon.history[len(mon.history)-(histSize-1):]
}
mon.history = append(mon.history, up)
mon.AnalyseData()
// report lag
if mon.MetricID > 0 {
go mon.config.API.SendMetric(mon.MetricID, lag)
}
}
// TODO: test
// AnalyseData decides if the monitor is statistically up or down and creates / resolves an incident
func (mon *AbstractMonitor) AnalyseData() {
// look at the past few incidents
numDown := 0
for _, wasUp := range mon.history {
if wasUp == false {
numDown++
}
}
t := (float32(numDown) / float32(len(mon.history))) * 100
l := logrus.WithFields(logrus.Fields{
"monitor": mon.Name,
"time": time.Now().Format(mon.config.DateFormat),
})
if numDown == 0 {
l.Printf("monitor is up")
} else if mon.ThresholdCount {
l.Printf("monitor down %d/%d", numDown, int(mon.Threshold))
} else {
l.Printf("monitor down %.2f%%/%.2f%%", t, mon.Threshold)
}
histSize := HistorySize
if mon.ThresholdCount {
histSize = int(mon.Threshold)
}
if len(mon.history) != histSize {
// not saturated
return
}
triggered := (mon.ThresholdCount && numDown == int(mon.Threshold)) || (!mon.ThresholdCount && t > mon.Threshold)
if triggered && mon.incident == nil {
// create incident
tplData := getTemplateData(mon)
tplData["FailReason"] = mon.lastFailReason
subject, message := mon.Template.Investigating.Exec(tplData)
mon.incident = &Incident{
Name: subject,
ComponentID: mon.ComponentID,
Message: message,
Notify: true,
}
// is down, create an incident
l.Warnf("creating incident. Monitor is down: %v", mon.lastFailReason)
// set investigating status
mon.incident.SetInvestigating()
// create/update incident
if err := mon.incident.Send(mon.config); err != nil {
l.Printf("Error sending incident: %v", err)
}
return
}
// still triggered or no incident
if triggered || mon.incident == nil {
return
}
// was down, created an incident, its now ok, make it resolved.
l.Warn("Resolving incident")
// resolve incident
tplData := getTemplateData(mon)
tplData["incident"] = mon.incident
subject, message := mon.Template.Fixed.Exec(tplData)
mon.incident.Name = subject
mon.incident.Message = message
mon.incident.SetFixed()
if err := mon.incident.Send(mon.config); err != nil {
l.Printf("Error sending incident: %v", err)
}
mon.lastFailReason = ""
mon.incident = nil
}

143
monitors/dns.go Normal file
View File

@@ -0,0 +1,143 @@
package monitors
import (
"errors"
"net"
"regexp"
"strings"
"github.com/miekg/dns"
"github.com/sirupsen/logrus"
)
// Investigating template
var defaultDNSInvestigatingTpl = MessageTemplate{
Subject: `{{ .Monitor.Name }} - {{ .SystemName }}`,
Message: `{{ .Monitor.Name }} DNS check **failed** (server time: {{ .now }})
{{ .FailReason }}`,
}
// Fixed template
var defaultDNSFixedTpl = MessageTemplate{
Subject: `{{ .Monitor.Name }} - {{ .SystemName }}`,
Message: `**Resolved** - {{ .now }}
- - -
{{ .incident.Message }}`,
}
type DNSAnswer struct {
Regex string
regexp *regexp.Regexp
Exact string
}
type DNSMonitor struct {
AbstractMonitor `mapstructure:",squash"`
// IP:port format or blank to use system defined DNS
DNS string
// A(default), AAAA, MX, ...
Question string
question uint16
Answers []DNSAnswer
}
func (monitor *DNSMonitor) Validate(validate backendValidateFunc) []string {
monitor.Template.Investigating.SetDefault(defaultDNSInvestigatingTpl)
monitor.Template.Fixed.SetDefault(defaultDNSFixedTpl)
errs := monitor.AbstractMonitor.Validate(validate)
if len(monitor.DNS) == 0 {
config, _ := dns.ClientConfigFromFile("/etc/resolv.conf")
if len(config.Servers) > 0 {
monitor.DNS = net.JoinHostPort(config.Servers[0], config.Port)
}
}
if len(monitor.DNS) == 0 {
monitor.DNS = "8.8.8.8:53"
}
if len(monitor.Question) == 0 {
monitor.Question = "A"
}
monitor.Question = strings.ToUpper(monitor.Question)
monitor.question = findDNSType(monitor.Question)
if monitor.question == 0 {
errs = append(errs, "Could not look up DNS question type")
}
for i, a := range monitor.Answers {
if len(a.Regex) > 0 {
monitor.Answers[i].regexp, _ = regexp.Compile(a.Regex)
}
}
return errs
}
func (monitor *DNSMonitor) test() (bool, []error) {
m := new(dns.Msg)
m.SetQuestion(dns.Fqdn(monitor.Target), monitor.question)
m.RecursionDesired = true
c := new(dns.Client)
r, _, err := c.Exchange(m, monitor.DNS)
if err != nil {
logrus.Warnf("DNS error: %v", err)
return false, []error{err}
}
if r.Rcode != dns.RcodeSuccess {
return false, []error{errors.New("Invalid status code returned")}
}
for _, check := range monitor.Answers {
found := false
for _, answer := range r.Answer {
found = matchAnswer(answer, check)
if found {
break
}
}
if !found {
logrus.Warnf("DNS check failed: %v. Not found in any of %v", check, r.Answer)
return false, []error{errors.New("Record not found")}
}
}
return true, nil
}
func findDNSType(t string) uint16 {
for rr, strType := range dns.TypeToString {
if t == strType {
return rr
}
}
return 0
}
func matchAnswer(answer dns.RR, check DNSAnswer) bool {
fields := []string{}
for i := 0; i < dns.NumField(answer); i++ {
fields = append(fields, dns.Field(answer, i+1))
}
str := strings.Join(fields, " ")
if check.regexp != nil {
return check.regexp.Match([]byte(str))
}
return str == check.Exact
}

View File

@@ -1,7 +1,8 @@
package cachet package monitors
import ( import (
"crypto/tls" "crypto/tls"
"errors"
"io/ioutil" "io/ioutil"
"net/http" "net/http"
"regexp" "regexp"
@@ -13,7 +14,7 @@ import (
// Investigating template // Investigating template
var defaultHTTPInvestigatingTpl = MessageTemplate{ var defaultHTTPInvestigatingTpl = MessageTemplate{
Subject: `{{ .Monitor.Name }} - {{ .SystemName }}`, Subject: `{{ .Monitor.Name }} - {{ .SystemName }}`,
Message: `{{ .Monitor.Name }} check **failed** (server time: {{ .now }}) Message: `{{ .Monitor.Name }} HTTP check **failed** (server time: {{ .now }})
{{ .FailReason }}`, {{ .FailReason }}`,
} }
@@ -41,14 +42,16 @@ type HTTPMonitor struct {
} }
// TODO: test // TODO: test
func (monitor *HTTPMonitor) test() bool { func (monitor *HTTPMonitor) test() (bool, []error) {
req, err := http.NewRequest(monitor.Method, monitor.Target, nil) req, err := http.NewRequest(monitor.Method, monitor.Target, nil)
for k, v := range monitor.Headers { for k, v := range monitor.Headers {
req.Header.Add(k, v) req.Header.Add(k, v)
} }
transport := http.DefaultTransport.(*http.Transport) transport := http.DefaultTransport.(*http.Transport)
transport.TLSClientConfig = &tls.Config{InsecureSkipVerify: monitor.Strict == false} transport.TLSClientConfig = &tls.Config{
InsecureSkipVerify: monitor.Strict == false,
}
client := &http.Client{ client := &http.Client{
Timeout: time.Duration(monitor.Timeout * time.Second), Timeout: time.Duration(monitor.Timeout * time.Second),
Transport: transport, Transport: transport,
@@ -56,40 +59,38 @@ func (monitor *HTTPMonitor) test() bool {
resp, err := client.Do(req) resp, err := client.Do(req)
if err != nil { if err != nil {
monitor.lastFailReason = err.Error() return false, []error{err}
return false
} }
defer resp.Body.Close() defer resp.Body.Close()
if monitor.ExpectedStatusCode > 0 && resp.StatusCode != monitor.ExpectedStatusCode { if monitor.ExpectedStatusCode > 0 && resp.StatusCode != monitor.ExpectedStatusCode {
monitor.lastFailReason = "Expected HTTP response status: " + strconv.Itoa(monitor.ExpectedStatusCode) + ", got: " + strconv.Itoa(resp.StatusCode) fail := "Expected HTTP response status: " + strconv.Itoa(monitor.ExpectedStatusCode) + ", got: " + strconv.Itoa(resp.StatusCode)
return false return false, []error{errors.New(fail)}
} }
if monitor.bodyRegexp != nil { if monitor.bodyRegexp != nil {
// check response body // check response body
responseBody, err := ioutil.ReadAll(resp.Body) responseBody, err := ioutil.ReadAll(resp.Body)
if err != nil { if err != nil {
monitor.lastFailReason = err.Error() return false, []error{err}
return false
} }
if !monitor.bodyRegexp.Match(responseBody) { if !monitor.bodyRegexp.Match(responseBody) {
monitor.lastFailReason = "Unexpected body: " + string(responseBody) + ".\nExpected to match: " + monitor.ExpectedBody fail := "Unexpected body: " + string(responseBody) + ".\nExpected to match: " + monitor.ExpectedBody
return false return false, []error{errors.New(fail)}
} }
} }
return true return true, nil
} }
// TODO: test // TODO: test
func (mon *HTTPMonitor) Validate() []string { func (mon *HTTPMonitor) Validate(validate backendValidateFunc) []string {
mon.Template.Investigating.SetDefault(defaultHTTPInvestigatingTpl) mon.Template.Investigating.SetDefault(defaultHTTPInvestigatingTpl)
mon.Template.Fixed.SetDefault(defaultHTTPFixedTpl) mon.Template.Fixed.SetDefault(defaultHTTPFixedTpl)
errs := mon.AbstractMonitor.Validate() errs := mon.AbstractMonitor.Validate(validate)
if len(mon.ExpectedBody) > 0 { if len(mon.ExpectedBody) > 0 {
exp, err := regexp.Compile(mon.ExpectedBody) exp, err := regexp.Compile(mon.ExpectedBody)

257
monitors/monitor.go Normal file
View File

@@ -0,0 +1,257 @@
package monitors
import (
"sync"
"time"
"github.com/sirupsen/logrus"
)
const DefaultInterval = time.Second * 60
const DefaultTimeout = time.Second
const HistorySize = 10
type MonitorStatus string
const (
MonitorStatusUp = "up"
MonitorStatusDown = "down"
MonitorStatusNotSaturated = "unsaturated"
)
type backendValidateFunc = func(monitor *AbstractMonitor) []string
type MonitorTestFunc func() (up bool, errs []error)
type MonitorTickFunc func(monitor MonitorInterface, status MonitorStatus, errs []error, lag int64)
type MonitorInterface interface {
Start(MonitorTestFunc, *sync.WaitGroup, MonitorTickFunc, bool)
Stop()
tick(MonitorTestFunc) (status MonitorStatus, errors []error, lag int64)
test() (bool, []error)
Validate(validate backendValidateFunc) []string
Describe() []string
GetMonitor() *AbstractMonitor
GetTestFunc() MonitorTestFunc
GetLastStatus() MonitorStatus
UpdateLastStatus(status MonitorStatus) (old MonitorStatus)
}
// AbstractMonitor data model
type AbstractMonitor struct {
Name string
Target string
// (default)http / dns
Type string
Strict bool
Interval time.Duration
Timeout time.Duration
Params map[string]interface{}
// Templating stuff
Template MonitorTemplates
// Threshold = percentage / number of down incidents
Threshold float32
ThresholdCount bool `mapstructure:"threshold_count"`
// lag / average(lagHistory) * 100 = percentage above average lag
// PerformanceThreshold sets the % limit above which this monitor will trigger degraded-performance
// PerformanceThreshold float32
history []bool
lastStatus MonitorStatus
// Closed when mon.Stop() is called
stopC chan bool
}
func (mon *AbstractMonitor) Validate(validate backendValidateFunc) []string {
errs := []string{}
if len(mon.Name) == 0 {
errs = append(errs, "Name is required")
}
if mon.Interval < 1 {
mon.Interval = DefaultInterval
}
if mon.Timeout < 1 {
mon.Timeout = DefaultTimeout
}
if mon.Timeout > mon.Interval {
errs = append(errs, "Timeout greater than interval")
}
// get the backend to validate the monitor
errs = append(errs, validate(mon)...)
if mon.Threshold <= 0 {
mon.Threshold = 100
}
// if len(mon.Template.Fixed.Message) == 0 || len(mon.Template.Fixed.Subject) == 0 {
// errs = append(errs, "\"fixed\" template empty/missing")
// }
// if len(mon.Template.Investigating.Message) == 0 || len(mon.Template.Investigating.Subject) == 0 {
// errs = append(errs, "\"investigating\" template empty/missing")
// }
if err := mon.Template.Fixed.Compile(); err != nil {
errs = append(errs, "Could not compile \"fixed\" template: "+err.Error())
}
if err := mon.Template.Investigating.Compile(); err != nil {
errs = append(errs, "Could not compile \"investigating\" template: "+err.Error())
}
return errs
}
func (mon *AbstractMonitor) GetMonitor() *AbstractMonitor {
return mon
}
func (mon *AbstractMonitor) Describe() []string {
features := []string{"Type: " + mon.Type}
if len(mon.Name) > 0 {
features = append(features, "Name: "+mon.Name)
}
return features
}
func (mon *AbstractMonitor) Start(testFunc MonitorTestFunc, wg *sync.WaitGroup, tickFunc MonitorTickFunc, immediate bool) {
wg.Add(1)
mon.stopC = make(chan bool)
if immediate {
status, errs, lag := mon.tick(testFunc)
tickFunc(mon, status, errs, lag)
}
ticker := time.NewTicker(mon.Interval * time.Second)
for {
select {
case <-ticker.C:
status, errs, lag := mon.tick(testFunc)
tickFunc(mon, status, errs, lag)
case <-mon.stopC:
wg.Done()
return
}
}
}
func (mon *AbstractMonitor) Stop() {
select {
case <-mon.stopC:
return
default:
close(mon.stopC)
}
}
func (mon *AbstractMonitor) tick(testFunc MonitorTestFunc) (status MonitorStatus, errors []error, lag int64) {
reqStart := getMs()
up, errs := testFunc()
lag = getMs() - reqStart
histSize := HistorySize
if mon.ThresholdCount {
histSize = int(mon.Threshold)
}
if len(mon.history) == histSize-1 {
logrus.WithFields(logrus.Fields{
"monitor": mon.Name,
}).Warn("monitor saturated")
}
if len(mon.history) >= histSize {
mon.history = mon.history[len(mon.history)-(histSize-1):]
}
mon.history = append(mon.history, up)
status = mon.GetStatus()
errors = errs
return
}
// TODO: test
// AnalyseData decides if the monitor is statistically up or down and creates / resolves an incident
func (mon *AbstractMonitor) GetStatus() MonitorStatus {
numDown := 0
for _, wasUp := range mon.history {
if wasUp == false {
numDown++
}
}
t := (float32(numDown) / float32(len(mon.history))) * 100
logFields := logrus.Fields{"monitor": mon.Name}
// stop reporting time for jsonformatter, it's there by default
if _, ok := logrus.StandardLogger().Formatter.(*logrus.JSONFormatter); !ok {
logFields["t"] = time.Now()
}
l := logrus.WithFields(logFields)
symbol := "⚠️"
if t == 100 {
symbol = "❌"
}
if numDown == 0 {
l.Printf("👍 up")
} else if mon.ThresholdCount {
l.Printf("%v down (%d/%d)", symbol, numDown, int(mon.Threshold))
} else {
l.Printf("%v down %.0f%%/%.0f%%", symbol, t, mon.Threshold)
}
histSize := HistorySize
if mon.ThresholdCount {
histSize = int(mon.Threshold)
}
if len(mon.history) != histSize {
// not saturated
return MonitorStatusNotSaturated
}
var down bool
if mon.ThresholdCount {
down = numDown >= int(mon.Threshold)
} else {
down = t >= mon.Threshold
}
if !down {
return MonitorStatusUp
}
return MonitorStatusDown
}
func (mon *AbstractMonitor) GetTestFunc() MonitorTestFunc {
return mon.test
}
func (mon *AbstractMonitor) GetLastStatus() MonitorStatus {
return mon.lastStatus
}
func (mon *AbstractMonitor) UpdateLastStatus(status MonitorStatus) (old MonitorStatus) {
old = mon.lastStatus
mon.lastStatus = status
return
}
func (mon *AbstractMonitor) test() (bool, []error) { return false, nil }
func getMs() int64 {
return time.Now().UnixNano() / int64(time.Millisecond)
}

View File

@@ -1,10 +1,15 @@
package cachet package monitors
import ( import (
"bytes" "bytes"
"text/template" "text/template"
) )
type MonitorTemplates struct {
Investigating MessageTemplate
Fixed MessageTemplate
}
type MessageTemplate struct { type MessageTemplate struct {
Subject string `json:"subject"` Subject string `json:"subject"`
Message string `json:"message"` Message string `json:"message"`

232
readme.md
View File

@@ -1,123 +1,173 @@
![screenshot](https://castawaylabs.github.io/cachet-monitor/screenshot.png) ![screenshot](https://castawaylabs.github.io/cachet-monitor/screenshot.png)
Features ## Features
--------
- [x] Creates & Resolves Incidents - [x] Creates & Resolves Incidents
- [x] Check URLs by response code and/or body contents
- [x] Posts monitor lag to cachet graphs - [x] Posts monitor lag to cachet graphs
- [x] HTTP Checks (body/status code)
- [x] DNS Checks
- [x] Updates Component to Partial Outage - [x] Updates Component to Partial Outage
- [x] Updates Component to Major Outage if already in Partial Outage (works well with distributed monitoring) - [x] Updates Component to Major Outage if already in Partial Outage (works with distributed monitors)
- [x] Can be run on multiple servers and geo regions - [x] Can be run on multiple servers and geo regions
Configuration ## Example Configuration
-------------
``` **Note:** configuration can be in json or yaml format. [`example.config.json`](https://github.com/CastawayLabs/cachet-monitor/blob/master/example.config.json), [`example.config.yaml`](https://github.com/CastawayLabs/cachet-monitor/blob/master/example.config.yml) files.
{
// URL for the API. Note: Must end with /api/v1 ```yaml
"api_url": "https://<cachet domain>/api/v1", api:
// Your API token for Cachet # cachet url
"api_token": "<cachet api token>", url: https://demo.cachethq.io/api/v1
// optional, false default, set if your certificate is self-signed/untrusted # cachet api token
"insecure_api": false, token: 9yMHsdioQosnyVK4iCVR
"monitors": [{ insecure: false
// required, friendly name for your monitor # https://golang.org/src/time/format.go#L57
"name": "Name of your monitor", date_format: 02/01/2006 15:04:05 MST
// required, url to probe monitors:
"url": "Ping URL", # http monitor example
// optional, http method (defaults GET) - name: google
"method": "get", # test url
// optional, http Headers to add (default none) target: https://google.com
"headers": [ # strict certificate checking for https
// specify Name and Value of Http-Header, eg. Authorization strict: true
{ "name": "Authorization", "value": "Basic <hash>" } # HTTP method
], method: POST
// self-signed ssl certificate
"strict_tls": true, # set to update component (either component_id or metric_id are required)
// seconds between checks component_id: 1
"interval": 10, # set to post lag to cachet metric (graph)
// seconds for http timeout metric_id: 4
"timeout": 5,
// post lag to cachet metric (graph) # custom templates (see readme for details)
// note either metric ID or component ID are required # leave empty for defaults
"metric_id": <metric id>, template:
// post incidents to this component investigating:
"component_id": <component id>, subject: "{{ .Monitor.Name }} - {{ .SystemName }}"
// If % of downtime is over this threshold, open an incident message: "{{ .Monitor.Name }} check **failed** (server time: {{ .now }})\n\n{{ .FailReason }}"
"threshold": 80, fixed:
// optional, expected status code (either status code or body must be supplied) subject: "I HAVE BEEN FIXED"
"expected_status_code": 200,
// optional, regular expression to match body content # seconds between checks
"expected_body": "P.*NG" interval: 1
}], # seconds for timeout
// optional, system name to identify bot (uses hostname by default) timeout: 1
"system_name": "", # If % of downtime is over this threshold, open an incident
// optional, defaults to stdout threshold: 80
"log_path": ""
} # custom HTTP headers
headers:
Authorization: Basic <hash>
# expected status code (either status code or body must be supplied)
expected_status_code: 200
# regex to match body
expected_body: "P.*NG"
# dns monitor example
- name: dns
# fqdn
target: matej.me.
# question type (A/AAAA/CNAME/...)
question: mx
type: dns
# set component_id/metric_id
component_id: 2
# poll every 1s
interval: 1
timeout: 1
# custom DNS server (defaults to system)
dns: 8.8.4.4:53
answers:
# exact/regex check
- regex: [1-9] alt[1-9].aspmx.l.google.com.
- exact: 10 aspmx2.googlemail.com.
- exact: 1 aspmx.l.google.com.
- exact: 10 aspmx3.googlemail.com.
``` ```
Installation ## Installation
------------
1. Download binary from [release page](https://github.com/CastawayLabs/cachet-monitor/releases) 1. Download binary from [release page](https://github.com/CastawayLabs/cachet-monitor/releases)
2. Create your configuration ([example](https://raw.githubusercontent.com/CastawayLabs/cachet-monitor/master/example.config.json)) 2. Add the binary to an executable path (/usr/bin, etc.)
3. `cachet-monitor -c /etc/cachet-monitor.config.json` 3. Create a configuration following provided examples
4. `cachet-monitor -c /etc/cachet-monitor.yaml`
pro tip: run in background using `nohup cachet-monitor 2>&1 > /var/log/cachet-monitor.log &` pro tip: run in background using `nohup cachet-monitor 2>&1 > /var/log/cachet-monitor.log &`, or use a tmux/screen session
``` ```
Usage of cachet-monitor: Usage:
-c="/etc/cachet-monitor.config.json": Config path cachet-monitor (-c PATH | --config PATH) [--log=LOGPATH] [--name=NAME] [--immediate]
-log="": Log path cachet-monitor -h | --help | --version
-name="": System Name
Arguments:
PATH path to config.json
LOGPATH path to log output (defaults to STDOUT)
NAME name of this logger
Examples:
cachet-monitor -c /root/cachet-monitor.json
cachet-monitor -c /root/cachet-monitor.json --log=/var/log/cachet-monitor.log --name="development machine"
Options:
-c PATH.json --config PATH Path to configuration file
-h --help Show this screen.
--version Show version
--immediate Tick immediately (by default waits for first defined interval)
Environment varaibles:
CACHET_API override API url from configuration
CACHET_TOKEN override API token from configuration
CACHET_DEV set to enable dev logging
``` ```
Environment variables ## Init script
---------------------
| Name | Example Value | Description | If your system is running systemd (like Debian, Ubuntu 16.04, Fedora, RHEL7, or Archlinux) you can use the provided example file: [example.cachet-monitor.service](https://github.com/CastawayLabs/cachet-monitor/blob/master/example.cachet-monitor.service).
| ------------ | ------------------------------ | --------------------------- |
| CACHET_API | http://demo.cachethq.io/api/v1 | URL endpoint for cachet api |
| CACHET_TOKEN | APIToken123 | API Authentication token |
| CACHET_DEV | 1 | Strips logging |
Vision and goals 1. Simply put it in the right place with `cp example.cachet-monitor.service /etc/systemd/system/cachet-monitor.service`
---------------- 2. Then do a `systemctl daemon-reload` in your terminal to update Systemd configuration
3. Finally you can start cachet-monitor on every startup with `systemctl enable cachet-monitor.service`! 👍
## Templates
This package makes use of [`text/template`](https://godoc.org/text/template). [Default HTTP template](https://github.com/CastawayLabs/cachet-monitor/blob/master/http.go#L14)
The following variables are available:
| Root objects | Description |
| ------------- | ------------------------------------|
| `.SystemName` | system name |
| `.API` | `api` object from configuration |
| `.Monitor` | `monitor` object from configuration |
| `.now` | formatted date string |
| Monitor variables |
| ------------------ |
| `.Name` |
| `.Target` |
| `.Type` |
| `.Strict` |
| `.MetricID` |
| ... |
All monitor variables are available from `monitor.go`
## Vision and goals
We made this tool because we felt the need to have our own monitoring software (leveraging on Cachet). We made this tool because we felt the need to have our own monitoring software (leveraging on Cachet).
The idea is a stateless program which collects data and pushes it to a central cachet instance. The idea is a stateless program which collects data and pushes it to a central cachet instance.
This gives us power to have an army of geographically distributed loggers and reveal issues in both latency & downtime on client websites. This gives us power to have an army of geographically distributed loggers and reveal issues in both latency & downtime on client websites.
Package usage ## Package usage
-------------
When using `cachet-monitor` as a package in another program, you should follow what `cli/main.go` does. It is important to call `ValidateConfiguration` on `CachetMonitor` and all the monitors inside. When using `cachet-monitor` as a package in another program, you should follow what `cli/main.go` does. It is important to call `Validate` on `CachetMonitor` and all the monitors inside.
[API Documentation](https://godoc.org/github.com/CastawayLabs/cachet-monitor) [API Documentation](https://godoc.org/github.com/CastawayLabs/cachet-monitor)
## License # Contributions welcome
MIT License We'll happily accept contributions for the following (non exhaustive list).
Copyright (c) 2016 Castaway Labs LLC - Implement ICMP check
- Implement TCP check
Permission is hereby granted, free of charge, to any person obtaining a copy - Any bug fixes / code improvements
of this software and associated documentation files (the "Software"), to deal - Test cases
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

15
tcp.go
View File

@@ -1,15 +0,0 @@
package cachet
type TCPMonitor struct {
AbstractMonitor `mapstructure:",squash"`
// same as output from net.JoinHostPort
// defaults to parsed config from /etc/resolv.conf when empty
DNSServer string
// Will be converted to FQDN
Domain string
Type string
// expected answers (regex)
Expect []string
}