1 of 20

Node Monitoring

Monitoring is inmensely important so to ensure the liveness and reliabilty of your infrastructure. If your validator is not signing blocks it will eventually get slashed losing you and your delegators some of their SCRT balance. Same for full nodes it is important they are able to serve queries as if they are down performance of dApps and other applications will be limited.\

Monitoring is best done by a dedicated piece of software that provides both analytics and alerts. Some of those options are laid out below so to help you set them up. Consider relying on more than 1 monitoring solutions and leverage external RPCs so to secure your setup even further.

Overview

Prometheus
Grafana
Docker
PagerDuty
Goaccess

Prometheus

Prometheus is a flexible monitoring solution in development since 2012. The software stores all its data in a time series database and offers a multi-dimensional data model and a powerful query language to generate reports of the monitored resources.

This tutorial makes no assumptions about previous knowledge, other than:

You are comfortable with a Linux operating system, specifically Ubuntu 20.04
You are comfortable being able to ssh into your node, as all operations will be done from the command line\

Environment Preperation

Creating Users

You will need to create new users for running Prometheus securely. This can be done by doing:

sudo useradd --no-create-home --shell /usr/sbin/nologin prometheus 
sudo useradd --no-create-home --shell /bin/false node_exporter

Creating Directories

Create the directories for storing the Prometheus binaries and its config files:

sudo mkdir /etc/prometheus 
sudo mkdir /var/lib/prometheus

Set Directory Ownership

Set the ownership of these directories to our prometheus user, to make sure that Prometheus can access to these folders:

sudo chown prometheus:prometheus /etc/prometheus 
sudo chown prometheus:prometheus /var/lib/prometheus

Install Node Exporter

As your Prometheus is only capable of collecting metrics, we want to extend its capabilities by adding Node Exporter, a tool that collects information about the system including CPU, disk, and memory usage and exposes them for scraping.

Download the latest version of Node Exporter:

wget https://github.com/prometheus/node_exporter/releases/download/v1.2.2/node_exporter-1.2.2.linux-amd64.tar.gz

Unpack the downloaded archive. This will create a directory node_exporter-1.2.2.linux-amd64, containing the executable, a readme and license file:

tar xvf node_exporter-1.2.2.linux-amd64.tar.gz

Copy the binary file into the directory /usr/local/bin and set the ownership to the user you have created in step previously:

sudo cp node_exporter-1.2.2.linux-amd64/node_exporter /usr/local/bin
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

Remove the leftover files of Node Exporter, as they are not needed any longer:

rm -rf node_exporter-0.16.0.linux-amd64.tar.gz node_exporter-0.16.0.linux-amd64

To run Node Exporter automatically on each boot, a Systemd service file is required. Create the following file by opening it in Nano:

sudo nano /etc/systemd/system/node_exporter.service

Copy the following information in the service file, save it and exit Nano:

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Collectors are used to gather information about the system. By default a set of collectors is activated. You can see the details about the set in the README-file. If you want to use a specific set of collectors, you can define them in the ExecStart section of the service. Collectors are enabled by providing a --collector.<name> flag. Collectors that are enabled by default can be disabled by providing a --no-collector.<name> flag.

Reload Systemd to use the newly defined service:

sudo systemctl daemon-reload

Run Node Exporter by typing the following command:

sudo systemctl start node_exporter

Verify that the software has been started successfully:

sudo systemctl status node_exporter

You will see an output like this, showing you the status active (running) as well as the main PID of the application:

● node_exporter.service - Node Exporter
Loaded: loaded (/etc/systemd/system/node_exporter.service; disabled; vendor preset: enabled)
Active: active (running) since Mon 2018-06-25 11:47:06 UTC; 4s ago
Main PID: 1719 (node_exporter)
CGroup: /system.slice/node_exporter.service
└─1719 /usr/local/bin/node_exporter

If everything is working, enable Node Exporter to be started on each boot of the server:

sudo systemctl enable node_exporter

Install Prometheus

Download and Unpack Prometheus latest release of Prometheus:

sudo apt-get update && apt-get upgrade
wget https://github.com/prometheus/prometheus/releases/download/v2.30.0/prometheus-2.30.0.linux-amd64.tar.gztar xfz prometheus-_.tar.gzcd prometheus-_

The following two binaries are in the directory:

Prometheus - Prometheus main binary file
Promtool

The following two folders (which contain the web interface, configuration files examples and the license) are in the directory:

Consoles
Console_libraries

Copy the binary files into the /usr/local/bin/directory:

sudo cp ./prometheus /usr/local/bin/
sudo cp ./promtool /usr/local/bin/

Set the ownership of these files to the prometheus user previously created:

sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool

Copy the consoles and console_libraries directories to /etc/prometheus:

sudo cp -r ./consoles /etc/prometheus
sudo cp -r ./console_libraries /etc/prometheus

Set the ownership of the two folders, as well as of all files that they contain, to our prometheus user:

sudo chown -R prometheus:prometheus /etc/prometheus/consoles
sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries

In our home folder, remove the source files that are not needed anymore:

cd .. && rm -rf prometheus-\*

Configuring Prometheus

Prior to using Prometheus, it needs basic configuring. Thus, we need to create a configuration file named prometheus.yml

The configuration file of Prometheus is written in YAML which strictly forbids to use tabs. If your file is incorrectly formatted, Prometheus will not start. Be careful when you edit it.

Open the file prometheus.yml in a text editor:

sudo nano /etc/prometheus/prometheus.yml

Prometheus’ configuration file is divided into three parts: global, rule_files, and scrape_configs.

In the global part we can find the general configuration of Prometheus: scrape_interval defines how often Prometheus scrapes targets, evaluation_interval controls how often the software will evaluate rules. Rules are used to create new time series and for the generation of alerts.

The rule_files block contains information of the location of any rules we want the Prometheus server to load.

The last block of the configuration file is named scape_configs and contains the information which resources Prometheus monitors.

Our file should look like this example:

global:  
  scrape_interval:     15s  
  evaluation_interval: 15s
  
rule_files:
  # - "first.rules"  
  # - "second.rules"
  
scrape_configs:  
  - job_name: 'prometheus'    
    scrape_interval: 5s    
	static_configs:      
	  - targets: ['localhost:9090']

The global scrape_interval is set to 15 seconds which is enough for most use cases.

We do not have any rule_files yet, so the lines are commented out and start with a #.

In the scrape_configs part we have defined our first exporter. It is Prometheus that monitors itself. As we want to have more precise information about the state of our Prometheus server we reduced the scrape_interval to 5 seconds for this job. The parameters static_configsand targets determine where the exporters are running. In our case it is the same server, so we use localhost and the port 9090.

As Prometheus scrapes only exporters that are defined in the scrape_configs part of the configuration file, we have to add Node Exporter to the file, as we did for Prometheus itself.

We add the following part below the configuration for scraping Prometheus:

- job_name: 'node_exporter'  
  scrape_interval: 5s  
  static_configs:    
    - targets: ['localhost:9100']

Overwrite the global scrape interval again and set it to 5 seconds. As we are scarping the data from the same server as Prometheus is running on, we can use localhost with the default port of Node Exporter: 9100.

If you want to scrape data from a remote host, you have to replace localhost with the IP address of the remote server.

Tip: For all information about the configuration of Prometheus, you may check the configuration documentation.

Set the ownership of the file to our Prometheus user:

sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml

Our Prometheus server is ready to run for the first time.

Running Prometheus

Start Prometheus directly from the command line with the following command, which executes the binary file as our Prometheus user:

sudo -u prometheus /usr/local/bin/prometheus --config.file /etc/prometheus/prometheus.yml --storage.tsdb.path /var/lib/prometheus/ --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries

The server starts displaying multiple status messages and the information that the server has started:

level=info ts=2018-04-12T11:56:53.084000977Z caller=main.go:220 msg="Starting Prometheus" version="(version=2.2.1, branch=HEAD, revision=bc6058c81272a8d938c05e75607371284236aadc)"
level=info ts=2018-04-12T11:56:53.084463975Z caller=main.go:221 build_context="(go=go1.10, user=root@149e5b3f0829, date=20180314-14:15:45)"
level=info ts=2018-04-12T11:56:53.084632256Z caller=main.go:222 host_details="(Linux 4.4.127-mainline-rev1 #1 SMP Sun Apr 8 10:38:32 UTC 2018 x86_64 scw-041406 (none))"
level=info ts=2018-04-12T11:56:53.084797692Z caller=main.go:223 fd_limits="(soft=1024, hard=65536)"
level=info ts=2018-04-12T11:56:53.09190775Z caller=web.go:382 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2018-04-12T11:56:53.091908126Z caller=main.go:504 msg="Starting TSDB ..."
level=info ts=2018-04-12T11:56:53.102833743Z caller=main.go:514 msg="TSDB started"
level=info ts=2018-04-12T11:56:53.103343144Z caller=main.go:588 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2018-04-12T11:56:53.104047346Z caller=main.go:491 msg="Server is ready to receive web requests."

Open your browser and type http://IP.OF.YOUR.SERVER:9090 to access the Prometheus interface. If everything is working, we end the task by pressing on CTRL + C on our keyboard.

If you get an error message when you start the server, double-check your configuration file for possible YAML syntax errors. The error message will tell you what to check.

The server is working now, but it cannot yet be launched automatically at boot. To achieve this, we have to create a new systemd configuration file that will tell your OS which services should it launch automatically during the boot process.

sudo nano /etc/systemd/system/prometheus.service

The service file tells systemd to run Prometheus as prometheus and specifies the path of the configuration files.

Copy the following information in the file and save it, then exit the editor:

[Unit]
Description=Prometheus Monitoring
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

To use the new service, reload systemd:

sudo systemctl daemon-reload

We enable the service so that it will be loaded automatically during boot:

sudo systemctl enable prometheus

Start Prometheus:

sudo systemctl start prometheus

Your Prometheus server is ready to be used.

We have now installed Prometheus to monitor your instance. Prometheus provides a basic web server running on http://your.server.ip:9000 that provide access to the data collected by the software.

Grafana

Grafana allows you to easily visualize your monitoring results and other analytics

Install Grafana

Install Grafana on our instance which queries our Prometheus server.

wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana_8.1.4_amd64.deb
sudo apt-get install -y adduser libfontconfig
sudo dpkg -i grafana_8.1.4_amd64.deb

Enable the automatic start of Grafana by systemd:

sudo systemctl daemon-reload && sudo systemctl enable grafana-server && sudo systemctl start grafana-server

Grafana is running now, and we can connect to it at http://your.server.ip:3000. The default user and password is admin / admin.

Now you have to create a Prometheus data source: - Click the Grafana logo to open the sidebar. - Click “Data Sources” in the sidebar. - Choose “Add New”. - Select “Prometheus” as the data source. - Set the Prometheus server URL (in our case: http://localhost:9090/) - Click “Add” to test the connection and to save the new data source.

Grafana Dashboard

Install Grafana on our instance which queries our Prometheus server.

Enable the automatic start of Grafana by systemd:

Grafana is running now, and we can connect to it at http://your.server.ip:3000. The default user and password is admin / admin.

Now you have to create a Prometheus data source:

Click the Grafana logo to open the sidebar.
Click “Data Sources” in the sidebar.
Choose “Add New”.
Select “Prometheus” as the data source
Set the Prometheus server URL (in our case: http://localhost:9090/)
Click “Add” to test the connection and to save the new data source

Installing Cosmos SDK Grafana Dashboard

Finally, we're going to install a basic dashboard for Cosmos SDKs. For further reference in these steps, see: https://github.com/zhangyelong/cosmos-dashboard

Enable Tendermint Metrics

Configure Prometheus Targets

Append a job under the scrape_configs of your prometheus.yml

Reload Prometheus Config

Import Grafana Dashboard

Set chain-id to secret-3
You're done!\

Next Steps

From here, you're going to want to set up alerts for if something happens with your node, which will be a follow-up document.

Docker

Docker and Docker Compose will allow you to run the required monitoring applications with a few commands. These instructions will run the following:

Grafana on port 3000: An open source interactive analytics dashboard.
Prometheus on port 9090: An open source metric collector.
Node Exporter on port 9100: An open source hardware metric exporter.

Install Docker

Preparing Your Environment

You will need to install docker and docker-compose.
The following instructions assume Ubuntu 20.04 on an x86-64 CPU.

Install Docker

$ docker-compose --version

Test the installation:

$ sudo chmod +x /usr/local/bin/docker-compose

Apply executable permissions to the binary:

$ sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

Download the current stable release of Docker Compose:

Install Docker Compose

$ sudo docker run hello-world

Test the installation:

$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io

Install docker:

$ echo \
  "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Setup the docker stable repository:

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

Add Docker’s official GPG key:

 $ sudo apt-get update
 $ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg \
    lsb-release

Update the apt package index and install packages to allow apt to use a repository over HTTPS:

$ sudo apt-get remove docker docker-engine docker.io containerd runc

Configuration

Clone the node_tooling repo and descend into the monitoring folder:

$ git clone https://github.com/Xiphiar/node_tooling.git
$ cd ./node_tooling/monitoring

In the Prometheus folder, modify cosmos.yaml, replace NODE_IP with the IP of your node. (If your node is on the docker host machine, use 172.17.0.1)

$ nano ./prometheus/cosmos.yaml

Replace the default Prometheus config with the modified cosmos.yaml

$ mv ./prometheus/prometheus.yml ./prometheus/prometheus.yml.orig
$ cp ./prometheus/cosmos.yaml ./prometheus/prometheus.yml

Start Containers

Start the containers deploying the monitoring stack (Grafana + Prometheus + Node Exporter):

$ docker-compose --profile monitor up -d

The containers will restart automatically after rebooting unless they are stopped manually.

Grafana Dashboard

The dashboard for Cosmos SDK nodes is pre-installed, to use it:

Enable Tendermint metrics in your secret-node

sed -i 's/prometheus = false/prometheus = true/g' <YOUR-NODE-HOMEDIR>/config/config.toml

After restarting your node, you should be able to access the Tendermint metrics (default port is 26660): http://localhost:26660

If you did not replace NODE_IP with the IP of your node in the Prometheus config, do so now. If your node is on the docker host machine, use 172.17.0.1

$ nano ./prometheus/prometheus.yml
$ docker-compose down
$ docker-compose --profile monitor up -d

Login to Grafana and open the Cosmos Dashboard from the Manage Dashboards page.
Set the chain-id to secret-3

Application Ports

The docker images expose the following ports:

3000 Grafana. Your main dashboard. Default login is admin\admin.
9090 Prometheus. Access to this port should be restricted.
9100 Node Exporter. Access to this port should be restricted.
Your secret node metrics on port 26660 should also be restricted.

If you followed the basic security guide, these ports are already restricted. You will need to allow the grafana port:

sudo ufw allow 3000

You can also allow access from a specific IP if desired:

sudo ufw allow from 123.123.123.123 to any port 3000

Stop Containers

From the node_tooling/monitoring directory:

$ docker-compose down

Goaccess

Goaccess is a powerful tool when it comes to providing usage statistics for your endpoints.

This tutorial will guide you through configuring Nginx for logging, anonymizing logs, monitoring web traffic with GoAccess, and setting up log rotation for Nginx logs.

This guide is intended for intermediate users who are familiar with Linux, Nginx, and using the command-line interface.

Install Goaccess

Before configuring Nginx, install GoAccess, a real-time web log analyzer.

Update your package lists:

Install GoAccess:

Setup Goaccess

Editing Nginx Configuration

Configure Nginx to format logs and set up a server block.

Open the Nginx configuration file:

Add the following log format into your http group in nginx:

Warning: This logs the users IP address directly. It's not recommended to do it in this fashion, if possible anonymize the address as seen below.

(optional) Instead anonymize IP addresses in logs:

Configure a server block:

Testing and Reloading Nginx Configuration

Test the new configuration:

Reload Nginx to apply changes:

Configuring Log Rotation

Log rotation in Nginx is a process for managing log files to prevent them from becoming excessively large and consuming too much disk space. As Nginx continuously logs web requests, these files can grow rapidly. Without rotation, they can lead to performance issues and make log analysis more difficult. The default setting is for log rotation is daily, which means that the logs that goaccess can use for its reporting are also only daily. To increase that timeframe, do the following:

Edit log rotation configuration:

Add the configuration, please change the monthly to daily or weekly if you need daily or weekly rotation of the logs.

Apply the new rotation configuration:

Setting Up GoAccess for Web Traffic Monitoring

Generate a HTML report:

If you wish to automate this, use crontab to generate recurring reports:

Open crontab for editing (use sudo, otherwise crontab will not access to the log file):

Add the line to automate hourly report generation: