Everything you should know about Prometheus Grafana dashboard

Are you worried about the performance of your systems and dozens of applications running on them? Stop worrying as you are at the right place to learn about one of the most latest and widely used tools Prometheus and Grafana.

Prometheus is an open-source monitoring system that collects real-time metrics and is used as the data source in Grafana for visualization.

In this tutorial, you will learn everything you should know about the Prometheus Grafana dashboard.

Let’s get started.

Prerequisites

Prometheus installed on your Ubuntu machine

What is Prometheus?

Prometheus is a powerful, open-source monitoring system that collects real-time time-series metrics from services and stores them in memory and local disk in its own custom and efficient format [time-series database]. It is also used for alerting.

For example:

Time-series database contains a set of key-value pairs called labels. It is written in Go Language and executes powerful queries using Flexible query language (PromQL which is read-only). Prometheus provides great visualization using its own built-in expression browser and works well with Grafana dashboards and alert notifications

There are dozens of client libraries such as Java, Python, Scala, Ruby, and multiple integrations available for Prometheus.

In Software or programming language binaries are compiled code that allow a program to be installed without having to compile the source code.

In Software or programming language library is a collection of non-volatile resources used by computer programs, often for software development such as configuration data, documentation, help data, message templates, pre-written code and subroutines, classes, values or type specifications.

  • Syntax of Prometheus metrics is shown below. The below notation of syntax contains metrics name followed by key-value pairs also known as labels.
# Notation of time series
<metric name> {<label name>=<label value>,.....} 
# Example
node_boot_time {instance="localhost:9000",job="node_exporter"}

How does Prometheus Work

Prometheus collects metrics from monitored targets by scraping metrics HTTP Endpoint using the Prometheus configuration file. A single Prometheus server is able to ingest up to one million samples per second.

Using Exporters and Prometheus in Prometheus Configuration file

  • The Prometheus configuration file is stored in YAML format and by default it looks like. The path of Prometheus configuration file is /etc/prometheus/prometheus.yml
# my global config
global:
  scrape_interval: 15s
  evaluation_interval: 15s

# To scrape metrics from prometheus itself add the below block of code.

scrape_configs:
  - job_name: 'prometheus'              # Prometheus will scrape 
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter_metrics'  # we are including Node Expoter
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9100']
  • After you configure or update the prometheus configuration file, you can reload it using the below command without having to restart the prometheus.
kill -SIGHUP <pid>
  • To check the Prometheus configuration file on the UI which was configured on the server.

Using EC2 as Service Discovery mechanism in Prometheus Configuration file

Prometheus server continuously pulls metrics from jobs or apps but at times it is not able to pull the metrics due to servers not reachable due to NAT or firewall and in that case, it uses Pushgateway.

  • Pushgateway is used as an intermediary service. You can also query the Prometheus server using the PromQL and can visualize the data in its own Web UI and the Grafana Dashboard.
  • Service discovery automatically detects devices and services offered on the computer network. Lets say if you need to self discover the AWS EC2 instance then add the following configuration in the Prometheus.
# my global config
global:
  scrape_interval: 15s
  evaluation_interval: 15s

# To scrape metrics from prometheus itself add the below block of code.

scrape_configs:
  - job_name: 'AWS EC2'              # Prometheus will scrape AWS EC2
    ec2_sd_configs:
      - region: us-east-1
         access_key: ACCESS_KEY
         secret_key: SECRET_KEY
         port: 9100

Using Kubernetes as Service Discovery mechanism in Prometheus Configuration file

  • Now if you need to add the Kubernetes configuration then add code in below format.
- job_name: 'kubernetes service endpoints'        
  kubernetes_sd_configs:
      -
        api_servers:
           - https://kube-master.prometheus.com
        in_cluster: true

Using Configuration file Service Discovery mechanism in Prometheus Configuration file

[
  {
     "targets": ["myslave:9104"]
     "labels": {
        "env": "prod"
         "job": "mysql_slave"
     }
  }
]
  • To add the file named target.json for service discovery add the below configuration
scrape_configs:
  - job_name: 'dummy'              # Prometheus will scrape file
    file_sd_configs:
      - files:
         - targets.json

Prometheus and Python Client Library

import random, time

from flask import Flask, render_template_string, abort
from prometheus_client import generate_latest, REGISTRY , Counter, Gauge, Histogram

app = Flask(__name__)

REQUESTS = Counter('http_requests_total', 'Total HTTP Requests(Count)', ['method', 'endpoint', 'status_code'])

IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')

TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')

@app.route('/')

@TIMINGS.time()
@IN_PROGRESS.track_inprogress()

def hello_world():
    REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc()
    return 'Hellos World'

if __name__ == "__main__":
    app.run(host='127.0.0.9',port=4455,debug=True)

Prometheus Alerting

Prometheus Alerting is divided into two categories Alerting Rules and AlertManager

Alerting rules: allows you to define the alert condition and send the alerts to an external services

  • Rules live in Prometheus server configuration and you can confgure the rules in /etc/prometheus/pometheus.yml
rule_files:
- "/etc/pometheus/alert.rules"
  • Create a file named alert.rules as shown below.
groups:
- name: Importanta Instance
# Alert for any instance that is unreachable for greater than 5 minutes
   rules:
   - alerts: Instancedown
      exp: up == 0
      for: 5m
      labels: 
         severity: critical
      annotations: 
         summary: Machine not available

Alertmanager: It handles all the alerts fired by the Prometheus servers such as grouping, rerouting, and deduplication of alerts. It routes alerts to Pagerduty, Opsgenie, email, slack.

  • The configuration of alertmanager is stored at /etc/alertmanager/alertmanager.yml
global: 

   smtp_smarthost: 'localhost:25'
   smtp_from: 'support@automateinfra.com'
   smtp_username: 'shanky'
   smtp_password: 'password123'
templates:
-  '/etc/alertmanager/template/*.tmpl'

route:
    repeat_interval: 1h
    receiver: operations-team

receivers:
-  name: 'operations-team'
    email_configs:
    - to: 'shanky@automateinfra.com'
    salck_configs:
    - api_url: https://hook.slack.com/servcies/xxxxx/xxxxxxxx/xxxxx
       channel: ''

Alert States:

  • Inactive
  • Pending
  • Firing