Prometheus 学习

一、Prometheus 简介

Prometheus[https://prometheus.io]是一套开源的系统监控报警框架。它始于2012年由 SoundCloud 创建，并作为社区开源项目进行开发。2016，Prometheus 正式加入 Cloud Native Computing Foundation (简称：CNCF)，成为受欢迎程度仅次于 Kubernetes 的项目。

作为新一代的监控系统，Prometheus 具有以下特点：

强大的多维数据库模型：
1. 时间序列数据通过 metric 名和【键/值】对来区分。
2. 所有的 metrics 都可以设置任意的多维标签。
3. 数据模型更随意，不需要刻意设置成以点分割的字符串。
4. 可以对数据模型进行聚合、切割、切片等操作。
5. 支持双精度浮点型，标签可以设为全 unicode。
灵活而强大的查询语言(PromQL，Query Language for Prometheus)：在同一个查询语句中，可以对多个 metrics 进行乘法、加法、连接、取分数位等操作。
易于管理：Prometheus server 是一个单独的二进制文件，可直接在本地工作，不依赖于分布式存储，即单个 Prometheus server 节点是自治的。
高效：平均每个采样点仅占 3.5 bytes，且一个 Prometheus server 可以处理数百万的 metrics。
使用 Pull(拉模式) 采集时间序列数据，可避免有问题的服务器向其推送坏的 metrics。
支持采用 push gateway 的方式将时间序列数据推送至 Prometheus server 端。
可通过服务发现和静态配置获取监控的 targets。
支持多种可视化图形界面。
易于伸缩。
Federation 机制：允许一个 Prometheus server 获取另一个 Prometheus server 的 metrics。

需要指出的是，由于 Prometheus 采集数据可能会丢失，因此 Prometheus 不适用于对采集数据要求 100% 精确的场景，例如计费系统。但如果用于记录时间序列数据，Prometheus 是具有很大的查询优势。此外，Prometheus 适用于微服务框架。

二、Prometheus 组成与架构

Prometheus 生态圈中包含了多个组件，其中许多组件是可选的：

Prometheus server：核心组件，用于抓取和存储时间序列数据。
Client Libraries：客户端库，为需要监控的服务生成相应的 metrics 并暴露给 Prometheus server。当 Prometheus server 来 pull 时，直接返回实时状态的 metrics。
Push gateway：主要用于短期的 jobs。由于这类 jobs 存活时间较短，可能在 Prometheus server 来 pull 之前就消失了。针对 push 系统设计， Short-lived jobs 可定时将 metrics push 到 Push gateway，再由 Prometheus server 从 Push gateway 上 pull metrics。这种方式主要用于服务层面的 metrics，对于机器层面的 metrics，需要使用 node exporter。
Exporters：用于暴露已有的第三方服务的 metrics 给 Prometheus。
Alertmanager：Prometheus 的报警组件，是与 Prometheus 组件相互分离的。Prometheus server 根据【告警规则】将 alerts 发送给 Alertmanager，Alertmanager 从 Prometheus server 接受到 alerts 后，进行去重、分组、降噪等处理，并将 alerts 通过路由发送到正确的接收器上，例如电子邮件、Slack、PaperDuty、HipChat、OpsGenie、WebHook 等。Alertmanager 还支持分组(Grouping)、抑制(Inhibition)、沉默(Silences)的机制。
一些其他工具的支持。

Prometheus 官网中的架构图：

其大概的工作流程：

1.Prometheus server 定期从配置好的 jobs/exporters 中拉 metrics，或者从 Pushgateway 中拉 metrics，或者从其他的 Prometheus server 中拉 metrics。
2.Prometheus server 在本地存储收集到的 metrics，并运行已定义好的 alert.rules(告警规则)，记录新的时间序列或向 Alertmanager 推送警报。
3.Alertmanager 根据配置文件，对接受到的警报进行处理，发出告警，并将这些告警路由到对应的接收器(电子邮件、PaperDuty 等)。
4.可视化图形界面(例如：Grafana)，将采集数据进行可视化。

实验环境

Prometheus服务器 192.168.53.6

grafana服务器 192.168.53.13

被监控服务器 192.168.53.13

下载 Prometheus

从Prometheus 官网 https://prometheus.io/download/ 下载 Prometheus 安装包 prometheus-2.10.0.linux-amd64.tar.gz，并解压：

1
2
3

tar xf prometheus-*.tar.gz -C /usr/local/

ln -s /usr/local/prometheus-2.22.0 /usr/local/prometheus

配置Prometheus启动文件

Prometheus server 除了可以拉取 jobs/exporters 上的 metric，还可以从其他的 Prometheus server(包括自己) 上拉取 metric。虽然在实践中 Prometheus server 收集自身的数据并没有太大用处，但这是一个很好的实例演示，以便我们更好的了解 Prometheus。

配置内容如下：

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:
    - targets: ['localhost:9090']
    
  - job_name: 'aliyb.web'
  	# 监控192.168.53.6 机器
    static_configs:
    - targets: ['192.168.53.13:9090']

启动软件

1	/usr/local/prometheus/prometheus -- config="/usr/local/prometheus/prometheus.yml" &

通过192.168.53.13:9090 访问页面

Prometheus 的状态信息：http://192.168.53.13:9090/status

http://192.168.53.13/metrics 可以查看到监控的数据

在web主界面可以通过关键字查询监控项

(表达式文档)[https://prometheus.io/docs/prometheus/latest/querying/basics/]

监控远程linux主机

远程linux主机上安装node_exporter组件

下载地址：https://prometheus.io/download

1 2	tar xf node_exporter-0.16.0.linux-amd64/ -C /usr/local/ ln -s /usr/local/node_exporter-0.16.0.linux-amd64/ /usr/local/node_exporter

启动node_exporter收集信息

1	/usr/local/node_exporter/node_exporter &

监控远程MySQL

如何监控其他服务

在node_exporter的基础上，可以根据自己的需要收集其他信息

安装mysqld_exporter组件

第一步：下载组件到Linux服务器

1
2
3

wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.12.1/mysqld_exporter-0.12.1.linux-amd64.tar.gz
tar xf mysqld_exporter-0.12.1.linux-amd64.tar.gz -C /usr/local/
ln -s /usr/local/mysqld_exporter-0.12.1.linux-amd64 /usr/local/mysqld_exporter

创建mysql收集信息账号

1 2	mysql> grant select, replication client, process on . to 'mysql_monitor'@'localhost' identified by 'mysql_pass_random123'; mysql> flush privileges;

创建配置文件，写上连接的用户名与密码

vim /usr/local/mysqld_exporter/.my.cnf
[client]
user=mysql_monitor
password=mysql_pass_random123

启动myslqd_exporter

1	nohup /usr/local/mysqld_exporter/mysqld_exporter --config.my-cnf=/usr/local/mysqld_exporter/.my.cnf &

在Prometheus服务器的配置文件里添加备件库的mysql的配置段

# vim /usr/local/prometheus/prometheus.yml
- job_name: 'agent_mysql'
  static_configs:
  - targets: ['192.168.53.13:9104']

Grafana图形化工具

什么是Grafana

Grafana是一个开源的度量分析和可视化工具，可以通过将采集的数据分析，查询，然后进行可视化的展示并能实现报警。 https://grafana.com

安装Grafana

1 2	wget https://dl.grafana.com/oss/release/grafana-7.3.0-1.x86_64.rpm sudo yum install grafana-7.3.0-1.x86_64.rpm

web界面：https://192.168.53.6:3000/login

默认密码： admin/admin
添加Prometheus的数据源

add data source –> prometheus –>数据源名称，ip，地址，端口
为添加好的数据源做图形展示

Grafana 图形显示MYSQL监控数据

在grafana上修改配置文件，并下载安装mysql监控的dashboard（包含相关json文件，这些json文件，可以看做是开发人员开发的一个监控模板）

参考网站： https://github.com/percona/grafana-dashboards

在grafana配置文件最后加上以下三行
vim /etc/grafana/grafana.ini
[dashboards.json]
enable=true
path=/var/lib/grafana/dashboards

cd /var/lib/grafana
git clone https://github.com/percona/grafana-dashboards.git
cp grafana-dashboards/dashboards /var/lib/grafana/
systemctl restart grafana-server

在Grafana中导入json文件。选择上传MySQL Overview.json，设置完成后，单机import
设置数据源

点import导入后，报Prometheus数据源找不到，因为这些json文件里默认要找的就是Prometheus的数据源，但我们前面建立的数据源却叫Prometheus_data
更改数据源名称

更改原来的Prometheus_data源改为Prometheus即可

Grafana+onealert报警

onealert http://www.onealert.com 睿象云 https://caweb.aiops.com/
创建应用，并获取access_token
Grafana中配置webHook http://api.aiops.com/alert/api/event/grafana/v1/${access_token}

PS: 别人写好的文档https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/quickstart