markdown-it demo

clear permalink

**Prometheus** 是一個由 **CNCF（Cloud Native Computing Foundation）** 管理的 **開源監控系統與時序資料庫**，被廣泛用於監控 **Kubernetes、應用程式、系統資源** 等。它的設計理念是「**Pull-based**」拉取資料，配合指標導向監控（metrics-based monitoring）。

---

## 🧠 Prometheus 是什麼？簡單理解

Prometheus 是一個：

* **時間序列資料庫**：會定時拉取（scrape）各服務的監控指標，並儲存時間序列資料
* **查詢語言 PromQL**：可以查詢、聚合指標（類似 SQL）
* **告警系統**：支援條件告警（Alert Rules）和通知（透過 Alertmanager）
* **可視化整合**：可以配合 Grafana 做漂亮的 dashboard

---

## 🔧 架構圖（簡化版）

```text
   ┌────────────┐
   │ Your App   │◄────┐
   │ /metrics   │     │
   └────────────┘     │
   ┌────────────┐     │
   │ Node Exporter│   │  <-- 系統指標
   └────────────┘     │
                      ▼
               ┌─────────────┐
               │ Prometheus  │ <-- 拉取上面所有 endpoint 的資料
               └─────────────┘
                      │
        ┌─────────────┴────────────┐
        ▼                          ▼
 Grafana Dashboard           Alertmanager（通知：Email、Slack、Line、Opsgenie…）
```

---

## 📊 Prometheus 怎麼用來「監控」

### 1. **部署 Prometheus**

* 在 K8s 上用 `Prometheus Operator` 安裝
* 或者直接單機啟動（支援 .yml 設定檔）

### 2. **設定 `scrape_targets`**

在 `prometheus.yml` 裡設定要拉的服務：

```yaml
scrape_configs:
  - job_name: 'my-app'
    static_configs:
      - targets: ['localhost:8080']
```

🔁 Prometheus 會每 15 秒去拉 `/metrics` endpoint 抓資料。

---

### 3. **應用程式暴露指標**

應用需暴露 `/metrics` endpoint（常用 `prometheus-client` library）：

* Python: `prometheus_client`
* Node.js: `prom-client`
* Go: 原生支援 `promhttp`
* Java: `micrometer + spring-boot-actuator`

---

### 4. **使用 PromQL 查詢指標**

例如：

```promql
http_requests_total{job="my-app"}
rate(cpu_usage_seconds_total[1m])
```

---

### 5. **設告警**

可以撰寫 PromQL 規則觸發告警，例如：

```yaml
- alert: HighCPU
  expr: rate(container_cpu_usage_seconds_total[1m]) > 0.9
  for: 2m
  labels:
    severity: critical
  annotations:
    description: "CPU 使用率持續超過 90%"
```

---

### 6. **通知與可視化**

* 搭配 **Alertmanager**：發 Slack、Email、Line、PagerDuty…
* 搭配 **Grafana**：畫圖表 Dashboard

---

## ✅ 優點

* 易於部署與擴展
* 查詢語言強大（PromQL）
* 支援 Kubernetes、微服務架構
* 搭配 Exporter 可監控各種系統（MySQL、Redis、Nginx、Node、Pod 等）

---

## 🚫 缺點

* 不適合長期資料保存（預設存 15 天）
* 不支援分布式查詢（需配 Thanos/Cortex）
* 若 target 太多時需調整效能參數

html source debug