Clean Blog - Start Bootstrap Theme

**Prometheus** 是一個由 **CNCF（Cloud Native Computing Foundation）** 管理的 **開源監控系統與時序資料庫**，被廣泛用於監控 **Kubernetes、應用程式、系統資源** 等。它的設計理念是「**Pull-based**」拉取資料，配合指標導向監控（metrics-based monitoring）。 --- ## 🧠 Prometheus 是什麼？簡單理解 Prometheus 是一個： * **時間序列資料庫**：會定時拉取（scrape）各服務的監控指標，並儲存時間序列資料 * **查詢語言 PromQL**：可以查詢、聚合指標（類似 SQL） * **告警系統**：支援條件告警（Alert Rules）和通知（透過 Alertmanager） * **可視化整合**：可以配合 Grafana 做漂亮的 dashboard --- ## 🔧 架構圖（簡化版） ```text ┌────────────┐ │ Your App │◄────┐ │ /metrics │ │ └────────────┘ │ ┌────────────┐ │ │ Node Exporter│ │ <-- 系統指標 └────────────┘ │ ▼ ┌─────────────┐ │ Prometheus │ <-- 拉取上面所有 endpoint 的資料 └─────────────┘ │ ┌─────────────┴────────────┐ ▼ ▼ Grafana Dashboard Alertmanager（通知：Email、Slack、Line、Opsgenie…） ``` --- ## 📊 Prometheus 怎麼用來「監控」 ### 1. **部署 Prometheus** * 在 K8s 上用 `Prometheus Operator` 安裝 * 或者直接單機啟動（支援 .yml 設定檔） ### 2. **設定 `scrape_targets`** 在 `prometheus.yml` 裡設定要拉的服務： ```yaml scrape_configs: - job_name: 'my-app' static_configs: - targets: ['localhost:8080'] ``` 🔁 Prometheus 會每 15 秒去拉 `/metrics` endpoint 抓資料。 --- ### 3. **應用程式暴露指標** 應用需暴露 `/metrics` endpoint（常用 `prometheus-client` library）： * Python: `prometheus_client` * Node.js: `prom-client` * Go: 原生支援 `promhttp` * Java: `micrometer + spring-boot-actuator` --- ### 4. **使用 PromQL 查詢指標** 例如： ```promql http_requests_total{job="my-app"} rate(cpu_usage_seconds_total[1m]) ``` --- ### 5. **設告警** 可以撰寫 PromQL 規則觸發告警，例如： ```yaml - alert: HighCPU expr: rate(container_cpu_usage_seconds_total[1m]) > 0.9 for: 2m labels: severity: critical annotations: description: "CPU 使用率持續超過 90%" ``` --- ### 6. **通知與可視化** * 搭配 **Alertmanager**：發 Slack、Email、Line、PagerDuty… * 搭配 **Grafana**：畫圖表 Dashboard --- ## ✅ 優點 * 易於部署與擴展 * 查詢語言強大（PromQL） * 支援 Kubernetes、微服務架構 * 搭配 Exporter 可監控各種系統（MySQL、Redis、Nginx、Node、Pod 等） --- ## 🚫 缺點 * 不適合長期資料保存（預設存 15 天） * 不支援分布式查詢（需配 Thanos/Cortex） * 若 target 太多時需調整效能參數