Infrastructure Monitoring¶
Overview¶
This directory contains the monitoring configuration for the infrastructure. The monitoring system tracks the health and performance of the infrastructure components, including AWS resources, Kubernetes clusters, and networking.
Components¶
- Prometheus: Collects and stores metrics
- Grafana: Visualizes metrics and provides dashboards
- Alertmanager: Manages alerts and notifications
- CloudWatch Exporter: Provides AWS-specific metrics
- Node Exporter: Provides host-level metrics
- Blackbox Exporter: Provides network probing metrics
Metrics¶
The following metrics are collected:
- AWS resource usage
- Kubernetes cluster health
- Node resource usage
- Network connectivity
- Service availability
- Cost metrics
- Security metrics
Dashboards¶
The following dashboards are available:
- Infrastructure Overview: General infrastructure metrics
- AWS Resources: Detailed AWS resource metrics
- Kubernetes: Kubernetes cluster metrics
- Network: Network connectivity metrics
- Cost: Cost analysis metrics
- Security: Security posture metrics
Setup¶
The monitoring system is automatically set up during the infrastructure provisioning. You can also set it up manually:
terraform apply -target=module.monitoring
Accessing Dashboards¶
Once the monitoring system is running, you can access the dashboards at:
- Grafana:
https://grafana.<your-domain> - Prometheus:
https://prometheus.<your-domain>
Configuration¶
The monitoring configuration can be customized by editing the following files:
terraform/modules/monitoring/*.tf: Terraform configurationterraform/modules/monitoring/templates/*.yaml: Kubernetes resourcesterraform/modules/monitoring/dashboards/*.json: Grafana dashboards
Integration with Logging¶
The monitoring system is integrated with the logging system. Metrics about log volume and errors are available in the monitoring dashboards.