A strong tool for gathering metrics from your applications and keeping an eye on your microservices is Prometheus. For Kubernetes clusters and any other program, Prometheus provides an excellent pod monitoring solution at a fine granularity for each pod.
Grafana is the visualization tool which takes metrics from Prometheus and then plots these same onto the dashboards so that you can monitor how your microservices are performing. You can also use this to create alerts for other dimensions such as uptime or error percentage.
Installation and Configuration
Step 1: Configuring the Prometheus Monitor
To configure a Prometheus monitor on Linux or macOS, follow these steps. Visit the official Prometheus platform website to view the most recent stable release. Then, use the following commands, replacing <version> with the actual release version.
# Download Prometheus
wget https://github.com/prometheus/prometheus/releases/download/<version>/prometheus-<version>.linux-amd64.tar.gz
# Extract the downloaded archive
tar -xvf prometheus-<version>.linux-amd64.tar.gz
# Navigate to the Prometheus directory
cd prometheus-<version>.linux-amd64/
After this configure prometheus.yml
file:
global:
scrape_interval: 15s # How frequently to scrape metrics
scrape_configs:
- job_name: 'microservices' # Name of the monitoring job
static_configs:
- targets: ['localhost:9090'] # Endpoint to monitor (replace with your target's address)
Start Prometheus:
./prometheus --config.file=prometheus.yml
Prometheus should be up and running at http://localhost:9090.
Step 2: Using Prometheus Pod Monitor
Create a PodMonitor resource to monitor the pods:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: microservices-pod-monitor
spec:
selector:
matchLabels:
app: my-microservice
namespaceSelector:
matchNames:
- my-namespace
podMetricsEndpoints:
- port: http
Step 3: Install Grafana for Visualization and Uptime Monitoring
To install Grafana run the following command, make sure to replace <version> with the LTS version from Grafana official page:
# Download Grafana
wget https://dl.grafana.com/oss/release/grafana-<version>.linux-amd64.tar.gz
# Extract the downloaded file
tar -zxvf grafana-<version>.linux-amd64.tar.gz
# Move into the Grafana directory
cd grafana-<version>/
Start Grafana by running the following command:
./bin/grafana-server
Grafana should be up and running at http://localhost:3000. To log in enter “admin” both for username and password.
Step 4: Connecting Grafana to Prometheus
- Navigate to Configuration > Data Sources in Grafana.
- Select Add Data Source > Prometheus.
- Enter the Prometheus (http://localhost:9090).
- Click Save & Test to confirm the connection.
Step 5: Creating Dashboards and Setting Up a Grafana Uptime Monitor
- Create a New Dashboard:
- Go to Create > Dashboard.
- Click Add new panel.
sum(rate(http_requests_total[5m])) by (service)
Set Up a Grafana Uptime Monitor: To create an uptime monitor, you can define a panel that tracks the status of your microservices:
up{job="microservices"}
Step 6: Building Alerts in Grafana
- Open the Panel you wish to set alerts on.
- Go to the Alert > Create Alert Rule.
- Define the alert conditions (e.g., alert if CPU usage exceeds 70% for more than 5 minutes).
Set up Notification Channels to receive alerts through preferred channels such as email or Slack.
Important Metrics to Monitor
For achieving maximum performance and reliability of your microservices, use the following metrics:
- Request Rate: Graph the requests in order to identify uncommon spikes or dips over time.
- Error Rate: Keep a tally of bad requests so you can detect problems before they create a bad experience for your users.
- Latency: Measure a service’s request/response time and identify bottlenecks if latency is high.
- Resource Usage. Ensure that the essential elements – CPU (processor), memory and disks – are not overburdened.
- Uptime: Regularly check service uptime to ensure your services are consistently available.
Get real-time metrics, uptime monitoring, and automated alerts with Prometheus and Grafana. Need help scaling your monitoring stack?
Our engineers are ready to jump in.
Troubleshooting
When working with Prometheus and Grafana, you may encounter various issues, for example related to collecting the data, visual dashboard or alerts. Here’s the list of most frequent issues you may encounter.
1. Prometheus Troubleshooting
Issue: Prometheus Not Starting
Solutions:
- Run
promtool check config /path/to/prometheus.yml
to validate the configuration file. Look for any syntax errors or misconfigurations. - Ensure that the default port (
9090
) used by Prometheus is not being used by another application. You can modify the port in theprometheus.yml
file if necessary. - Make sure Prometheus has the required permissions to access the necessary files and directories.
Issue: Prometheus Not Scraping Targets
Solutions:
- Check the
prometheus.yml
file to confirm that the target endpoints are correctly defined, including the right IP addresses and port numbers.
Check network connectivity. Use tools like curl or ping to test connectivity to your target endpoints from the Prometheus server. Ensure firewalls or network policies are not blocking the connections.
Issue: High Memory Usage
Solutions:
- Adjust data retention. Consider reducing the
--storage.tsdb.retention.time
setting to lower the amount of data Prometheus retains.
Optimize queries. Review your queries for inefficiencies, and use functions like rate()
or increase()
to limit the volume of data processed.
2. Grafana Troubleshooting
Issue: Grafana dashboard not loading
Solutions:
- Clear browser cache or use a different browser to see if the problem persists.
Check installed plugins (Configuration > Plugins
) in Grafana and look for any plugins that are outdated or causing conflicts. Update or disable plugins if needed.
Issue: Data Sources Not Connecting
Solutions:
- Verify data source settings. Go to
Configuration > Data Sources
in Grafana and make sure all the settings, including the URL, authentication credentials, and access method are correct.
Test connectivity. Use the Save & Test
button in the Data Source settings to check if Grafana can connect to the data source.
Issue: Alerting Not Working
Solutions:
- Review the alert rules. Go to
Alerting > Alert Rules
and verify that all alert conditions and thresholds are correctly configured. - Check notification channels. Make sure the notification channels are properly configured and that Grafana has access to external services.
Conclusion
Prometheus and Grafana together form a powerful duo for monitoring and managing modern applications, especially in environments where reliability, performance, and real-time insights are critical. Prometheus excels in collecting and storing detailed metrics across your entire system, providing a robust foundation for understanding what’s happening under the hood of your applications. With Grafana, on the other hand, it's possible to convert the complex data into visually intuitive dashboards that make it easier for your team to keep an eye on things in real time.
Therefore, by following the steps described in this article you will have a comprehensive solution that enables you to monitor the performance of your apps, identify any problems, and make sure everything continues as planned and functions effectively.
If you need more features in monitoring, check out ITSyndicate Monitoring Solutions.
Automate resource provisioning in Azure DevOps CI/CD pipelines using Terraform
Streamline CORS for your APIs on AWS Gateway with Terraform and Lambda secure scale done
Cut your Kubernetes cloud bill with these 5 hacks for smarter scaling and resource tuning