Table of contents
- Springboot Monitoring Guide
- Entailed Monitoring Components
- Prerequisites
- Basic vs Full version
- Disclaimer
- Final Product Preview
- 1. Step: Spring Boot Actuator
- 2. Step: Add Liveness and Rediness Probe
- 3. Step: Prometheus Service Discovery
- 4. Step: OpenTelemtry-Java-Agent (Optional)
- 5. Step: Routing logs to Loki (optional)
- 6. Step: Importing Grafana Dashboard
- 7. Step: Alerts for detecting problems
Springboot Monitoring Guide
The purpose of this guide is to show how we can configure an entire monitoring system around springboot services on a cloud-native environment, w/o the need to write to/extend the base src code
Entailed Monitoring Components
- Application Metrics
- SpringBoot specific dashboard
- Alerts
- Logs (optional)
- Traces (optional)
Prerequisites
- Prometheus
- Alertmanager
- Grafana
- Loki (optional)
- Tempo (optional)
Basic vs Full version
The basic version only uses application metrics for monitoring, meanwhile the full version encompasses logs and traces for clearer visibility, faster incident response, and also visualizes all in one dashboard.
Everything which is part of the full version will marked as "optional"
Disclaimer
In this guide I'm using Openshift's native log collector (Fluentd) and merely redirect it into Loki, therefore this guide is partly Openshift specific, in case you choose to implement the full version instead of the application metrics only (basic) one
Final Product Preview
Full version
At the upper-left corner are the logs, upper-right are the traces, at bottom-half are basic statistics from application metrics.
JVM specific metrics visualized, like memory, load, etc.
If we click at one trace, we can see it in detail like shown above. These are all the spans related to one user request, from controller, through business logic, all the way to the database.
Basic version
Basic statistics about the application with JVM metrics.
Basic JVM metrics at the top, GC metrics at the bottom
Database Connection metrics
1. Step: Spring Boot Actuator
Add to app
For the app, to give us application metrics in prometheus format, we need to add Spring Boot Actuator to the pom.xml:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
More about spring boot actuator
Naming our service
For the dashboard, to be able to navigate between services easily, we have add a name to them in spring boot level. We can add it in the application.propeerties file, or as an environmental property in the configmap file. This example is for the configmapfile
SPRING_APPLICATION_NAME: my-app
Make it secure
To avoid vulnerability, we will redirect a prometeus endpont to a different port from the service's web facing one (It's in the configmap file):
MANAGEMENT_SERVER_PORT: '8081'
2. Step: Add Liveness and Rediness Probe
For services, we (and kubernetes) want to know when they are alive, and after start when they are ready to take requests. These aree the purpose of liveness and rediness probes. Spring Boot Actuator also gives this to us, we only have to specify it in the PodTemplate under "containers" where to look for it.:
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8081
scheme: HTTP
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8081
scheme: HTTP
example deploymentConfig is given in example-deployment.yaml
3. Step: Prometheus Service Discovery
For prometheus to scrape our service's application metrics, we have to provide a ServiceMonitor, where we specify where to look for our app, and which endpoint to scrape. An example is given in the files with the name example-service-monitor.yaml
4. Step: OpenTelemtry-Java-Agent (Optional)
What is an opentelemtry-agent?
It is an auto-instrumented java bin file, which can be added to any java service, which sits onto the JVM directly, and in a standartized formats exports logs, metrics and traces. This time we are only using the trace exporter capability, because we get way more usefull metrics from Spring Boot Actuator, and using Openshift Log collector doesn't require aditional dependencies like the opentelemetry-agent's log forwarder capability.
Add OpenTelemetry-Java-agent to the jar
java -javaagent:path/to/opentelemetry-javaagent.jar \
-jar myapp.jar
Necessary configurations
Service-name
we also have to add a name to the service for the traces. We can either add it here, or in the configmap.yaml:
At Bulding
java -javaagent:path/to/opentelemetry-javaagent.jar \
-Dotel.resource.attributes=service.name=your-service-name \
-jar myapp.jar
OR In Configmap
OTEL_SERVICE_NAME=your-service-name
I'm favoring to centralize everything in the configmaps.
Routing
We are sending the traces straight into tempo
OTEL_TRACES_EXPORTER: otlp
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: example-tempo:4317
More about java-agent instrumentation
5. Step: Routing logs to Loki (optional)
In this guide, I'm not using Promtail, which is the default log collector system of Loki, instead I'm using Openshifts native, already working solution (Fluentd), and redirect it into Loki with a ClusterLogForwarder. The important lines:
outputs:
- name: myLoki
type: "loki"
url: http://loki:3100/loki/api/v1/push
inputs:
- name: my-logs
application:
namespaces:
- my-namespace
pipelines:
- name: my-app
inputRefs:
- my-logs
outputRefs:
- myLoki
Example is given with the cluster-log-forwarder.yaml
6. Step: Importing Grafana Dashboard
In Grafana Left Side Menu->Create->Import. There is 2 versions of dashboard (basic/full). We have to configure the dashboard json and insert the right datasource ids (for prometheus, loki, tempo). The procedure is the same for all of them.
Datasource syntax is dependent on Grafana version
If Grafana < 8.5
"datasource": myPrometheusDatasource
You can just copy the name of your Prometheus datasource there.
If Grafana 8.5+
Dashboard file is given in this format.
"datasource": {
"type": "prometheus",
"uid": "ur3OkZ97z"
}
Here you have change the uid for your Prometheus datasource's uid.
Tip.: replace manually in one place in dashboard edit mode, then in the json replace all (Ctrl + H) old uid for the new one.
7. Step: Alerts for detecting problems
We can't allow ourselves to have a dedicated person, whose only job is looking at dashboards and trying to find problems, therefore we have alerts that fires and reaches out to us when a certain bad condition occurs. These alerts are defined in Prometheus Rules, which are composite expressions calculated from application metrics.
General predefiened alert for Spring Boot applications can be found in the alerting/prometheus-rule.yaml file.
In that file, there is 1 specific use case alert, where I demonstrate how to write an alert, which only fires at working hours. For this functionality, we also have to configure the alertmanager, examle file also given for that.