Molnár Líviusz
Molnár Líviusz's Blog

Follow

Molnár Líviusz's Blog

Follow

Springboot Monitoring Guide

Logs, Metrics and Traces in unity

Molnár Líviusz's photo
Molnár Líviusz
·Aug 1, 2022·

5 min read

Springboot Monitoring Guide

Table of contents

Springboot Monitoring Guide

The purpose of this guide is to show how we can configure an entire monitoring system around springboot services on a cloud-native environment, w/o the need to write to/extend the base src code

Entailed Monitoring Components

  • Application Metrics
  • SpringBoot specific dashboard
  • Alerts
  • Logs (optional)
  • Traces (optional)

Prerequisites

  • Prometheus
  • Alertmanager
  • Grafana
  • Loki (optional)
  • Tempo (optional)

Basic vs Full version

The basic version only uses application metrics for monitoring, meanwhile the full version encompasses logs and traces for clearer visibility, faster incident response, and also visualizes all in one dashboard.
Everything which is part of the full version will marked as "optional"

Disclaimer

In this guide I'm using Openshift's native log collector (Fluentd) and merely redirect it into Loki, therefore this guide is partly Openshift specific, in case you choose to implement the full version instead of the application metrics only (basic) one

Final Product Preview

Full version

Full version At the upper-left corner are the logs, upper-right are the traces, at bottom-half are basic statistics from application metrics. Full version JVM specific metrics visualized, like memory, load, etc. Full version If we click at one trace, we can see it in detail like shown above. These are all the spans related to one user request, from controller, through business logic, all the way to the database.

Basic version

Basic version Basic statistics about the application with JVM metrics. Basic version Basic JVM metrics at the top, GC metrics at the bottom Basic version Database Connection metrics

1. Step: Spring Boot Actuator

Add to app

For the app, to give us application metrics in prometheus format, we need to add Spring Boot Actuator to the pom.xml:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

More about spring boot actuator

Naming our service

For the dashboard, to be able to navigate between services easily, we have add a name to them in spring boot level. We can add it in the application.propeerties file, or as an environmental property in the configmap file. This example is for the configmapfile

SPRING_APPLICATION_NAME: my-app

Make it secure

To avoid vulnerability, we will redirect a prometeus endpont to a different port from the service's web facing one (It's in the configmap file):

MANAGEMENT_SERVER_PORT: '8081'

2. Step: Add Liveness and Rediness Probe

For services, we (and kubernetes) want to know when they are alive, and after start when they are ready to take requests. These aree the purpose of liveness and rediness probes. Spring Boot Actuator also gives this to us, we only have to specify it in the PodTemplate under "containers" where to look for it.:

readinessProbe:
    httpGet:
        path: /actuator/health/readiness
        port: 8081
        scheme: HTTP
livenessProbe:
    httpGet:
        path: /actuator/health/liveness
        port: 8081
        scheme: HTTP

example deploymentConfig is given in example-deployment.yaml

3. Step: Prometheus Service Discovery

For prometheus to scrape our service's application metrics, we have to provide a ServiceMonitor, where we specify where to look for our app, and which endpoint to scrape. An example is given in the files with the name example-service-monitor.yaml

4. Step: OpenTelemtry-Java-Agent (Optional)

What is an opentelemtry-agent?

It is an auto-instrumented java bin file, which can be added to any java service, which sits onto the JVM directly, and in a standartized formats exports logs, metrics and traces. This time we are only using the trace exporter capability, because we get way more usefull metrics from Spring Boot Actuator, and using Openshift Log collector doesn't require aditional dependencies like the opentelemetry-agent's log forwarder capability.

Add OpenTelemetry-Java-agent to the jar

java -javaagent:path/to/opentelemetry-javaagent.jar \
     -jar myapp.jar

Necessary configurations

Service-name

we also have to add a name to the service for the traces. We can either add it here, or in the configmap.yaml:

At Bulding

java -javaagent:path/to/opentelemetry-javaagent.jar \
     -Dotel.resource.attributes=service.name=your-service-name \
     -jar myapp.jar

OR In Configmap

OTEL_SERVICE_NAME=your-service-name

I'm favoring to centralize everything in the configmaps.

Routing

We are sending the traces straight into tempo

OTEL_TRACES_EXPORTER: otlp
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: example-tempo:4317

More about java-agent instrumentation

5. Step: Routing logs to Loki (optional)

In this guide, I'm not using Promtail, which is the default log collector system of Loki, instead I'm using Openshifts native, already working solution (Fluentd), and redirect it into Loki with a ClusterLogForwarder. The important lines:

outputs:
   - name: myLoki
     type: "loki"
     url: http://loki:3100/loki/api/v1/push
inputs: 
   - name: my-logs
     application:
        namespaces:
        - my-namespace
pipelines:
   - name: my-app
     inputRefs:
      - my-logs
     outputRefs:
      - myLoki

Example is given with the cluster-log-forwarder.yaml

6. Step: Importing Grafana Dashboard

In Grafana Left Side Menu->Create->Import. There is 2 versions of dashboard (basic/full). We have to configure the dashboard json and insert the right datasource ids (for prometheus, loki, tempo). The procedure is the same for all of them.

Datasource syntax is dependent on Grafana version

If Grafana < 8.5

"datasource": myPrometheusDatasource

You can just copy the name of your Prometheus datasource there.

If Grafana 8.5+

Dashboard file is given in this format.

"datasource": {
    "type": "prometheus",
    "uid": "ur3OkZ97z"
}

Here you have change the uid for your Prometheus datasource's uid.

Tip.: replace manually in one place in dashboard edit mode, then in the json replace all (Ctrl + H) old uid for the new one.

7. Step: Alerts for detecting problems

We can't allow ourselves to have a dedicated person, whose only job is looking at dashboards and trying to find problems, therefore we have alerts that fires and reaches out to us when a certain bad condition occurs. These alerts are defined in Prometheus Rules, which are composite expressions calculated from application metrics.

General predefiened alert for Spring Boot applications can be found in the alerting/prometheus-rule.yaml file.

In that file, there is 1 specific use case alert, where I demonstrate how to write an alert, which only fires at working hours. For this functionality, we also have to configure the alertmanager, examle file also given for that.

Good alerts are really application specific, therefore the is only a select few option we can generalize, mainteners also have take their own share in creating these alerts