r/OpenTelemetry

▲ 60 r/OpenTelemetry+1 crossposts

OpenTelemetry signals from first principles

There's a lot of high-noise, low-value content around OpenTelemetry out there, so I've tried to put together the simplest description I could by incrementally building up from needs that arise in your systems. I hope it might help cut through some of the less obvious concepts like context propagation and exponential histograms.

The format is very loosely pinched from "The Little .." series :)

kodraus.github.io
u/KodrAus — 9 days ago
▲ 28 r/OpenTelemetry+7 crossposts

I added dedicated OpenShift support to KubeShark.

Mini recap:

KubeShark is my Kubernetes skill for Claude Code and Codex.

It helps AI agents generate, review, and refactor Kubernetes manifests without falling into the usual LLM traps: missing security contexts, deprecated API versions, broken selectors, wildcard RBAC, unsafe probes, missing resource requests, and rollout configs that look okay but fail under real traffic.

The important part is that KubeShark is failure-mode-first. It does not just tell the model “write good Kubernetes”. It forces the model to reason about what can go wrong before it generates YAML, and then return validation and rollback guidance as part of the answer.

That matters a lot with Kubernetes, because many bad manifests are accepted by the API server and only fail later at runtime.

Repo: https://github.com/LukasNiessen/kubernetes-skill

---

Now what’s new:

KubeShark now has special dedicated OpenShift support.

When the task involves OpenShift, OKD, ROSA, ARO, Routes, SCCs, OLM, ImageStreams, or oc, KubeShark switches into OpenShift-aware guidance.

This matters because OpenShift is Kubernetes, but with important platform behavior that generic Kubernetes YAML often ignores.

Common LLM mistakes include:

  • hardcoding runAsUser: 1000
  • assuming root-capable images will run
  • telling users to edit default SCCs
  • granting anyuid or privileged too broadly
  • using Ingress-controller annotations on OpenShift Routes
  • forgetting to validate with oc

Example guidance KubeShark now keeps in mind:

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: app
spec:
  to:
    kind: Service
    name: app
  tls:
    termination: edge

It also knows to treat OpenShift Routes, SCCs, arbitrary UID containers, and OLM-managed resources as first-class concerns.

So instead of generic Kubernetes advice, you get OpenShift-aware manifest generation and review.

u/trolleid — 13 days ago

I'm trying to setup a self hosted Otel log/trace/metric sink and dashboard for a small set of web and worker apps. I've tried ClickStack, Grafana, and now OpenObserve and all three appear to have roughly the same general feature set for showing otel data.

But one piece they all seem to lack, which feels nuts is that is a standard "tail" and keyword search for logs like you find in Seq, Papertrail, other log systems. Everything is "run this query" and some log query syntax that I definitely don't want to have to learn when triaging some system issue.

So - do you have a preferred OTel solution that's inexpensive to self host at a small scale and a log interface that matches the sort of features purely log focused apps provide?

Thanks!

reddit.com
u/jakenuts- — 9 days ago
▲ 36 r/OpenTelemetry+4 crossposts

I built a repo of ready-to-run OpenTelemetry Collector configs (Prometheus, Jaeger, Dynatrace, Datadog, Loki, k8s), feedback welcome

I just open-sourced a collection of ready-to-run OpenTelemetry

Collector configurations, because finding complete, working configs

for your specific backend always takes hours of trial and error.

It now includes examples for:

  • Prometheus
  • Jaeger
  • Grafana Loki
  • Dynatrace
  • Datadog
  • Kubernetes Operator
  • Kubernetes Pod Annotation Scraping (with full relabeling)
  • Debug (no backend needed, perfect for local dev)

Each example includes Docker Compose so you can run it in 60 seconds.

The k8s pod annotation scraping example includes relabeling for

prometheus.io/scrape, prometheus.io/port, and prometheus.io/path

annotations, the config everyone googles when setting up k8s monitoring.

I also actively contribute to the OpenTelemetry open source project,

recently got PRs merged into open-telemetry/otel-arrow and have PRs

open in opentelemetry-android, opentelemetry-helm-charts, and

opentelemetry-dotnet-instrumentation.

https://github.com/Cloud-Architect-Emma/opentelemetry-collector-examples

Feedback and contributions welcome! ⭐ if it's useful.

#OpenTelemetry #DevOps #Observability #Kubernetes #SRE #Monitoring #CloudNative #OpenSource

u/EmmaOpu — 4 days ago
▲ 9 r/OpenTelemetry+2 crossposts

ClickHouse is a beast for observability, but dumping raw, un-enriched OTel data into it can lead to massive storage costs and messy queries. We just launched native OTLP connection for GlassFlow that moves that processing upstream when it comes to enriching and filtering OTel spans before they hit the table.

The goal is to keep the dashboards fast without the overhead of massive background merges or complex SQL views. Check out the setup we’re using for enriched OTel pipelines. What’s your biggest bottleneck when querying raw OTel data in ClickHouse? 🤔

u/Marksfik — 11 days ago
▲ 44 r/OpenTelemetry+1 crossposts

CNCF TOC votes in favor of OTel Graduation

The CNCF technical oversight committee has voted to approve the OTel due diligence document.

This is one of the final steps towards graduation: the thorough due diligence, which included interviews with end users and resolution of the recommendations given in previous steps, has been finished and approved by the TOC 🎉

github.com
u/jpkroehling — 6 days ago

Hey all,

I’ve been working on a lightweight tool called mqtt2otel and thought it might be useful for some of you here.

It basically connects MQTT-based IoT setups with the OpenTelemetry ecosystem. It subscribes to MQTT topics, lets you process/enrich the messages, and then exports them as OTel metrics/logs.

Why I built it:

  • MQTT is great for IoT, but doesn’t integrate nicely with modern observability stacks, especially for logs, or even traces.
  • Direct solutions to consume, parse, process and enrich mqtt messages in the dashboard system are often limited and have a high dependency to these systems making it hard to change later.
  • OpenTelemetry is everywhere now, but not really designed for IoT ingestion
  • Many architectures are allready build upon the OpenTelemetry stack, which gives you a nice abstraction for the different available Dashboard tools.

So this bridges the gap.

What it does:

  • Subscribe to MQTT topics
  • Transform / enrich messages (add metadata like location, device info, etc.)
  • Export as OpenTelemetry metrics or logs

Would love to get feedback or ideas 🙌

Web: https://mqtt2otel.org

GitHub: https://github.com/OSgAgA/mqtt2otel

reddit.com
u/OSgAgA42 — 11 days ago

How to convert Prometheus Remote Write metrics from Kafka into OTEL semantic conventions?

I’m trying to get OpenShift metrics into OTEL semantic conventions while keeping an OTel Collector after Kafka.

My understanding is that if Prometheus Remote Write data is received directly by the OTel Prometheus Remote Write receiver and exported as OTLP, the metrics are converted into OTEL metric format/semantic conventions where applicable.

However, our current pipeline is:

OpenShift Prometheus Remote Write -> Metricbeat -> Kafka -> OTel Kafka Receiver -> OTLP Exporter

The problem is that I don’t think the OTel Kafka receiver can decode Prometheus Remote Write payloads the same way the Prometheus Remote Write receiver does.

Has anyone implemented this architecture successfully with Kafka in the middle?

Specifically:
- Can the Kafka receiver process Prometheus Remote Write payloads correctly?
- Is there a way to preserve/convert to OTEL semantic conventions after Kafka?
- Should the data be converted to OTLP before it reaches Kafka instead?

TL;DR:
How do you convert Prometheus Remote Write metrics coming from Kafka into proper OTEL metrics/semantic conventions using an OTel Collector after Kafka?

reddit.com
u/13hyperdragoons — 4 days ago

Decomposing OpenTelemetry Collector Configuration for Maintainability | OllyGarden Blog

This is one trick I tell people and surprise them most of the time: "the Collector can do this?"

This one took a while to write, the idea came during OTel Night here in Berlin and I noticed that decomposing the config wasn't helpful only for keeping sanity but also to enable small chunks to be tested.

ollygarden.com
u/jpkroehling — 3 days ago