Live observability laboratory

This site
observes
itself.

Every click, every page load, every API call generates real OpenTelemetry traces and metrics — the same signals you'd find in any production system. The Telemetry tab lets you watch your own session flow through the stack in real time.

Built with OpenTelemetry, ClickHouse, and a custom Spring Boot backend running inside Kubernetes — fully instrumented, end to end.

💸
What does this telemetry cost?Live cost comparison across Datadog, Dynatrace, Grafana, Dash0, and 3 more
The data pipeline

Your Browser

Every click, scroll, and page load is an event.

OpenTelemetry

Each event becomes a structured trace with timing and context.

ClickHouse

Every trace from this site lands here. Compressed, indexed, queryable in milliseconds.

This Page

Live metrics, traces, and logs — anyone can see them on the Telemetry tab.

Observability · Platform Engineering

THE DERIVED
ONTOLOGY:
MIGRATIONS AT SCALE

Agentic Workflow — Kibana to Grafana Dashboard Migration
01 / 03
Featured Article · O11y Alchemy

The Derived Ontology: From Theory to Working Migrations

How a canonical ontology layer turned autonomous dashboard migration from theory into a working system across ELK, LGTM, and Dash0

In January I wrote about ontologies for vendor-agnostic observability migration — the idea that a canonical semantic layer could bridge the conceptual gaps between platforms. That post was theory. This one is about what happened when we built it and pointed it at real infrastructure.

The short version: we now have autonomous, AI-driven migration of dashboards and alert rules across three observability stacks — ELK (Elasticsearch/Kibana), LGTM (Loki/Grafana/Tempo/Mimir), and Dash0 — using derived ontological mappings that an AI agent traverses at runtime. Not scripts. Not templates. An agent that reads a source dashboard, understands what each panel means, and rebuilds it idiomatically on the target platform.

OpenTelemetryAgentic AIPlatform MigrationMCP

From the Blog

O11y Alchemy · 7 Articles

Agentic AI, distributed tracing, and the future of observability

I'm going to make a claim that I haven't seen anyone else make explicitly, and then back it up with empirical evidence from three production experiments.

Anti-patterns in distributed systems produce characteristic geometric signatures in trace topology space. These signatures are invariant across programming languages, runtimes, and instrumentation strategies. A system that classifies trace geometry can detect anti-patterns without knowing anything about the underlying implementation.

The autonomous remediation market is expanding rapidly, with vendors demonstrating AI systems that automatically detect and fix production issues. However, a critical foundational question often goes unasked: whether the platform is open or closed architecturally.

This distinction concerns architectural composability — the ability to extend systems with new capabilities, swap out components, and integrate with arbitrary backends. This choice determines whether you're building on scalable infrastructure or betting your operations on a vendor's roadmap.

Traditional observability tools provide dashboards for human investigation. MCP inverts this by exposing observability platforms as programmatic APIs that AI agents can query, correlate, and act upon.

This enables a shift across three observability generations: Observability 1.0 was dashboard-centric monitoring with manual investigation. Observability 2.0 was rich event storage with ad-hoc querying. Observability 3.0 is AI agents autonomously querying telemetry to detect patterns and take action. The Model Context Protocol provides standardized tool interfaces enabling AI agents to interact with any observability backend, infrastructure system, or development tool.

Conventional observability workflows operate reactively. Teams instrument services, transmit telemetry data to platforms, construct dashboards, establish alerts, and await system failures. When issues arise, engineers manually examine traces, correlate logs, and rely on accumulated knowledge about previous incidents.

This approach has limits. Distributed system complexity exceeds human analytical capacity. Performance problems like N+1 queries, memory leaks, and retry cascades often remain undetected until significant damage occurs. The proposition: what if observability systems could actively identify emerging anti-patterns, assess their probability and consequences, and respond preemptively?

Major observability vendors — Dynatrace, Datadog, New Relic, and Honeycomb — launched AI capabilities in 2024-2025. However, a consistent pattern emerged: these systems permit generous read access while restricting write capabilities.

Users can ask "what problems are open?" but cannot "create a dashboard for this incident." They can query metrics but cannot configure alerts. The AI can observe but it can't operate — and this limitation is a ceiling on agentic AI potential, even though vendor caution around write operations makes business sense.

Observability platform migrations are notoriously painful. Whether moving from Datadog to Dynatrace, New Relic to Grafana, or any other combination, the process typically involves exporting dashboards as JSON, manually mapping fields between schemas, rewriting queries in the target language, rebuilding what doesn't translate, and hoping nothing breaks.

This is syntactic translation — moving symbols between systems without understanding what they mean. The result: migrations take months, cost more than expected, and leave gaps that only surface in production.

For the past decade, the DevOps market has consolidated around platforms. Harness, GitLab Ultimate, GitHub Enterprise, CircleCI — they all sell the same promise: unified DevOps under one roof. Before platforms, teams stitched together Jenkins, custom scripts, and a dozen point solutions. It was fragile and expensive to maintain. Platforms offered integration, governance, and a single pane of glass.

But platforms come with tradeoffs: vendor lock-in, lowest-common-denominator features, and pricing that scales with headcount rather than value. Now there's an alternative.