Live observability laboratory

This site
observes
itself.

Every click, every page load, every API call generates real OpenTelemetry traces and metrics — the same signals you'd find in any production system. The Telemetry tab lets you watch your own session flow through the stack in real time.

Built with OpenTelemetry, ClickHouse, and a custom Spring Boot backend running inside Kubernetes — fully instrumented, end to end.

See Live Telemetry How the Data Works

💸

What does this telemetry cost?Live cost comparison across Datadog, Dynatrace, Grafana, Dash0, and 3 more

→

The data pipeline

Your Browser

Every click, scroll, and page load is an event.

OpenTelemetry

Each event becomes a structured trace with timing and context.

ClickHouse

Every trace from this site lands here. Compressed, indexed, queryable in milliseconds.

This Page

Live metrics, traces, and logs — anyone can see them on the Telemetry tab.

Observability · Platform Engineering

THE DERIVED
ONTOLOGY:
MIGRATIONS AT SCALE

Agentic Workflow — Kibana to Grafana Dashboard Migration

01 / 03

The Derived Ontology: From Theory to Working Migrations

How a canonical ontology layer turned autonomous dashboard migration from theory into a working system across ELK, LGTM, and Dash0

In January I wrote about ontologies for vendor-agnostic observability migration — the idea that a canonical semantic layer could bridge the conceptual gaps between platforms. That post was theory. This one is about what happened when we built it and pointed it at real infrastructure.

The short version: we now have autonomous, AI-driven migration of dashboards and alert rules across three observability stacks — ELK (Elasticsearch/Kibana), LGTM (Loki/Grafana/Tempo/Mimir), and Dash0 — using derived ontological mappings that an AI agent traverses at runtime. Not scripts. Not templates. An agent that reads a source dashboard, understands what each panel means, and rebuilds it idiomatically on the target platform.

Explore Telemetry

OpenTelemetryAgentic AIPlatform MigrationMCP

LIVE TELEMETRY

Real-time traces and metrics from this running application.

Explore →

KIBANA MCP

Full-surface MCP server for Kibana — dashboards, alerts, saved objects, Lens, and KQL query execution.

View on GitHub →

ELASTICSEARCH MCP

MCP server for Elasticsearch — index management, document search, cluster health, and aggregation queries.

View on GitHub →

DASH0 MCP

MCP server for Dash0 — spans, logs, metrics, dashboards, check rules, and synthetic monitoring.

View on GitHub →

GRAFANA MCP

MCP server for Grafana — dashboard creation, PromQL/Loki queries, alert rules, and datasource management.

View on GitHub →

DYNATRACE CLASSIC MCP

MCP server for Dynatrace Classic — entities, problems, metrics, USQL queries, and Davis AI integration.

View on GitHub →

DYNATRACE PLATFORM MCP

MCP server for Dynatrace Platform — DQL queries, log analytics, business events, and platform APIs.

View on GitHub →

DATADOG MCP

MCP server for Datadog — metrics, logs, traces, monitors, dashboards, and incident management.

View on GitHub →

APPDYNAMICS MCP

MCP server for AppDynamics — application performance, business transactions, baselines, and anomaly detection.

View on GitHub →

SOURCE CODE

Full-stack Kubernetes app: Next.js, Spring Boot, OTel Java agent.

View on GitHub →

From the Blog

O11y Alchemy · 7 Articles

Agentic AI, distributed tracing, and the future of observability

Distributed TracingAnti-PatternsTrace Topology

The Geometry of Failure: Language-Agnostic Anti-Pattern Signatures in Distributed Trace Topology

Anti-patterns in distributed systems produce characteristic geometric signatures in trace topology. I tested this hypothesis across Go, Python, and Java — and the geometry was identical every time.

I'm going to make a claim that I haven't seen anyone else make explicitly, and then back it up with empirical evidence from three production experiments.

Anti-patterns in distributed systems produce characteristic geometric signatures in trace topology space. These signatures are invariant across programming languages, runtimes, and instrumentation strategies. A system that classifies trace geometry can detect anti-patterns without knowing anything about the underlying implementation.

MCPArchitectureAutonomous Remediation

Why Autonomous Remediation Requires Open Architecture

Why composable, extensible MCP infrastructure beats proprietary closed platforms for autonomous observability operations.

The autonomous remediation market is expanding rapidly, with vendors demonstrating AI systems that automatically detect and fix production issues. However, a critical foundational question often goes unasked: whether the platform is open or closed architecturally.

This distinction concerns architectural composability — the ability to extend systems with new capabilities, swap out components, and integrate with arbitrary backends. This choice determines whether you're building on scalable infrastructure or betting your operations on a vendor's roadmap.

MCPGitOpsObservability

Closed-Loop Observability: From Trace Detection to Autonomous Remediation

A case study in closed-loop observability: using distributed traces, MCP-integrated tooling, and GitOps to autonomously identify, fix, deploy, and verify a performance anti-pattern in a microservices application.

Traditional observability tools provide dashboards for human investigation. MCP inverts this by exposing observability platforms as programmatic APIs that AI agents can query, correlate, and act upon.

This enables a shift across three observability generations: Observability 1.0 was dashboard-centric monitoring with manual investigation. Observability 2.0 was rich event storage with ad-hoc querying. Observability 3.0 is AI agents autonomously querying telemetry to detect patterns and take action. The Model Context Protocol provides standardized tool interfaces enabling AI agents to interact with any observability backend, infrastructure system, or development tool.

VALISAnti-PatternsBayesian Inference

VALIS: Autonomous Anti-Pattern Detection for Production Observability

A deep technical exploration of VALIS — the Vast Active Living Intelligence System — and how its native integration with observability telemetry enables autonomous detection of performance anti-patterns in distributed systems.

Conventional observability workflows operate reactively. Teams instrument services, transmit telemetry data to platforms, construct dashboards, establish alerts, and await system failures. When issues arise, engineers manually examine traces, correlate logs, and rely on accumulated knowledge about previous incidents.

This approach has limits. Distributed system complexity exceeds human analytical capacity. Performance problems like N+1 queries, memory leaks, and retry cascades often remain undetected until significant damage occurs. The proposition: what if observability systems could actively identify emerging anti-patterns, assess their probability and consequences, and respond preemptively?

MCPAI AgentsObservability

Building AI Agents for Observability Platforms

Why full API coverage matters for autonomous observability operations.

Major observability vendors — Dynatrace, Datadog, New Relic, and Honeycomb — launched AI capabilities in 2024-2025. However, a consistent pattern emerged: these systems permit generous read access while restricting write capabilities.

Users can ask "what problems are open?" but cannot "create a dashboard for this incident." They can query metrics but cannot configure alerts. The AI can observe but it can't operate — and this limitation is a ceiling on agentic AI potential, even though vendor caution around write operations makes business sense.

ObservabilityMigrationOntology

Ontologies for Vendor-Agnostic Observability Migration

Building semantic bridges between observability platforms for intelligent configuration migration.

Observability platform migrations are notoriously painful. Whether moving from Datadog to Dynatrace, New Relic to Grafana, or any other combination, the process typically involves exporting dashboards as JSON, manually mapping fields between schemas, rewriting queries in the target language, rebuilding what doesn't translate, and hoping nothing breaks.

This is syntactic translation — moving symbols between systems without understanding what they mean. The result: migrations take months, cost more than expected, and leave gaps that only surface in production.

MCPDevOpsArchitecture

The End of DevOps Platforms

Why MCP-based AI agents will replace monolithic DevOps platforms.

For the past decade, the DevOps market has consolidated around platforms. Harness, GitLab Ultimate, GitHub Enterprise, CircleCI — they all sell the same promise: unified DevOps under one roof. Before platforms, teams stitched together Jenkins, custom scripts, and a dozen point solutions. It was fragile and expensive to maintain. Platforms offered integration, governance, and a single pane of glass.

But platforms come with tradeoffs: vendor lock-in, lowest-common-denominator features, and pricing that scales with headcount rather than value. Now there's an alternative.

This siteobservesitself.

THE DERIVEDONTOLOGY:MIGRATIONS AT SCALE

The Derived Ontology: From Theory to Working Migrations

From the Blog

The Geometry of Failure: Language-Agnostic Anti-Pattern Signatures in Distributed Trace Topology

Why Autonomous Remediation Requires Open Architecture

Closed-Loop Observability: From Trace Detection to Autonomous Remediation

VALIS: Autonomous Anti-Pattern Detection for Production Observability

Building AI Agents for Observability Platforms

Ontologies for Vendor-Agnostic Observability Migration

The End of DevOps Platforms

This site
observes
itself.

THE DERIVED
ONTOLOGY:
MIGRATIONS AT SCALE