Skip to content

Observability Specs

An observability component configures monitoring, tracing, logging, and alerting for agents and workflows. Observability specs define what data to collect, how to collect it, and when to alert.


Creating an Observability Spec

Coming Soon

The Python SDK for local development is not yet publicly available.

from flow_sdk.cli_client import CLIClient

client = CLIClient(config)

obs = client.observability.create({
    "name": "Production Agent Monitoring",
    "slug": "production-agent-monitoring",
    "description": "Full observability for production-deployed agents",
    "trace_config": {
        "enabled": True,
        "sample_rate": 1.0,
        "include_inputs": True,
        "include_outputs": True,
        "include_tool_calls": True,
        "propagate_context": True,
    },
    "metric_config": {
        "enabled": True,
        "custom_metrics": [
            {
                "name": "tokens_used",
                "type": "counter",
                "description": "Total tokens consumed",
                "labels": ["model", "agent"],
            },
            {
                "name": "response_latency_ms",
                "type": "histogram",
                "description": "Agent response latency",
                "buckets": [100, 250, 500, 1000, 2500, 5000, 10000],
            },
            {
                "name": "tool_call_success_rate",
                "type": "gauge",
                "description": "Rolling success rate for tool calls",
            },
        ],
    },
    "log_config": {
        "level": "info",
        "include_prompts": False,
        "include_tool_results": True,
        "structured": True,
        "retention_days": 30,
    },
    "tags": ["production", "monitoring"],
})

Navigate to Components > Observability > New. Configure trace, metric, and log settings. Click Create.


Configuration

Field Type Required Description
name string Yes Display name
slug string Yes URL-safe identifier
trace_config dict No Distributed tracing configuration
metric_config dict No Metrics collection configuration
log_config dict No Logging configuration

Trace Configuration

The trace_config section controls distributed tracing, which tracks the full execution path of an agent or workflow request.

Field Type Default Description
enabled bool true Enable tracing
sample_rate float 1.0 Fraction of requests to trace (0.0 - 1.0)
include_inputs bool true Record input data in spans
include_outputs bool true Record output data in spans
include_tool_calls bool true Create child spans for each tool call
propagate_context bool true Propagate trace context across component boundaries

What Gets Traced

A single agent execution generates a trace with multiple spans:

Agent Execution (root span)
├── Prompt Rendering
├── LLM Inference
│   ├── Model: gpt-4o
│   ├── Tokens: 1,250 in / 380 out
│   └── Latency: 1,200ms
├── Tool Call: search-knowledge-base
│   ├── Parameters: {"query": "refund policy"}
│   ├── Result: 3 articles found
│   └── Latency: 450ms
├── LLM Inference (second turn)
│   └── ...
└── Response Generation
# High-traffic production: sample 10% of requests
trace_config={
    "enabled": True,
    "sample_rate": 0.1,
    "include_inputs": True,
    "include_outputs": False,  # Reduce storage for high volume
    "include_tool_calls": True,
}

Sample rate in production

For high-traffic agents, set sample_rate below 1.0 to reduce observability costs while still capturing representative data. A rate of 0.1 (10%) is usually sufficient for identifying patterns.


Metric Configuration

The metric_config section defines custom metrics collected during execution.

Metric Types

Type Description Example
counter Monotonically increasing value Total requests, total tokens
gauge Value that can go up or down Active sessions, queue depth
histogram Distribution of values across buckets Response latency, token counts

Built-in Metrics

The platform automatically collects these metrics for all components:

Metric Type Description
execution_count counter Total executions
execution_success counter Successful executions
execution_failure counter Failed executions
execution_duration_ms histogram Execution duration

Custom Metrics

Define additional metrics specific to your use case:

metric_config={
    "enabled": True,
    "custom_metrics": [
        {
            "name": "tokens_input",
            "type": "counter",
            "description": "Input tokens consumed",
            "labels": ["model", "agent_slug"],
        },
        {
            "name": "tokens_output",
            "type": "counter",
            "description": "Output tokens generated",
            "labels": ["model", "agent_slug"],
        },
        {
            "name": "tool_call_duration_ms",
            "type": "histogram",
            "description": "Individual tool call duration",
            "labels": ["tool_slug"],
            "buckets": [50, 100, 250, 500, 1000, 5000],
        },
        {
            "name": "escalation_rate",
            "type": "gauge",
            "description": "Percentage of conversations that escalated to human",
            "labels": ["agent_slug"],
        },
    ],
}

Log Configuration

The log_config section controls what gets written to logs during execution.

Field Type Default Description
level string info Minimum log level: debug, info, warning, error
include_prompts bool false Log full prompt content (may contain sensitive data)
include_tool_results bool true Log tool call results
structured bool true Use JSON structured logging
retention_days int 30 How long to retain logs
# Development: verbose logging
log_config={
    "level": "debug",
    "include_prompts": True,
    "include_tool_results": True,
    "structured": True,
    "retention_days": 7,
}

# Production: minimal logging
log_config={
    "level": "warning",
    "include_prompts": False,      # Avoid logging sensitive prompt data
    "include_tool_results": False,  # Reduce log volume
    "structured": True,
    "retention_days": 90,
}

Sensitive data in logs

Setting include_prompts=True logs the full prompt content, which may contain customer data, PII, or internal instructions. Use this setting only in development environments or with appropriate data handling policies.


Connecting to Components

Reference an observability spec from an agent using observability_ref:

Coming Soon

The Python SDK for local development is not yet publicly available.

agent = client.agents.create({
    "name": "Production Support Agent",
    "slug": "production-support-agent",
    "model_ref": model.id,
    "tool_refs": [...],
    "observability_ref": obs.id,  # Attach observability spec
})

Environment-Specific Configurations

Create different observability specs for different environments:

Coming Soon

The Python SDK for local development is not yet publicly available.

# Development: full visibility
dev_obs = client.observability.create({
    "name": "Dev Monitoring",
    "slug": "dev-monitoring",
    "trace_config": {"enabled": True, "sample_rate": 1.0,
                     "include_inputs": True, "include_outputs": True},
    "metric_config": {"enabled": True, "custom_metrics": [...]},
    "log_config": {"level": "debug", "include_prompts": True, "retention_days": 7},
})

# Staging: production-like with full tracing
staging_obs = client.observability.create({
    "name": "Staging Monitoring",
    "slug": "staging-monitoring",
    "trace_config": {"enabled": True, "sample_rate": 1.0,
                     "include_inputs": True, "include_outputs": True},
    "metric_config": {"enabled": True, "custom_metrics": [...]},
    "log_config": {"level": "info", "include_prompts": False, "retention_days": 14},
})

# Production: sampled tracing, minimal logging
prod_obs = client.observability.create({
    "name": "Production Monitoring",
    "slug": "production-monitoring",
    "trace_config": {"enabled": True, "sample_rate": 0.1,
                     "include_inputs": True, "include_outputs": False},
    "metric_config": {"enabled": True, "custom_metrics": [...]},
    "log_config": {"level": "warning", "include_prompts": False, "retention_days": 90},
})

Alert Definitions

Define alerts within the metric config to get notified when metrics cross thresholds:

metric_config={
    "enabled": True,
    "custom_metrics": [...],
    "alerts": [
        {
            "name": "High Error Rate",
            "metric": "execution_failure",
            "condition": "rate_5m > 0.05",
            "severity": "critical",
            "description": "More than 5% of executions are failing",
        },
        {
            "name": "Slow Responses",
            "metric": "execution_duration_ms",
            "condition": "p95 > 10000",
            "severity": "warning",
            "description": "95th percentile latency exceeds 10 seconds",
        },
        {
            "name": "Token Budget Exceeded",
            "metric": "tokens_output",
            "condition": "sum_1h > 1000000",
            "severity": "warning",
            "description": "Output token usage exceeded 1M in the last hour",
        },
    ],
}
Alert Field Description
name Human-readable alert name
metric Metric to monitor (built-in or custom)
condition Threshold expression
severity info, warning, critical
description What the alert means and potential actions