Observability Specs¶

An observability component configures monitoring, tracing, logging, and alerting for agents and workflows. Observability specs define what data to collect, how to collect it, and when to alert.

Creating an Observability Spec¶

PythonUI

Coming Soon

The Python SDK for local development is not yet publicly available.

from flow_sdk.cli_client import CLIClient

client = CLIClient(config)

obs = client.observability.create({
    "name": "Production Agent Monitoring",
    "slug": "production-agent-monitoring",
    "description": "Full observability for production-deployed agents",
    "trace_config": {
        "enabled": True,
        "sample_rate": 1.0,
        "include_inputs": True,
        "include_outputs": True,
        "include_tool_calls": True,
        "propagate_context": True,
    },
    "metric_config": {
        "enabled": True,
        "custom_metrics": [
            {
                "name": "tokens_used",
                "type": "counter",
                "description": "Total tokens consumed",
                "labels": ["model", "agent"],
            },
            {
                "name": "response_latency_ms",
                "type": "histogram",
                "description": "Agent response latency",
                "buckets": [100, 250, 500, 1000, 2500, 5000, 10000],
            },
            {
                "name": "tool_call_success_rate",
                "type": "gauge",
                "description": "Rolling success rate for tool calls",
            },
        ],
    },
    "log_config": {
        "level": "info",
        "include_prompts": False,
        "include_tool_results": True,
        "structured": True,
        "retention_days": 30,
    },
    "tags": ["production", "monitoring"],
})

Navigate to Components > Observability > New. Configure trace, metric, and log settings. Click Create.

Configuration¶

Field	Type	Required	Description
`name`	string	Yes	Display name
`slug`	string	Yes	URL-safe identifier
`trace_config`	dict	No	Distributed tracing configuration
`metric_config`	dict	No	Metrics collection configuration
`log_config`	dict	No	Logging configuration

Trace Configuration¶

The trace_config section controls distributed tracing, which tracks the full execution path of an agent or workflow request.

Field	Type	Default	Description
`enabled`	bool	`true`	Enable tracing
`sample_rate`	float	`1.0`	Fraction of requests to trace (0.0 - 1.0)
`include_inputs`	bool	`true`	Record input data in spans
`include_outputs`	bool	`true`	Record output data in spans
`include_tool_calls`	bool	`true`	Create child spans for each tool call
`propagate_context`	bool	`true`	Propagate trace context across component boundaries

What Gets Traced¶

A single agent execution generates a trace with multiple spans:

Agent Execution (root span)
├── Prompt Rendering
├── LLM Inference
│   ├── Model: gpt-4o
│   ├── Tokens: 1,250 in / 380 out
│   └── Latency: 1,200ms
├── Tool Call: search-knowledge-base
│   ├── Parameters: {"query": "refund policy"}
│   ├── Result: 3 articles found
│   └── Latency: 450ms
├── LLM Inference (second turn)
│   └── ...
└── Response Generation

# High-traffic production: sample 10% of requests
trace_config={
    "enabled": True,
    "sample_rate": 0.1,
    "include_inputs": True,
    "include_outputs": False,  # Reduce storage for high volume
    "include_tool_calls": True,
}

Sample rate in production

For high-traffic agents, set sample_rate below 1.0 to reduce observability costs while still capturing representative data. A rate of 0.1 (10%) is usually sufficient for identifying patterns.

Metric Configuration¶

The metric_config section defines custom metrics collected during execution.

Metric Types¶

Type	Description	Example
`counter`	Monotonically increasing value	Total requests, total tokens
`gauge`	Value that can go up or down	Active sessions, queue depth
`histogram`	Distribution of values across buckets	Response latency, token counts

Built-in Metrics¶

The platform automatically collects these metrics for all components:

Metric	Type	Description
`execution_count`	counter	Total executions
`execution_success`	counter	Successful executions
`execution_failure`	counter	Failed executions
`execution_duration_ms`	histogram	Execution duration

Custom Metrics¶

Define additional metrics specific to your use case:

metric_config={
    "enabled": True,
    "custom_metrics": [
        {
            "name": "tokens_input",
            "type": "counter",
            "description": "Input tokens consumed",
            "labels": ["model", "agent_slug"],
        },
        {
            "name": "tokens_output",
            "type": "counter",
            "description": "Output tokens generated",
            "labels": ["model", "agent_slug"],
        },
        {
            "name": "tool_call_duration_ms",
            "type": "histogram",
            "description": "Individual tool call duration",
            "labels": ["tool_slug"],
            "buckets": [50, 100, 250, 500, 1000, 5000],
        },
        {
            "name": "escalation_rate",
            "type": "gauge",
            "description": "Percentage of conversations that escalated to human",
            "labels": ["agent_slug"],
        },
    ],
}

Log Configuration¶

The log_config section controls what gets written to logs during execution.

Field	Type	Default	Description
`level`	string	`info`	Minimum log level: `debug`, `info`, `warning`, `error`
`include_prompts`	bool	`false`	Log full prompt content (may contain sensitive data)
`include_tool_results`	bool	`true`	Log tool call results
`structured`	bool	`true`	Use JSON structured logging
`retention_days`	int	`30`	How long to retain logs

# Development: verbose logging
log_config={
    "level": "debug",
    "include_prompts": True,
    "include_tool_results": True,
    "structured": True,
    "retention_days": 7,
}

# Production: minimal logging
log_config={
    "level": "warning",
    "include_prompts": False,      # Avoid logging sensitive prompt data
    "include_tool_results": False,  # Reduce log volume
    "structured": True,
    "retention_days": 90,
}

Sensitive data in logs

Setting include_prompts=True logs the full prompt content, which may contain customer data, PII, or internal instructions. Use this setting only in development environments or with appropriate data handling policies.

Connecting to Components¶

Reference an observability spec from an agent using observability_ref:

Coming Soon

The Python SDK for local development is not yet publicly available.

agent = client.agents.create({
    "name": "Production Support Agent",
    "slug": "production-support-agent",
    "model_ref": model.id,
    "tool_refs": [...],
    "observability_ref": obs.id,  # Attach observability spec
})

Environment-Specific Configurations¶

Create different observability specs for different environments:

Coming Soon

The Python SDK for local development is not yet publicly available.

# Development: full visibility
dev_obs = client.observability.create({
    "name": "Dev Monitoring",
    "slug": "dev-monitoring",
    "trace_config": {"enabled": True, "sample_rate": 1.0,
                     "include_inputs": True, "include_outputs": True},
    "metric_config": {"enabled": True, "custom_metrics": [...]},
    "log_config": {"level": "debug", "include_prompts": True, "retention_days": 7},
})

# Staging: production-like with full tracing
staging_obs = client.observability.create({
    "name": "Staging Monitoring",
    "slug": "staging-monitoring",
    "trace_config": {"enabled": True, "sample_rate": 1.0,
                     "include_inputs": True, "include_outputs": True},
    "metric_config": {"enabled": True, "custom_metrics": [...]},
    "log_config": {"level": "info", "include_prompts": False, "retention_days": 14},
})

# Production: sampled tracing, minimal logging
prod_obs = client.observability.create({
    "name": "Production Monitoring",
    "slug": "production-monitoring",
    "trace_config": {"enabled": True, "sample_rate": 0.1,
                     "include_inputs": True, "include_outputs": False},
    "metric_config": {"enabled": True, "custom_metrics": [...]},
    "log_config": {"level": "warning", "include_prompts": False, "retention_days": 90},
})

Alert Definitions¶

Define alerts within the metric config to get notified when metrics cross thresholds:

metric_config={
    "enabled": True,
    "custom_metrics": [...],
    "alerts": [
        {
            "name": "High Error Rate",
            "metric": "execution_failure",
            "condition": "rate_5m > 0.05",
            "severity": "critical",
            "description": "More than 5% of executions are failing",
        },
        {
            "name": "Slow Responses",
            "metric": "execution_duration_ms",
            "condition": "p95 > 10000",
            "severity": "warning",
            "description": "95th percentile latency exceeds 10 seconds",
        },
        {
            "name": "Token Budget Exceeded",
            "metric": "tokens_output",
            "condition": "sum_1h > 1000000",
            "severity": "warning",
            "description": "Output token usage exceeded 1M in the last hour",
        },
    ],
}

Alert Field	Description
`name`	Human-readable alert name
`metric`	Metric to monitor (built-in or custom)
`condition`	Threshold expression
`severity`	`info`, `warning`, `critical`
`description`	What the alert means and potential actions