Observability Specs¶
An observability component configures monitoring, tracing, logging, and alerting for agents and workflows. Observability specs define what data to collect, how to collect it, and when to alert.
Creating an Observability Spec¶
Coming Soon
The Python SDK for local development is not yet publicly available.
from flow_sdk.cli_client import CLIClient
client = CLIClient(config)
obs = client.observability.create({
"name": "Production Agent Monitoring",
"slug": "production-agent-monitoring",
"description": "Full observability for production-deployed agents",
"trace_config": {
"enabled": True,
"sample_rate": 1.0,
"include_inputs": True,
"include_outputs": True,
"include_tool_calls": True,
"propagate_context": True,
},
"metric_config": {
"enabled": True,
"custom_metrics": [
{
"name": "tokens_used",
"type": "counter",
"description": "Total tokens consumed",
"labels": ["model", "agent"],
},
{
"name": "response_latency_ms",
"type": "histogram",
"description": "Agent response latency",
"buckets": [100, 250, 500, 1000, 2500, 5000, 10000],
},
{
"name": "tool_call_success_rate",
"type": "gauge",
"description": "Rolling success rate for tool calls",
},
],
},
"log_config": {
"level": "info",
"include_prompts": False,
"include_tool_results": True,
"structured": True,
"retention_days": 30,
},
"tags": ["production", "monitoring"],
})
Navigate to Components > Observability > New. Configure trace, metric, and log settings. Click Create.
Configuration¶
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Display name |
slug |
string | Yes | URL-safe identifier |
trace_config |
dict | No | Distributed tracing configuration |
metric_config |
dict | No | Metrics collection configuration |
log_config |
dict | No | Logging configuration |
Trace Configuration¶
The trace_config section controls distributed tracing, which tracks the full execution path of an agent or workflow request.
| Field | Type | Default | Description |
|---|---|---|---|
enabled |
bool | true |
Enable tracing |
sample_rate |
float | 1.0 |
Fraction of requests to trace (0.0 - 1.0) |
include_inputs |
bool | true |
Record input data in spans |
include_outputs |
bool | true |
Record output data in spans |
include_tool_calls |
bool | true |
Create child spans for each tool call |
propagate_context |
bool | true |
Propagate trace context across component boundaries |
What Gets Traced¶
A single agent execution generates a trace with multiple spans:
Agent Execution (root span)
├── Prompt Rendering
├── LLM Inference
│ ├── Model: gpt-4o
│ ├── Tokens: 1,250 in / 380 out
│ └── Latency: 1,200ms
├── Tool Call: search-knowledge-base
│ ├── Parameters: {"query": "refund policy"}
│ ├── Result: 3 articles found
│ └── Latency: 450ms
├── LLM Inference (second turn)
│ └── ...
└── Response Generation
# High-traffic production: sample 10% of requests
trace_config={
"enabled": True,
"sample_rate": 0.1,
"include_inputs": True,
"include_outputs": False, # Reduce storage for high volume
"include_tool_calls": True,
}
Sample rate in production
For high-traffic agents, set sample_rate below 1.0 to reduce observability costs while still capturing representative data. A rate of 0.1 (10%) is usually sufficient for identifying patterns.
Metric Configuration¶
The metric_config section defines custom metrics collected during execution.
Metric Types¶
| Type | Description | Example |
|---|---|---|
counter |
Monotonically increasing value | Total requests, total tokens |
gauge |
Value that can go up or down | Active sessions, queue depth |
histogram |
Distribution of values across buckets | Response latency, token counts |
Built-in Metrics¶
The platform automatically collects these metrics for all components:
| Metric | Type | Description |
|---|---|---|
execution_count |
counter | Total executions |
execution_success |
counter | Successful executions |
execution_failure |
counter | Failed executions |
execution_duration_ms |
histogram | Execution duration |
Custom Metrics¶
Define additional metrics specific to your use case:
metric_config={
"enabled": True,
"custom_metrics": [
{
"name": "tokens_input",
"type": "counter",
"description": "Input tokens consumed",
"labels": ["model", "agent_slug"],
},
{
"name": "tokens_output",
"type": "counter",
"description": "Output tokens generated",
"labels": ["model", "agent_slug"],
},
{
"name": "tool_call_duration_ms",
"type": "histogram",
"description": "Individual tool call duration",
"labels": ["tool_slug"],
"buckets": [50, 100, 250, 500, 1000, 5000],
},
{
"name": "escalation_rate",
"type": "gauge",
"description": "Percentage of conversations that escalated to human",
"labels": ["agent_slug"],
},
],
}
Log Configuration¶
The log_config section controls what gets written to logs during execution.
| Field | Type | Default | Description |
|---|---|---|---|
level |
string | info |
Minimum log level: debug, info, warning, error |
include_prompts |
bool | false |
Log full prompt content (may contain sensitive data) |
include_tool_results |
bool | true |
Log tool call results |
structured |
bool | true |
Use JSON structured logging |
retention_days |
int | 30 |
How long to retain logs |
# Development: verbose logging
log_config={
"level": "debug",
"include_prompts": True,
"include_tool_results": True,
"structured": True,
"retention_days": 7,
}
# Production: minimal logging
log_config={
"level": "warning",
"include_prompts": False, # Avoid logging sensitive prompt data
"include_tool_results": False, # Reduce log volume
"structured": True,
"retention_days": 90,
}
Sensitive data in logs
Setting include_prompts=True logs the full prompt content, which may contain customer data, PII, or internal instructions. Use this setting only in development environments or with appropriate data handling policies.
Connecting to Components¶
Reference an observability spec from an agent using observability_ref:
Coming Soon
The Python SDK for local development is not yet publicly available.
agent = client.agents.create({
"name": "Production Support Agent",
"slug": "production-support-agent",
"model_ref": model.id,
"tool_refs": [...],
"observability_ref": obs.id, # Attach observability spec
})
Environment-Specific Configurations¶
Create different observability specs for different environments:
Coming Soon
The Python SDK for local development is not yet publicly available.
# Development: full visibility
dev_obs = client.observability.create({
"name": "Dev Monitoring",
"slug": "dev-monitoring",
"trace_config": {"enabled": True, "sample_rate": 1.0,
"include_inputs": True, "include_outputs": True},
"metric_config": {"enabled": True, "custom_metrics": [...]},
"log_config": {"level": "debug", "include_prompts": True, "retention_days": 7},
})
# Staging: production-like with full tracing
staging_obs = client.observability.create({
"name": "Staging Monitoring",
"slug": "staging-monitoring",
"trace_config": {"enabled": True, "sample_rate": 1.0,
"include_inputs": True, "include_outputs": True},
"metric_config": {"enabled": True, "custom_metrics": [...]},
"log_config": {"level": "info", "include_prompts": False, "retention_days": 14},
})
# Production: sampled tracing, minimal logging
prod_obs = client.observability.create({
"name": "Production Monitoring",
"slug": "production-monitoring",
"trace_config": {"enabled": True, "sample_rate": 0.1,
"include_inputs": True, "include_outputs": False},
"metric_config": {"enabled": True, "custom_metrics": [...]},
"log_config": {"level": "warning", "include_prompts": False, "retention_days": 90},
})
Alert Definitions¶
Define alerts within the metric config to get notified when metrics cross thresholds:
metric_config={
"enabled": True,
"custom_metrics": [...],
"alerts": [
{
"name": "High Error Rate",
"metric": "execution_failure",
"condition": "rate_5m > 0.05",
"severity": "critical",
"description": "More than 5% of executions are failing",
},
{
"name": "Slow Responses",
"metric": "execution_duration_ms",
"condition": "p95 > 10000",
"severity": "warning",
"description": "95th percentile latency exceeds 10 seconds",
},
{
"name": "Token Budget Exceeded",
"metric": "tokens_output",
"condition": "sum_1h > 1000000",
"severity": "warning",
"description": "Output token usage exceeded 1M in the last hour",
},
],
}
| Alert Field | Description |
|---|---|
name |
Human-readable alert name |
metric |
Metric to monitor (built-in or custom) |
condition |
Threshold expression |
severity |
info, warning, critical |
description |
What the alert means and potential actions |