GatewayClient¶
Coming Soon — Standalone Usage
Standalone GatewayClient usage (outside the platform) is not yet publicly available. It will be released in an upcoming release. If your code runs inside a Hosted Service or Code Block, the AI Gateway is already available via flow.gateway on FlowSDK — no additional setup needed.
GatewayClient provides direct async access to the Manifest Platform AI Gateway for chat completions, streaming, embeddings, and model discovery. The gateway exposes an OpenAI-compatible API that routes requests to configured providers (OpenAI, Anthropic, Google, and others) with built-in authentication, rate limiting, and usage tracking.
FlowSDK vs GatewayClient
If your code runs inside a hosted service or code block, use flow.gateway on FlowSDK instead -- it wraps GatewayClient with automatic configuration. Use GatewayClient directly when you need standalone AI Gateway access from scripts or applications outside the platform runtime.
Initialization¶
from flow_sdk import GatewayClient
# From environment variables (PLATFORM_API_URL, AUTH_TOKEN)
gateway = GatewayClient()
# Explicit configuration
gateway = GatewayClient(
base_url="https://api.flow.marut.cloud",
auth_token="your-jwt-token",
timeout=60.0,
)
Constructor Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
base_url |
str |
PLATFORM_API_URL env var |
Platform API base URL |
auth_token |
str |
AUTH_TOKEN env var |
JWT authentication token |
timeout |
float |
60.0 |
Request timeout in seconds |
Context Manager¶
GatewayClient supports async context management for automatic cleanup:
async with GatewayClient() as gateway:
response = await gateway.chat_completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
# HTTP client is automatically closed
Chat Completions¶
Non-Streaming¶
response = await gateway.chat_completion(
model="openai/gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in one paragraph."},
],
temperature=0.7,
max_tokens=256,
)
# Response follows the OpenAI format
print(response["choices"][0]["message"]["content"])
print(f"Tokens used: {response['usage']['total_tokens']}")
Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
str |
Yes | Model identifier (e.g., openai/gpt-4o, anthropic/claude-sonnet-4-20250514) |
messages |
list[dict] |
Yes | Conversation messages with role and content |
temperature |
float |
No | Sampling temperature (0.0 -- 2.0) |
max_tokens |
int |
No | Maximum tokens to generate |
top_p |
float |
No | Nucleus sampling probability |
Response Format¶
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "openai/gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing is..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 42,
"completion_tokens": 128,
"total_tokens": 170
}
}
Streaming¶
Stream tokens as they are generated using server-sent events:
async for chunk in gateway.stream_chat_completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Write a short story about a robot."}],
max_tokens=500,
):
delta = chunk["choices"][0]["delta"]
content = delta.get("content", "")
print(content, end="", flush=True)
print() # newline after streaming completes
Each chunk follows the OpenAI streaming format:
{
"id": "chatcmpl-abc123",
"object": "chat.completion.chunk",
"model": "openai/gpt-4o",
"choices": [
{
"index": 0,
"delta": {
"content": "Once"
},
"finish_reason": null
}
]
}
Collecting a Full Streamed Response¶
full_content = ""
async for chunk in gateway.stream_chat_completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "List 5 programming languages."}],
):
delta = chunk["choices"][0]["delta"]
content = delta.get("content", "")
full_content += content
print(full_content)
Embeddings¶
Generate vector embeddings for text:
Response Format¶
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0123, -0.0456, ...]
}
],
"usage": {
"prompt_tokens": 5,
"total_tokens": 5
}
}
Model Discovery¶
List All Models¶
List Provider Models¶
# List all models from a specific provider
openai_models = await gateway.list_provider_models("openai")
print(f"Provider: {openai_models['provider']}")
print(f"Total models: {openai_models['total']}")
for model in openai_models["models"]:
print(f" {model['id']}")
List Known Providers¶
providers = await gateway.list_known_providers()
print(providers)
# ["openai", "anthropic", "google", "azure", "mistral", ...]
Model Naming Convention¶
Models are identified using the provider/model-name format:
| Provider | Example Models |
|---|---|
| OpenAI | openai/gpt-4o, openai/gpt-4o-mini, openai/text-embedding-3-small |
| Anthropic | anthropic/claude-sonnet-4-20250514, anthropic/claude-haiku-4-20250414 |
google/gemini-2.0-flash |
|
| Mistral | mistral/mistral-large-latest |
The gateway handles provider routing, API key management, and format translation automatically. Your code always uses the same OpenAI-compatible interface regardless of the underlying provider.
Error Handling¶
GatewayClient raises typed exceptions based on HTTP status codes:
from flow_sdk.gateway_client import (
GatewayClient,
GatewayClientError,
GatewayAuthError,
GatewayRateLimitError,
GatewayModelNotFoundError,
GatewayProviderError,
)
try:
response = await gateway.chat_completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
except GatewayAuthError:
print("Authentication failed -- check your token")
except GatewayModelNotFoundError:
print("Model not available -- check model ID")
except GatewayRateLimitError:
print("Rate limited -- implement backoff and retry")
except GatewayProviderError:
print("Upstream provider error -- try again later")
except GatewayClientError as e:
print(f"Gateway error: {e}")
Exception Reference¶
| Exception | HTTP Status | When |
|---|---|---|
GatewayAuthError |
401, 403 | Invalid or expired token, insufficient permissions |
GatewayModelNotFoundError |
404 | Model ID not recognized or not available for your org |
GatewayRateLimitError |
429 | Request rate or token quota exceeded |
GatewayProviderError |
502 | Upstream LLM provider returned an error |
GatewayClientError |
Other | Base class for all gateway errors |
Usage Tracking¶
Every request through the gateway is tracked for billing and observability. Token usage is returned in the response usage field:
response = await gateway.chat_completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
usage = response["usage"]
print(f"Prompt tokens: {usage['prompt_tokens']}")
print(f"Completion tokens: {usage['completion_tokens']}")
print(f"Total tokens: {usage['total_tokens']}")
Usage data is automatically attributed to your organization and workspace for cost allocation.
Resource Cleanup¶
Always close the client when you are done, or use the async context manager: