GatewayClient¶

Coming Soon — Standalone Usage

Standalone GatewayClient usage (outside the platform) is not yet publicly available. It will be released in an upcoming release. If your code runs inside a Hosted Service or Code Block, the AI Gateway is already available via flow.gateway on FlowSDK — no additional setup needed.

GatewayClient provides direct async access to the Manifest Platform AI Gateway for chat completions, streaming, embeddings, and model discovery. The gateway exposes an OpenAI-compatible API that routes requests to configured providers (OpenAI, Anthropic, Google, and others) with built-in authentication, rate limiting, and usage tracking.

FlowSDK vs GatewayClient

If your code runs inside a hosted service or code block, use flow.gateway on FlowSDK instead -- it wraps GatewayClient with automatic configuration. Use GatewayClient directly when you need standalone AI Gateway access from scripts or applications outside the platform runtime.

Initialization¶

from flow_sdk import GatewayClient

# From environment variables (PLATFORM_API_URL, AUTH_TOKEN)
gateway = GatewayClient()

# Explicit configuration
gateway = GatewayClient(
    base_url="https://api.flow.marut.cloud",
    auth_token="your-jwt-token",
    timeout=60.0,
)

Constructor Parameters¶

Parameter	Type	Default	Description
`base_url`	`str`	`PLATFORM_API_URL` env var	Platform API base URL
`auth_token`	`str`	`AUTH_TOKEN` env var	JWT authentication token
`timeout`	`float`	`60.0`	Request timeout in seconds

Context Manager¶

GatewayClient supports async context management for automatic cleanup:

async with GatewayClient() as gateway:
    response = await gateway.chat_completion(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
    )
# HTTP client is automatically closed

Chat Completions¶

Non-Streaming¶

response = await gateway.chat_completion(
    model="openai/gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in one paragraph."},
    ],
    temperature=0.7,
    max_tokens=256,
)

# Response follows the OpenAI format
print(response["choices"][0]["message"]["content"])
print(f"Tokens used: {response['usage']['total_tokens']}")

Parameters¶

Parameter	Type	Required	Description
`model`	`str`	Yes	Model identifier (e.g., `openai/gpt-4o`, `anthropic/claude-sonnet-4-20250514`)
`messages`	`list[dict]`	Yes	Conversation messages with `role` and `content`
`temperature`	`float`	No	Sampling temperature (0.0 -- 2.0)
`max_tokens`	`int`	No	Maximum tokens to generate
`top_p`	`float`	No	Nucleus sampling probability

Response Format¶

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing is..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 128,
    "total_tokens": 170
  }
}

Streaming¶

Stream tokens as they are generated using server-sent events:

async for chunk in gateway.stream_chat_completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Write a short story about a robot."}],
    max_tokens=500,
):
    delta = chunk["choices"][0]["delta"]
    content = delta.get("content", "")
    print(content, end="", flush=True)

print()  # newline after streaming completes

Each chunk follows the OpenAI streaming format:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "Once"
      },
      "finish_reason": null
    }
  ]
}

Collecting a Full Streamed Response¶

full_content = ""
async for chunk in gateway.stream_chat_completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "List 5 programming languages."}],
):
    delta = chunk["choices"][0]["delta"]
    content = delta.get("content", "")
    full_content += content

print(full_content)

Embeddings¶

Generate vector embeddings for text:

Single StringBatch

result = await gateway.embedding(
    model="openai/text-embedding-3-small",
    input="Manifest Platform documentation",
)
vector = result["data"][0]["embedding"]
print(f"Dimensions: {len(vector)}")

result = await gateway.embedding(
    model="openai/text-embedding-3-small",
    input=[
        "First document to embed",
        "Second document to embed",
        "Third document to embed",
    ],
)
for item in result["data"]:
    print(f"Index {item['index']}: {len(item['embedding'])} dimensions")

Response Format¶

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0123, -0.0456, ...]
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}

Model Discovery¶

List All Models¶

models = await gateway.list_models()
for model in models:
    print(f"{model['id']}")

List Provider Models¶

# List all models from a specific provider
openai_models = await gateway.list_provider_models("openai")
print(f"Provider: {openai_models['provider']}")
print(f"Total models: {openai_models['total']}")
for model in openai_models["models"]:
    print(f"  {model['id']}")

List Known Providers¶

providers = await gateway.list_known_providers()
print(providers)
# ["openai", "anthropic", "google", "azure", "mistral", ...]

Model Naming Convention¶

Models are identified using the provider/model-name format:

Provider	Example Models
OpenAI	`openai/gpt-4o`, `openai/gpt-4o-mini`, `openai/text-embedding-3-small`
Anthropic	`anthropic/claude-sonnet-4-20250514`, `anthropic/claude-haiku-4-20250414`
Google	`google/gemini-2.0-flash`
Mistral	`mistral/mistral-large-latest`

The gateway handles provider routing, API key management, and format translation automatically. Your code always uses the same OpenAI-compatible interface regardless of the underlying provider.

Error Handling¶

GatewayClient raises typed exceptions based on HTTP status codes:

from flow_sdk.gateway_client import (
    GatewayClient,
    GatewayClientError,
    GatewayAuthError,
    GatewayRateLimitError,
    GatewayModelNotFoundError,
    GatewayProviderError,
)

try:
    response = await gateway.chat_completion(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Hello"}],
    )
except GatewayAuthError:
    print("Authentication failed -- check your token")
except GatewayModelNotFoundError:
    print("Model not available -- check model ID")
except GatewayRateLimitError:
    print("Rate limited -- implement backoff and retry")
except GatewayProviderError:
    print("Upstream provider error -- try again later")
except GatewayClientError as e:
    print(f"Gateway error: {e}")

Exception Reference¶

Exception	HTTP Status	When
`GatewayAuthError`	401, 403	Invalid or expired token, insufficient permissions
`GatewayModelNotFoundError`	404	Model ID not recognized or not available for your org
`GatewayRateLimitError`	429	Request rate or token quota exceeded
`GatewayProviderError`	502	Upstream LLM provider returned an error
`GatewayClientError`	Other	Base class for all gateway errors

Usage Tracking¶

Every request through the gateway is tracked for billing and observability. Token usage is returned in the response usage field:

response = await gateway.chat_completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

usage = response["usage"]
print(f"Prompt tokens:     {usage['prompt_tokens']}")
print(f"Completion tokens: {usage['completion_tokens']}")
print(f"Total tokens:      {usage['total_tokens']}")

Usage data is automatically attributed to your organization and workspace for cost allocation.

Resource Cleanup¶

Always close the client when you are done, or use the async context manager:

Context ManagerManual Close

async with GatewayClient() as gateway:
    response = await gateway.chat_completion(...)

gateway = GatewayClient()
try:
    response = await gateway.chat_completion(...)
finally:
    await gateway.close()