Skip to content

GatewayClient

Coming Soon — Standalone Usage

Standalone GatewayClient usage (outside the platform) is not yet publicly available. It will be released in an upcoming release. If your code runs inside a Hosted Service or Code Block, the AI Gateway is already available via flow.gateway on FlowSDK — no additional setup needed.

GatewayClient provides direct async access to the Manifest Platform AI Gateway for chat completions, streaming, embeddings, and model discovery. The gateway exposes an OpenAI-compatible API that routes requests to configured providers (OpenAI, Anthropic, Google, and others) with built-in authentication, rate limiting, and usage tracking.

FlowSDK vs GatewayClient

If your code runs inside a hosted service or code block, use flow.gateway on FlowSDK instead -- it wraps GatewayClient with automatic configuration. Use GatewayClient directly when you need standalone AI Gateway access from scripts or applications outside the platform runtime.

Initialization

from flow_sdk import GatewayClient

# From environment variables (PLATFORM_API_URL, AUTH_TOKEN)
gateway = GatewayClient()

# Explicit configuration
gateway = GatewayClient(
    base_url="https://api.flow.marut.cloud",
    auth_token="your-jwt-token",
    timeout=60.0,
)

Constructor Parameters

Parameter Type Default Description
base_url str PLATFORM_API_URL env var Platform API base URL
auth_token str AUTH_TOKEN env var JWT authentication token
timeout float 60.0 Request timeout in seconds

Context Manager

GatewayClient supports async context management for automatic cleanup:

async with GatewayClient() as gateway:
    response = await gateway.chat_completion(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
    )
# HTTP client is automatically closed

Chat Completions

Non-Streaming

response = await gateway.chat_completion(
    model="openai/gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in one paragraph."},
    ],
    temperature=0.7,
    max_tokens=256,
)

# Response follows the OpenAI format
print(response["choices"][0]["message"]["content"])
print(f"Tokens used: {response['usage']['total_tokens']}")

Parameters

Parameter Type Required Description
model str Yes Model identifier (e.g., openai/gpt-4o, anthropic/claude-sonnet-4-20250514)
messages list[dict] Yes Conversation messages with role and content
temperature float No Sampling temperature (0.0 -- 2.0)
max_tokens int No Maximum tokens to generate
top_p float No Nucleus sampling probability

Response Format

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing is..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 128,
    "total_tokens": 170
  }
}

Streaming

Stream tokens as they are generated using server-sent events:

async for chunk in gateway.stream_chat_completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Write a short story about a robot."}],
    max_tokens=500,
):
    delta = chunk["choices"][0]["delta"]
    content = delta.get("content", "")
    print(content, end="", flush=True)

print()  # newline after streaming completes

Each chunk follows the OpenAI streaming format:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "Once"
      },
      "finish_reason": null
    }
  ]
}

Collecting a Full Streamed Response

full_content = ""
async for chunk in gateway.stream_chat_completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "List 5 programming languages."}],
):
    delta = chunk["choices"][0]["delta"]
    content = delta.get("content", "")
    full_content += content

print(full_content)

Embeddings

Generate vector embeddings for text:

result = await gateway.embedding(
    model="openai/text-embedding-3-small",
    input="Manifest Platform documentation",
)
vector = result["data"][0]["embedding"]
print(f"Dimensions: {len(vector)}")
result = await gateway.embedding(
    model="openai/text-embedding-3-small",
    input=[
        "First document to embed",
        "Second document to embed",
        "Third document to embed",
    ],
)
for item in result["data"]:
    print(f"Index {item['index']}: {len(item['embedding'])} dimensions")

Response Format

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0123, -0.0456, ...]
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}

Model Discovery

List All Models

models = await gateway.list_models()
for model in models:
    print(f"{model['id']}")

List Provider Models

# List all models from a specific provider
openai_models = await gateway.list_provider_models("openai")
print(f"Provider: {openai_models['provider']}")
print(f"Total models: {openai_models['total']}")
for model in openai_models["models"]:
    print(f"  {model['id']}")

List Known Providers

providers = await gateway.list_known_providers()
print(providers)
# ["openai", "anthropic", "google", "azure", "mistral", ...]

Model Naming Convention

Models are identified using the provider/model-name format:

Provider Example Models
OpenAI openai/gpt-4o, openai/gpt-4o-mini, openai/text-embedding-3-small
Anthropic anthropic/claude-sonnet-4-20250514, anthropic/claude-haiku-4-20250414
Google google/gemini-2.0-flash
Mistral mistral/mistral-large-latest

The gateway handles provider routing, API key management, and format translation automatically. Your code always uses the same OpenAI-compatible interface regardless of the underlying provider.


Error Handling

GatewayClient raises typed exceptions based on HTTP status codes:

from flow_sdk.gateway_client import (
    GatewayClient,
    GatewayClientError,
    GatewayAuthError,
    GatewayRateLimitError,
    GatewayModelNotFoundError,
    GatewayProviderError,
)

try:
    response = await gateway.chat_completion(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Hello"}],
    )
except GatewayAuthError:
    print("Authentication failed -- check your token")
except GatewayModelNotFoundError:
    print("Model not available -- check model ID")
except GatewayRateLimitError:
    print("Rate limited -- implement backoff and retry")
except GatewayProviderError:
    print("Upstream provider error -- try again later")
except GatewayClientError as e:
    print(f"Gateway error: {e}")

Exception Reference

Exception HTTP Status When
GatewayAuthError 401, 403 Invalid or expired token, insufficient permissions
GatewayModelNotFoundError 404 Model ID not recognized or not available for your org
GatewayRateLimitError 429 Request rate or token quota exceeded
GatewayProviderError 502 Upstream LLM provider returned an error
GatewayClientError Other Base class for all gateway errors

Usage Tracking

Every request through the gateway is tracked for billing and observability. Token usage is returned in the response usage field:

response = await gateway.chat_completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

usage = response["usage"]
print(f"Prompt tokens:     {usage['prompt_tokens']}")
print(f"Completion tokens: {usage['completion_tokens']}")
print(f"Total tokens:      {usage['total_tokens']}")

Usage data is automatically attributed to your organization and workspace for cost allocation.


Resource Cleanup

Always close the client when you are done, or use the async context manager:

async with GatewayClient() as gateway:
    response = await gateway.chat_completion(...)
gateway = GatewayClient()
try:
    response = await gateway.chat_completion(...)
finally:
    await gateway.close()