Creating Services (Beta)¶

This guide walks through creating a hosted service, configuring it, managing versions, and publishing.

Creating a Service¶

From the UI¶

Navigate to Hosted Services in the left sidebar.
Click New Service.
Fill in the required fields:
Name -- human-readable label (e.g., "Document Processor")
Slug -- URL-safe identifier, lowercase alphanumeric with hyphens (e.g., document-processor). Immutable after creation.
Execution Mode -- invocation (default) or persistent.
Click Create.

The service starts in draft status with version 1 automatically created.

From the SDK¶

Python

Coming Soon

The Python SDK for local development is not yet publicly available.

from flow_sdk.cli_client import CLIClient
from flow_sdk.config import FlowConfig

config = FlowConfig.load()
client = CLIClient(config)

# Create via the management API
import httpx

resp = httpx.post(
    f"{config.platform_url}/api/v1/orgs/{config.org_id}/workspaces/{config.workspace_id}/hosted-services",
    headers={"Authorization": f"Bearer {config.access_token}"},
    json={
        "slug": "document-processor",
        "name": "Document Processor",
        "execution_mode": "invocation",
        "ring_id": "uuid-of-ring",  # required for invocation mode — get from Admin > Deployments > Rings
        "description": "Processes and summarizes uploaded documents",
    },
)
service = resp.json()
print(f"Service created: {service['id']}")

ring_id is required for invocation mode

ring_id is required when execution_mode is invocation. Get the ring ID from Admin > Deployments > Rings or via the rings API.

CLI

# Service creation is managed through the UI or direct API calls.
# Use curl with your platform credentials:
curl -X POST \
  "https://api.flow.marut.cloud/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "slug": "document-processor",
    "name": "Document Processor",
    "execution_mode": "invocation",
    "ring_id": "uuid-of-ring"
  }'

Service Configuration¶

Execution Mode¶

Choose the execution mode at creation time. It can be changed later on draft services.

Mode	Best for	Cold start	Scaling
`invocation`	Stateless, fast operations	None	Automatic per-request
`persistent`	Stateful, ML inference, high throughput	Container startup time	Replica-based autoscaling

Persistent Mode Settings¶

When using persistent execution mode, you can configure deployment parameters:

{
  "slug": "ml-inference",
  "name": "ML Inference Service",
  "execution_mode": "persistent",
  "min_replicas": 1,
  "max_replicas": 5,
  "concurrency_per_replica": 80,
  "scale_threshold": null,
  "startup_timeout_seconds": 120,
  "base_image": null,
  "system_packages": ["libgomp1"]
}

Field	Description	Default
`min_replicas`	Minimum running instances (0 allows scale-to-zero)	`0`
`max_replicas`	Maximum instances under load	`10`
`concurrency_per_replica`	Max concurrent requests per instance	`80`
`scale_threshold`	Custom scaling metric threshold	auto
`startup_timeout_seconds`	How long to wait for container startup	`null`
`base_image`	Custom Docker base image	Platform default
`system_packages`	OS packages to install in the container	`[]`

Connector Bindings¶

Services can bind to platform connectors for database access, external APIs, and storage:

{
  "connector_bindings": {
    "primary_db": {
      "connector_instance_id": "uuid-of-postgres-instance",
      "role": "read_write"
    },
    "s3_storage": {
      "connector_instance_id": "uuid-of-s3-instance",
      "role": "read_write"
    }
  }
}

Bound connectors are available to your endpoint code at runtime through the execution context.

Environment and Ring Assignment¶

Services can be scoped to a specific environment and deployment ring:

{
  "environment_id": "uuid-of-environment",
  "ring_id": "uuid-of-deployment-ring"
}

Version Management¶

Versions are the unit of deployment for hosted services. Each version is an immutable snapshot of endpoint definitions.

Version Lifecycle¶

stateDiagram-v2
    [*] --> Draft: Create version
    Draft --> Active: Publish
    Active --> Draining: New version published
    Draining --> Archived: Traffic drained
    Archived --> [*]

Creating a New Version¶

When you need to update endpoints, create a new version:

PythonCLI

resp = httpx.post(
    f"{config.platform_url}/api/v1/orgs/{org_id}/workspaces/{ws_id}"
    f"/hosted-services/{service_id}/versions",
    headers={"Authorization": f"Bearer {token}"},
)
version = resp.json()
print(f"Created version {version['version']} (status: {version['status']})")

curl -X POST \
  "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/versions" \
  -H "Authorization: Bearer $TOKEN"

The new version is in draft status. You can add, edit, and remove endpoints on a draft version.

Publishing a Version¶

Publishing freezes the endpoint definitions into an immutable snapshot and makes the version live:

PythonCLI

resp = httpx.post(
    f"{config.platform_url}/api/v1/orgs/{org_id}/workspaces/{ws_id}"
    f"/hosted-services/{service_id}/versions/{version_id}/publish",
    headers={"Authorization": f"Bearer {token}"},
    json={"traffic_percent": 100},
)
published = resp.json()
print(f"Version {published['version']} is now active")

curl -X POST \
  "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/versions/$VERSION_ID/publish" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"traffic_percent": 100}'

Publishing is irreversible

Once a version is published, its endpoint definitions cannot be changed. To make changes, create a new version.

When a version is published, the platform:

Freezes all endpoint definitions into the version's endpoint_snapshot.
Generates a skill_snapshot for agent tool discovery.
Sets the version as the service's active version.
Marks the previous active version as draining.

Traffic Splitting¶

When publishing, you can control how much traffic the new version receives:

{"traffic_percent": 50}

This allows canary deployments where the new version receives a percentage of traffic while the previous version handles the rest.

Deploying Persistent Services¶

Persistent-mode services require an explicit build and deploy step after publishing.

Build and Deploy¶

# Build and deploy in one step (default)
curl -X POST \
  "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/deploy" \
  -H "Authorization: Bearer $TOKEN"

# Build only (without deploying)
curl -X POST \
  "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/build" \
  -H "Authorization: Bearer $TOKEN"

# Deploy a pre-built image (build=false)
curl -X POST \
  "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/deploy?build=false" \
  -H "Authorization: Bearer $TOKEN"

Deployment States¶

Persistent services move through deployment states:

State	Description
`building`	Container image is being built
`built`	Image built, ready to deploy
`deploying`	Deploying to Cloud Run
`active`	Running and serving traffic
`failed`	Build or deploy failed
`draining`	Being replaced by a new deployment

Checking Deployment Status¶

curl "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/deployment-status" \
  -H "Authorization: Bearer $TOKEN"

Response:

{
  "deployment_state": "active",
  "health": "healthy",
  "message": null,
  "synced": true
}

Tearing Down¶

To remove a persistent deployment (stops the running container):

curl -X POST \
  "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/teardown" \
  -H "Authorization: Bearer $TOKEN"

Cloud Credentials¶

Persistent services deploy to cloud infrastructure. Credentials can be configured at two levels:

Per-Service Credentials¶

curl -X PUT \
  "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/service-credentials" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "cloud_provider": "gcp",
    "cloud_credentials_json": "{...GCP service account key...}",
    "cloud_project_id": "my-project",
    "cloud_region": "us-central1"
  }'

Organization-Level Credentials¶

Shared across all services in the organization:

curl -X PUT \
  "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/credentials" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "cloud_provider": "gcp",
    "cloud_credentials_json": "{...}",
    "cloud_project_id": "org-project"
  }'

Credential precedence

Per-service credentials take priority over organization-level credentials. This lets you run most services on shared infrastructure while isolating specific services to dedicated projects.

Generated Client Stubs¶

The platform auto-generates typed client code for your published services:

PythonTypeScript

curl "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/stubs/python" \
  -H "Authorization: Bearer $TOKEN" \
  -o client.py

curl "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/stubs/typescript" \
  -H "Authorization: Bearer $TOKEN" \
  -o client.ts

The generated stubs include typed methods for each endpoint, matching your request/response schemas.