Skip to content

Creating Services (Beta)

This guide walks through creating a hosted service, configuring it, managing versions, and publishing.

Creating a Service

From the UI

  1. Navigate to Hosted Services in the left sidebar.
  2. Click New Service.
  3. Fill in the required fields:
  4. Name -- human-readable label (e.g., "Document Processor")
  5. Slug -- URL-safe identifier, lowercase alphanumeric with hyphens (e.g., document-processor). Immutable after creation.
  6. Execution Mode -- invocation (default) or persistent.
  7. Click Create.

The service starts in draft status with version 1 automatically created.

From the SDK

Coming Soon

The Python SDK for local development is not yet publicly available.

from flow_sdk.cli_client import CLIClient
from flow_sdk.config import FlowConfig

config = FlowConfig.load()
client = CLIClient(config)

# Create via the management API
import httpx

resp = httpx.post(
    f"{config.platform_url}/api/v1/orgs/{config.org_id}/workspaces/{config.workspace_id}/hosted-services",
    headers={"Authorization": f"Bearer {config.access_token}"},
    json={
        "slug": "document-processor",
        "name": "Document Processor",
        "execution_mode": "invocation",
        "ring_id": "uuid-of-ring",  # required for invocation mode — get from Admin > Deployments > Rings
        "description": "Processes and summarizes uploaded documents",
    },
)
service = resp.json()
print(f"Service created: {service['id']}")

ring_id is required for invocation mode

ring_id is required when execution_mode is invocation. Get the ring ID from Admin > Deployments > Rings or via the rings API.

# Service creation is managed through the UI or direct API calls.
# Use curl with your platform credentials:
curl -X POST \
  "https://api.flow.marut.cloud/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "slug": "document-processor",
    "name": "Document Processor",
    "execution_mode": "invocation",
    "ring_id": "uuid-of-ring"
  }'

Service Configuration

Execution Mode

Choose the execution mode at creation time. It can be changed later on draft services.

Mode Best for Cold start Scaling
invocation Stateless, fast operations None Automatic per-request
persistent Stateful, ML inference, high throughput Container startup time Replica-based autoscaling

Persistent Mode Settings

When using persistent execution mode, you can configure deployment parameters:

{
  "slug": "ml-inference",
  "name": "ML Inference Service",
  "execution_mode": "persistent",
  "min_replicas": 1,
  "max_replicas": 5,
  "concurrency_per_replica": 80,
  "scale_threshold": null,
  "startup_timeout_seconds": 120,
  "base_image": null,
  "system_packages": ["libgomp1"]
}
Field Description Default
min_replicas Minimum running instances (0 allows scale-to-zero) 0
max_replicas Maximum instances under load 10
concurrency_per_replica Max concurrent requests per instance 80
scale_threshold Custom scaling metric threshold auto
startup_timeout_seconds How long to wait for container startup null
base_image Custom Docker base image Platform default
system_packages OS packages to install in the container []

Connector Bindings

Services can bind to platform connectors for database access, external APIs, and storage:

{
  "connector_bindings": {
    "primary_db": {
      "connector_instance_id": "uuid-of-postgres-instance",
      "role": "read_write"
    },
    "s3_storage": {
      "connector_instance_id": "uuid-of-s3-instance",
      "role": "read_write"
    }
  }
}

Bound connectors are available to your endpoint code at runtime through the execution context.

Environment and Ring Assignment

Services can be scoped to a specific environment and deployment ring:

{
  "environment_id": "uuid-of-environment",
  "ring_id": "uuid-of-deployment-ring"
}

Version Management

Versions are the unit of deployment for hosted services. Each version is an immutable snapshot of endpoint definitions.

Version Lifecycle

stateDiagram-v2
    [*] --> Draft: Create version
    Draft --> Active: Publish
    Active --> Draining: New version published
    Draining --> Archived: Traffic drained
    Archived --> [*]

Creating a New Version

When you need to update endpoints, create a new version:

resp = httpx.post(
    f"{config.platform_url}/api/v1/orgs/{org_id}/workspaces/{ws_id}"
    f"/hosted-services/{service_id}/versions",
    headers={"Authorization": f"Bearer {token}"},
)
version = resp.json()
print(f"Created version {version['version']} (status: {version['status']})")
curl -X POST \
  "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/versions" \
  -H "Authorization: Bearer $TOKEN"

The new version is in draft status. You can add, edit, and remove endpoints on a draft version.

Publishing a Version

Publishing freezes the endpoint definitions into an immutable snapshot and makes the version live:

resp = httpx.post(
    f"{config.platform_url}/api/v1/orgs/{org_id}/workspaces/{ws_id}"
    f"/hosted-services/{service_id}/versions/{version_id}/publish",
    headers={"Authorization": f"Bearer {token}"},
    json={"traffic_percent": 100},
)
published = resp.json()
print(f"Version {published['version']} is now active")
curl -X POST \
  "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/versions/$VERSION_ID/publish" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"traffic_percent": 100}'

Publishing is irreversible

Once a version is published, its endpoint definitions cannot be changed. To make changes, create a new version.

When a version is published, the platform:

  1. Freezes all endpoint definitions into the version's endpoint_snapshot.
  2. Generates a skill_snapshot for agent tool discovery.
  3. Sets the version as the service's active version.
  4. Marks the previous active version as draining.

Traffic Splitting

When publishing, you can control how much traffic the new version receives:

{"traffic_percent": 50}

This allows canary deployments where the new version receives a percentage of traffic while the previous version handles the rest.

Deploying Persistent Services

Persistent-mode services require an explicit build and deploy step after publishing.

Build and Deploy

# Build and deploy in one step (default)
curl -X POST \
  "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/deploy" \
  -H "Authorization: Bearer $TOKEN"

# Build only (without deploying)
curl -X POST \
  "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/build" \
  -H "Authorization: Bearer $TOKEN"

# Deploy a pre-built image (build=false)
curl -X POST \
  "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/deploy?build=false" \
  -H "Authorization: Bearer $TOKEN"

Deployment States

Persistent services move through deployment states:

State Description
building Container image is being built
built Image built, ready to deploy
deploying Deploying to Cloud Run
active Running and serving traffic
failed Build or deploy failed
draining Being replaced by a new deployment

Checking Deployment Status

curl "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/deployment-status" \
  -H "Authorization: Bearer $TOKEN"

Response:

{
  "deployment_state": "active",
  "health": "healthy",
  "message": null,
  "synced": true
}

Tearing Down

To remove a persistent deployment (stops the running container):

curl -X POST \
  "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/teardown" \
  -H "Authorization: Bearer $TOKEN"

Cloud Credentials

Persistent services deploy to cloud infrastructure. Credentials can be configured at two levels:

Per-Service Credentials

curl -X PUT \
  "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/service-credentials" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "cloud_provider": "gcp",
    "cloud_credentials_json": "{...GCP service account key...}",
    "cloud_project_id": "my-project",
    "cloud_region": "us-central1"
  }'

Organization-Level Credentials

Shared across all services in the organization:

curl -X PUT \
  "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/credentials" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "cloud_provider": "gcp",
    "cloud_credentials_json": "{...}",
    "cloud_project_id": "org-project"
  }'

Credential precedence

Per-service credentials take priority over organization-level credentials. This lets you run most services on shared infrastructure while isolating specific services to dedicated projects.

Generated Client Stubs

The platform auto-generates typed client code for your published services:

curl "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/stubs/python" \
  -H "Authorization: Bearer $TOKEN" \
  -o client.py
curl "$API_URL/api/v1/orgs/$ORG_ID/workspaces/$WS_ID/hosted-services/$SERVICE_ID/stubs/typescript" \
  -H "Authorization: Bearer $TOKEN" \
  -o client.ts

The generated stubs include typed methods for each endpoint, matching your request/response schemas.