Skip to content

Vector Search

Vector search enables semantic queries over dataset records using embeddings. Instead of matching exact keywords, vector search finds records that are similar in meaning to a query -- powering RAG (Retrieval-Augmented Generation) pipelines, document retrieval, recommendation systems, and deduplication.


How Vector Search Works

graph LR
    Query["User Query<br/>'How do I reset my password?'"] --> Embed["Embedding Model<br/>text-embedding-3-small"]
    Embed --> Vector["Query Vector<br/>[0.12, -0.34, 0.56, ...]"]
    Vector --> Search["Vector Search<br/>(cosine similarity)"]
    Index["Vector Index<br/>(stored embeddings)"] --> Search
    Search --> Results["Top-K Results<br/>(nearest neighbors)"]
  1. The user's query is converted to a vector using an embedding model
  2. The vector is compared against stored embeddings using a distance metric
  3. The most similar records are returned, ranked by distance

Embeddings

An embedding is a fixed-length array of numbers that represents the semantic meaning of text. Similar texts produce similar embeddings, which enables similarity-based retrieval.

Embedding Models

Manifest Platform supports embedding generation through the AI Gateway. Common models:

Model Dimensions Best For
text-embedding-3-small 1536 General-purpose, cost-effective
text-embedding-3-large 3072 Higher accuracy, larger index
text-embedding-ada-002 1536 Legacy compatibility

Model consistency

Always use the same embedding model for indexing and querying. Mixing models produces incompatible vectors and meaningless similarity scores.


Creating Vector-Enabled Datasets

To use vector search, your dataset needs a column that stores embedding vectors. You can populate this column through connector sources or by generating embeddings from text fields.

Schema with Vector Field

Coming Soon

The Python SDK for local development is not yet publicly available.

from flow_sdk.cli_client import CLIClient

client = CLIClient(config)

dataset = client.datasets.create({
    "name": "Knowledge Base",
    "slug": "knowledge-base",
    "description": "Support articles with vector embeddings for semantic search",
    "tags": ["rag", "knowledge-base"],
})

# Note: Schema definitions (e.g., vector fields like "embedding: vector(1536)")
# are registered separately via the dataset schema API, not inline at creation time.

Storing Vectors via Connector Write Jobs

Use connector write jobs to insert records with embeddings:

result = sdk.connector_instances.submit_write_job(
    instance_id="uuid-of-postgres-instance",
    scope_type="organization",
    scope_id="uuid-of-org",
    operations=[
        {
            "kind": "vector",
            "index_id": "uuid-of-vector-index",
            "vectors": [
                {
                    "id": "article-001",
                    "values": [0.12, -0.34, 0.56, ...],  # 1536 dimensions
                    "metadata": {
                        "title": "How to Reset Your Password",
                        "category": "account",
                        "content": "To reset your password, go to Settings..."
                    }
                },
                {
                    "id": "article-002",
                    "values": [0.23, -0.45, 0.67, ...],
                    "metadata": {
                        "title": "Billing FAQ",
                        "category": "billing",
                        "content": "We accept all major credit cards..."
                    }
                }
            ]
        }
    ],
)

print(f"Write job submitted: {result.job_id}")
print(f"Poll status at: {result.poll_url}")

Find records semantically similar to a query vector:

result = sdk.datasets.query(
    dataset_id="uuid-of-dataset",
    vector_field="embedding",
    vector_query_embedding=[0.12, -0.34, 0.56, ...],  # query vector
    vector_top_k=10,
    vector_distance_metric="cosine",
)

for row in result.rows:
    print(f"  {row['title']} (distance: {row.get('_distance', 'N/A')})")

Distance Metrics

Metric Description When to Use
cosine Cosine similarity (1 - cosine distance) Default. Best for normalized embeddings
l2 Euclidean (L2) distance When magnitude matters
inner_product Dot product When vectors are not normalized

Combining Vector and Scalar Filters

You can filter results using both vector similarity and traditional field filters:

result = sdk.datasets.query(
    dataset_id="uuid-of-dataset",
    # Vector search
    vector_field="embedding",
    vector_query_embedding=[0.12, -0.34, 0.56, ...],
    vector_top_k=20,
    vector_distance_metric="cosine",
    # Scalar filter -- only search within "billing" category
    filters=[
        {"field": "category", "operator": "eq", "value": "billing"},
    ],
)

This first filters by category = "billing", then ranks the matching records by vector similarity.

Vector-Based Sorting

Sort results by distance to a target vector using the vector_sort parameter:

result = sdk.datasets.query(
    dataset_id="uuid-of-dataset",
    sort_by="embedding",
    sort_order="asc",           # asc = nearest first
    vector_sort={
        "target_vector": [0.12, -0.34, 0.56, ...],
        "distance_metric": "cosine",
    },
    limit=10,
)

Distance-Based Filtering

Filter records by their distance from a target vector:

result = sdk.datasets.query(
    dataset_id="uuid-of-dataset",
    filters=[
        {
            "field": "embedding",
            "operator": "lte",
            "value": 0.3,
            "vector_distance": {
                "target_vector": [0.12, -0.34, 0.56, ...],
                "distance_metric": "cosine",
            },
        }
    ],
)

This returns only records within a cosine distance of 0.3 from the target vector.


Use Cases

RAG (Retrieval-Augmented Generation)

The most common vector search use case. An agent retrieves relevant context from a knowledge base before generating a response.

graph LR
    Q["User Question"] --> E1["Embed Query"]
    E1 --> VS["Vector Search<br/>Knowledge Base"]
    VS --> Context["Top-K Documents"]
    Context --> Agent["Agent"]
    Q --> Agent
    Agent --> Answer["Grounded Answer"]

A typical RAG flow:

  1. User asks a question
  2. The question is embedded using the same model as the index
  3. Vector search retrieves the 5-10 most relevant documents
  4. Retrieved documents are included in the agent's context
  5. The agent generates an answer grounded in the retrieved content

Semantic Deduplication

Find near-duplicate records by searching for vectors with very high similarity:

# For each record, search for similar records
result = sdk.datasets.query(
    dataset_id="uuid-of-dataset",
    vector_field="embedding",
    vector_query_embedding=record_embedding,
    vector_top_k=5,
    vector_distance_metric="cosine",
)

# Records with distance < 0.05 are likely duplicates
duplicates = [r for r in result.rows if r["_distance"] < 0.05 and r["id"] != record_id]

Recommendation

Find items similar to a user's preferences by embedding user behavior and comparing against item embeddings:

# Embed the user's recent interactions
user_vector = embed(user_interaction_history)

# Find similar items
result = sdk.datasets.query(
    dataset_id="product-catalog",
    vector_field="product_embedding",
    vector_query_embedding=user_vector,
    vector_top_k=20,
    filters=[
        {"field": "in_stock", "operator": "eq", "value": True},
    ],
)

Vector API

The vector index API provides direct REST access for managing indexes and vectors, independent of the dataset query layer. All paths are under /orgs/{org_id}/vector.

Index Endpoints

Method Path Description
POST /vector/indexes Create a new vector index
GET /vector/indexes List indexes (paginated)
GET /vector/indexes/{index_id} Get index details and metadata
DELETE /vector/indexes/{index_id} Delete an index and all its vectors

Create Index

import httpx

response = httpx.post(
    "https://api.flow.marut.cloud/api/v1/orgs/{org_id}/vector/indexes",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "name": "knowledge-base",
        "dimensions": 1536,
        "metric": "cosine",
        "workspace_id": "uuid-of-workspace",
        "description": "Support articles for RAG",
        "embedding_model": "text-embedding-3-small",
    },
)
index = response.json()
print(f"Created index: {index['id']}")
Field Type Required Description
name string Yes Index name (1-200 chars)
dimensions integer Yes Vector dimensionality, must match embedding model (1-4096)
metric string No Distance metric: cosine (default), l2, inner_product
workspace_id UUID No Scope index to a specific workspace
description string No Human-readable description
embedding_model string No Embedding model used (e.g., text-embedding-3-small)
metadata_schema object No JSON schema for vector metadata fields

Vector Endpoints

Method Path Description
POST /vector/indexes/{index_id}/upsert Insert or update vectors
POST /vector/indexes/{index_id}/search Semantic similarity search
POST /vector/indexes/{index_id}/delete Delete vectors by ID

Upsert Vectors

response = httpx.post(
    f"https://api.flow.marut.cloud/api/v1/orgs/{org_id}/vector/indexes/{index_id}/upsert",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "vectors": [
            {
                "id": "article-001",
                "embedding": [0.12, -0.34, 0.56, ...],  # 1536 floats
                "content": "To reset your password, go to Settings...",
                "metadata": {"category": "account", "title": "Password Reset"},
            },
        ]
    },
)
print(response.json())  # {"upserted": 1}
Field Type Required Description
vectors array Yes List of vector objects
vectors[].id string No Stable identifier; auto-generated if omitted
vectors[].embedding float[] Yes The embedding vector (must match index dimensions)
vectors[].content string No Source text associated with the vector
vectors[].metadata object No Arbitrary key-value metadata for filtering

Search Vectors

response = httpx.post(
    f"https://api.flow.marut.cloud/api/v1/orgs/{org_id}/vector/indexes/{index_id}/search",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "query_embedding": [0.12, -0.34, 0.56, ...],
        "top_k": 5,
        "filter_metadata": {"category": "account"},
    },
)
for result in response.json()["results"]:
    print(f"  {result['id']} (distance: {result['distance']})")
Field Type Required Description
query_embedding float[] Yes The query vector to search against
top_k integer No Number of nearest neighbors to return (default: 10, max: 1000)
filter_metadata object No Metadata key/value pairs to pre-filter candidates before ranking

Delete Vectors

httpx.post(
    f"https://api.flow.marut.cloud/api/v1/orgs/{org_id}/vector/indexes/{index_id}/delete",
    headers={"Authorization": f"Bearer {token}"},
    json={"vector_ids": ["article-001", "article-002"]},
)

Best Practices

Embedding Quality

  • Chunk text appropriately -- For long documents, split into paragraphs or sections (300-500 tokens each). Embedding an entire 10-page document into one vector loses detail.
  • Include metadata in chunks -- Prepend the document title or section header to each chunk before embedding for better retrieval.
  • Normalize consistently -- Use the same preprocessing (lowercasing, whitespace normalization) during both indexing and querying.

Index Performance

  • Choose dimensions wisely -- text-embedding-3-small (1536 dims) is a good default. Only use text-embedding-3-large (3072 dims) if you need higher accuracy and can afford the storage/latency cost.
  • Use top_k judiciously -- Retrieving 5-10 results is usually sufficient for RAG. Larger top_k values increase latency without proportional quality improvement.
  • Combine with scalar filters -- Pre-filter on metadata fields (category, date, status) before vector ranking to reduce the search space and improve relevance.

Cost Management

  • Cache embeddings -- Embedding the same text repeatedly wastes API calls. Store computed embeddings alongside the source text.
  • Batch embedding requests -- When indexing many documents, batch them into single API calls (most providers support this).
  • Monitor index size -- Each vector consumes storage proportional to its dimensionality. A million 1536-dimension vectors uses roughly 6 GB of storage.

Vector dimensions must match

All vectors in an index must have the same number of dimensions. You cannot mix embeddings from different models in the same vector field. If you switch embedding models, you must re-embed all existing records.