Vector Search¶
Vector search enables semantic queries over dataset records using embeddings. Instead of matching exact keywords, vector search finds records that are similar in meaning to a query -- powering RAG (Retrieval-Augmented Generation) pipelines, document retrieval, recommendation systems, and deduplication.
How Vector Search Works¶
graph LR
Query["User Query<br/>'How do I reset my password?'"] --> Embed["Embedding Model<br/>text-embedding-3-small"]
Embed --> Vector["Query Vector<br/>[0.12, -0.34, 0.56, ...]"]
Vector --> Search["Vector Search<br/>(cosine similarity)"]
Index["Vector Index<br/>(stored embeddings)"] --> Search
Search --> Results["Top-K Results<br/>(nearest neighbors)"]
- The user's query is converted to a vector using an embedding model
- The vector is compared against stored embeddings using a distance metric
- The most similar records are returned, ranked by distance
Embeddings¶
An embedding is a fixed-length array of numbers that represents the semantic meaning of text. Similar texts produce similar embeddings, which enables similarity-based retrieval.
Embedding Models¶
Manifest Platform supports embedding generation through the AI Gateway. Common models:
| Model | Dimensions | Best For |
|---|---|---|
text-embedding-3-small |
1536 | General-purpose, cost-effective |
text-embedding-3-large |
3072 | Higher accuracy, larger index |
text-embedding-ada-002 |
1536 | Legacy compatibility |
Model consistency
Always use the same embedding model for indexing and querying. Mixing models produces incompatible vectors and meaningless similarity scores.
Creating Vector-Enabled Datasets¶
To use vector search, your dataset needs a column that stores embedding vectors. You can populate this column through connector sources or by generating embeddings from text fields.
Schema with Vector Field¶
Coming Soon
The Python SDK for local development is not yet publicly available.
from flow_sdk.cli_client import CLIClient
client = CLIClient(config)
dataset = client.datasets.create({
"name": "Knowledge Base",
"slug": "knowledge-base",
"description": "Support articles with vector embeddings for semantic search",
"tags": ["rag", "knowledge-base"],
})
# Note: Schema definitions (e.g., vector fields like "embedding: vector(1536)")
# are registered separately via the dataset schema API, not inline at creation time.
Storing Vectors via Connector Write Jobs¶
Use connector write jobs to insert records with embeddings:
result = sdk.connector_instances.submit_write_job(
instance_id="uuid-of-postgres-instance",
scope_type="organization",
scope_id="uuid-of-org",
operations=[
{
"kind": "vector",
"index_id": "uuid-of-vector-index",
"vectors": [
{
"id": "article-001",
"values": [0.12, -0.34, 0.56, ...], # 1536 dimensions
"metadata": {
"title": "How to Reset Your Password",
"category": "account",
"content": "To reset your password, go to Settings..."
}
},
{
"id": "article-002",
"values": [0.23, -0.45, 0.67, ...],
"metadata": {
"title": "Billing FAQ",
"category": "billing",
"content": "We accept all major credit cards..."
}
}
]
}
],
)
print(f"Write job submitted: {result.job_id}")
print(f"Poll status at: {result.poll_url}")
Querying with Vector Search¶
Semantic Similarity Search¶
Find records semantically similar to a query vector:
result = sdk.datasets.query(
dataset_id="uuid-of-dataset",
vector_field="embedding",
vector_query_embedding=[0.12, -0.34, 0.56, ...], # query vector
vector_top_k=10,
vector_distance_metric="cosine",
)
for row in result.rows:
print(f" {row['title']} (distance: {row.get('_distance', 'N/A')})")
Distance Metrics¶
| Metric | Description | When to Use |
|---|---|---|
cosine |
Cosine similarity (1 - cosine distance) | Default. Best for normalized embeddings |
l2 |
Euclidean (L2) distance | When magnitude matters |
inner_product |
Dot product | When vectors are not normalized |
Combining Vector and Scalar Filters¶
You can filter results using both vector similarity and traditional field filters:
result = sdk.datasets.query(
dataset_id="uuid-of-dataset",
# Vector search
vector_field="embedding",
vector_query_embedding=[0.12, -0.34, 0.56, ...],
vector_top_k=20,
vector_distance_metric="cosine",
# Scalar filter -- only search within "billing" category
filters=[
{"field": "category", "operator": "eq", "value": "billing"},
],
)
This first filters by category = "billing", then ranks the matching records by vector similarity.
Vector-Based Sorting¶
Sort results by distance to a target vector using the vector_sort parameter:
result = sdk.datasets.query(
dataset_id="uuid-of-dataset",
sort_by="embedding",
sort_order="asc", # asc = nearest first
vector_sort={
"target_vector": [0.12, -0.34, 0.56, ...],
"distance_metric": "cosine",
},
limit=10,
)
Distance-Based Filtering¶
Filter records by their distance from a target vector:
result = sdk.datasets.query(
dataset_id="uuid-of-dataset",
filters=[
{
"field": "embedding",
"operator": "lte",
"value": 0.3,
"vector_distance": {
"target_vector": [0.12, -0.34, 0.56, ...],
"distance_metric": "cosine",
},
}
],
)
This returns only records within a cosine distance of 0.3 from the target vector.
Use Cases¶
RAG (Retrieval-Augmented Generation)¶
The most common vector search use case. An agent retrieves relevant context from a knowledge base before generating a response.
graph LR
Q["User Question"] --> E1["Embed Query"]
E1 --> VS["Vector Search<br/>Knowledge Base"]
VS --> Context["Top-K Documents"]
Context --> Agent["Agent"]
Q --> Agent
Agent --> Answer["Grounded Answer"]
A typical RAG flow:
- User asks a question
- The question is embedded using the same model as the index
- Vector search retrieves the 5-10 most relevant documents
- Retrieved documents are included in the agent's context
- The agent generates an answer grounded in the retrieved content
Semantic Deduplication¶
Find near-duplicate records by searching for vectors with very high similarity:
# For each record, search for similar records
result = sdk.datasets.query(
dataset_id="uuid-of-dataset",
vector_field="embedding",
vector_query_embedding=record_embedding,
vector_top_k=5,
vector_distance_metric="cosine",
)
# Records with distance < 0.05 are likely duplicates
duplicates = [r for r in result.rows if r["_distance"] < 0.05 and r["id"] != record_id]
Recommendation¶
Find items similar to a user's preferences by embedding user behavior and comparing against item embeddings:
# Embed the user's recent interactions
user_vector = embed(user_interaction_history)
# Find similar items
result = sdk.datasets.query(
dataset_id="product-catalog",
vector_field="product_embedding",
vector_query_embedding=user_vector,
vector_top_k=20,
filters=[
{"field": "in_stock", "operator": "eq", "value": True},
],
)
Vector API¶
The vector index API provides direct REST access for managing indexes and vectors, independent of the dataset query layer. All paths are under /orgs/{org_id}/vector.
Index Endpoints¶
| Method | Path | Description |
|---|---|---|
POST |
/vector/indexes |
Create a new vector index |
GET |
/vector/indexes |
List indexes (paginated) |
GET |
/vector/indexes/{index_id} |
Get index details and metadata |
DELETE |
/vector/indexes/{index_id} |
Delete an index and all its vectors |
Create Index¶
import httpx
response = httpx.post(
"https://api.flow.marut.cloud/api/v1/orgs/{org_id}/vector/indexes",
headers={"Authorization": f"Bearer {token}"},
json={
"name": "knowledge-base",
"dimensions": 1536,
"metric": "cosine",
"workspace_id": "uuid-of-workspace",
"description": "Support articles for RAG",
"embedding_model": "text-embedding-3-small",
},
)
index = response.json()
print(f"Created index: {index['id']}")
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Index name (1-200 chars) |
dimensions |
integer | Yes | Vector dimensionality, must match embedding model (1-4096) |
metric |
string | No | Distance metric: cosine (default), l2, inner_product |
workspace_id |
UUID | No | Scope index to a specific workspace |
description |
string | No | Human-readable description |
embedding_model |
string | No | Embedding model used (e.g., text-embedding-3-small) |
metadata_schema |
object | No | JSON schema for vector metadata fields |
Vector Endpoints¶
| Method | Path | Description |
|---|---|---|
POST |
/vector/indexes/{index_id}/upsert |
Insert or update vectors |
POST |
/vector/indexes/{index_id}/search |
Semantic similarity search |
POST |
/vector/indexes/{index_id}/delete |
Delete vectors by ID |
Upsert Vectors¶
response = httpx.post(
f"https://api.flow.marut.cloud/api/v1/orgs/{org_id}/vector/indexes/{index_id}/upsert",
headers={"Authorization": f"Bearer {token}"},
json={
"vectors": [
{
"id": "article-001",
"embedding": [0.12, -0.34, 0.56, ...], # 1536 floats
"content": "To reset your password, go to Settings...",
"metadata": {"category": "account", "title": "Password Reset"},
},
]
},
)
print(response.json()) # {"upserted": 1}
| Field | Type | Required | Description |
|---|---|---|---|
vectors |
array | Yes | List of vector objects |
vectors[].id |
string | No | Stable identifier; auto-generated if omitted |
vectors[].embedding |
float[] | Yes | The embedding vector (must match index dimensions) |
vectors[].content |
string | No | Source text associated with the vector |
vectors[].metadata |
object | No | Arbitrary key-value metadata for filtering |
Search Vectors¶
response = httpx.post(
f"https://api.flow.marut.cloud/api/v1/orgs/{org_id}/vector/indexes/{index_id}/search",
headers={"Authorization": f"Bearer {token}"},
json={
"query_embedding": [0.12, -0.34, 0.56, ...],
"top_k": 5,
"filter_metadata": {"category": "account"},
},
)
for result in response.json()["results"]:
print(f" {result['id']} (distance: {result['distance']})")
| Field | Type | Required | Description |
|---|---|---|---|
query_embedding |
float[] | Yes | The query vector to search against |
top_k |
integer | No | Number of nearest neighbors to return (default: 10, max: 1000) |
filter_metadata |
object | No | Metadata key/value pairs to pre-filter candidates before ranking |
Delete Vectors¶
httpx.post(
f"https://api.flow.marut.cloud/api/v1/orgs/{org_id}/vector/indexes/{index_id}/delete",
headers={"Authorization": f"Bearer {token}"},
json={"vector_ids": ["article-001", "article-002"]},
)
Best Practices¶
Embedding Quality¶
- Chunk text appropriately -- For long documents, split into paragraphs or sections (300-500 tokens each). Embedding an entire 10-page document into one vector loses detail.
- Include metadata in chunks -- Prepend the document title or section header to each chunk before embedding for better retrieval.
- Normalize consistently -- Use the same preprocessing (lowercasing, whitespace normalization) during both indexing and querying.
Index Performance¶
- Choose dimensions wisely --
text-embedding-3-small(1536 dims) is a good default. Only usetext-embedding-3-large(3072 dims) if you need higher accuracy and can afford the storage/latency cost. - Use
top_kjudiciously -- Retrieving 5-10 results is usually sufficient for RAG. Largertop_kvalues increase latency without proportional quality improvement. - Combine with scalar filters -- Pre-filter on metadata fields (category, date, status) before vector ranking to reduce the search space and improve relevance.
Cost Management¶
- Cache embeddings -- Embedding the same text repeatedly wastes API calls. Store computed embeddings alongside the source text.
- Batch embedding requests -- When indexing many documents, batch them into single API calls (most providers support this).
- Monitor index size -- Each vector consumes storage proportional to its dimensionality. A million 1536-dimension vectors uses roughly 6 GB of storage.
Vector dimensions must match
All vectors in an index must have the same number of dimensions. You cannot mix embeddings from different models in the same vector field. If you switch embedding models, you must re-embed all existing records.