Query documents using similarity search with optional reranking.
Primary endpoint for semantic search, question-answering, and RAG (Retrieval-Augmented Generation) applications. Returns documents ranked by relevance to the query text with similarity scores.
Query Types:
semantic (default): Approximate nearest-neighbor search using HNSW over cosine similarity of document embeddings. Optimal for question-answering, conceptual search,
and finding semantically related content without requiring exact keyword matches.lexical: Keyword-based text search (BM25 algorithm). Optimal for exact phrase matching, proper nouns,
and scenarios where keyword presence is more important than semantic similarity.hybrid: Combines semantic and lexical approaches with weighted scoring. Provides maximum recall by
identifying documents matching either semantically or lexically.Metadata Filtering: Narrow the search scope by applying metadata filters (e.g., search only documents
where category: "technical"). Only indexed fields can be used for filtering.
Filters are applied before similarity search for optimal efficiency.
Reranking (Advanced): Optionally enhance result quality using a cross-encoder reranking model. The reranker rescores the initial results using a more sophisticated model that evaluates the complete query-document pair (not solely embeddings). This adds 100-500ms latency but significantly improves precision for high-stakes applications.
Reranking Strategy: Set top_k higher than the desired final count (e.g., 50) to retrieve more
candidates from the initial search. Then configure rerank_top_n to the desired final count (e.g., 10)
to return only the most relevant documents after reranking. This two-stage approach maximizes both recall
and precision.
Performance Metrics: The response includes detailed timing breakdowns (embedding generation time, index query time, reranking time) to facilitate search pipeline optimization and latency analysis.
Similarity Scores: Each result includes a score field indicating relevance. Higher scores indicate
greater relevance. Score ranges and semantics vary by query type (semantic scores use cosine similarity,
lexical scores use BM25, hybrid scores combine both approaches).
The name of the vector store
Request to query documents.
Query content for automatic embedding
Query type: semantic, lexical, or hybrid
semantic, lexical, hybrid Number of search results to return
x >= 1Metadata filter expression
Reranking configuration. Presence enables reranking; omit to disable. Pass an empty object ({}) to enable reranking with system defaults.
[Deprecated: use rerank_config] Enable reranking of search results
[Deprecated: use rerank_config.model] Reranking model to use
[Deprecated: use rerank_config.top_n] Number of results after reranking
x >= 1[Deprecated: use rerank_config.instruction] Custom instruction for reranker
Include embedding vectors in response