Concept

Recall pipeline

A single recall request triggers three parallel searches in knowmind and fuses the results. This page describes the stages, their selection criteria and their order.

Why knowmind combines three searches

A pure full-text search finds literal matches but misses paraphrased content. A pure vector search finds paraphrases but tends to surface topically similar yet factually wrong hits. A knowledge graph finds tightly related entities without ever having read their content. knowmind combines all three and re-ranks the result with a cross-encoder — more robust than any single method.

The stages

A single recall request fans out into five parallel searches. RRF fuses the rankings; the cross-encoder reranks the top results.

BM25 (full text): PostgreSQL full-text index with a tenant-specific language configuration (default: German). Finds literal matches quickly and is robust against typos.
Vector search: embeddings with intfloat/multilingual-e5-large (1024 dimensions), HNSW index in pgvector. Finds semantically similar content regardless of word choice.
Graph hops: Personalized PageRank across the Knowledge Graph in Neo4j, starting from the nodes returned by the first two stages. Default depth: two hops. Surfaces entities that are not in the question but are tightly connected (people to projects, contracts to clients).
Identity and relation stage: heuristics for questions that target a specific entity or a typical relation — "Who is …", "Which contracts with …".
Reciprocal-Rank Fusion: the rankings from each stage are fused with RRF (k=60). Hits that perform well in multiple stages float to the top.
Cross-encoder rerank: the top candidates pass through a BGE reranker. This is the most expensive step but runs only on a short list — and the final ranking is noticeably more precise than any single stage.

What this means in practice

You do not need to guess whether a question is "semantic" or "factual" — the pipeline combines both perspectives.
Relations in the Knowledge Graph pay off. An isolated memory is findable; one with three relations brings more context on related questions.
Multilingual corpora work because the embedding model is multilingual. The language of the question and the language of the memory do not have to match.

Benchmarks

The pipeline is evaluated against a public benchmark: github.com/Schubeler-Consulting/knowmind-benchmark. Reproducible comparisons against full text, classic vector RAG, and filename search; on the corpus documented there the knowmind pipeline reaches a noticeably higher recall.

Why knowmind combines three searches

The stages

What this means in practice

Benchmarks

Related