The VectorStore trait provides a backend-agnostic interface for storing and searching vector embeddings. This is the foundation layer for building RAG (Retrieval-Augmented Generation) applications.
Key Features:
Backend-agnostic API supporting multiple vector databases
Type-safe error handling with Result[A]
Metadata filtering with composable DSL
Batch operations for efficient bulk processing
Built-in statistics and monitoring
Current Backends:
SQLite - File-based or in-memory storage (default)
pgvector - PostgreSQL with pgvector extension (production-ready)
Qdrant - Cloud-native vector database via REST API
importorg.llm4s.vectorstore._// Create an in-memory storevalstore=VectorStoreFactory.inMemory().fold(e=>thrownewRuntimeException(s"Failed: ${e.formatted}"),identity)// Store a vectorvalrecord=VectorRecord(id="doc-1",embedding=Array(0.1f,0.2f,0.3f),content=Some("Hello world"),metadata=Map("type"->"greeting"))store.upsert(record)// Search for similar vectorsvalresults=store.search(queryVector=Array(0.1f,0.2f,0.3f),topK=5)results.foreach{scored=>println(s"${scored.record.id}: ${scored.score}")}// Clean upstore.close()
File-Based Store
1
2
3
4
5
6
7
8
9
10
11
importorg.llm4s.vectorstore._// Create a persistent storevalstore=VectorStoreFactory.sqlite("/path/to/vectors.db").fold(e=>thrownewRuntimeException(s"Failed: ${e.formatted}"),identity)// Use the store...store.close()
importorg.llm4s.vectorstore._// Local PostgreSQL with defaultsvalstore=VectorStoreFactory.pgvector().fold(e=>thrownewRuntimeException(s"Failed: ${e.formatted}"),identity)// Or with explicit connection settingsvalstore2=VectorStoreFactory.pgvector(connectionString="jdbc:postgresql://localhost:5432/mydb",user="postgres",password="secret",tableName="embeddings").fold(...)// Create HNSW index for faster search (optional)store.asInstanceOf[PgVectorStore].createHnswIndex()store.close()
Setup: Requires PostgreSQL with pgvector extension:
1
CREATEEXTENSIONIFNOTEXISTSvector;
Qdrant
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
importorg.llm4s.vectorstore._// Local Qdrant (docker run -p 6333:6333 qdrant/qdrant)valstore=VectorStoreFactory.qdrant().fold(e=>thrownewRuntimeException(s"Failed: ${e.formatted}"),identity)// Or Qdrant CloudvalcloudStore=VectorStoreFactory.qdrantCloud(cloudUrl="https://your-cluster.qdrant.io",apiKey="your-api-key",collectionName="my_vectors").fold(...)store.close()
Core Concepts
VectorRecord
A VectorRecord represents a single entry in the vector store:
// With explicit IDvalrecord1=VectorRecord(id="doc-123",embedding=Array(0.1f,0.2f,0.3f),content=Some("Document text"),metadata=Map("source"->"wiki","lang"->"en"))// With auto-generated IDvalrecord2=VectorRecord.create(embedding=Array(0.1f,0.2f,0.3f),content=Some("Another document"))// Add metadata fluentlyvalrecord3=VectorRecord("id",Array(1.0f)).withMetadata("key1","value1").withMetadata(Map("key2"->"value2","key3"->"value3"))
ScoredRecord
Search results include similarity scores:
1
2
3
4
finalcaseclassScoredRecord(record:VectorRecord,score:Double// 0.0 to 1.0, higher is more similar)
Operations
CRUD Operations
1
2
3
4
5
6
7
8
9
10
11
12
// Single record operationsstore.upsert(record)// Insert or replacestore.get("doc-id")// Retrieve by IDstore.delete("doc-id")// Delete by ID// Batch operations (more efficient)store.upsertBatch(records)// Insert/replace multiplestore.getBatch(ids)// Retrieve multiplestore.deleteBatch(ids)// Delete multiple// Clear all recordsstore.clear()
Search
1
2
3
4
5
6
7
8
9
10
11
12
13
// Basic searchvalresults=store.search(queryVector=embeddingVector,topK=10)// Search with metadata filtervalfilter=MetadataFilter.Equals("type","document")valfiltered=store.search(queryVector=embeddingVector,topK=10,filter=Some(filter))
Listing and Pagination
1
2
3
4
5
6
7
8
9
// List all recordsvalall=store.list()// Paginate resultsvalpage1=store.list(limit=10,offset=0)valpage2=store.list(limit=10,offset=10)// List with filtervaldocs=store.list(filter=Some(MetadataFilter.Equals("type","doc")))
The MetadataFilter DSL allows composing complex filters:
Basic Filters
1
2
3
4
5
6
7
8
9
10
11
12
13
importorg.llm4s.vectorstore.MetadataFilter._// Exact matchvalbyType=Equals("type","document")// Contains substringvalbyContent=Contains("summary","Scala")// Has key (any value)valhasAuthor=HasKey("author")// Value in setvalbyLang=In("lang",Set("en","es","fr"))
Combining Filters
1
2
3
4
5
6
7
8
9
10
11
12
13
// AND - both conditions must matchvalandFilter=Equals("type","doc").and(Equals("lang","en"))// OR - either condition can matchvalorFilter=Equals("type","doc").or(Equals("type","article"))// NOT - negate a filtervalnotFilter=!Equals("archived","true")// Complex combinationsvalcomplex=Equals("type","doc").and(Equals("lang","en").or(Equals("lang","es"))).and(!Equals("draft","true"))
Using Filters
1
2
3
4
5
6
7
8
9
10
11
// In searchstore.search(queryVector,topK=10,filter=Some(byType))// In liststore.list(filter=Some(complex))// In countstore.count(filter=Some(byType))// Delete by filterstore.deleteByFilter(Equals("archived","true"))
Factory Configuration
Using VectorStoreFactory
1
2
3
4
5
6
7
8
9
10
11
12
13
14
importorg.llm4s.vectorstore._// In-memory (default)valmemStore=VectorStoreFactory.inMemory()// File-based SQLitevalfileStore=VectorStoreFactory.sqlite("/path/to/db.sqlite")// From provider namevalstore=VectorStoreFactory.create("sqlite",path=Some("/path/to/db.sqlite"))// From config objectvalconfig=VectorStoreFactory.Config.sqlite("/path/to/db.sqlite")valconfigStore=VectorStoreFactory.create(config)
Hybrid search combines vector similarity (semantic) with BM25 keyword matching for better retrieval quality. LLM4S provides a HybridSearcher that fuses results from both search types.
importorg.llm4s.vectorstore._// Create storesvalvectorStore=VectorStoreFactory.inMemory().getOrElse(???)valkeywordIndex=KeywordIndex.inMemory().getOrElse(???)// Create hybrid searchervalsearcher=HybridSearcher(vectorStore,keywordIndex)// Index documents in both storesvalembedding=Array(0.1f,0.2f,0.3f)vectorStore.upsert(VectorRecord("doc-1",embedding,Some("Scala programming language")))keywordIndex.index(KeywordDocument("doc-1","Scala programming language"))// Search with both vector and keywordvalresults=searcher.search(queryEmbedding=embedding,queryText="Scala",topK=10)results.foreach{r=>println(s"${r.id}: ${r.score} (vector: ${r.vectorScore}, keyword: ${r.keywordScore})")}searcher.close()
The KeywordIndex provides BM25-scored full-text search using SQLite FTS5:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
importorg.llm4s.vectorstore._// Create keyword indexvalindex=KeywordIndex.inMemory().getOrElse(???)// Index documentsindex.index(KeywordDocument("doc-1","Scala is a programming language"))index.index(KeywordDocument("doc-2","Python is also popular"))// Search with highlightsvalresults=index.searchWithHighlights("programming",topK=5)results.foreach{r=>println(s"${r.id}: ${r.score}")r.highlights.foreach(h=>println(s" ...${h}..."))}index.close()
Factory Methods
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// In-memory (development)valmemSearcher=HybridSearcher.inMemory().getOrElse(???)// File-based SQLitevalfileSearcher=HybridSearcher.sqlite(vectorDbPath="/path/to/vectors.db",keywordDbPath="/path/to/keywords.db").getOrElse(???)// From configurationvalconfig=HybridSearcher.Config().withVectorStore(VectorStoreFactory.Config.pgvector(...)).withRRF(k=60)valconfigSearcher=HybridSearcher(config).getOrElse(???)
Reranking
Reranking improves retrieval quality by re-scoring initial search results using a more powerful model (like Cohere’s cross-encoder). This is particularly useful when you retrieve many candidates and want to refine the ranking.
Quick Start
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
importorg.llm4s.vectorstore._importorg.llm4s.reranker._// Create hybrid searcher and rerankervalsearcher=HybridSearcher.inMemory().getOrElse(???)valreranker=RerankerFactory.cohere(apiKey="your-cohere-api-key")// Search with rerankingvalresults=searcher.searchWithReranking(queryEmbedding=embedding,queryText="What is Scala?",topK=5,// Final results to returnrerankTopK=50,// Candidates to rerankreranker=Some(reranker))results.foreach{r=>println(s"${r.id}: ${r.score}")}
Reranker Options
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
importorg.llm4s.reranker._// Cohere reranker (recommended for production)valcohereReranker=RerankerFactory.cohere(apiKey="your-api-key",model="rerank-english-v3.0",// or rerank-multilingual-v3.0baseUrl="https://api.cohere.ai")// Passthrough reranker (no-op, preserves original order)valpassthrough=RerankerFactory.passthrough// From environment variables// Set RERANK_PROVIDER=cohere, COHERE_API_KEY=xxxvalfromEnv=RerankerFactory.fromEnv(configReader)
importorg.llm4s.reranker._valreranker=RerankerFactory.cohere(apiKey="xxx")valrequest=RerankRequest(query="What is Scala?",documents=Seq("Scala is a programming language","Python is popular for ML","Scala runs on the JVM"),topK=Some(2))valresponse=reranker.rerank(request)response.foreach{r=>r.results.foreach{result=>println(s"[${result.index}] ${result.score}: ${result.document}")}}
Configuration
Environment Variable
Description
Default
RERANK_PROVIDER
Provider: cohere, none
none
COHERE_API_KEY
Cohere API key
-
COHERE_RERANK_MODEL
Model name
rerank-english-v3.0
COHERE_RERANK_BASE_URL
API base URL
https://api.cohere.ai
Document Chunking
Document chunking splits text into manageable pieces for embedding and retrieval. LLM4S provides multiple chunking strategies optimized for different content types.
Quick Start
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
importorg.llm4s.chunking._// Create a sentence-aware chunker (recommended)valchunker=ChunkerFactory.sentence()// Chunk a documentvalchunks=chunker.chunk(documentText,ChunkingConfig(targetSize=800,// Target chunk size in charactersmaxSize=1200,// Hard limit for chunk sizeoverlap=150// Overlap between consecutive chunks))chunks.foreach{chunk=>println(s"[${chunk.index}] ${chunk.content.take(50)}...")}
Chunking Strategies
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
importorg.llm4s.chunking._// Sentence-aware chunking (recommended for most text)// Respects sentence boundaries for semantic coherencevalsentenceChunker=ChunkerFactory.sentence()// Simple character-based chunking// Fast but may split mid-sentencevalsimpleChunker=ChunkerFactory.simple()// Auto-detect based on content// Detects markdown and chooses appropriate strategyvalautoChunker=ChunkerFactory.auto(documentText)// By strategy namevalchunker=ChunkerFactory.create("sentence")// or "simple"
importorg.llm4s.chunking._// Default: 800 char target, 150 overlapvaldefaultConfig=ChunkingConfig.default// Small chunks: 400 char target, 75 overlap// Better for precise retrievalvalsmallConfig=ChunkingConfig.small// Large chunks: 1500 char target, 250 overlap// Better for broader contextvallargeConfig=ChunkingConfig.large// No overlapvalnoOverlapConfig=ChunkingConfig.noOverlap// Custom configurationvalcustomConfig=ChunkingConfig(targetSize=600,maxSize=900,overlap=100,minChunkSize=50,preserveCodeBlocks=true,preserveHeadings=true)
With Source Metadata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
importorg.llm4s.chunking._valchunker=ChunkerFactory.sentence()// Chunks include source file in metadatavalchunks=chunker.chunkWithSource(text=documentText,sourceFile="docs/guide.md",config=ChunkingConfig.default)chunks.foreach{chunk=>println(s"From: ${chunk.metadata.sourceFile.getOrElse("unknown")}")println(s"Content: ${chunk.content.take(50)}...")}
Chunking Best Practices
Content Type
Recommended Strategy
Config
Prose/articles
sentence
default
Technical docs
sentence
large
Code files
simple
Custom with no overlap
Q&A pairs
sentence
small
Mixed content
auto
default
Tips:
Use overlap for context continuity in retrieval
Smaller chunks improve retrieval precision but may lose context
Larger chunks preserve context but may dilute relevance
importorg.llm4s.vectorstore._importorg.llm4s.llmconnect.{LLMConnect,EmbeddingClient}// 1. Create embedding client and vector storevalembeddingClient=EmbeddingClient.fromEnv().getOrElse(???)valvectorStore=VectorStoreFactory.inMemory().getOrElse(???)// 2. Ingest documentsvaldocuments=Seq("Scala is a programming language","LLM4S provides LLM integration","Vector stores enable semantic search")documents.zipWithIndex.foreach{case(doc,idx)=>valembedding=embeddingClient.embed(doc).getOrElse(???)vectorStore.upsert(VectorRecord(id=s"doc-$idx",embedding=embedding,content=Some(doc)))}// 3. Query with retrievalvalquery="What is Scala?"valqueryEmbedding=embeddingClient.embed(query).getOrElse(???)valrelevant=vectorStore.search(queryEmbedding,topK=3).getOrElse(Seq.empty)// 4. Augment prompt with contextvalcontext=relevant.map(_.record.content.getOrElse("")).mkString("\n")valprompt=s"""Based on the following context:
$context
Answer this question: $query"""// 5. Generate responsevalllm=LLMConnect.fromEnv().getOrElse(???)valresponse=llm.complete(prompt)
Best Practices
Resource Management
Always close stores when done:
1
2
3
4
5
6
7
valstore=VectorStoreFactory.inMemory().getOrElse(???)// Use scala.util.Using for automatic cleanupscala.util.Using.resource(newjava.io.Closeable{defclose():Unit=store.close()}){_=>// Use the store}
Batch Operations
Use batch operations for efficiency:
1
2
3
4
5
// Good - single batch callstore.upsertBatch(records)// Less efficient - individual callsrecords.foreach(store.upsert)
Error Handling
All operations return Result[A]:
1
2
3
4
5
6
7
8
9
10
11
12
store.search(query,topK=10)match{caseRight(results)=>results.foreach(r=>println(r.score))caseLeft(error)=>println(s"Search failed: ${error.formatted}")}// Or use for-comprehensionfor{results<-store.search(query,topK=10)count<-store.count()}yield(results,count)
Metadata Design
Design metadata for your filtering needs:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// Good - filterable metadataVectorRecord(id="doc-1",embedding=embedding,metadata=Map("type"->"article","source"->"wikipedia","lang"->"en","year"->"2024"))// Then filter efficientlystore.search(query,topK=10,filter=Some(Equals("type","article").and(Equals("lang","en"))))
Performance Considerations
SQLite Backend
The SQLite backend is suitable for:
Development and testing
Small to medium datasets (~100K vectors)
Single-machine deployments
Scenarios where simplicity is preferred
Limitations:
Vector similarity computed in Scala (not hardware-accelerated)
All candidate vectors loaded into memory during search
No built-in sharding or replication
pgvector Backend
PostgreSQL with pgvector is ideal for:
Production workloads with existing PostgreSQL infrastructure
Medium to large datasets (millions of vectors)
Teams familiar with PostgreSQL operations
Applications requiring ACID transactions
Features:
HNSW indexing for fast approximate nearest neighbor search
Connection pooling with HikariCP
Native vector operations in PostgreSQL
Excellent SQL tooling and monitoring
Setup:
1
2
# Enable pgvector extension
psql -c"CREATE EXTENSION IF NOT EXISTS vector;"
Qdrant Backend
Qdrant is recommended for:
High-performance production workloads
Large-scale deployments (billions of vectors)
Cloud-native architectures
Teams wanting managed vector database service
Features:
Cloud-native architecture with horizontal scaling
REST and gRPC APIs
Rich filtering on payload fields
Snapshot and backup capabilities
Managed cloud offering available
Setup:
1
2
3
4
# Local development with Docker
docker run -p 6333:6333 qdrant/qdrant
# Or use Qdrant Cloud for production
Once you have a RAG pipeline, use the evaluation framework to measure and improve retrieval quality.
Quick Evaluation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
importorg.llm4s.rag.evaluation._valevaluator=newRAGASEvaluator(llmClient)valsample=EvalSample(question="What is Scala?",answer=generatedAnswer,contexts=retrievedChunks,groundTruth=Some("Scala is a programming language."))valmetrics=evaluator.evaluate(sample)metrics.foreach{m=>println(s"RAGAS Score: ${m.ragasScore}")println(s"Faithfulness: ${m.faithfulness}")}
Benchmarking Different Configurations
Compare chunking strategies, fusion methods, and embedding providers: