LLM4S Roadmap

The single source of truth for LLM4S project status and future direction.


Quick Status

   
Version 0.1.0-SNAPSHOT (Pre-release)
Stability Active development, API stabilizing
Target v1.0.0 Production-Ready
Timeline Q2-Q3 2025

What’s Complete

Core Platform Features

Category Feature Status Documentation
LLM Connectivity Multi-Provider Support ✅ Complete Providers
  OpenAI Integration ✅ Complete Basic Usage
  Anthropic Integration ✅ Complete Providers
  Azure OpenAI Integration ✅ Complete Providers
  Ollama (Local Models) ✅ Complete Providers
  Streaming Responses ✅ Complete Streaming
  Model Metadata API ✅ Complete API Reference
Content Image Generation ✅ Complete Image Generation
  Speech-to-Text (STT) ✅ Complete Speech
  Text-to-Speech (TTS) ✅ Complete Speech
  Embeddings API ✅ Complete Embeddings
Tools Tool Calling API ✅ Complete Tools
  MCP Server Support ✅ Complete MCP
  Built-in Tools Module ✅ Complete Examples
  Workspace Isolation ✅ Complete Workspace
Infrastructure Type-Safe Configuration ✅ Complete Configuration
  Result-Based Errors ✅ Complete Error Handling
  Langfuse Observability ✅ Complete Observability
  Cross-Version (2.13/3.x) ✅ Complete Installation

Agent Framework

The agent framework extends core LLM4S with advanced capabilities. Detailed design →

Phase Feature Status Key Capabilities
1.0 Core Agent ✅ Complete Basic execution, tool calling, streaming
1.1 Conversations ✅ Complete Immutable state, continueConversation(), pruning
1.2 Guardrails ✅ Complete Input/output validation, LLM-as-Judge
1.3 Handoffs ✅ Complete Agent-to-agent delegation, context preservation
1.4 Memory ✅ Complete Short/long-term memory, SQLite, vector search
2.1 Streaming Events ✅ Complete Lifecycle events, runWithEvents()
2.2 Async Tools ✅ Complete Parallel execution strategies
3.2 Built-in Tools ✅ Complete DateTime, Calculator, HTTP, file ops
4.1 Reasoning Modes ✅ Complete Extended thinking for o1/o3, Claude
4.3 Serialization ✅ Complete AgentState save/load to JSON

Production Readiness

The Seven Pillars

Production readiness is measured across seven pillars:

graph TD
    PR[Production Readiness]
    PR --> P1[Testing & Quality]
    PR --> P2[API Stability]
    PR --> P3[Performance]
    PR --> P4[Security]
    PR --> P5[Documentation]
    PR --> P6[Observability]
    PR --> P7[Community]

Pillar Status

Pillar Goal Status Key Deliverable
Testing & Quality Catch bugs before runtime 🚧 In Progress 80%+ coverage target
API Stability Safe upgrades with clear compatibility 🚧 In Progress MiMa checks, SemVer policy
Performance Predictable behavior under load 📋 Planned JMH benchmarks, baselines
Security Prevent data leaks, audit data flows 📋 Planned Threat model, dependency scanning
Documentation Clone to working example quickly 🚧 In Progress Complete guides, Scaladoc
Observability See what’s happening in production ✅ Complete Langfuse, structured logging
Community Healthy contributor ecosystem 🚧 In Progress 10+ contributors target

Known Limitations (v1.0)

  • Tool registries are not serialized; tools must be re-attached when restoring AgentState
  • Reasoning modes are provider-specific and may not be available on all models
  • Memory stores have size and TTL limits; long-term retention belongs in external systems

What’s In Progress

Feature Progress Blocking Issues
RAG Core Engine ✅ Complete Retrieval pipeline shipped
RAG Evaluation ✅ Complete RAGAS metrics + benchmarking harness
RAG Benchmarking ✅ Complete Chunking, fusion, embedding comparison
RAG in a Box ✅ Complete Separate project - 194 tests, production-ready
MCP Full Implementation ~50% Full protocol, server implementation
Advanced Embeddings ~60% Multi-provider support, caching
Enhanced Observability Planning Plugin architecture, multi-backend

RAG Pipeline Roadmap

The RAG pipeline follows a 5-phase roadmap toward a production-grade retrieval system. The core library (Phases 1-3) lives in LLM4S; deployment tooling (Phase 5) is provided by the separate RAG in a Box project.

Phase 1: Foundation ✅ COMPLETE

Component Status Notes
VectorStore Abstraction ✅ Complete Backend-agnostic trait
SQLite Backend ✅ Complete File-based and in-memory
pgvector Backend ✅ Complete PostgreSQL + pgvector extension
Qdrant Backend ✅ Complete REST API, local + cloud
BM25 Keyword Index ✅ Complete SQLite FTS5 + PostgreSQL native full-text search
Hybrid Search Fusion ✅ Complete RRF + weighted score strategies

Phase 2: Intelligence ✅ COMPLETE

Component Status Notes
Reranking Pipeline ✅ Complete Cohere + LLM-based reranking
Document Chunking ✅ Complete Simple, sentence-aware, markdown-aware, semantic chunkers
Ollama Embeddings ✅ Complete Local embedding support

Phase 3: Evaluation & Quality ✅ COMPLETE

Component Status Notes
RAGAS Evaluation ✅ Complete Faithfulness, answer relevancy, context precision/recall metrics
RAG Benchmarking Harness ✅ Complete Systematic comparison of chunking, fusion, embedding strategies
RAG-Specific Guardrails ✅ Complete PII detection/masking, prompt injection, grounding, context relevance, source attribution, topic boundary
RAG Cost Tracking 📋 Planned Per-query cost, latency percentiles (p50/p95/p99)
Embedding Drift Detection 📋 Planned Monitor embedding quality over time
Prompt Tuning & Optimization 📋 Planned Systematic prompt improvement, A/B testing, performance tracking

Phase 4: Extended Integrations 📋 PLANNED

Component Status Notes
Milvus Backend 📋 Planned GPU-accelerated vector search
Pinecone Backend 📋 Planned Cloud-managed vector DB
Cohere Embeddings 📋 Planned Multilingual embed-v3
ONNX Embeddings 📋 Planned Local sentence-transformers
Embedding Cache 📋 Planned Reduce redundant embedding calls
Metadata Extraction 📋 Planned Titles, TOC, links, code blocks

Phase 5: RAG in a Box Server ✅ COMPLETE

RAG in a Box is a production-ready RAG server built on the LLM4S framework, providing a complete REST API for document ingestion, semantic search, and AI-powered question answering.

Component Status Notes
REST API Layer ✅ Complete Document ingestion, query endpoints, streaming responses
Docker Compose ✅ Complete Single-command deployment with PostgreSQL/pgvector
Kubernetes ✅ Complete Namespace, ConfigMap, Secrets, Deployments, Ingress templates
Admin UI ✅ Complete Vue.js dashboard with real-time stats, document browser, chunking preview
Security ✅ Complete JWT auth, PBKDF2 password hashing, CORS, input validation
Observability ✅ Complete Prometheus metrics, health checks (/health, /ready, /live)
CI/CD ✅ Complete 194 backend tests, security scanning (OWASP, Anchore)

Key Features:

  • Multi-format document ingestion (text, markdown, PDF, URLs)
  • Configurable chunking strategies (simple, sentence, markdown, semantic)
  • Collection-based organization with per-collection settings
  • Hybrid search with RRF fusion (vector + keyword)
  • Multi-provider support for embeddings (OpenAI, Voyage, Ollama) and LLM (OpenAI, Anthropic, Ollama)
  • Production infrastructure: HikariCP connection pooling, structured logging, graceful shutdown

Repository: github.com/llm4s/rag_in_a_box


What’s Planned

Near Term (Q1-Q2 2025)

Feature Priority Description
RAG Vector Integrations ✅ Done SQLite, pgvector, Qdrant complete
RAG Hybrid Search ✅ Done BM25 + vector fusion with RRF
RAG Reranking Pipeline ✅ Done Cohere cross-encoder + LLM-based
RAG Document Chunking ✅ Done Sentence-aware, semantic, markdown chunking
RAGAS Evaluation ✅ Done Context precision/recall, faithfulness, answer relevancy
RAG Benchmarking Harness ✅ Done Systematic comparison of RAG configurations
RAG Guardrails ✅ Done PII detection/masking, prompt injection, grounding, context relevance, source attribution, topic boundary
Reliable Calling P0 Retry with backoff, circuit breakers, deadlines
Performance Benchmarks P1 JMH framework, baseline metrics
Security Audit P1 Threat model, vulnerability scanning

Medium Term (H2 2025)

Feature Priority Description
Prompt Tuning & Optimization P1 Systematic prompt improvement, A/B testing, variant tracking
RAG Cost & Latency Tracking P1 Per-query metrics, embedding drift detection
Prompt Management P2 Template system with variable substitution
Caching Layer P2 LLM response + embedding caching for cost/latency
Cost Tracking P2 Token usage tracking and estimation
Provider Expansion P2 Cohere, Mistral, Gemini, LiteLLM
Extended Vector DBs P2 Milvus (GPU), Pinecone (cloud)
ONNX Embeddings P2 Local sentence-transformers runtime

Long Term (Post-1.0)

Feature Description
Fine-tuning Support Model adaptation, LoRA integration
Workflow Engines Camunda/Temporal integration
Plugin Architecture Community-contributed providers and tools
Advanced Multi-Agent DAG orchestration, complex workflows
RAG in a Box ✅ Complete - See RAG in a Box

Timeline Overview

graph LR
    M1[Months 1-2<br/>Testing + API] --> M2[Months 2-3<br/>Integration + Perf]
    M2 --> M3[Months 3-4<br/>Security + Reliability]
    M3 --> M4[Month 5<br/>Stabilization]
    M4 --> M5[Month 6+<br/>v1.0.0 Launch]
Phase Focus Key Outcomes
Months 1-2 Testing + API audit Coverage audit, public API documented
Months 2-3 Integration + performance Integration tests, JMH benchmarks
Months 3-4 Security + reliability Threat model, reliable calling
Month 5 Stabilization RC releases, API freeze
Month 6+ Launch v1.0.0 release, ecosystem growth

Reference Deployments

Three deployment patterns will be documented for production use:

Pattern Use Case Key Components
Laptop/Dev Experiments, learning Single-node, local Ollama or single provider, console tracing
Small K8s Single-tenant production llm4s app + workspace, Langfuse, pgvector, K8s secrets
Enterprise VPC Multi-tenant, regulated Private networking, Vault, centralized logging, audit trails

Success Metrics (v1.0 Targets)

Category Metric Current Target
Quality Statement Coverage ~21% 80%
Quality Critical Bugs - 0
Community Contributors 6 10+
Adoption Maven Downloads - 500/mo
Docs ScalaDoc Coverage <50% 100%

Design Documents

Detailed technical designs are in docs/design:

Document Purpose
Agent Framework Roadmap Comprehensive agent feature comparison and roadmap
Phase 1.1: Conversations Functional conversation management design
Phase 1.2: Guardrails Input/output validation framework
Phase 1.3: Handoffs Agent-to-agent delegation
Phase 1.4: Memory Short/long-term memory system
Phase 2.1: Streaming Agent lifecycle events
Phase 2.2: Async Tools Parallel tool execution
Phase 3.2: Built-in Tools Standard tool library
Phase 4.1: Reasoning Extended thinking support
Phase 4.3: Serialization State persistence

Get Involved


Release Schedule

Type Frequency
SNAPSHOT builds Weekly
Feature previews Monthly
Milestone releases Quarterly
v1.0.0 Q2-Q3 2025

After 1.0.0: Semantic Versioning with MiMa binary compatibility checks.