org.llm4s.rag.loader

Load all documents from a directory.

Supports recursive directory traversal and file filtering by extension. Each file is loaded using FileLoader, inheriting its version and hint detection.

Value parameters

extensions: File extensions to include (without leading dot)
maxDepth: Maximum recursion depth (0 = current directory only)
metadata: Additional metadata to attach to all documents
path: Path to the directory
recursive: Whether to recurse into subdirectories

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

trait DocumentLoader

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: DirectoryLoader.type

A document ready for RAG ingestion.

Documents represent content from any source (files, URLs, databases, APIs) in a normalized form ready for chunking and embedding.

Value parameters

content: The text content of the document
hints: Optional processing hints suggested by the loader
id: Unique identifier for this document
metadata: Key-value metadata (source, author, timestamp, etc.)
version: Optional version for change detection

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: Document.type

Processing hints that loaders can suggest to the RAG pipeline.

Hints are optional suggestions - the pipeline may ignore them based on global configuration or other factors. They allow loaders to provide domain-specific optimization recommendations.

Value parameters

batchSize: Suggested batch size for embedding (for rate limiting)
chunkingConfig: Suggested chunking configuration
chunkingStrategy: Suggested chunking strategy for this document type
customHints: Additional loader-specific hints
priority: Processing priority (higher = process first)
skipReason: If set, suggests this document should be skipped with reason

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: DocumentHints.type

Abstraction for loading documents from any source into the RAG pipeline.

DocumentLoader provides a unified interface for various document sources:

Files and directories
URLs and web content
Cloud storage (S3, GCS, Azure Blob)
Databases and APIs
Custom sources

Key design principles:

Streaming support via Iterator for large document sets
Graceful error handling with LoadResult for partial failures
Optional hints for processing optimization
Composability through the ++ operator

Usage:

// At build time - pre-ingest documents
val rag = RAG.builder()
 .withDocuments(DirectoryLoader("./docs"))
 .build()

// At ingest time - add documents later
rag.ingest(UrlLoader(urls))

// Combine loaders
val combined = DirectoryLoader("./docs") ++ UrlLoader(urls)

Attributes

Supertypes: class Object

trait Matchable

class Any
Known subtypes: class DirectoryLoader

class FileLoader

class TextLoader

class UrlLoader

Factory and combinators for DocumentLoaders.

Attributes

Supertypes: class Object

trait Matchable

class Any
Self type: DocumentLoaders.type

Registry for tracking indexed documents.

Used by sync operations to determine which documents have been indexed, their versions, and to detect changes (adds, updates, deletes).

Attributes

Supertypes: class Object

trait Matchable

class Any
Known subtypes: class InMemoryDocumentRegistry

Version information for change detection.

Used by sync operations to determine if a document has changed since it was last indexed.

Value parameters

contentHash: SHA-256 hash of the content
etag: Optional HTTP ETag for URL sources
timestamp: Optional last modified timestamp (epoch ms)

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: DocumentVersion.type

Load a single file as a document.

Supports all file types handled by UniversalExtractor:

Text files (.txt, .md, .json, .xml, .html)
PDF documents
Word documents (.docx)

Automatically detects appropriate chunking hints based on file extension. Includes version information (content hash + file timestamp) for sync operations.

Value parameters

metadata: Additional metadata to attach
path: Path to the file

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

trait DocumentLoader

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: FileLoader.type

In-memory implementation of DocumentRegistry.

Suitable for development and testing. Data is lost on restart.

Attributes

Companion: object
Supertypes: trait DocumentRegistry

class Object

trait Matchable

class Any

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: InMemoryDocumentRegistry.type

Result of loading a single document.

Represents either a successfully loaded document, a loading error, or an intentionally skipped document.

Attributes

Companion: object
Supertypes: class Object

trait Matchable

class Any
Known subtypes: class Failure

class Skipped

class Success

Attributes

Companion: trait
Supertypes: trait Sum

trait Mirror

class Object

trait Matchable

class Any
Self type: LoadResult.type

Aggregated loading statistics.

Value parameters

errors: List of error details for debugging
failed: Number that failed
skipped: Number intentionally skipped
successful: Number successfully loaded
totalAttempted: Total documents attempted to load

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: LoadStats.type

Configuration for document loading behavior.

Controls how documents are loaded, processed, and tracked by the RAG pipeline.

Value parameters

batchSize: Documents per embedding batch
enableVersioning: Track versions for sync operations
failFast: Stop on first error vs continue and collect all errors
parallelism: Maximum concurrent document processing
skipEmptyDocuments: Whether to skip documents with empty content
useHints: Whether to use loader hints for processing

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: LoadingConfig.type

Statistics for sync operations.

Value parameters

added: New documents added
deleted: Documents removed
unchanged: Documents with no changes
updated: Existing documents updated

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: SyncStats.type

Load documents from raw text content.

Useful for:

Programmatically created content
Database records
API responses
Testing

Value parameters

documents: Documents to load

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

trait DocumentLoader

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: TextLoader.type

Builder for constructing TextLoader fluently.

Attributes

Supertypes: class Object

trait Matchable

class Any

Load documents from URLs.

Supports HTTP/HTTPS URLs with configurable timeouts and headers. Includes ETag-based version detection for efficient sync operations.

Value parameters

headers: HTTP headers to send with requests
metadata: Additional metadata to attach
retryCount: Number of retry attempts for failed requests
timeoutMs: Connection and read timeout in milliseconds
urls: URLs to load

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

trait DocumentLoader

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: UrlLoader.type

org.llm4s.rag.loader

Members list

Type members

Classlikes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes