org.llm4s.rag.loader
Members list
Type members
Classlikes
Load all documents from a directory.
Load all documents from a directory.
Supports recursive directory traversal and file filtering by extension. Each file is loaded using FileLoader, inheriting its version and hint detection.
Value parameters
- extensions
-
File extensions to include (without leading dot)
- maxDepth
-
Maximum recursion depth (0 = current directory only)
- metadata
-
Additional metadata to attach to all documents
- path
-
Path to the directory
- recursive
-
Whether to recurse into subdirectories
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait DocumentLoaderclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
DirectoryLoader.type
A document ready for RAG ingestion.
A document ready for RAG ingestion.
Documents represent content from any source (files, URLs, databases, APIs) in a normalized form ready for chunking and embedding.
Value parameters
- content
-
The text content of the document
- hints
-
Optional processing hints suggested by the loader
- id
-
Unique identifier for this document
- metadata
-
Key-value metadata (source, author, timestamp, etc.)
- version
-
Optional version for change detection
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Processing hints that loaders can suggest to the RAG pipeline.
Processing hints that loaders can suggest to the RAG pipeline.
Hints are optional suggestions - the pipeline may ignore them based on global configuration or other factors. They allow loaders to provide domain-specific optimization recommendations.
Value parameters
- batchSize
-
Suggested batch size for embedding (for rate limiting)
- chunkingConfig
-
Suggested chunking configuration
- chunkingStrategy
-
Suggested chunking strategy for this document type
- customHints
-
Additional loader-specific hints
- priority
-
Processing priority (higher = process first)
- skipReason
-
If set, suggests this document should be skipped with reason
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
DocumentHints.type
Abstraction for loading documents from any source into the RAG pipeline.
Abstraction for loading documents from any source into the RAG pipeline.
DocumentLoader provides a unified interface for various document sources:
- Files and directories
- URLs and web content
- Cloud storage (S3, GCS, Azure Blob)
- Databases and APIs
- Custom sources
Key design principles:
- Streaming support via Iterator for large document sets
- Graceful error handling with LoadResult for partial failures
- Optional hints for processing optimization
- Composability through the ++ operator
Usage:
// At build time - pre-ingest documents
val rag = RAG.builder()
.withDocuments(DirectoryLoader("./docs"))
.build()
// At ingest time - add documents later
rag.ingest(UrlLoader(urls))
// Combine loaders
val combined = DirectoryLoader("./docs") ++ UrlLoader(urls)
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
Factory and combinators for DocumentLoaders.
Factory and combinators for DocumentLoaders.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
DocumentLoaders.type
Registry for tracking indexed documents.
Registry for tracking indexed documents.
Used by sync operations to determine which documents have been indexed, their versions, and to detect changes (adds, updates, deletes).
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
-
class InMemoryDocumentRegistry
Version information for change detection.
Version information for change detection.
Used by sync operations to determine if a document has changed since it was last indexed.
Value parameters
- contentHash
-
SHA-256 hash of the content
- etag
-
Optional HTTP ETag for URL sources
- timestamp
-
Optional last modified timestamp (epoch ms)
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
DocumentVersion.type
Load a single file as a document.
Load a single file as a document.
Supports all file types handled by UniversalExtractor:
- Text files (.txt, .md, .json, .xml, .html)
- PDF documents
- Word documents (.docx)
Automatically detects appropriate chunking hints based on file extension. Includes version information (content hash + file timestamp) for sync operations.
Value parameters
- metadata
-
Additional metadata to attach
- path
-
Path to the file
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait DocumentLoaderclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
FileLoader.type
In-memory implementation of DocumentRegistry.
In-memory implementation of DocumentRegistry.
Suitable for development and testing. Data is lost on restart.
Attributes
- Companion
- object
- Supertypes
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
Result of loading a single document.
Attributes
- Companion
- trait
- Supertypes
-
trait Sumtrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
LoadResult.type
Aggregated loading statistics.
Aggregated loading statistics.
Value parameters
- errors
-
List of error details for debugging
- failed
-
Number that failed
- skipped
-
Number intentionally skipped
- successful
-
Number successfully loaded
- totalAttempted
-
Total documents attempted to load
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Configuration for document loading behavior.
Configuration for document loading behavior.
Controls how documents are loaded, processed, and tracked by the RAG pipeline.
Value parameters
- batchSize
-
Documents per embedding batch
- enableVersioning
-
Track versions for sync operations
- failFast
-
Stop on first error vs continue and collect all errors
- parallelism
-
Maximum concurrent document processing
- skipEmptyDocuments
-
Whether to skip documents with empty content
- useHints
-
Whether to use loader hints for processing
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
LoadingConfig.type
Statistics for sync operations.
Statistics for sync operations.
Value parameters
- added
-
New documents added
- deleted
-
Documents removed
- unchanged
-
Documents with no changes
- updated
-
Existing documents updated
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Load documents from raw text content.
Load documents from raw text content.
Useful for:
- Programmatically created content
- Database records
- API responses
- Testing
Value parameters
- documents
-
Documents to load
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait DocumentLoaderclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
TextLoader.type
Builder for constructing TextLoader fluently.
Builder for constructing TextLoader fluently.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Load documents from URLs.
Load documents from URLs.
Supports HTTP/HTTPS URLs with configurable timeouts and headers. Includes ETag-based version detection for efficient sync operations.
Value parameters
- headers
-
HTTP headers to send with requests
- metadata
-
Additional metadata to attach
- retryCount
-
Number of retry attempts for failed requests
- timeoutMs
-
Connection and read timeout in milliseconds
- urls
-
URLs to load
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalstrait DocumentLoaderclass Objecttrait Matchableclass AnyShow all