DocumentLoader

org.llm4s.rag.loader.DocumentLoader

Abstraction for loading documents from any source into the RAG pipeline.

DocumentLoader provides a unified interface for various document sources:

  • Files and directories
  • URLs and web content
  • Cloud storage (S3, GCS, Azure Blob)
  • Databases and APIs
  • Custom sources

Key design principles:

  • Streaming support via Iterator for large document sets
  • Graceful error handling with LoadResult for partial failures
  • Optional hints for processing optimization
  • Composability through the ++ operator

Usage:

// At build time - pre-ingest documents
val rag = RAG.builder()
 .withDocuments(DirectoryLoader("./docs"))
 .build()

// At ingest time - add documents later
rag.ingest(UrlLoader(urls))

// Combine loaders
val combined = DirectoryLoader("./docs") ++ UrlLoader(urls)

Attributes

Graph
Supertypes
class Object
trait Matchable
class Any
Known subtypes

Members list

Value members

Abstract methods

def description: String

Human-readable description of this loader.

Human-readable description of this loader.

Used for logging and debugging.

Attributes

def load(): Iterator[LoadResult]

Load documents from this source.

Load documents from this source.

Returns an iterator of LoadResult for streaming large document sets. Each result is either a successfully loaded document or a loading error. This allows processing to continue even when some documents fail.

Attributes

Returns

Iterator of load results (successes and failures)

Concrete methods

Combine this loader with another.

Combine this loader with another.

Creates a composite loader that loads from both sources.

Attributes

def estimatedCount: Option[Int]

Estimated number of documents (if known).

Estimated number of documents (if known).

Used for progress reporting and resource allocation. Returns None if count is unknown or expensive to compute.

Attributes