org.llm4s.rag.loader.s3.S3DocumentSource
See theS3DocumentSource companion object
final case class S3DocumentSource(bucket: String, prefix: String, region: String, extensions: Set[String], credentials: Option[AwsCredentialsProvider], metadata: Map[String, String], endpointOverride: Option[String]) extends SyncableSource
Document source for AWS S3.
Reads documents from an S3 bucket with support for:
- Prefix filtering (e.g., "docs/", "reports/2024/")
- File extension filtering
- Automatic change detection via ETags
- Pagination for large buckets
Authentication uses the AWS credential chain by default:
- Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
- System properties
- AWS credentials file (~/.aws/credentials)
- IAM role (for EC2/Lambda)
Usage:
val source = S3DocumentSource("my-bucket", prefix = "docs/")
val loader = SourceBackedLoader(source)
rag.sync(loader)
Value parameters
-
bucket
-
S3 bucket name
-
credentials
-
Optional credentials provider (default: AWS credential chain)
-
endpointOverride
-
Optional endpoint override (for LocalStack, MinIO, etc.)
-
extensions
-
File extensions to include (empty = all files)
-
metadata
-
Additional metadata to attach to all documents
-
prefix
-
Key prefix to filter objects (e.g., "docs/", "reports/")
-
region
-
AWS region (default: us-east-1)
Attributes
-
Companion
-
object
-
Graph
-
-
Supertypes
-
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
Members list
Human-readable description of this source.
Human-readable description of this source.
Used for logging and debugging (e.g., "S3(my-bucket/docs/)")
Attributes
-
Definition Classes
-
Estimated number of documents in this source, if known.
Estimated number of documents in this source, if known.
Used for progress reporting. Return None if unknown or expensive to compute.
Attributes
-
Definition Classes
-
Get version information for change detection.
Get version information for change detection.
This should return quickly without reading the full document content. For S3, use the ETag; for filesystems, use content hash + mtime.
Value parameters
-
ref
-
Document reference
Attributes
-
Returns
-
Version info for comparison, or error if unavailable
-
Definition Classes
-
List all document references in this source.
List all document references in this source.
Returns an iterator for streaming large document sets. Each element is either a successful DocumentRef or an error.
Attributes
-
Definition Classes
-
Read document content into memory.
Read document content into memory.
Value parameters
-
ref
-
Document reference from listDocuments()
Attributes
-
Returns
-
Raw document bytes or an error
-
Definition Classes
-
Read document content as a stream.
Read document content as a stream.
Use this for large documents to avoid loading everything into memory. The caller is responsible for closing the returned stream.
Default implementation wraps readDocument; override for true streaming.
Value parameters
-
ref
-
Document reference from listDocuments()
Attributes
-
Returns
-
InputStream for the document content or an error
-
Definition Classes
-
Create a copy with endpoint override (for LocalStack, MinIO).
Create a copy with endpoint override (for LocalStack, MinIO).
Attributes
Create a copy with different extensions filter.
Create a copy with different extensions filter.
Attributes
Create a copy with different prefix.
Create a copy with different prefix.
Attributes
Attributes
-
Inherited from:
-
Product
Attributes
-
Inherited from:
-
Product