org.llm4s.llmconnect.extractors
package org.llm4s.llmconnect.extractors
Members list
Type members
Classlikes
object UniversalExtractor
Extracts text and multimedia content from files of various formats.
Extracts text and multimedia content from files of various formats.
MIME type detection is performed by Apache Tika. Supported formats include:
- PDF (via PDFBox)
- DOCX (via Apache POI)
- Plain text and other text types
- Images (via ImageIO)
- Unknown types (Tika fallback)
Three entry points are provided:
extract-- text-only extraction from a file pathextractAny-- multimedia-aware extraction returning an Extracted ADTextractFromBytes/extractFromStream-- source-agnostic text extraction from raw bytes
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
UniversalExtractor.type
In this article