org.llm4s.llmconnect.extractors

Members list

Type members

Classlikes

Extracts text and multimedia content from files of various formats.

Extracts text and multimedia content from files of various formats.

MIME type detection is performed by Apache Tika. Supported formats include:

  • PDF (via PDFBox)
  • DOCX (via Apache POI)
  • Plain text and other text types
  • Images (via ImageIO)
  • Unknown types (Tika fallback)

Three entry points are provided:

  • extract -- text-only extraction from a file path
  • extractAny -- multimedia-aware extraction returning an Extracted ADT
  • extractFromBytes / extractFromStream -- source-agnostic text extraction from raw bytes

Attributes

Supertypes
class Object
trait Matchable
class Any
Self type