UniversalExtractor

org.llm4s.llmconnect.extractors.UniversalExtractor

Attributes

Graph
Supertypes
class Object
trait Matchable
class Any
Self type

Members list

Type members

Classlikes

final case class AudioContent(samples: Array[Float], sampleRate: Int) extends Extracted

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
trait Extracted
class Object
trait Matchable
class Any
Show all
sealed trait Extracted

Attributes

Supertypes
class Object
trait Matchable
class Any
Known subtypes
final case class ImageContent(image: BufferedImage) extends Extracted

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
trait Extracted
class Object
trait Matchable
class Any
Show all
final case class TextContent(text: String) extends Extracted

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
trait Extracted
class Object
trait Matchable
class Any
Show all
final case class VideoContent(frames: Seq[BufferedImage], fps: Int) extends Extracted

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
trait Extracted
class Object
trait Matchable
class Any
Show all

Value members

Concrete methods

def detectMimeType(content: Array[Byte], filename: String): String

Detect MIME type from bytes and filename.

Detect MIME type from bytes and filename.

Value parameters

content

Raw document bytes (first few KB are sufficient)

filename

Filename hint for detection

Attributes

Returns

Detected MIME type string

def extract(inputPath: String): Either[ExtractorError, String]
def extractAny(inputPath: String): Either[ExtractorError, Extracted]
def extractFromBytes(content: Array[Byte], filename: String, mimeType: Option[String]): Either[ExtractorError, String]

Extract text from raw bytes.

Extract text from raw bytes.

This method enables source-agnostic document extraction - the same extraction logic can be used for documents from S3, HTTP responses, databases, etc.

Value parameters

content

Raw document bytes

filename

Filename for MIME type detection (e.g., "report.pdf")

mimeType

Optional explicit MIME type (skips detection if provided)

Attributes

Returns

Extracted text content or an error

def extractFromStream(input: InputStream, filename: String, mimeType: Option[String]): Either[ExtractorError, String]

Extract text from an InputStream.

Extract text from an InputStream.

Note: This reads the entire stream into memory for processing. The caller is responsible for closing the stream after this method returns.

Value parameters

filename

Filename for MIME type detection

input

InputStream to read from

mimeType

Optional explicit MIME type

Attributes

Returns

Extracted text content or an error

def isTextLike(mime: String): Boolean