DBx Core - Vector Store Abstraction Layer

DBx Core provides a provider-agnostic abstraction layer for vector databases, starting with PostgreSQL/pgvector support.

Features

Provider Abstraction: Clean trait-based design for supporting multiple vector stores
PostgreSQL/pgvector: First-class support with automatic setup and health checks
Connection Pooling: Efficient resource management with HikariCP
SQL Injection Prevention: Robust identifier validation and parameterized queries
Transaction Support: Atomic operations with automatic rollback on errors
Type Safety: Comprehensive error modeling with Either-based error handling
LLM4S Integration: Seamless integration with the Result type system
Logging: Structured logging with SLF4J

Configuration

DBx Core uses environment variables for configuration:

# Required
export PG_HOST=localhost
export PG_PORT=5432
export PG_DATABASE=mydb
export PG_USER=myuser
export PG_PASSWORD=mypassword
export PG_SSLMODE=disable  # or require, prefer

# Optional
export PG_SCHEMA=dbx  # defaults to 'dbx'

Usage

Basic Initialization

import org.llm4s.llmconnect.DbxClient
import org.llm4s.llmconnect.config.dbx.DbxConfig

// Load configuration
val config = DbxConfig.load()

// Create client
val client = new DbxClient(config)

// Initialize database
client.initCore() match {
  case Right(report) =>
    println(s"Initialized! PGVector version: ${report.pgvectorVersion}")
  case Left(error) =>
    println(s"Failed: ${error.message}")
}

// Always close when done
client.close()

Using with LLM4S Result Type

import org.llm4s.llmconnect.dbx.DbxErrorBridge._

// Convert to Result type
val result: Result[CoreHealthReport] = client.initCoreAsResult()

// Or use the extension method
val result2 = client.initCore().asResult

Connection Pool Statistics

val stats = client.getPoolStats()
println(s"Active connections: ${stats.activeConnections}")
println(s"Idle connections: ${stats.idleConnections}")
println(s"Total connections: ${stats.totalConnections}")

Vector Operations (Coming Soon)

The following interfaces are defined and ready for implementation:

// Create a collection
val config = CollectionConfig(
  name = "embeddings",
  dimension = 1536,
  distanceMetric = DistanceMetric.Cosine
)

// Store a vector
val request = StoreVectorRequest(
  embedding = Vector(0.1f, 0.2f, ...),
  metadata = Map("source" -> "document.pdf"),
  content = Some("Original text")
)

// Search for similar vectors
val searchRequest = SearchVectorRequest(
  queryEmbedding = Vector(0.1f, 0.2f, ...),
  limit = 10,
  threshold = Some(0.8f)
)

Architecture

Core Components

DbxClient: Main client interface with connection pooling
DbxProvider: Provider abstraction trait
PGVectorProvider: PostgreSQL/pgvector implementation
ConnectionPool: HikariCP-based connection management
SqlSafetyUtils: SQL injection prevention utilities

Error Hierarchy

DbxError (sealed trait)
├── ConfigError - Configuration issues
├── ConnectionError - Network/connection failures (recoverable)
├── PgvectorMissing - Extension not installed
├── SchemaError - Schema/identifier validation failures
├── PermissionError - Database permission issues
└── WriteError - Write operation failures (recoverable)

Safety Features

SQL Identifier Validation
- Only allows alphanumeric, underscore, and dollar signs
- Maximum 63 characters (PostgreSQL limit)
- Proper quoting and escaping
Connection Pooling
- Maximum 10 connections
- Minimum 2 idle connections
- 30-second connection timeout
- Automatic connection testing
Transaction Management
- Automatic rollback on errors
- Proper resource cleanup
- Connection returning to pool

Security Considerations

SQL Injection Prevention

DBx Core implements comprehensive SQL injection prevention measures:

Identifier Validation: All schema names, table names, and identifiers are validated using SqlSafetyUtils:
- Only alphanumeric characters, underscores, and dollar signs allowed
- Maximum 63 characters (PostgreSQL limit)
- No special characters that could break out of identifiers
- Proper quoting using PostgreSQL’s quote_ident() equivalent

Parameterized Queries: All user-provided values are passed as parameters, never concatenated:

// Safe: Version is parameterized
val sql = s"insert into $qualifiedName(pgvector_version) values (?)"
ps.setString(1, version)

// Never do this:
// val sql = s"insert into table values ('$userInput')" // VULNERABLE!

Schema/Table Name Safety: Configuration validation ensures safe identifiers:

// Validated on startup
SqlSafetyUtils.validateIdentifier(schema)  // Rejects dangerous input
SqlSafetyUtils.validateIdentifier(table)   // Prevents injection

Required PostgreSQL Permissions

The database user configured for DBx Core requires the following permissions:

-- Minimum required permissions
GRANT CONNECT ON DATABASE your_database TO dbx_user;
GRANT CREATE ON DATABASE your_database TO dbx_user;  -- For schema creation
GRANT USAGE ON SCHEMA dbx TO dbx_user;               -- After schema exists
GRANT CREATE ON SCHEMA dbx TO dbx_user;              -- For table creation
GRANT ALL ON ALL TABLES IN SCHEMA dbx TO dbx_user;  -- For data operations

-- For pgvector operations (when implemented)
GRANT USAGE ON SCHEMA vector TO dbx_user;            -- If pgvector is in separate schema

Connection Security

SSL/TLS Support: Configure SSL mode via PG_SSLMODE:
- require: Always use SSL (recommended for production)
- verify-ca: Verify server certificate
- verify-full: Full certificate verification including hostname
- disable: No SSL (development only)
Connection Pool Security:
- Connections are tested before use with SELECT 1
- Maximum lifetime of 30 minutes prevents stale connections
- Automatic cleanup of leaked connections
- Connection timeout prevents resource exhaustion
Credential Management:
- Never log passwords or sensitive connection details
- Use environment variables for configuration
- Consider using credential rotation in production
- Use tools like HashiCorp Vault for secret management

Error Handling Security

Information Disclosure Prevention:
- Database errors are categorized and sanitized
- Internal details are logged but not exposed to clients
- Generic error messages for authentication failures
Resource Exhaustion Protection:
- Connection pool limits prevent connection exhaustion
- Transaction timeouts prevent long-running operations
- Automatic rollback on errors prevents data corruption

Configuration Security

Validation on Startup: All configuration is validated before use:
- Port ranges checked (1-65535)
- SSL modes validated against allowed values
- Schema/table names checked for SQL injection risks
Fail-Safe Defaults:
- Schema defaults to ‘dbx’ if not specified
- Connections auto-commit by default unless in transaction
- Pool automatically closes connections on shutdown

Best Practices

Principle of Least Privilege: Grant only necessary database permissions
Network Isolation: Use private networks or VPCs when possible
Audit Logging: Enable PostgreSQL audit logging for sensitive operations
Regular Updates: Keep PostgreSQL and pgvector extension updated
Monitor Connections: Track active connections and unusual patterns
Backup Strategy: Regular backups before schema changes

Requirements

PostgreSQL 12+
pgvector extension (for vector operations)
Java 11+
Scala 2.13 or 3.x

Testing

Run the test suite:

sbt test

# Specific tests
sbt "testOnly org.llm4s.llmconnect.utils.dbx.SqlSafetyUtilsTest"
sbt "testOnly org.llm4s.llmconnect.config.dbx.DbxConfigTest"