DBx Core - Vector Store Abstraction Layer
DBx Core - Vector Store Abstraction Layer
DBx Core provides a provider-agnostic abstraction layer for vector databases, starting with PostgreSQL/pgvector support.
Features
- Provider Abstraction: Clean trait-based design for supporting multiple vector stores
- PostgreSQL/pgvector: First-class support with automatic setup and health checks
- Connection Pooling: Efficient resource management with HikariCP
- SQL Injection Prevention: Robust identifier validation and parameterized queries
- Transaction Support: Atomic operations with automatic rollback on errors
- Type Safety: Comprehensive error modeling with Either-based error handling
- LLM4S Integration: Seamless integration with the Result type system
- Logging: Structured logging with SLF4J
Configuration
DBx Core uses environment variables for configuration:
# Required
export PG_HOST=localhost
export PG_PORT=5432
export PG_DATABASE=mydb
export PG_USER=myuser
export PG_PASSWORD=mypassword
export PG_SSLMODE=disable # or require, prefer
# Optional
export PG_SCHEMA=dbx # defaults to 'dbx'
Usage
Basic Initialization
import org.llm4s.llmconnect.DbxClient
import org.llm4s.llmconnect.config.dbx.DbxConfig
// Load configuration
val config = DbxConfig.load()
// Create client
val client = new DbxClient(config)
// Initialize database
client.initCore() match {
case Right(report) =>
println(s"Initialized! PGVector version: ${report.pgvectorVersion}")
case Left(error) =>
println(s"Failed: ${error.message}")
}
// Always close when done
client.close()
Using with LLM4S Result Type
import org.llm4s.llmconnect.dbx.DbxErrorBridge._
// Convert to Result type
val result: Result[CoreHealthReport] = client.initCoreAsResult()
// Or use the extension method
val result2 = client.initCore().asResult
Connection Pool Statistics
val stats = client.getPoolStats()
println(s"Active connections: ${stats.activeConnections}")
println(s"Idle connections: ${stats.idleConnections}")
println(s"Total connections: ${stats.totalConnections}")
Vector Operations (Coming Soon)
The following interfaces are defined and ready for implementation:
// Create a collection
val config = CollectionConfig(
name = "embeddings",
dimension = 1536,
distanceMetric = DistanceMetric.Cosine
)
// Store a vector
val request = StoreVectorRequest(
embedding = Vector(0.1f, 0.2f, ...),
metadata = Map("source" -> "document.pdf"),
content = Some("Original text")
)
// Search for similar vectors
val searchRequest = SearchVectorRequest(
queryEmbedding = Vector(0.1f, 0.2f, ...),
limit = 10,
threshold = Some(0.8f)
)
Architecture
Core Components
- DbxClient: Main client interface with connection pooling
- DbxProvider: Provider abstraction trait
- PGVectorProvider: PostgreSQL/pgvector implementation
- ConnectionPool: HikariCP-based connection management
- SqlSafetyUtils: SQL injection prevention utilities
Error Hierarchy
DbxError (sealed trait)
├── ConfigError - Configuration issues
├── ConnectionError - Network/connection failures (recoverable)
├── PgvectorMissing - Extension not installed
├── SchemaError - Schema/identifier validation failures
├── PermissionError - Database permission issues
└── WriteError - Write operation failures (recoverable)
Safety Features
- SQL Identifier Validation
- Only allows alphanumeric, underscore, and dollar signs
- Maximum 63 characters (PostgreSQL limit)
- Proper quoting and escaping
- Connection Pooling
- Maximum 10 connections
- Minimum 2 idle connections
- 30-second connection timeout
- Automatic connection testing
- Transaction Management
- Automatic rollback on errors
- Proper resource cleanup
- Connection returning to pool
Security Considerations
SQL Injection Prevention
DBx Core implements comprehensive SQL injection prevention measures:
- Identifier Validation: All schema names, table names, and identifiers are validated using
SqlSafetyUtils
:- Only alphanumeric characters, underscores, and dollar signs allowed
- Maximum 63 characters (PostgreSQL limit)
- No special characters that could break out of identifiers
- Proper quoting using PostgreSQL’s quote_ident() equivalent
- Parameterized Queries: All user-provided values are passed as parameters, never concatenated:
// Safe: Version is parameterized val sql = s"insert into $qualifiedName(pgvector_version) values (?)" ps.setString(1, version) // Never do this: // val sql = s"insert into table values ('$userInput')" // VULNERABLE!
- Schema/Table Name Safety: Configuration validation ensures safe identifiers:
// Validated on startup SqlSafetyUtils.validateIdentifier(schema) // Rejects dangerous input SqlSafetyUtils.validateIdentifier(table) // Prevents injection
Required PostgreSQL Permissions
The database user configured for DBx Core requires the following permissions:
-- Minimum required permissions
GRANT CONNECT ON DATABASE your_database TO dbx_user;
GRANT CREATE ON DATABASE your_database TO dbx_user; -- For schema creation
GRANT USAGE ON SCHEMA dbx TO dbx_user; -- After schema exists
GRANT CREATE ON SCHEMA dbx TO dbx_user; -- For table creation
GRANT ALL ON ALL TABLES IN SCHEMA dbx TO dbx_user; -- For data operations
-- For pgvector operations (when implemented)
GRANT USAGE ON SCHEMA vector TO dbx_user; -- If pgvector is in separate schema
Connection Security
- SSL/TLS Support: Configure SSL mode via
PG_SSLMODE
:require
: Always use SSL (recommended for production)verify-ca
: Verify server certificateverify-full
: Full certificate verification including hostnamedisable
: No SSL (development only)
- Connection Pool Security:
- Connections are tested before use with
SELECT 1
- Maximum lifetime of 30 minutes prevents stale connections
- Automatic cleanup of leaked connections
- Connection timeout prevents resource exhaustion
- Connections are tested before use with
- Credential Management:
- Never log passwords or sensitive connection details
- Use environment variables for configuration
- Consider using credential rotation in production
- Use tools like HashiCorp Vault for secret management
Error Handling Security
- Information Disclosure Prevention:
- Database errors are categorized and sanitized
- Internal details are logged but not exposed to clients
- Generic error messages for authentication failures
- Resource Exhaustion Protection:
- Connection pool limits prevent connection exhaustion
- Transaction timeouts prevent long-running operations
- Automatic rollback on errors prevents data corruption
Configuration Security
- Validation on Startup: All configuration is validated before use:
- Port ranges checked (1-65535)
- SSL modes validated against allowed values
- Schema/table names checked for SQL injection risks
- Fail-Safe Defaults:
- Schema defaults to ‘dbx’ if not specified
- Connections auto-commit by default unless in transaction
- Pool automatically closes connections on shutdown
Best Practices
- Principle of Least Privilege: Grant only necessary database permissions
- Network Isolation: Use private networks or VPCs when possible
- Audit Logging: Enable PostgreSQL audit logging for sensitive operations
- Regular Updates: Keep PostgreSQL and pgvector extension updated
- Monitor Connections: Track active connections and unusual patterns
- Backup Strategy: Regular backups before schema changes
Requirements
- PostgreSQL 12+
- pgvector extension (for vector operations)
- Java 11+
- Scala 2.13 or 3.x
Testing
Run the test suite:
sbt test
# Specific tests
sbt "testOnly org.llm4s.llmconnect.utils.dbx.SqlSafetyUtilsTest"
sbt "testOnly org.llm4s.llmconnect.config.dbx.DbxConfigTest"
Future Enhancements
- Implement vector storage operations
- Add support for Pinecone
- Add support for Weaviate
- Add support for Qdrant
- Implement batch operations
- Add index management
- Add backup/restore functionality
- Add metrics and monitoring