Providers Guide

LLM4S supports multiple LLM providers out of the box. Choose your provider, configure it, and start building.

Table of contents

  1. Supported Providers
  2. Provider Selection
    1. How It Works
    2. Available Models
  3. OpenAI
    1. Setup
    2. Configuration
    3. Available Models
    4. Costs
    5. Tips
  4. Anthropic
    1. Setup
    2. Configuration
    3. Available Models
    4. Costs
    5. Tips
  5. Google Gemini
    1. Setup
    2. Configuration
    3. Available Models
    4. Costs
    5. Tips
  6. Azure OpenAI
    1. Setup
    2. Configuration
    3. Available Models
    4. Costs
    5. Tips
  7. DeepSeek
    1. Setup
    2. Configuration
    3. Available Models
    4. Costs
    5. Tips
  8. Cohere
    1. Setup
    2. Configuration
    3. Available Models
    4. Costs
    5. Tips
  9. Ollama (Local Models)
    1. Setup
    2. Configuration
    3. Available Models
    4. Costs
    5. Tips
  10. API Key Management
    1. Security Best Practices
    2. Using Keys Safely in Code
  11. Base URL Customization
    1. When to Use Custom Base URLs
    2. Setting Custom URLs
    3. Via application.conf
  12. Provider Comparison Table
    1. Which Provider Should I Use?
  13. Multiple Providers in One App
  14. Troubleshooting
    1. “Invalid API Key”
    2. “Connection refused”
    3. “Model not found”
    4. “Rate limit exceeded”

Supported Providers

LLM4S supports 7 major LLM providers plus local models:

Provider Type Best For Setup
OpenAI Cloud GPT-4, o1 reasoning, most popular Medium
Anthropic Cloud Claude Opus, best for reasoning Medium
Google Gemini Cloud Free tier, Gemini 2.0 models Medium
Azure OpenAI Cloud Enterprise Enterprise deployments, VPC isolation Hard
DeepSeek Cloud Cost-effective, reasoning models Easy
Cohere Cloud Production RAG, low latency Easy
Ollama Local Private, no API key, offline Easy

Provider Selection

How It Works

LLM4S automatically selects the provider based on your LLM_MODEL setting:

1
2
3
4
# Format: <provider>/<model-name>
LLM_MODEL=openai/gpt-4o              # Uses OpenAI
LLM_MODEL=anthropic/claude-opus-4-6  # Uses Anthropic
LLM_MODEL=ollama/mistral              # Uses Ollama

Available Models

See MODEL_METADATA.md for the complete model list. Quick reference:

OpenAI: gpt-4o, gpt-4-turbo, gpt-3.5-turbo

Anthropic: claude-opus-4-6, claude-sonnet-4-5-latest, claude-haiku-3-5

Google Gemini: gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash

DeepSeek: deepseek-chat, deepseek-reasoner

Cohere: command-r-plus, command-r

Ollama: mistral, llama2, neural-chat, nomic-embed-text (100+ models)


OpenAI

Setup

  1. Get an API key from platform.openai.com/api-keys
  2. Set environment variables:
1
2
export LLM_MODEL=openai/gpt-4o
export OPENAI_API_KEY=sk-proj-...
  1. (Optional) Organization ID for multi-workspace accounts:
1
export OPENAI_ORGANIZATION=org-...
  1. (Optional) Custom API base URL for Azure or proxy:
1
export OPENAI_BASE_URL=https://api.openai.com/v1  # Default

Configuration

In application.conf:

1
2
3
4
5
6
7
8
9
llm {
  providers {
    openai {
      api-key = ${?OPENAI_API_KEY}
      base-url = "https://api.openai.com/v1"
      organization = ${?OPENAI_ORGANIZATION}
    }
  }
}

Available Models

  • Latest: gpt-4o, gpt-4o-mini
  • Reasoning: o1-preview, o1-mini
  • Turbo: gpt-4-turbo
  • Legacy: gpt-3.5-turbo

Costs

See OpenAI Pricing. Generally:

  • gpt-4o: $2.50-$10 per 1M input tokens
  • gpt-3.5-turbo: $0.50-$1.50 per 1M input tokens

Tips

  • Use gpt-4o-mini for cost-effective applications
  • Use o1 for complex reasoning and math
  • Batching API available for high-volume use
  • Vision support in gpt-4o

Anthropic

Setup

  1. Get an API key from console.anthropic.com
  2. Set environment variables:
1
2
export LLM_MODEL=anthropic/claude-opus-4-6
export ANTHROPIC_API_KEY=sk-ant-...
  1. (Optional) Custom API base URL:
1
export ANTHROPIC_BASE_URL=https://api.anthropic.com

Configuration

In application.conf:

1
2
3
4
5
6
7
8
9
llm {
  providers {
    anthropic {
      api-key = ${?ANTHROPIC_API_KEY}
      base-url = "https://api.anthropic.com"
      version = "2023-06-01"  # API version
    }
  }
}

Available Models

  • Best Quality: claude-opus-4-6 (200K context)
  • Balanced: claude-sonnet-4-5-latest (200K context)
  • Fast: claude-haiku-3-5 (200K context)

Costs

  • claude-opus-4-6: $3-$15 per 1M input tokens
  • claude-sonnet: $3-$15 per 1M input tokens
  • claude-haiku: $0.80-$4 per 1M input tokens

Claude models generally score higher on reasoning benchmarks.

Tips

  • All Claude models have 200K context window
  • Exceptional at writing and analysis tasks
  • Excellent vision capabilities
  • Supports prompt caching for repeated queries

Google Gemini

Setup

  1. Get an API key from aistudio.google.com/apikey
    • Free tier available (60 requests per minute)
  2. Set environment variables:
1
2
export LLM_MODEL=gemini/gemini-2.0-flash
export GOOGLE_API_KEY=your-api-key
  1. (Optional) Custom API base URL:
1
export GEMINI_BASE_URL=https://generativelanguage.googleapis.com/v1beta

Configuration

In application.conf:

1
2
3
4
5
6
7
8
llm {
  providers {
    gemini {
      api-key = ${?GOOGLE_API_KEY}
      base-url = "https://generativelanguage.googleapis.com/v1beta"
    }
  }
}

Available Models

  • Latest: gemini-2.0-flash (1M context)
  • Advanced: gemini-1.5-pro (1M context)
  • Fast: gemini-1.5-flash (1M context)

Costs

  • Free tier: 60 requests/minute, 2M free tokens/month
  • Paid: Pay as you go (~$0.075-$1.50 per 1M input tokens)

Great for cost-conscious projects and high-volume applications.

Tips

  • Free tier perfect for development and testing
  • 1M context window for processing large documents
  • Very fast inference latency
  • Strong code generation capabilities

Azure OpenAI

Setup

  1. Create resource in Azure Portal
  2. Deploy model (e.g., gpt-4o) to get deployment name
  3. Get credentials from Azure Portal → Keys & Endpoint
  4. Set environment variables:
1
2
3
4
5
export LLM_MODEL=azure/gpt-4o
export AZURE_API_KEY=your-azure-key
export AZURE_API_BASE=https://your-resource.openai.azure.com
export AZURE_DEPLOYMENT_NAME=gpt-4o
export AZURE_API_VERSION=2024-02-15-preview

Configuration

In application.conf:

1
2
3
4
5
6
7
8
9
10
llm {
  providers {
    azure {
      api-key = ${?AZURE_API_KEY}
      api-base = ${?AZURE_API_BASE}
      deployment-name = ${?AZURE_DEPLOYMENT_NAME}
      api-version = "2024-02-15-preview"
    }
  }
}

Available Models

Same as OpenAI (via Azure deployment). Choose models when deploying:

  • gpt-4o
  • gpt-4-turbo
  • gpt-35-turbo

Costs

Similar to OpenAI but often bundled with enterprise agreements.

Tips

  • Use for VPC-isolated workloads
  • Enterprise support available
  • Same API as OpenAI (easy migration)
  • Reserve capacity for predictable costs

DeepSeek

Setup

  1. Get API key from platform.deepseek.com
  2. Set environment variables:
1
2
export LLM_MODEL=deepseek/deepseek-chat
export DEEPSEEK_API_KEY=sk-...

Configuration

In application.conf:

1
2
3
4
5
6
7
8
llm {
  providers {
    deepseek {
      api-key = ${?DEEPSEEK_API_KEY}
      base-url = "https://api.deepseek.com"
    }
  }
}

Available Models

  • Chat: deepseek-chat (best for general use)
  • Reasoning: deepseek-reasoner (extended thinking)

Costs

Very competitive: ~$0.14-$0.28 per 1M input tokens

Tips

  • Excellent cost/performance ratio
  • Reasoning model rivals GPT-4o
  • Good for translations and multilingual tasks
  • Supports very long contexts

Cohere

Setup

  1. Get API key from dashboard.cohere.com
  2. Set environment variables:
1
2
export LLM_MODEL=cohere/command-r-plus
export COHERE_API_KEY=your-key

Configuration

In application.conf:

1
2
3
4
5
6
7
8
llm {
  providers {
    cohere {
      api-key = ${?COHERE_API_KEY}
      base-url = "https://api.cohere.com"
    }
  }
}

Available Models

  • Best: command-r-plus (advanced reasoning)
  • Standard: command-r (balanced)

Costs

Competitive for production RAG use cases.

Tips

  • Optimized for retrieval-augmented generation
  • Fast token generation for streaming
  • Safe and reliable for enterprise use

Ollama (Local Models)

Setup

  1. Install Ollama from ollama.ai
  2. Pull a model:
1
2
ollama pull mistral        # Downloads model
ollama serve               # Runs on http://localhost:11434
  1. Set environment variables:
1
2
export LLM_MODEL=ollama/mistral
export OLLAMA_BASE_URL=http://localhost:11434

No API key needed!

Configuration

In application.conf:

1
2
3
4
5
6
7
llm {
  providers {
    ollama {
      base-url = "http://localhost:11434"
    }
  }
}

Available Models

100+ models available:

  • Small: phi, neural-chat (~4GB)
  • Medium: mistral, llama2 (~13GB)
  • Large: llama2-70b (~40GB)
  • Specialized: neural-chat, orca, wizard-math

Run ollama list to see installed models.

Costs

Free! Just compute (CPU or GPU needed).

Tips

  • Perfect for development and testing
  • Works offline (no internet needed)
  • Use GPU for faster inference
  • Ideal for sensitive data (runs locally)

API Key Management

Security Best Practices

Never commit API keys!

  1. Use environment variables:
1
2
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
  1. Use .env file (add to .gitignore):
1
2
3
# .env (NOT committed to git)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
  1. Use CI/CD secrets:
1
2
3
4
# GitHub Actions
- uses: actions/setup-java@v3
  env:
    OPENAI_API_KEY: $
  1. Rotate keys regularly on provider dashboards

Using Keys Safely in Code

Good:

1
2
// Keys from env/config - never hardcoded
val providerConfig = Llm4sConfig.provider()

Bad:

1
2
3
// ❌ Never do this!
val key = "sk-proj-abc123..."  // Hardcoded
sys.env.get("OPENAI_API_KEY")  // Outside config boundary

Base URL Customization

When to Use Custom Base URLs

  • Reverse proxy or load balancer
  • VPC endpoint for security
  • Azure OpenAI or self-hosted setup
  • Provider migration (e.g., from OpenAI to similar API)

Setting Custom URLs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# OpenAI
export OPENAI_BASE_URL=https://api.openai.com/v1

# Anthropic
export ANTHROPIC_BASE_URL=https://api.anthropic.com

# Azure OpenAI
export AZURE_API_BASE=https://your-resource.openai.azure.com

# Ollama
export OLLAMA_BASE_URL=http://localhost:11434

# Gemini
export GEMINI_BASE_URL=https://generativelanguage.googleapis.com/v1beta

# Cohere
export COHERE_BASE_URL=https://api.cohere.com

# DeepSeek
export DEEPSEEK_BASE_URL=https://api.deepseek.com

Via application.conf

1
2
3
4
5
6
7
8
9
llm {
  providers {
    openai {
      api-key = ${?OPENAI_API_KEY}
      base-url = ${?OPENAI_BASE_URL}
      base-url = "https://proxy.example.com/openai"
    }
  }
}

Provider Comparison Table

Feature OpenAI Anthropic Gemini Azure DeepSeek Cohere Ollama
Setup Difficulty Easy Easy Easy Hard Easy Easy Medium
API Key Required
Free Tier Limited Limited ✅ Generous Limited Limited
Local Option
Context Window 128K 200K 1M 128K 4K-32K 8K Model-specific
Vision Support ⚠️ Limited Model-specific
Function Calling ⚠️ Limited
Reasoning Models ✅ o1 ✅ (via OpenAI) ✅ deepseek-reasoner
Enterprise Support ⚠️ N/A
Cost (Budget) Medium Medium 🏆 Low High 🏆 Very Low Low Free
Speed Fast Medium 🏆 Very Fast Medium Fast Medium Varies
Reliability 🏆 Enterprise 🏆 Enterprise Good 🏆 Enterprise Good Good Local

Which Provider Should I Use?

  • Getting started? → Try Gemini (free tier) or Ollama (local)
  • Production API?OpenAI (most stable) or Anthropic (best reasoning)
  • Cost-conscious?DeepSeek or Ollama
  • Enterprise?Azure OpenAI or Anthropic
  • Private data?Ollama (runs locally)
  • Reasoning tasks?Anthropic Claude or DeepSeek reasoner
  • Vision/multimodal?OpenAI GPT-4o or Anthropic Claude

Multiple Providers in One App

Switch providers at runtime:

1
2
3
4
5
6
7
8
for {
  // Get configured provider from environment
  providerConfig <- Llm4sConfig.provider()
  client <- LLMConnect.getClient(providerConfig)
} yield {
  // Use the available provider
  client.complete(conversation)
}

This enables:

  • Fallback logic - Use OpenAI, fall back to Anthropic
  • A/B testing - Compare provider outputs
  • Cost optimization - Use cheapest available provider

Troubleshooting

“Invalid API Key”

1
2
3
4
5
6
7
# Verify key is set
echo $OPENAI_API_KEY

# Check key format (starts with correct prefix)
# OpenAI: sk-proj-* or sk-*
# Anthropic: sk-ant-*
# Gemini: Should be long alphanumeric

“Connection refused”

For local providers (Ollama):

1
2
3
4
5
# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama
ollama serve

“Model not found”

1
2
3
4
5
6
# Verify model name and provider
export LLM_MODEL=openai/gpt-4o  # Correct format

# Check available models
# OpenAI: https://platform.openai.com/docs/models
# Anthropic: https://docs.anthropic.com/claude/reference/models

“Rate limit exceeded”

Use provider-specific strategies:

  • OpenAI: Wait before retrying, use batching API
  • Gemini: Upgrade from free tier
  • Ollama: Increase system resources or use GPU