importorg.llm4s.config.Llm4sConfigimportorg.llm4s.llmconnect.LLMConnectimportorg.llm4s.llmconnect.model.UserMessageobjectHelloLLMextendsApp{valresult=for{providerConfig<-Llm4sConfig.provider()client<-LLMConnect.getClient(providerConfig)response<-client.complete(messages=List(UserMessage("What is Scala?")),model=None)}yieldresponseresultmatch{caseRight(completion)=>println(s"Response: ${completion.content}")caseLeft(error)=>println(s"Error: $error")}}
Run It
1
2
3
4
5
# Make sure your API key is configuredexport LLM_MODEL=openai/gpt-4o
export OPENAI_API_KEY=sk-...
sbt run
Expected Output
1
2
3
Response: Scala is a high-level programming language that combines
object-oriented and functional programming paradigms. It runs on the
JVM and is known for its strong type system and concurrency support.
importorg.llm4s.config.Llm4sConfigimportorg.llm4s.llmconnect.LLMConnectimportorg.llm4s.llmconnect.model._objectConversationExampleextendsApp{valresult=for{providerConfig<-Llm4sConfig.provider()client<-LLMConnect.getClient(providerConfig)response<-client.complete(messages=List(SystemMessage("You are a helpful programming tutor."),UserMessage("What is Scala?"),AssistantMessage("Scala is a high-level programming language..."),UserMessage("How does it compare to Java?")),model=None)}yieldresponseresult.fold(error=>println(s"Error: $error"),completion=>println(s"Response: ${completion.content}"))}
importorg.llm4s.config.Llm4sConfigimportorg.llm4s.llmconnect.LLMConnectimportorg.llm4s.llmconnect.model.UserMessageimportorg.llm4s.toolapi.{ToolFunction,ToolRegistry}importorg.llm4s.agent.AgentobjectToolExampleextendsApp{// Define a simple tooldefgetWeather(location:String):String={s"The weather in $location is sunny and 72°F"}valweatherTool=ToolFunction(name="get_weather",description="Get current weather for a location",function=getWeather_)valresult=for{providerConfig<-Llm4sConfig.provider()client<-LLMConnect.getClient(providerConfig)tools=newToolRegistry(Seq(weatherTool))agent=newAgent(client)state<-agent.run("What's the weather in Paris?",tools)}yieldstateresult.fold(error=>println(s"Error: $error"),state=>println(s"Final response: ${state.finalResponse}"))}
importorg.llm4s.llmconnect.LLMConnectimportorg.llm4s.llmconnect.model._importorg.llm4s.config.Llm4sConfigobjectStreamingExampleextendsApp{valresult=for{providerConfig<-Llm4sConfig.provider()client<-LLMConnect.getClient(providerConfig)completion<-client.streamComplete(conversation=Conversation(Seq(UserMessage("Write a short poem about Scala")))){chunk=>chunk.content.foreach(print)// Print each token as it arrives}}yieldcompletionresult.fold(error=>println(s"Error: $error"),_=>println("\nDone!"))}
Output appears token-by-token in real-time, like ChatGPT!
importorg.llm4s.config.Llm4sConfigimportorg.llm4s.llmconnect.LLMConnectimportorg.llm4s.llmconnect.model._importorg.llm4s.toolapi.{ToolFunction,ToolRegistry}importorg.llm4s.agent.AgentobjectComprehensiveExampleextendsApp{// Define toolsdefcalculate(expression:String):String={// Simple calculator (use proper eval in production!)s"Result: ${expression} = 42"}valcalcTool=ToolFunction(name="calculate",description="Evaluate a mathematical expression",function=calculate_)// Main programprintln("🚀 Starting LLM4S Example...")valresult=for{providerConfig<-Llm4sConfig.provider()client<-LLMConnect.getClient(providerConfig)tools=newToolRegistry(Seq(calcTool))agent=newAgent(client)// Run agent with tool supportstate<-agent.run("What is 6 times 7? Please use the calculator.",tools)}yieldstateresultmatch{caseRight(state)=>println(s"✅ Success!")println(s"Response: ${state.finalResponse}")println(s"Messages exchanged: ${state.messages.length}")caseLeft(error)=>println(s"❌ Error: $error")System.exit(1)}}
for{providerConfig<-Llm4sConfig.provider()client<-LLMConnect.getClient(providerConfig)response<-client.complete(Conversation(Seq(SystemMessage("You are an expert in..."),UserMessage("Question"))))}yieldresponse.content
// ✅ Good: Create once, reusevalclientResult=for{providerConfig<-Llm4sConfig.provider()client<-LLMConnect.getClient(providerConfig)}yieldclientclientResultmatch{caseRight(client)=>// Reuse for multiple requests(1to10).foreach{i=>client.complete(Conversation(Seq(UserMessage(s"Question $i"))))}caseLeft(error)=>println(s"Error: $error")}// ❌ Bad: Creating new client each time (wasteful)(1to10).foreach{i=>for{providerConfig<-Llm4sConfig.provider()client<-LLMConnect.getClient(providerConfig)// Don't do this!response<-client.complete(Conversation(Seq(UserMessage(s"Q$i"))))}yieldresponse}
2. Use Streaming for Long Responses
Streaming gets you the first token faster and improves perceived latency:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
importorg.llm4s.llmconnect.model._valstreamResult=for{providerConfig<-Llm4sConfig.provider()client<-LLMConnect.getClient(providerConfig)completion<-client.streamComplete(Conversation(Seq(UserMessage("Write a long essay about Scala")))){chunk=>chunk.content.foreach(print)// Prints as tokens arrive}}yieldcompletionstreamResultmatch{caseRight(completion)=>println(s"\nCompleted with ${completion.usage.map(_.totalTokens).getOrElse(0)} tokens")caseLeft(error)=>println(s"Error: $error")}
importscala.concurrent.{Future,ExecutionContext,Await}importscala.concurrent.duration._importExecutionContext.Implicits.globalvalclientResult=for{providerConfig<-Llm4sConfig.provider()client<-LLMConnect.getClient(providerConfig)}yieldclientclientResultmatch{caseRight(client)=>valqueries=List("What is Scala?","What is functional programming?","What is the JVM?")// Run all queries in parallelvalfutures=queries.map{query=>Future{client.complete(Conversation(Seq(UserMessage(query))))}}valresults=Await.result(Future.sequence(futures),30.seconds)results.foreach{caseRight(response)=>println(response.content)caseLeft(error)=>println(s"Error: $error")}caseLeft(error)=>println(s"Error: $error")}
4. Set Appropriate Timeouts
Different operations need different timeouts:
1
2
3
4
5
6
7
8
# In application.confllm4s{# Short timeout for quick queriesrequest-timeout=15seconds# For long-form generation# request-timeout = 60 seconds}
Or override per request:
1
2
3
4
5
// Note: Per-request timeouts are configured via application.conf or provider settings.// The complete method uses the configured timeout automatically.valresponse=client.complete(Conversation(Seq(UserMessage("Quick question"))))
5. Use Cheaper Models for Development
1
2
3
4
5
6
7
8
# Development: Fast and cheapexport LLM_MODEL=openai/gpt-4o-mini # 60x cheaper than gpt-4# Or free with Ollamaexport LLM_MODEL=ollama/llama3.2
# Production: Use when quality mattersexport LLM_MODEL=openai/gpt-4o
6. Batch Embeddings
When generating embeddings for RAG:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// ✅ Good: Batch processingvaldocuments=List("doc1","doc2",..."doc1000")valbatchSize=100valallEmbeddings=documents.grouped(batchSize).flatMap{batch=>embedder.embed(batch)match{caseRight(embeddings)=>embeddingscaseLeft(error)=>println(s"Batch failed: $error")List.empty}}.toList// ❌ Bad: One at a time (slow, expensive)valembeddings=documents.map{doc=>embedder.embed(List(doc))}
7. Monitor Token Usage
Track costs in production:
1
2
3
4
5
6
7
8
9
10
valresponse=client.complete(Conversation(messages))responsematch{caseRight(completion)=>// Note: usage returns Option[TokenUsage], so these are Option[Int]println(s"Prompt tokens: ${completion.usage.map(_.promptTokens)}")println(s"Completion tokens: ${completion.usage.map(_.completionTokens)}")println(s"Total tokens: ${completion.usage.map(_.totalTokens)}")caseLeft(error)=>println(s"Error: $error")}
Next Steps
Great job! You’ve written your first LLM4S programs. Now explore: