Best Open-Source Language Models for Tool Calling in 2025 (With Ollama Setup Guide)
As AI continues to evolve, developers are pushing language models beyond simple text generation. One key feature in high demand is tool calling — the ability for a model to select and execute predefined functions based on natural language input.
While GPT-4 and Claude 3 handle this with ease, open-source alternatives have traditionally lagged behind. But with advancements in models like OpenChat, Deepseek, and Qwen, open-source LLMs are now catching up.
So, which open-source models are actually good at tool calling in 2025? And how do you run them locally using Ollama? Let’s break down the problem, explore top model choices, and offer a step-by-step solution.
1. The Problem: Tool Calling in Open-Source LLMs
Tool calling requires structured reasoning: the model must parse a user's request, map it to a tool, and generate the correct arguments in a predefined schema (usually JSON).
Challenges include:
- Hallucinating tool names or incorrect parameters
- Malformed JSON output
- Lack of structured training for function invocation
“GPT-4 can do it, but I want an open-source option. Anything close?”
2. Understanding Tool Calling
Tool calling enables an LLM to:
- Recognize user intent
- Select an appropriate tool/function
- Return structured arguments (e.g., JSON)
Use cases include AI assistants, coding agents, web automation, and chatbot plugins.
3. Best Open-Source Models for Tool Calling
✅ OpenChat 3.5 / 3.6
Strengths: Instruction-following, low hallucination, schema-adherence
Command: ollama run openchat
✅ Deepseek-V2
Strengths: Code and function reasoning, nested tool support
Command: ollama run deepseek-coder
✅ Qwen1.5 / Qwen2
Strengths: Multilingual, accurate JSON handling
Command: ollama run qwen
✅ CodeGemma / Code LLaMA
Strengths: Schema parsing, code structure understanding
Note: Best for IDEs and developer tools; may require schema priming.
✅ Phi-3
Strengths: Works on 8GB RAM, reliable for simple calls
Command: ollama run phi3
4. Tips to Improve Tool-Calling Accuracy
- Use a strong system prompt: “You are an AI assistant. Only respond in valid JSON using the tools provided.”
- Define tools with clarity:
[
{
"name": "get_weather",
"description": "Returns weather info",
"parameters": { "city": "string" }
}
]
- Fine-tune using LoRA: If your use case is specialized, fine-tuning yields higher precision.
5. How to Run These Models with Ollama
-
Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh
-
Run a model:
ollama run openchat
ollama run deepseek-coder
ollama run phi3
- Test with a sample function call and check the output.
6. Community Insights
- OpenChat 3.5 is widely praised for accurate tool selection.
- Deepseek-V2 handles structured multi-parameter functions well.
- Phi-3 works surprisingly well on entry-level hardware.
- Qwen1.5 needs more memory but delivers solid accuracy.
Recommended toolkits to pair with these models include CrewAI, LangGraph, and OpenDevin.
7. Conclusion
Tool calling is no longer limited to proprietary models. In 2025, open-source LLMs like OpenChat, Deepseek, and Qwen provide reliable function calling — especially when used with clear prompts and local runners like Ollama.
Model recommendations based on use case:
- For lightweight usage: Phi-3 or Deepseek 1.3B
- For accurate JSON output: OpenChat 3.6 or Deepseek-V2
- For complex apps: Qwen1.5 or CodeGemma (with more RAM)
All you need is Ollama to get started with these models locally — no API key or cloud required.
8. Sample Prompt for Testing
System Prompt:
You are an AI agent. Only use the tools provided below. Return a JSON object with the selected tool and parameters.
Tools:
[
{
"name": "get_weather",
"description": "Gets the weather for a specific city",
"parameters": { "city": "string" }
},
{
"name": "get_news",
"description": "Fetches top news headlines based on a topic",
"parameters": { "topic": "string" }
}
]
User: What’s the weather like in Chicago today?
Expected Output:
{
"tool": "get_weather",
"parameters": {
"city": "Chicago"
}
}
FAQ
What is tool calling in LLMs?
Tool calling allows a language model to choose and invoke a predefined function (like getting the weather or accessing data) by formatting input parameters correctly — typically in JSON.
Which open-source models are best for tool calling?
Top models include OpenChat 3.5/3.6, Deepseek-V2, Qwen1.5, CodeGemma, and Phi-3 — depending on your hardware and use case.
Can I run these models locally?
Yes. You can run these models locally using Ollama with a simple command-line install, even on modest hardware.
Comments
Post a Comment