Run AI Models Locally - Ollama Quickstart

Get started with local AI development. Learn to install and use Ollama to run powerful AI models on your own machine for enhanced privacy, speed, and cost-efficiency.

Tutorial banner

Cloud-based AI APIs are powerful, but they're not always the right choice. Sending sensitive data to third-party servers raises privacy concerns, recurring API costs can become unpredictable, and network latency can slow down your applications. For many developers, these limitations are deal-breakers.

Ollama solves these problems by making it simple to run state-of-the-art Large Language Models directly on your own hardware. This open-source tool streamlines downloading, running, and managing LLMs on macOS, Windows, or Linux, putting you in complete control of your AI development stack.

This tutorial will get you up and running with Ollama quickly. You'll learn to install the tool, download your first model, and interact with it through both command line and Python. By the end, you'll have a fully operational local AI setup ready for building private, cost-effective applications.

Tutorial Goals

  • Understand the pros and cons of running AI models locally
  • Install Ollama and run your first AI model locally
  • Use the Ollama Python SDK to control AI models from Python

Why Run AI Models Locally?

Running AI models on your own machine offers several compelling advantages over cloud APIs:

Complete data privacy - Your sensitive documents, proprietary code, and user data never leave your machine. Perfect for healthcare apps, legal document analysis, or any application handling confidential information.

Zero ongoing costs - No per-token charges or usage limits. Once you have the hardware, experiment freely without worrying about API bills. Great for development, testing, or high-volume applications.

Works offline - Build AI features that function without internet. Essential for edge devices, remote locations, or applications that need guaranteed uptime regardless of network conditions.

Instant responses - Eliminate network latency for snappy user experiences (if you have the required hardware). Ideal for real-time code assistants, interactive chatbots, or applications where every millisecond matters.

Full control - No rate limits, no API changes breaking your app, and no vendor lock-in. You decide when to update models and how to configure them.

How to Get Started with Ollama

Getting Ollama running takes just a few minutes. Let's install it, grab a model, and start chatting.

Step 1: Installation

Install Ollama for your operating system:

Using Homebrew is the fastest way:

brew install --cask ollama

After installation, you'll see an Ollama icon in your menu bar - it's now running as a background service.

Verify everything worked by checking the version:

ollama --version
Output
ollama version is 0.9.6

Step 2: Download Your First Model

Now let's grab a capable model to work with. We'll use qwen3:4b - it's fast, smart, and perfect for getting started:

ollama pull qwen3:4b

This downloads the 4-billion parameter Qwen3 model to your machine. It'll take a few minutes depending on your internet speed.

Step 3: Start Chatting

Time to test your new AI assistant. Start an interactive chat:

ollama run qwen3:4b

You'll see a >>> prompt. Ask it anything:

>>> Explain what a Large Language Model is in one sentence.
Output
A Large Language Model is a type of artificial intelligence system that has been trained on vast amounts of text data to understand and generate human-like text.

Useful commands:

  • ollama list - See all your downloaded models
  • /bye - Exit the chat session
ollama list
Output
NAME                                      ID              SIZE      MODIFIED
qwen3:4b                                  2bfd38a7daaf    2.6 GB    30 seconds ago

Control AI Models with Python

The command line is great for quick tests, but real applications need to control AI models through code. The Ollama Python SDK makes this simple and straightforward. Let's install it:

pip install ollama

Your First AI-Powered Program

Open a new Jupyter Notebook and let's build a simple AI client. This example shows the core workflow: connect to Ollama, send a prompt, and get a response:

python code
from ollama import ChatResponse, chat
 
response: ChatResponse = chat(
    model="qwen3:4b",
    messages=[
        {
            "role": "user",
            "content": "Explain what a Large Language Model is in one sentence.",
        },
    ],
    think=False,
)
 
print(response.message.content)
Output
A Large Language Model is a type of artificial intelligence system that has been trained on vast amounts of text data to understand and generate human-like text.

Here's what each part does:

  • Import the SDK - from ollama import ChatResponse, chat gives us the tools to talk to our local model
  • Call the model - chat() sends our prompt to the qwen3:4b model running locally
  • Structure the conversation - The messages format with role and content is the standard way to chat with AI models
  • Get the response - response.message.content contains the AI's answer
  • Set thinking mode - think=False tells the model to respond directly without showing its reasoning process

That's it! You now have the foundation for building any AI-powered application. This programmatic control opens the door to more advanced systems, starting with learning how to craft effective prompts in the next tutorial on Prompt Engineering.

Where to Learn More?

  1. Ollama's Docs
  2. Ollama's GitHub Repository

Conclusion

You just broke free from cloud AI dependency! In a few minutes, you've set up your own local AI powerhouse. Here's what you can now do:

  • Run AI models locally - No more API keys, no more usage limits
  • Keep your data private - Everything stays on your machine
  • Build with Python - Integrate AI into any application
  • Experiment freely - Test ideas without worrying about costs

This is your foundation for serious local AI development. You have the engine running - now you need to learn how to drive it effectively. Ready to unlock your model's full potential? Next up: Prompt Engineering - where you'll master the art of getting what you want from your AI models.

Loading...