Run AI Models Locally - Ollama Quickstart

Get started with local AI development. Learn to install and use Ollama to run powerful AI models on your own machine for enhanced privacy, speed, and cost-efficiency.

Cloud-based AI APIs are powerful, but they're not always the right choice. Sending sensitive data to third-party servers raises privacy concerns, recurring API costs can become unpredictable, and network latency can slow down your applications. For many developers, these limitations are deal-breakers.

Ollama solves these problems by making it simple to run state-of-the-art Large Language Models directly on your own hardware. This open-source tool streamlines downloading, running, and managing LLMs on macOS, Windows, or Linux, putting you in complete control of your AI development stack.

This tutorial will get you up and running with Ollama quickly. You'll learn to install the tool, download your first model, and interact with it through both command line and Python. By the end, you'll have a fully operational local AI setup ready for building private, cost-effective applications.

Tutorial Goals

Understand the pros and cons of running AI models locally
Install Ollama and run your first AI model locally
Use the Ollama Python SDK to control AI models from Python

Why Run AI Models Locally?

Running AI models on your own machine offers several compelling advantages over cloud APIs:

• Complete data privacy - Your sensitive documents, proprietary code, and user data never leave your machine. Perfect for healthcare apps, legal document analysis, or any application handling confidential information.

• Zero ongoing costs - No per-token charges or usage limits. Once you have the hardware, experiment freely without worrying about API bills. Great for development, testing, or high-volume applications.

• Works offline - Build AI features that function without internet. Essential for edge devices, remote locations, or applications that need guaranteed uptime regardless of network conditions.

• Instant responses - Eliminate network latency for snappy user experiences (if you have the required hardware). Ideal for real-time code assistants, interactive chatbots, or applications where every millisecond matters.

• Full control - No rate limits, no API changes breaking your app, and no vendor lock-in. You decide when to update models and how to configure them.

How to Get Started with Ollama

Hardware Requirements

Running AI models locally isn't exactly free. You need a decent hardware (at least 16GB of RAM and 8GB of VRAM) to run the models. Want to run state-of-the-art models? You'll need 160GB+ of VRAM for some of the larger models in compressed (quantized) format.

Getting Ollama running takes just a few minutes. Let's install it, grab a model, and start chatting.

Step 1: Installation

Install Ollama for your operating system:

Using Homebrew is the fastest way:

brew install --cask ollama

After installation, you'll see an Ollama icon in your menu bar - it's now running as a background service.

Verify everything worked by checking the version:

ollama --version

Output

ollama version is 0.9.6

Step 2: Download Your First Model

Now let's grab a capable model to work with. We'll use qwen3:4b - it's fast, smart, and perfect for getting started:

ollama pull qwen3:4b

This downloads the 4-billion parameter Qwen3 model to your machine. It'll take a few minutes depending on your internet speed.

Pro Tip

Want to try more models? Hundreds are available at the Ollama library for free. Be sure to check their licenses.

Step 3: Start Chatting

Time to test your new AI assistant. Start an interactive chat:

ollama run qwen3:4b

You'll see a >>> prompt. Ask it anything:

>>> Explain what a Large Language Model is in one sentence.

Output

A Large Language Model is a type of artificial intelligence system that has been trained on vast amounts of text data to understand and generate human-like text.

Useful commands:

ollama list - See all your downloaded models
/bye - Exit the chat session

ollama list

Output

NAME                                      ID              SIZE      MODIFIED
qwen3:4b                                  2bfd38a7daaf    2.6 GB    30 seconds ago

Control AI Models with Python

Project Setup

You can find the complete code on GitHub: Ollama Quickstart Notebook

The command line is great for quick tests, but real applications need to control AI models through code. The Ollama Python SDK makes this simple and straightforward. Let's install it:

pip install ollama

Your First AI-Powered Program

Open a new Jupyter Notebook and let's build a simple AI client. This example shows the core workflow: connect to Ollama, send a prompt, and get a response:

from ollama import ChatResponse, chat
 
response: ChatResponse = chat(
    model="qwen3:4b",
    messages=[
        {
            "role": "user",
            "content": "Explain what a Large Language Model is in one sentence.",
        },
    ],
    think=False,
)
 
print(response.message.content)

Output

A Large Language Model is a type of artificial intelligence system that has been trained on vast amounts of text data to understand and generate human-like text.

Here's what each part does:

Import the SDK - from ollama import ChatResponse, chat gives us the tools to talk to our local model
Call the model - chat() sends our prompt to the qwen3:4b model running locally
Structure the conversation - The messages format with role and content is the standard way to chat with AI models
Get the response - response.message.content contains the AI's answer
Set thinking mode - think=False tells the model to respond directly without showing its reasoning process

That's it! You now have the foundation for building any AI-powered application. This programmatic control opens the door to more advanced systems, starting with learning how to craft effective prompts in the next tutorial on Prompt Engineering.

Where to Learn More?

Conclusion

You just broke free from cloud AI dependency! In a few minutes, you've set up your own local AI powerhouse. Here's what you can now do:

Run AI models locally - No more API keys, no more usage limits
Keep your data private - Everything stays on your machine
Build with Python - Integrate AI into any application
Experiment freely - Test ideas without worrying about costs

This is your foundation for serious local AI development. You have the engine running - now you need to learn how to drive it effectively. Ready to unlock your model's full potential? Next up: Prompt Engineering - where you'll master the art of getting what you want from your AI models.

AI Systems Engineering

Tutorial Goals

Why Run AI Models Locally?

How to Get Started with Ollama

Hardware Requirements

Step 1: Installation

Step 2: Download Your First Model

Pro Tip

Step 3: Start Chatting

Control AI Models with Python

Project Setup

Your First AI-Powered Program

Where to Learn More?

Conclusion

Prompt Engineering