🏠 Home > AI > AI Agent > Education Assistant > Class > Step 3b

Step 3b: Deep Dive into Local LLMs & Ollama

In the previous step, we talked about giving our Agent a “Brain”. Usually, developers use cloud services like OpenAI (ChatGPT) or Google Gemini. However, there is a powerful alternative: Running the AI locally on your own computer.

This guide explains what that means, why it matters, and how to do it using a tool called Ollama.

1. Cloud vs. Local AI: What’s the difference?

Imagine you need to solve a complex math problem. You have two options:

Option A: Call a Genius Friend (Cloud AI)

How it works: You send a text message (API Request) to OpenAI or Google. Their massive supercomputers process it and send back the answer.
Pros: Extremely smart, no work for your computer.
Cons:
- Cost: You often pay per message.
- Privacy: Your data leaves your computer.
- Dependency: No internet = No AI.

Option B: Read a Textbook Yourself (Local AI)

How it works: You download a smaller version of the “brain” (Model) onto your laptop’s hard drive. Your computer’s CPU/GPU does the thinking.
Pros:
- Free: No monthly fees.
- Private: Data never leaves your machine.
- Offline: Works without Wi-Fi.
Cons:
- Heavy: Uses a lot of RAM and battery.
- Dumber: Local models are smaller than Cloud models (like comparing a student to a professor).

2. What is Ollama?

Running an AI model raw (using Python libraries like pytorch or transformers) is very complicated. It involves installing huge drivers and writing complex code.

Ollama is a tool that simplifies this. Think of it like a “Video Game Player” for AI Models.

Game Cartridge = The Model (e.g., Llama 3).
Console = Ollama.

You just tell Ollama “Run Llama 3”, and it handles all the complex math and hardware setup for you.

3. Step-by-Step Setup

Step 1: Install Ollama

Go to ollama.com.
Download the version for your OS (Mac, Windows, or Linux).
Install it like any normal application.

Step 2: “Pull” (Download) a Model

In the terminal (or command prompt), you need to download a brain. We call this “pulling”. The most popular open-source model currently is Llama 3 (created by Meta/Facebook).

Run this command:

ollama pull llama3

Note: This download is about 4.7 GB. It’s a compressed file containing the neural network weights.

Step 3: Test it

Once downloaded, you can chat with it directly in your terminal:

ollama run llama3

You will see a prompt. Type “Hi, who are you?” and it should reply! (Type /bye to exit).

4. Connecting Python to Ollama

Now that Ollama is running, how does our Python code talk to it?

Ollama runs a Local Server in the background on port 11434. It listens for messages, just like a website.

The Code Explanation

In agent_core.py, we added this function:

import urllib.request
import json

def _query_ollama(self, user_input):
    # 1. The Address: We look for Ollama on our own computer (localhost)
    url = "http://localhost:11434/api/generate"
    
    # 2. The Message: We prepare the data (Model name + Prompt)
    data = {
        "model": "llama3",       # The specific brain to use
        "prompt": user_input,    # What we want it to do
        "stream": False          # False = Wait for full answer, True = Typewriter effect
    }
    
    # 3. Sending the Letter: We convert data to JSON and send it
    req = urllib.request.Request(url, data=json.dumps(data).encode('utf-8'))
    
    # 4. Reading the Reply
    with urllib.request.urlopen(req) as response:
        result = json.loads(response.read().decode('utf-8'))
        return result.get('response', '')

Why use `urllib` instead of `import ollama`?

There is a Python library for Ollama, but using standard tools (urllib) is better for learning because:

No Installation: It works on every Python setup without pip install.
Transparency: You see exactly how the “API Request” works (URL + Data -> Response).

5. Summary

By adding Ollama to our Education Assistant, we created a Hybrid Agent:

It tries to use the Cloud (Gemini/OpenAI) first, because it’s smarter.
If that fails (no keys/internet), it switches to Local (Ollama/Llama3).
If even that fails, it falls back to Regex (simple rules).

This makes your application incredibly robust and accessible!

Lecture Series

Python, AI, and Cybersecurity Resources

Step 3b: Deep Dive into Local LLMs & Ollama

1. Cloud vs. Local AI: What’s the difference?

Option A: Call a Genius Friend (Cloud AI)

Option B: Read a Textbook Yourself (Local AI)

2. What is Ollama?

3. Step-by-Step Setup

Step 1: Install Ollama

Step 2: “Pull” (Download) a Model

Step 3: Test it

4. Connecting Python to Ollama

The Code Explanation

Why use `urllib` instead of `import ollama`?

5. Summary

Step 3b: Deep Dive into Local LLMs & Ollama

1. Cloud vs. Local AI: What’s the difference?

Option A: Call a Genius Friend (Cloud AI)

Option B: Read a Textbook Yourself (Local AI)

2. What is Ollama?

3. Step-by-Step Setup

Step 1: Install Ollama

Step 2: “Pull” (Download) a Model

Step 3: Test it

4. Connecting Python to Ollama

The Code Explanation

Why use urllib instead of import ollama?

5. Summary

Why use `urllib` instead of `import ollama`?