Large language models (LLMs) have proven to be a powerful way to interact with digital information in a conversational manner. LLMs allow anyone to easily retrieve a well-formatted and expert response on a variety of topics. They can supply recipes, help outline a rough draft, and much more.
As a software engineer, I'm interested in how LLMs can help write code.
Ollama
Most commercial models (think ChatGPT, Gemini, Copilot) run in the cloud. The benefit of running in the cloud is that the "cloud" is actually, likely, a series of Nvidia H100s. These are incredible (and expensive) GPUs with 80GBs of high-bandwidth memory. This essentially lowers the response time of the model to the latency of your network connection.
Conversely, Ollama was created as a tool for people to run LLMs on local hardware. The downside of running an LLM on your local hardware is that you probably don't have an 80GB GPU.
Large language models are... well, large. In order to run a high-parameter model, you would ideally be able to load the entire model into your GPU's memory. If you don't have a GPU, then you probably won't be able to run many models efficiently (or at all).
If you're lucky enough to have a newer Macbook (with lots of unified memory)— or a newer high-VRAM GPU, you can still get reasonably quick and useful responses out of a local LLM using Ollama.
To install on MacOS, I use the Homebrew package:
brew install ollama
In order to run ollama
locally, you can execute ollama serve
. However, I would recommend running:
brew services start ollama
This command registers the formula to launch at login. Any time you start your computer, you will be able to make requests (HTTP) to the Ollama language server.
To pull your first model, peruse the Ollama model repo and run the following:
# ollama pull <model>:<tag>
ollama pull deepseek-r1:14b
Or, run the model directly (pull and execute):
# ollama run <model>:<tag>
ollama run deepseek-r1:14b
Error Downloading New Models
In some cases (like on a company proxy), you may need to stop the brew service and run the Ollama server manually in order to download new models. This is probably a bug or a quirk.
brew services stop ollama && ollama serve
Integrating with Neovim
Using a separate window from your text editor or IDE can sometimes feel like an interruption to your workflow. I interact with the Ollama LLM from Neovim, and have found the codecompanion.nvim
plugin to be a promising integration.
You can setup a keyboard shortcut to run the :CodeCompanionChat
command which opens a new buffer inside Neovim. That buffer acts as a chat interface to with the Ollama server. You can easily yank and paste code and other text without leaving Neovim.