LLM

10:32PMDecember 21 2024Daniel Tompkins

Archive KB

Large lan­guage models (LLMs) have proven to be a pow­erful way to in­teract with dig­ital in­for­ma­tion in a con­ver­sa­tional manner. LLMs allow anyone to easily re­trieve a well-for­matted and ex­pert re­sponse on a va­riety of topics. They can supply recipes, help out­line a rough draft, and much more.

As a soft­ware en­gi­neer, I'm in­ter­ested in how LLMs can help write code.

Ol­lama

Most com­mer­cial models (think ChatGPT, Gemini, Copilot) run in the cloud. The ben­efit of run­ning in the cloud is that the "cloud" is ac­tu­ally, likely, a se­ries of Nvidia H100s. These are in­cred­ible (and ex­pen­sive) GPUs with 80GBs of high-band­width memory. This es­sen­tially lowers the re­sponse time of the model to the la­tency of your net­work con­nec­tion.

Con­versely, Ol­lama was cre­ated as a tool for people to run LLMs on local hard­ware. The down­side of run­ning an LLM on your local hard­ware is that you prob­ably don't have an 80GB GPU.

Large lan­guage models are... well, large. In order to run a high-pa­ra­meter model, you would ide­ally be able to load the en­tire model into your GPU's memory. If you don't have a GPU, then you prob­ably won't be able to run many models ef­fi­ciently (or at all).

If you're lucky enough to have a newer Mac­book (with lots of uni­fied memory)— or a newer high-VRAM GPU, you can still get rea­son­ably quick and useful re­sponses out of a local LLM using Ol­lama.

To in­stall on MacOS, I use the Home­brew package:

brew install ollama
shell

In order to run ollama lo­cally, you can ex­e­cute ollama serve. How­ever, I would rec­om­mend run­ning:

brew services start ollama
shell

This com­mand reg­is­ters the for­mula to launch at login. Any time you start your com­puter, you will be able to make re­quests (HTTP) to the Ol­lama lan­guage server.

To pull your first model, pe­ruse the Ol­lama model repo and run the fol­lowing:

# ollama pull <model>:<tag> ollama pull deepseek-r1:14b
shell

Or, run the model di­rectly (pull and ex­e­cute):

# ollama run <model>:<tag> ollama run deepseek-r1:14b
shell

Error Down­loading New Models

In some cases (like on a com­pany proxy), you may need to stop the brew ser­vice and run the Ol­lama server man­u­ally in order to down­load new models. This is prob­ably a bug or a quirk.

brew services stop ollama && ollama serve
shell

In­te­grating with Neovim

Using a sep­a­rate window from your text ed­itor or IDE can some­times feel like an in­ter­rup­tion to your work­flow. I in­teract with the Ol­lama LLM from Neovim, and have found the codecompanion.nvim plugin to be a promising in­te­gra­tion.

You can setup a key­board shortcut to run the :CodeCompanionChat com­mand which opens a new buffer in­side Neovim. That buffer acts as a chat in­ter­face to with the Ol­lama server. You can easily yank and paste code and other text without leaving Neovim.