Getting started with local LLMs in 2025 using just your CPU

tl;dr

Ollama: CLI + API interface, open source, technical users, Python client

GPT4All: User-friendly GUI, open source, local RAG support

LM Studio: Polished GUI with user modes, performance metrics, extensive model options

Jan: Minimalist interface, open source, flexible model importing, developer-friendly

Text completion was a pretty challenging tasks over the past decades. Tools for predictive text such as T9 were commonly used on mobile phones, but they were not very powerful and had a limited vocabulary that extended depending on the user's input.

LLM (large language models) have made a huge leap in the field of text completion. They are able to generate high-quality text completions, corrections or translations, sometimes being hard to distinguish from human-written text. The technology has gained a lot of attention after the release of ChatGPT in november 2022, although the technology that powered it was available two years earlier.

The evolution of the most recent LLMs continues mostly behind closed doors, but there are several efforts to bring the technology to the local level, and over the past years several projects created sparks of hope for the future of local LLMs, most notably llaMA, Mistral 7B and DeepSeek R1 which can be found in the Chatbot Arena LLM Leaderboard.

There are several limitations regarding the operation of LLMs, such as the need for a powerful GPU or the tooling required to run them. Nvidia is currently the leading provider of GPUs due to their CUDA technology, but there are also efforts to bring the technology to other platforms, such as ROCm and MLX.

The release of "small" language models with 1B to 3B parameters and methods such as Quantization have made it possible to run LLMs on local hardware where no GPU is available.

The bubble of projects built around LLMs has created a lot of opportunities for developers and researchers to get started with the technology. In this article, I want to highlight some of the tools and resources that are available to get started with local LLMs in 2025.

Ollama

The platform is available for macOS, Linux, and Windows. The best way to get started is on the download page where you can find the binaries for your operating system.

After installing Ollama, you should be able to run the ollama command:

Ollama cli

This will show you the available commands and options. You should start by downloading a model from the list of available models, for example, the gemma2:2b or llama3.2:1b are good even when you don't have a GPU available:

ollama pull gemma2:2b

After downloading the model, you can interact with it using the following command:

ollama run gemma2:2b

Ollama chat example

You can also expose the model as a REST API serve command and interact with it using HTTP requests or the ollama client which is available on PyPi.

GPT4All

This is a platform that is better suited for the technical users that are used to managing configurations using a graphical interface. The platform is available for macOS, Linux, and Windows. The best way to get started is on the download page where you can find the binaries for your operating system.

After installing GPT4All, you should be able to start the application. Next you need to do is to download a model:

GPT4All model list

Once you are done you can switch to the chat tab, load the model and interact with it:

GPT4All chat example

LM Studio

This is also a Graphical User Interface (GUI) that is available for macOS, Linux, and Windows. The best way to get started is on the download page where you can find the binaries for your operating system.

After installing LM Studio, you should be able to start the application. First thing you need to do is to download a model for the chat. You can do this by clicking on the gear icon in the bottom right corner and selecting the model you want to download:

LM Studio model list

Once you are done you can switch to the chat tab, load the model and interact with it:

LM Studio chat example

The interface has multiple modes: User, Power User, and Developer. The User mode is the easiest to use, while the Developer mode gives you more control over the model and details about the used resources and performance.

Another feature is the ability to start a local development server that exposes the model as a REST API, which is compatible with the OpenAI API. This allows you to interact with the model using HTTP requests on your local machine.

Jan

Available for macOS, Linux, and Windows. The best way to get started is on the download page where you can find the binaries for your operating system. Jan is similar to LM Studio, you can use it for chat and development (with an OpenAI compatible REST API), but it has a more minimalistic interface and the source code is available on GitHub.

Jan chat example

After the installation you can use the model library to add a model and start chatting with it. It also has options to load models from the Hugging Face or upload already downloaded models.

Jan model list

Conclusion

Having a GPU allows you to get faster responses and run larger models, but it is not a requirement to get started with local LLMs.

For those just getting started, I recommend trying these specific models which perform well on CPU hardware:

LLaMA3.2 1B and 3B - Great balance of speed and capability, works well for general chat
Gemma2 2B - Google's offering with impressive reasoning for its size
Phi3 3.8B - Microsoft's model with strong performance on structured tasks
DeepSeek R1 1.5B - A small sized model that uses chain of thought for reasoning

This article shows some of the tools that are available to host and interact with local LLMs. The technology is evolving rapidly, and so are the tools for managing and interacting with it. The screenshots in the examples above are created just by using models loaded on the CPU.