My fully offline AI-assisted Linux development machine

By Deepu K Sasidharan on May 11, 2026. Originally published on DEV.to.
My fully offline AI-assisted Linux development machine

Originally published at deepu.tech.

One of my most popular posts of all time was when I wrote about my beautiful Linux development machine in 2019. I followed that up in 2021 with my sleek and modern Linux development machine. Since then, a lot has changed in my setup.

I moved from Fedora and KDE to a mostly vanilla Arch Linux setup. I moved from a traditional desktop environment to niri, a scrolling Wayland compositor. And of course, like every developer out there, my workflow now has AI in it. But this time, I wanted something a bit different: AI-assisted development that can run fully offline on my own machine.

Yes, local AI coding on Linux. What is not to love here?

This is not a tutorial on how to reproduce every single bit of my setup. My full personal configuration is private because it has too much machine-specific and personal stuff. But I'm making a stripped-down public version with the bare minimum needed for Arch, niri, DMS, OpenCode, and llama.cpp at deepu105/archdots.

This post is more about the current shape of my Linux development machine and why I ended up with this stack.

This is my primary machine for all of the below.

Machine configuration

The configuration of the machine is quite crucial for this setup. Running a browser, a few IDEs, Docker, terminals, and local LLMs is not exactly a light workload.

My current machine is an ASUS ROG Flow Z13 2025 model. It is a weird little beast. It is technically a tablet, but it has enough CPU, GPU, and memory to behave like a mobile workstation.

Here is the current setup.

The memory is the most interesting part here. For normal development work, 32GB is still fine and 64GB is great. But for local AI work, memory changes everything. A 27B quantized model, a large context window, Docker, Chrome, and an editor can happily eat memory like there is no tomorrow.

Having that much unified memory means the machine can run a useful local coding model without feeling like a science experiment. That is a big deal.

Operating system

I praised Fedora in the previous posts, and I still think Fedora is one of the best Linux distributions for most developers. Updates are smooth, new packages land often, and it mostly stays out of the way.

But this time I went with vanilla Arch Linux. So yes, I use Arch btw! 😉 I know, rolling release and all that. I have been using Linux long enough to know what I was signing up for.

The main reason was simple: I wanted the latest kernel, Mesa, ROCm-adjacent bits, Wayland tools, and desktop packages without waiting for the next distro release. New hardware like the Flow Z13 usually benefits from being closer to the bleeding edge. Arch gives me that. Well, OK, I also fell in love with the sexy new compositors like niri and Hyprland, and Arch is a great way to run those without waiting for backports. I started with Hyprland, but I ended up liking niri better for my workflow, and Arch made it easy to switch and experiment.

My installation is still fairly boring, and I mean that as a compliment.

I also use Topgrade to keep the system updated. My private config even wires it into DankMaterialShell, so I can see available updates from the bar and trigger an update for everything on the system from pacman/AUR, brew, cargo, npm, VS Code plugins, Docker images, and so on in Kitty.

Again, quite simple, at least in my eyes.

Desktop environment, or lack of one

This is probably the biggest change from my previous setup. I no longer run GNOME or KDE as my main desktop. I use niri, which is a scrollable tiling Wayland compositor.

If you have not used niri, the workflow is quite different from a regular tiling window manager. Instead of forcing everything into a fixed grid, windows live in columns and you scroll horizontally across them. It sounds odd until it clicks. Once it clicks, it feels very natural on ultrawide monitors and laptop displays. I especially love the touchpad gestures for switching workspaces and moving windows around. It is a very fluid way to manage windows.

Scrolling workspaces in niri

My current session looks like this.

Niri and DMS

Niri gives me the compositor. DMS gives me the desktop shell pieces that I would otherwise have to stitch together myself.

DMS replaces a lot of the usual Wayland desktop plumbing:

This is the kind of stuff where I do not want to maintain five different tools and a bunch of scripts if one project does the job well enough. DMS is still young, but it is already quite useful, especially with niri. It's also quite extensible, and I have already started adding tools that I want. For example, a locally saved TODO widget.

The Flow Z13 also needs some special handling. I have fixes for ASUS hotkeys, touchpad behavior, keyboard backlight, Thunderbolt rescans, and Wi-Fi quirks in my private config. The public archdots repo will only carry the reusable bits. This is Linux on new hardware, so of course there are quirks. What is a Linux experience without glitches, right?

Development tools

My development tools are still mostly boring, in a good way. These are subjective choices, and they do not matter as long as you are comfortable with your tools.

My development tools

Shell: I use Zsh with zinit, Powerlevel10k, zoxide, and fzf. I still use a bunch of aliases for Git, Docker, package management, Jekyll, and local AI tools.

Terminal: I use Kitty. I have tabs, splits, clipboard bindings, quick access terminal, and a few custom keybindings. It is fast, it works well on Wayland, and it does not get in my way.

Editors: I use Neovim with LazyVim as my default editor. I still use Visual Studio Code depending on the project and what I am testing.

Toolchains: I use SDKMAN! for JDKs, NVM for Node.js, rustup for Rust, Bun, Go, Python, Deno, and the usual Linux build tools.

DevOps: Docker, Docker Compose, kubectl, kdash, Terraform, Distrobox, and so on. Some come from pacman or AUR, some from Homebrew, and some from language-specific installers.

Offline AI-assisted development

Now to the fun part.

I use cloud AI tools as well, and they are useful. But I also wanted a setup where I can code with an AI assistant without sending code, prompts, logs, or half-written ideas to a remote API. Not because every project is secret, but because local-first tooling is a good capability to have especially in a world that's heading towards techno oligarchy.

My current stack is:

Here is my OpenCode provider config:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "llama.cpp": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama.cpp ROCm (local)",
      "options": {
        "baseURL": "http://127.0.0.1:18080/v1"
      },
      "models": {
        "qwen3-6-27b-q8-0": {
          "name": "Qwen3.6 27B Q8_0 (local ROCm)",
          "limit": {
            "context": 262144,
            "output": 16384
          }
        },
        "qwen3-6-27b-q6-k": ...,
        "qwen3-6-27b-q4-k-m": ...,
        "gemma-4-31b-it-q4-k-m": ...,
        "gemma-4-31b-it-q8-0": ...
      }
    },
    "openrouter": {
      "models": {
        "moonshotai/kimi-k2.6": {
          "name": "Kimi K2.6 (OpenRouter backup)",
          "limit": {
            "context": 262144,
            "output": 16384
          }
        },
        "deepseek/deepseek-v4-pro": {
          "name": "DeepSeek V4 Pro (OpenRouter backup)",
          "limit": {
            "context": 1048576,
            "output": 384000
          }
        }
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

I start the local model server with an alias.

llamaServer
Enter fullscreen mode Exit fullscreen mode

That points to a small script. It lets me pick a GGUF model, context size, and reasoning mode. It remembers the last choice, so most of the time I just start it and get going.

The default model and context right now are:

Qwen3.6-27B-Q8_0.gguf - 256k context
Enter fullscreen mode Exit fullscreen mode

Here is a quick llama-bench comparison of the local models on my machine. The numbers are tokens per second with ROCm, full GPU offload, flash attention, f16 KV cache, a 4096-token prompt, a 256-token generation, and 3 repetitions.

Model Quantization Size Prompt tokens/s Generation tokens/s
Qwen3.6 27B Q4_K_M 15.40 GiB 260.06 10.41
Qwen3.6 27B Q6_K 20.56 GiB 279.37 8.70
Qwen3.6 27B Q8_0 26.62 GiB 260.12 7.18
Gemma 4 31B IT Q4_K_M 17.39 GiB 209.57 9.12
Gemma 4 31B IT Q8_0 30.38 GiB 202.31 6.19

The full context is 256k tokens. Here is a benchmark with full context for the Qwen variants.

Model Quantization Size Prompt+Generation tokens/s
Qwen3.6 27B Q4_K_M 15.40 GiB 67.15
Qwen3.6 27B Q6_K 20.56 GiB 65.77
Qwen3.6 27B Q8_0 26.62 GiB 64.34

Running Qwen3.6 27B Q8_0 with 256k context in reasoning mode loads around 70% of the GPU memory in my setup and gives around 64 tokens/s for prompt+generation. That is quite good for a local model with that much context.

The llama.cpp build is also automated with a small script.

cmake -S /mnt/work/Workspace/llms/llama.cpp \
  -B /mnt/work/Workspace/llms/llama.cpp/build-hip \
  -G Ninja \
  -DGGML_HIP=ON \
  -DAMDGPU_TARGETS=gfx1151 \
  -DCMAKE_BUILD_TYPE=Release

cmake --build /mnt/work/Workspace/llms/llama.cpp/build-hip \
  --config Release \
  -j "$(nproc)" \
  --target llama-server llama-bench
Enter fullscreen mode Exit fullscreen mode

The server runs like this under the hood.

ROCBLAS_USE_HIPBLASLT=1 llama-server \
  --model "$model" \
  --alias "$alias_name" \
  --host 127.0.0.1 \
  --port 18080 \
  --ctx-size "$ctx" \
  --n-gpu-layers 999 \
  --flash-attn on \
  --no-mmap \
  --cache-type-k f16 \
  --cache-type-v f16 \
  --batch-size 4096 \
  --ubatch-size 512 \
  --reasoning "$reasoning"
Enter fullscreen mode Exit fullscreen mode

Once the server is running, OpenCode talks to it like it would talk to any OpenAI-compatible provider. The difference is that the whole loop stays on my machine.

It's very elegant IMO!

I do not only use local models, though. For complex tasks, I also use frontier models through OpenRouter, mostly Kimi K2.6 and DeepSeek V4. Occasionally I use Copilot CLI and at work, I use Claude Code as well.

For the harness, I prefer OpenCode. I do not see any noticeable performance difference between Claude Code and OpenCode with Kimi or DeepSeek for the kind of coding tasks I do, which is mostly open source projects in Rust and TypeScript. That might vary for other people, of course, but for me OpenCode has been quite good and I especially prefer its UX over others. I'm trying Pi on the side as well to see if I keep it in the mix.

Why local AI coding matters to me

Local AI is not a replacement for everything. The best hosted models are still better for many tasks, especially when you need maximum reasoning quality or very fast responses. But local models have their own sweet spot.

For me, the advantages are clear.

But there are tradeoffs.

So no, I do not think everyone should run a local coding model. But if you enjoy owning your stack and you have the hardware for it, it is a very satisfying setup.

The AI workflow

My usual workflow is quite simple.

  1. Start the local model server with llamaServer.
  2. Pick the model and context preset if I want to change it.
  3. Start opencode in the repository and pick a model if I want to change it.
  4. Ask it to inspect the codebase before making changes.
  5. Let it edit, test, and iterate, while I review the changes using the opencode-telegram-bot remotely from Telegram.

For small tasks, I turn reasoning off because it makes tool-heavy work faster. For design questions, debugging, or code review, I turn reasoning on. The script makes that a prompt instead of forcing me to remember a long command.

This is the kind of boring automation I like. It removes friction without hiding what is actually happening.

Productivity and media tools

Most of my productivity stack did not change much.

Browser: Google Chrome is still my primary browser. I also keep Firefox around.

Password management: I use Bitwarden and a YubiKey.

Communication: Zoom, Signal, Telegram, and the usual suspects.

Screen capture: DMS screenshot plugin, screen recorder plugin, and OBS Studio when I need more control.

Images and video: Gimp, Inkscape, Kdenlive, and a few Flatpak utilities like Upscayl and Buzz.

File manager: Dolphin, because KDE apps are still excellent even when KDE is not my main desktop.

What is still not perfect

Of course, not everything is perfect. This is bleeding-edge Linux, on a new ASUS convertible, with a new AMD chip, a Wayland compositor, and a local AI stack. If everything worked perfectly on day one, I would be suspicious.

Some current rough edges are below.

None of these are deal breakers for me. Most are either already fixed in my private config or on my TODO list.

Conclusion

This is easily the most interesting Linux machine I have used so far. My 2019 setup was beautiful, my 2021 setup was sleek, and this one feels like a proper local-first AI development workstation.

Vanilla Arch gives me the latest bits. Niri gives me a workflow that fits both the tiny built-in screen and my ultrawide monitor. DMS gives me the desktop polish without a full desktop environment. And OpenCode plus llama.cpp gives me an AI coding assistant that can run without the cloud.

It is not the right setup for everyone. If you want a machine that never asks you to think about kernels, ROCm, compositor configs, or model files, this is probably not it. But for me, this is exactly the kind of developer machine that sparks joy.

The right tool for the right job.

If you like this article, please leave a like or a comment.

You can follow me on Bluesky and LinkedIn.