Code with HF Inference Providers and llama.cpp
Use the latest open weight models via Hugging Face Inference Providers:
This starts fast-agent pre-configured with a coding agent and filesystem search sub-agent.
By default you will be prompted for the coding model to use, with gpt-oss-120b used for search.
To change these defaults use:
The agent has a minimal system prompt set of tools for accessing the shell, filesystem and fast-agent services. The system prompt includes AGENTS.md if present. Customise the agent by modifying .fast-agent/agent-cards/dev.md
Use /skills to discover, add, remove and update skills. Use /connect to connect to MCP Servers.
Use the compaction-strategies skill to set up your preferred compaction scheme (if any).
Installation
fast-agent required Python 3.13 or above. Install with:
Or a specific version of python:
This installs the fast-agent executable.
llama.cpp
To use models hosted locally with llama.cpp, start llama-server with your chosen model and then run:
This reads the correct model parameters (e.g. context window size) from the llama.cpp server, and configures the fast-agent model settings correctly.

Create a model overlay file for future use, or start immediately with "Start now".