OpenClaw is free. The model behind it is where your money goes. The open-source assistant formerly known as Clawdbot (then Moltbot) runs on your own machine and talks to you over Telegram, Discord, or WhatsApp — but every message, heartbeat, and scheduled task is an LLM API call. Anthropic has blocked subscription-tier usage for OpenClaw, so the default path is metered frontier-API pricing, and an always-on agent burns tokens around the clock. The usual escapes are rate-limited free tiers or running Ollama on your own GPU.
There's a third option: OpenClaw treats the model as a config entry. Any OpenAI-compatible endpoint works as a custom provider — and the ParalonCloud inference API is exactly that, free during the early phase, serving Qwen 3.6 27B with native tool calling from a distributed fleet of independent GPU providers. Here's the complete setup, including the part most guides skip: making a 32k-context model genuinely livable in OpenClaw.
1. Install OpenClaw
curl -fsSL https://openclaw.ai/install.sh | bash
openclaw onboard --install-daemon
The onboarding wizard sets up the gateway and walks you through connecting a chat channel (Telegram via @BotFather is the classic first one). When it asks about a model provider, you can skip it — we're about to add our own.
2. Get a free API key
Sign in to the Console, go to API keys, click Create key, and copy the prlc_ token. No credit card — the API is free right now. Then confirm what the network is serving:
curl -s https://paraloncloud.com/v1/models \
-H "Authorization: Bearer prlc_your_api_key"
{
"data": [
{ "id": "qwen3.6-27b", "object": "model", "max_context": 33792 }
]
}
The id field is what goes in the config. Details in the authentication docs.
3. Add the provider to openclaw.json
OpenClaw reads custom providers from models.providers in ~/.openclaw/openclaw.json. Add this and set the model as your primary:
{
"agents": {
"defaults": {
"model": { "primary": "paralon/qwen3.6-27b" },
"compaction": { "reserveTokens": 4096 }
}
},
"models": {
"providers": {
"paralon": {
"baseUrl": "https://paraloncloud.com/v1",
"apiKey": "${PARALON_API_KEY}",
"api": "openai-completions",
"timeoutSeconds": 300,
"models": [
{
"id": "qwen3.6-27b",
"name": "Qwen 3.6 27B",
"reasoning": false,
"input": ["text"],
"cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
"contextWindow": 32768,
"maxTokens": 2048
}
]
}
}
}
}
Restart the gateway so it picks up the config, and export PARALON_API_KEY in the environment the gateway runs in — the ${...} reference keeps the key out of the file.
Three things in there do more work than they look like:
"compaction": { "reserveTokens": 4096 }is the real overflow fix. OpenClaw reserves headroom at the top of the window for its reply and compaction bookkeeping, and the internal default is16384— half of a 32k window gone before you say anything. On a frontier model with a 200k window that's noise; on 32k it's the difference between a usable prompt and constantcontext_length_exceeded. Dropping it to4096gives you roughly 28.6k of usable prompt instead of 16.4k. Keep the rulereserveTokens ≥ maxTokens— withmaxTokensat2048you're well inside it."contextWindow": 32768is not optional. If you omit it, OpenClaw assumes a 200k window, happily builds prompts far past what the model accepts, and you getcontext_length_exceedederrors mid-conversation. Declaring the real window also makes OpenClaw cap tool results automatically so a single web page or file read can't blow the budget."api": "openai-completions"tells OpenClaw to speak the standard Chat Completions dialect. On any non-OpenAI host it automatically downgrades OpenAI-only quirks (like thedeveloperrole) to what a vLLM-class backend expects — no extra compat flags needed.
4. Talk to it
Message your bot. OpenClaw routes the conversation through paralon/qwen3.6-27b, and Qwen 3.6's native function calling drives the agentic parts — web search, file operations, scheduled tasks — through real tool_calls, validated end-to-end on this API.
Living well inside 32k
Honesty section. OpenClaw refuses to run models under a 16k window, so 32k clears the bar — but OpenClaw's system prompt plus tool schemas can occupy roughly half of it before you type a word. That's plenty for a personal assistant exchanging messages and running tools; it is not infinite scrollback. Three habits keep it smooth:
/context detailshows exactly what's eating the window — per-tool schemas, skills, injected files. Disable skills you don't use; every one you drop is tokens back./compactsummarizes older history into a compact entry. Do it when conversations run long, or let OpenClaw's auto-compaction handle it.- Keep workspace files lean. Anything injected into the system prompt (persona files, notes) is paid on every single call.
Troubleshooting
| Symptom | Fix |
|---|---|
context_length_exceeded | first check contextWindow is set to 32768; if it still overflows on 32k, lower compaction.reserveTokens to 4096 (the internal default 16384 eats half the window) |
| Gateway refuses the model (16k minimum) | the window resolved too low; declare contextWindow explicitly |
| 401 Unauthorized | wrong or revoked key — check PARALON_API_KEY is set in the gateway's environment, rotate in the Console |
| 404 model not found | the id must match /v1/models exactly (qwen3.6-27b) |
| Tools print raw JSON instead of executing | add "params": { "extra_body": { "tool_choice": "required" } } under agents.defaults.models["paralon/qwen3.6-27b"] |
| Replies cut off | maxTokens is set to 2048 on purpose (short, chat-sized replies); raise it if you want longer answers, keeping reserveTokens ≥ maxTokens |
| Slow responses time out | raise the provider-level timeoutSeconds (not the agent timeout) |
Why this pairing works
OpenClaw's economics are brutal with metered frontier APIs precisely because it's always on — heartbeats, cron jobs, proactive check-ins. A free OpenAI-compatible endpoint flips that from a liability into the whole point. And the same models.providers recipe works for any backend, so nothing locks you in: swap baseUrl, key, and model ID and you've moved.
The ParalonCloud inference API is live, OpenAI-compatible, and free while the network grows — served not from one datacenter but from independent GPU providers worldwide. Your assistant, your machine, an open model, and an open cloud behind it.



