OpenClaw is free. The model behind it is where your money goes. The open-source assistant formerly known as Clawdbot (then Moltbot) runs on your own machine and talks to you over Telegram, Discord, or WhatsApp — but every message, heartbeat, and scheduled task is an LLM API call. Anthropic has blocked subscription-tier usage for OpenClaw, so the default path is metered frontier-API pricing, and an always-on agent burns tokens around the clock. The usual escapes are rate-limited free tiers or running Ollama on your own GPU.

There's a third option: OpenClaw treats the model as a config entry. Any OpenAI-compatible endpoint works as a custom provider — and the ParalonCloud inference API is exactly that, free during the early phase, serving Qwen 3.6 27B with native tool calling from a distributed fleet of independent GPU providers. Here's the complete setup, including the part most guides skip: making a 32k-context model genuinely livable in OpenClaw.

1. Install OpenClaw

curl -fsSL https://openclaw.ai/install.sh | bash
openclaw onboard --install-daemon

The onboarding wizard sets up the gateway and walks you through connecting a chat channel (Telegram via @BotFather is the classic first one). When it asks about a model provider, you can skip it — we're about to add our own.

2. Get a free API key

Sign in to the Console, go to API keys, click Create key, and copy the prlc_ token. No credit card — the API is free right now. Then confirm what the network is serving:

curl -s https://paraloncloud.com/v1/models \
  -H "Authorization: Bearer prlc_your_api_key"

{
  "data": [
    { "id": "qwen3.6-27b", "object": "model", "max_context": 33792 }
  ]
}

The id field is what goes in the config. Details in the authentication docs.

3. Add the provider to `openclaw.json`

OpenClaw reads custom providers from models.providers in ~/.openclaw/openclaw.json. Add this and set the model as your primary:

{
  "agents": {
    "defaults": {
      "model": { "primary": "paralon/qwen3.6-27b" },
      "compaction": { "reserveTokens": 4096 }
    }
  },
  "models": {
    "providers": {
      "paralon": {
        "baseUrl": "https://paraloncloud.com/v1",
        "apiKey": "${PARALON_API_KEY}",
        "api": "openai-completions",
        "timeoutSeconds": 300,
        "models": [
          {
            "id": "qwen3.6-27b",
            "name": "Qwen 3.6 27B",
            "reasoning": false,
            "input": ["text"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 32768,
            "maxTokens": 2048
          }
        ]
      }
    }
  }
}

Restart the gateway so it picks up the config, and export PARALON_API_KEY in the environment the gateway runs in — the ${...} reference keeps the key out of the file.

Three things in there do more work than they look like:

"compaction": { "reserveTokens": 4096 } is the real overflow fix. OpenClaw reserves headroom at the top of the window for its reply and compaction bookkeeping, and the internal default is 16384 — half of a 32k window gone before you say anything. On a frontier model with a 200k window that's noise; on 32k it's the difference between a usable prompt and constant context_length_exceeded. Dropping it to 4096 gives you roughly 28.6k of usable prompt instead of 16.4k. Keep the rule reserveTokens ≥ maxTokens — with maxTokens at 2048 you're well inside it.
"contextWindow": 32768 is not optional. If you omit it, OpenClaw assumes a 200k window, happily builds prompts far past what the model accepts, and you get context_length_exceeded errors mid-conversation. Declaring the real window also makes OpenClaw cap tool results automatically so a single web page or file read can't blow the budget.
"api": "openai-completions" tells OpenClaw to speak the standard Chat Completions dialect. On any non-OpenAI host it automatically downgrades OpenAI-only quirks (like the developer role) to what a vLLM-class backend expects — no extra compat flags needed.

4. Talk to it

Message your bot. OpenClaw routes the conversation through paralon/qwen3.6-27b, and Qwen 3.6's native function calling drives the agentic parts — web search, file operations, scheduled tasks — through real tool_calls, validated end-to-end on this API.

Living well inside 32k

Honesty section. OpenClaw refuses to run models under a 16k window, so 32k clears the bar — but OpenClaw's system prompt plus tool schemas can occupy roughly half of it before you type a word. That's plenty for a personal assistant exchanging messages and running tools; it is not infinite scrollback. Three habits keep it smooth:

/context detail shows exactly what's eating the window — per-tool schemas, skills, injected files. Disable skills you don't use; every one you drop is tokens back.
/compact summarizes older history into a compact entry. Do it when conversations run long, or let OpenClaw's auto-compaction handle it.
Keep workspace files lean. Anything injected into the system prompt (persona files, notes) is paid on every single call.

Troubleshooting

Symptom	Fix
`context_length_exceeded`	first check `contextWindow` is set to `32768`; if it still overflows on 32k, lower `compaction.reserveTokens` to `4096` (the internal default `16384` eats half the window)
Gateway refuses the model (16k minimum)	the window resolved too low; declare `contextWindow` explicitly
401 Unauthorized	wrong or revoked key — check `PARALON_API_KEY` is set in the gateway's environment, rotate in the Console
404 model not found	the `id` must match `/v1/models` exactly (`qwen3.6-27b`)
Tools print raw JSON instead of executing	add `"params": { "extra_body": { "tool_choice": "required" } }` under `agents.defaults.models["paralon/qwen3.6-27b"]`
Replies cut off	`maxTokens` is set to `2048` on purpose (short, chat-sized replies); raise it if you want longer answers, keeping `reserveTokens ≥ maxTokens`
Slow responses time out	raise the provider-level `timeoutSeconds` (not the agent timeout)

Why this pairing works

OpenClaw's economics are brutal with metered frontier APIs precisely because it's always on — heartbeats, cron jobs, proactive check-ins. A free OpenAI-compatible endpoint flips that from a liability into the whole point. And the same models.providers recipe works for any backend, so nothing locks you in: swap baseUrl, key, and model ID and you've moved.

The ParalonCloud inference API is live, OpenAI-compatible, and free while the network grows — served not from one datacenter but from independent GPU providers worldwide. Your assistant, your machine, an open model, and an open cloud behind it.

Run OpenClaw on a Free LLM API (Full Qwen 3.6 Setup)

1. Install OpenClaw

2. Get a free API key

3. Add the provider to `openclaw.json`

4. Talk to it

Living well inside 32k

Troubleshooting

Why this pairing works

Related Articles

Run pi, the Open-Source Coding Agent, on Your Own LLM API

Run AI on Your Own Infrastructure: Why Self-Hosted Inference Is the Enterprise Default

Rent GPUs From Inside Claude — the ParalonCloud MCP Server

1. Install OpenClaw

2. Get a free API key

3. Add the provider to openclaw.json

4. Talk to it

Living well inside 32k

Troubleshooting

Why this pairing works

Related Articles

Run pi, the Open-Source Coding Agent, on Your Own LLM API

Run AI on Your Own Infrastructure: Why Self-Hosted Inference Is the Enterprise Default

Rent GPUs From Inside Claude — the ParalonCloud MCP Server

3. Add the provider to `openclaw.json`