News & Updates6 min read

Run OpenClaw on a Free LLM API (Full Qwen 3.6 Setup)

OpenClaw is free — the model behind it usually isn't. Here's the complete openclaw.json custom-provider setup to run OpenClaw on ParalonCloud's free OpenAI-compatible inference API, with Qwen 3.6 27B doing the thinking and an honest guide to living inside a 32k context window.

OpenClaw connected to ParalonCloud's free OpenAI-compatible inference API

OpenClaw is free. The model behind it is where your money goes. The open-source assistant formerly known as Clawdbot (then Moltbot) runs on your own machine and talks to you over Telegram, Discord, or WhatsApp — but every message, heartbeat, and scheduled task is an LLM API call. Anthropic has blocked subscription-tier usage for OpenClaw, so the default path is metered frontier-API pricing, and an always-on agent burns tokens around the clock. The usual escapes are rate-limited free tiers or running Ollama on your own GPU.

There's a third option: OpenClaw treats the model as a config entry. Any OpenAI-compatible endpoint works as a custom provider — and the ParalonCloud inference API is exactly that, free during the early phase, serving Qwen 3.6 27B with native tool calling from a distributed fleet of independent GPU providers. Here's the complete setup, including the part most guides skip: making a 32k-context model genuinely livable in OpenClaw.

1. Install OpenClaw

curl -fsSL https://openclaw.ai/install.sh | bash
openclaw onboard --install-daemon

The onboarding wizard sets up the gateway and walks you through connecting a chat channel (Telegram via @BotFather is the classic first one). When it asks about a model provider, you can skip it — we're about to add our own.

2. Get a free API key

Sign in to the Console, go to API keys, click Create key, and copy the prlc_ token. No credit card — the API is free right now. Then confirm what the network is serving:

curl -s https://paraloncloud.com/v1/models \
  -H "Authorization: Bearer prlc_your_api_key"
{
  "data": [
    { "id": "qwen3.6-27b", "object": "model", "max_context": 33792 }
  ]
}

The id field is what goes in the config. Details in the authentication docs.

3. Add the provider to openclaw.json

OpenClaw reads custom providers from models.providers in ~/.openclaw/openclaw.json. Add this and set the model as your primary:

{
  "agents": {
    "defaults": {
      "model": { "primary": "paralon/qwen3.6-27b" },
      "compaction": { "reserveTokens": 4096 }
    }
  },
  "models": {
    "providers": {
      "paralon": {
        "baseUrl": "https://paraloncloud.com/v1",
        "apiKey": "${PARALON_API_KEY}",
        "api": "openai-completions",
        "timeoutSeconds": 300,
        "models": [
          {
            "id": "qwen3.6-27b",
            "name": "Qwen 3.6 27B",
            "reasoning": false,
            "input": ["text"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 32768,
            "maxTokens": 2048
          }
        ]
      }
    }
  }
}

Restart the gateway so it picks up the config, and export PARALON_API_KEY in the environment the gateway runs in — the ${...} reference keeps the key out of the file.

Three things in there do more work than they look like:

  • "compaction": { "reserveTokens": 4096 } is the real overflow fix. OpenClaw reserves headroom at the top of the window for its reply and compaction bookkeeping, and the internal default is 16384 — half of a 32k window gone before you say anything. On a frontier model with a 200k window that's noise; on 32k it's the difference between a usable prompt and constant context_length_exceeded. Dropping it to 4096 gives you roughly 28.6k of usable prompt instead of 16.4k. Keep the rule reserveTokens ≥ maxTokens — with maxTokens at 2048 you're well inside it.
  • "contextWindow": 32768 is not optional. If you omit it, OpenClaw assumes a 200k window, happily builds prompts far past what the model accepts, and you get context_length_exceeded errors mid-conversation. Declaring the real window also makes OpenClaw cap tool results automatically so a single web page or file read can't blow the budget.
  • "api": "openai-completions" tells OpenClaw to speak the standard Chat Completions dialect. On any non-OpenAI host it automatically downgrades OpenAI-only quirks (like the developer role) to what a vLLM-class backend expects — no extra compat flags needed.

4. Talk to it

Message your bot. OpenClaw routes the conversation through paralon/qwen3.6-27b, and Qwen 3.6's native function calling drives the agentic parts — web search, file operations, scheduled tasks — through real tool_calls, validated end-to-end on this API.

Living well inside 32k

Honesty section. OpenClaw refuses to run models under a 16k window, so 32k clears the bar — but OpenClaw's system prompt plus tool schemas can occupy roughly half of it before you type a word. That's plenty for a personal assistant exchanging messages and running tools; it is not infinite scrollback. Three habits keep it smooth:

  • /context detail shows exactly what's eating the window — per-tool schemas, skills, injected files. Disable skills you don't use; every one you drop is tokens back.
  • /compact summarizes older history into a compact entry. Do it when conversations run long, or let OpenClaw's auto-compaction handle it.
  • Keep workspace files lean. Anything injected into the system prompt (persona files, notes) is paid on every single call.

Troubleshooting

SymptomFix
context_length_exceededfirst check contextWindow is set to 32768; if it still overflows on 32k, lower compaction.reserveTokens to 4096 (the internal default 16384 eats half the window)
Gateway refuses the model (16k minimum)the window resolved too low; declare contextWindow explicitly
401 Unauthorizedwrong or revoked key — check PARALON_API_KEY is set in the gateway's environment, rotate in the Console
404 model not foundthe id must match /v1/models exactly (qwen3.6-27b)
Tools print raw JSON instead of executingadd "params": { "extra_body": { "tool_choice": "required" } } under agents.defaults.models["paralon/qwen3.6-27b"]
Replies cut offmaxTokens is set to 2048 on purpose (short, chat-sized replies); raise it if you want longer answers, keeping reserveTokens ≥ maxTokens
Slow responses time outraise the provider-level timeoutSeconds (not the agent timeout)

Why this pairing works

OpenClaw's economics are brutal with metered frontier APIs precisely because it's always on — heartbeats, cron jobs, proactive check-ins. A free OpenAI-compatible endpoint flips that from a liability into the whole point. And the same models.providers recipe works for any backend, so nothing locks you in: swap baseUrl, key, and model ID and you've moved.

The ParalonCloud inference API is live, OpenAI-compatible, and free while the network grows — served not from one datacenter but from independent GPU providers worldwide. Your assistant, your machine, an open model, and an open cloud behind it.

Keep reading

Related Articles