News & Updates4 min read

Run pi, the Open-Source Coding Agent, on Your Own LLM API

pi is an open-source coding agent in the spirit of Claude Code — except you choose the model behind it. Here's the full setup to point pi at any OpenAI-compatible endpoint, using ParalonCloud's inference API and Qwen 3.6 27B as the worked example.

pi coding agent connected to ParalonCloud's OpenAI-compatible inference API

Coding agents are the best thing that happened to terminals in years — and most of them are welded shut to one vendor's API. pi is the exception worth knowing: a lean, open-source coding agent (the author's tagline: "There are many coding agents, but this one is mine") that treats the model as a config entry, not a subscription. Point it at any OpenAI-compatible endpoint and it works.

That makes it a perfect match for an inference API like ours. ParalonCloud serves open models over a standard OpenAI-compatible Chat Completions API — so pi + ParalonCloud gives you a real agentic coding loop (read files, run commands, edit code, iterate) with no Anthropic or OpenAI account anywhere in the stack. Here's the whole setup, end to end.

1. Install pi

curl -fsSL https://pi.dev/install.sh | sh

The installer needs Node.js 22.19+. Under the hood it's the @earendil-works/pi-coding-agent npm package, and the command it gives you is just pi.

2. Ask the API what models it serves

Don't guess model IDs — list them:

curl -s https://paraloncloud.com/v1/models \
  -H "Authorization: Bearer prlc_your_api_key" | python3 -m json.tool
{
  "data": [
    { "id": "qwen3.6-27b", "name": "Qwen 3.6 27B", "max_context": 32768 }
  ]
}

The id field is what you'll put in pi's config. No key yet? Create one in the Console — sign in, go to API keys, click Create key, and copy the prlc_ token it gives you. Full details in the authentication docs.

3. Register the provider in ~/.pi/agent/models.json

pi ships with the big providers built in. Anything else — your company's vLLM box, a fine-tune behind a proxy, or ParalonCloud — goes into models.json as a custom provider:

{
  "providers": {
    "paralon": {
      "baseUrl": "https://paraloncloud.com/v1",
      "api": "openai-completions",
      "apiKey": "$PARALON_API_KEY",
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false
      },
      "models": [
        {
          "id": "qwen3.6-27b",
          "name": "Qwen 3.6 27B",
          "input": ["text"],
          "contextWindow": 32768,
          "maxTokens": 8192
        }
      ]
    }
  }
}

The two compat flags matter for any non-OpenAI backend, not just ours: they tell pi to send the system prompt as a plain system role (instead of OpenAI's newer developer role) and to skip the reasoning_effort parameter. With those set, pi speaks exactly the dialect a vLLM-class server expects.

The apiKey field accepts an environment variable reference ("$PARALON_API_KEY") or even a shell command ("!security find-generic-password -ws paralon"), so the key never has to live in the file. If you do inline it, chmod 600 the file.

4. Code

# interactive session
pi --provider paralon --model qwen3.6-27b

# one-shot, non-interactive
pi --provider paralon --model qwen3.6-27b -p "write a hello world HTTP server in Go"

Inside a session, /model switches models on the fly — pi re-reads models.json every time, no restart. pi --list-models paralon shows what it sees. And if you settle on a daily driver, an alias saves the flags:

alias pip='pi --provider paralon --model qwen3.6-27b'

Why Qwen 3.6 27B behind it

An agent is only as good as its tool calling, and Qwen 3.6 27B is currently the strongest agentic-coding model we serve: it runs with native function calling enabled (tool_calls responses, validated end-to-end through this exact API) and a 32k context window — enough for pi to hold a real working set of files plus the conversation. You can see live model specs anytime via /v1/models.

Troubleshooting

SymptomFix
No models availablemodels.json missing or invalid JSON; check pi --list-models paralon
401 Unauthorizedwrong or revoked key — verify $PARALON_API_KEY, rotate from the dashboard
Error mentioning developer roleset compat.supportsDeveloperRole: false
Error on reasoning_effortset compat.supportsReasoningEffort: false
404 model not foundthe id in models.json must match /v1/models exactly
Truncated answersraise maxTokens

The same recipe works for any OpenAI-compatible backend — swap baseUrl, key, and model IDs. But if you want to skip running your own GPUs entirely: the ParalonCloud inference API is live, OpenAI-compatible, and served by a distributed fleet of independent GPU providers. Two lines of config, and your coding agent runs on the open cloud.

Related Articles