Coding agents are the best thing that happened to terminals in years — and most of them are welded shut to one vendor's API. pi is the exception worth knowing: a lean, open-source coding agent (the author's tagline: "There are many coding agents, but this one is mine") that treats the model as a config entry, not a subscription. Point it at any OpenAI-compatible endpoint and it works.
That makes it a perfect match for an inference API like ours. ParalonCloud serves open models over a standard OpenAI-compatible Chat Completions API — so pi + ParalonCloud gives you a real agentic coding loop (read files, run commands, edit code, iterate) with no Anthropic or OpenAI account anywhere in the stack. Here's the whole setup, end to end.
1. Install pi
curl -fsSL https://pi.dev/install.sh | sh
The installer needs Node.js 22.19+. Under the hood it's the @earendil-works/pi-coding-agent npm package, and the command it gives you is just pi.
2. Ask the API what models it serves
Don't guess model IDs — list them:
curl -s https://paraloncloud.com/v1/models \
-H "Authorization: Bearer prlc_your_api_key" | python3 -m json.tool
{
"data": [
{ "id": "qwen3.6-27b", "name": "Qwen 3.6 27B", "max_context": 32768 }
]
}
The id field is what you'll put in pi's config. No key yet? Create one in the Console — sign in, go to API keys, click Create key, and copy the prlc_ token it gives you. Full details in the authentication docs.
3. Register the provider in ~/.pi/agent/models.json
pi ships with the big providers built in. Anything else — your company's vLLM box, a fine-tune behind a proxy, or ParalonCloud — goes into models.json as a custom provider:
{
"providers": {
"paralon": {
"baseUrl": "https://paraloncloud.com/v1",
"api": "openai-completions",
"apiKey": "$PARALON_API_KEY",
"compat": {
"supportsDeveloperRole": false,
"supportsReasoningEffort": false
},
"models": [
{
"id": "qwen3.6-27b",
"name": "Qwen 3.6 27B",
"input": ["text"],
"contextWindow": 32768,
"maxTokens": 8192
}
]
}
}
}
The two compat flags matter for any non-OpenAI backend, not just ours: they tell pi to send the system prompt as a plain system role (instead of OpenAI's newer developer role) and to skip the reasoning_effort parameter. With those set, pi speaks exactly the dialect a vLLM-class server expects.
The apiKey field accepts an environment variable reference ("$PARALON_API_KEY") or even a shell command ("!security find-generic-password -ws paralon"), so the key never has to live in the file. If you do inline it, chmod 600 the file.
4. Code
# interactive session
pi --provider paralon --model qwen3.6-27b
# one-shot, non-interactive
pi --provider paralon --model qwen3.6-27b -p "write a hello world HTTP server in Go"
Inside a session, /model switches models on the fly — pi re-reads models.json every time, no restart. pi --list-models paralon shows what it sees. And if you settle on a daily driver, an alias saves the flags:
alias pip='pi --provider paralon --model qwen3.6-27b'
Why Qwen 3.6 27B behind it
An agent is only as good as its tool calling, and Qwen 3.6 27B is currently the strongest agentic-coding model we serve: it runs with native function calling enabled (tool_calls responses, validated end-to-end through this exact API) and a 32k context window — enough for pi to hold a real working set of files plus the conversation. You can see live model specs anytime via /v1/models.
Troubleshooting
| Symptom | Fix |
|---|---|
No models available | models.json missing or invalid JSON; check pi --list-models paralon |
| 401 Unauthorized | wrong or revoked key — verify $PARALON_API_KEY, rotate from the dashboard |
Error mentioning developer role | set compat.supportsDeveloperRole: false |
Error on reasoning_effort | set compat.supportsReasoningEffort: false |
| 404 model not found | the id in models.json must match /v1/models exactly |
| Truncated answers | raise maxTokens |
The same recipe works for any OpenAI-compatible backend — swap baseUrl, key, and model IDs. But if you want to skip running your own GPUs entirely: the ParalonCloud inference API is live, OpenAI-compatible, and served by a distributed fleet of independent GPU providers. Two lines of config, and your coding agent runs on the open cloud.



