Coding agents are the best thing that happened to terminals in years — and most of them are welded shut to one vendor's API. pi is the exception worth knowing: a lean, open-source coding agent (the author's tagline: "There are many coding agents, but this one is mine") that treats the model as a config entry, not a subscription. Point it at any OpenAI-compatible endpoint and it works.

That makes it a perfect match for an inference API like ours. ParalonCloud serves open models over a standard OpenAI-compatible Chat Completions API — so pi + ParalonCloud gives you a real agentic coding loop (read files, run commands, edit code, iterate) with no Anthropic or OpenAI account anywhere in the stack. Here's the whole setup, end to end.

1. Install pi

curl -fsSL https://pi.dev/install.sh | sh

The installer needs Node.js 22.19+. Under the hood it's the @earendil-works/pi-coding-agent npm package, and the command it gives you is just pi.

2. Ask the API what models it serves

Don't guess model IDs — list them:

curl -s https://paraloncloud.com/v1/models \
  -H "Authorization: Bearer prlc_your_api_key" | python3 -m json.tool

{
  "data": [
    { "id": "qwen3.6-27b", "name": "Qwen 3.6 27B", "max_context": 32768 }
  ]
}

The id field is what you'll put in pi's config. No key yet? Create one in the Console — sign in, go to API keys, click Create key, and copy the prlc_ token it gives you. Full details in the authentication docs.

3. Register the provider in `~/.pi/agent/models.json`

pi ships with the big providers built in. Anything else — your company's vLLM box, a fine-tune behind a proxy, or ParalonCloud — goes into models.json as a custom provider:

{
  "providers": {
    "paralon": {
      "baseUrl": "https://paraloncloud.com/v1",
      "api": "openai-completions",
      "apiKey": "$PARALON_API_KEY",
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false
      },
      "models": [
        {
          "id": "qwen3.6-27b",
          "name": "Qwen 3.6 27B",
          "input": ["text"],
          "contextWindow": 32768,
          "maxTokens": 8192
        }
      ]
    }
  }
}

The two compat flags matter for any non-OpenAI backend, not just ours: they tell pi to send the system prompt as a plain system role (instead of OpenAI's newer developer role) and to skip the reasoning_effort parameter. With those set, pi speaks exactly the dialect a vLLM-class server expects.

The apiKey field accepts an environment variable reference ("$PARALON_API_KEY") or even a shell command ("!security find-generic-password -ws paralon"), so the key never has to live in the file. If you do inline it, chmod 600 the file.

4. Code

# interactive session
pi --provider paralon --model qwen3.6-27b

# one-shot, non-interactive
pi --provider paralon --model qwen3.6-27b -p "write a hello world HTTP server in Go"

Inside a session, /model switches models on the fly — pi re-reads models.json every time, no restart. pi --list-models paralon shows what it sees. And if you settle on a daily driver, an alias saves the flags:

alias pip='pi --provider paralon --model qwen3.6-27b'

Why Qwen 3.6 27B behind it

An agent is only as good as its tool calling, and Qwen 3.6 27B is currently the strongest agentic-coding model we serve: it runs with native function calling enabled (tool_calls responses, validated end-to-end through this exact API) and a 32k context window — enough for pi to hold a real working set of files plus the conversation. You can see live model specs anytime via /v1/models.

Troubleshooting

Symptom	Fix
`No models available`	`models.json` missing or invalid JSON; check `pi --list-models paralon`
401 Unauthorized	wrong or revoked key — verify `$PARALON_API_KEY`, rotate from the dashboard
Error mentioning `developer` role	set `compat.supportsDeveloperRole: false`
Error on `reasoning_effort`	set `compat.supportsReasoningEffort: false`
404 model not found	the `id` in `models.json` must match `/v1/models` exactly
Truncated answers	raise `maxTokens`

The same recipe works for any OpenAI-compatible backend — swap baseUrl, key, and model IDs. But if you want to skip running your own GPUs entirely: the ParalonCloud inference API is live, OpenAI-compatible, and served by a distributed fleet of independent GPU providers. Two lines of config, and your coding agent runs on the open cloud.

Run pi, the Open-Source Coding Agent, on Your Own LLM API

1. Install pi

2. Ask the API what models it serves

3. Register the provider in `~/.pi/agent/models.json`

4. Code

Why Qwen 3.6 27B behind it

Troubleshooting

Related Articles

Run AI on Your Own Infrastructure: Why Self-Hosted Inference Is the Enterprise Default

One VRAM Number Can't Schedule LLMs Across Mixed Consumer GPUs

Run OpenClaw on a Free LLM API (Full Qwen 3.6 Setup)

1. Install pi

2. Ask the API what models it serves

3. Register the provider in ~/.pi/agent/models.json

4. Code

Why Qwen 3.6 27B behind it

Troubleshooting

Related Articles

Run AI on Your Own Infrastructure: Why Self-Hosted Inference Is the Enterprise Default

One VRAM Number Can't Schedule LLMs Across Mixed Consumer GPUs

Run OpenClaw on a Free LLM API (Full Qwen 3.6 Setup)

3. Register the provider in `~/.pi/agent/models.json`