Run open LLMs through an OpenAI-compatible API, served on a decentralized network of GPU nodes.

The ParalonCloud Inference API lets you call open large language models over HTTP — no GPU to provision, no model to download. It's OpenAI-compatible, so any tool or SDK that speaks the OpenAI API works by changing two lines: the base URL and the key. Requests are routed to GPU nodes across the network that can serve the model you ask for.

Base URL

https://paraloncloud.com/v1

The API mirrors the OpenAI surface — /v1/chat/completions, /v1/models — and authenticates with a bearer key.

Quick start

Point the official OpenAI Python SDK at ParalonCloud:

from openai import OpenAI

client = OpenAI(
    api_key="prlc_your_key_here",
    base_url="https://paraloncloud.com/v1",
)

response = client.chat.completions.create(
    model="qwen3.6-27b",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)

Or with curl:

curl https://paraloncloud.com/v1/chat/completions \
  -H "Authorization: Bearer prlc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6-27b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

What's next

Authentication — create an API key and make your first authenticated call.
Chat completions — the request and response format, parameters, and streaming.
Models — list available models and pick the right one.

Need GPUs too?

The same prlc_ key can also rent GPUs programmatically — browse available hardware, start a rental, and stop it over HTTP. One key and one balance cover both APIs, so you can build agents that call models and rent the GPUs to run them.

Try it without code

Want to test a model before you write any code? Open the Playground to chat with the available models in your browser, or head to the Console to create keys and watch your usage.

Inference is free right now while the network is in its early phase — build without metering. See Credits & Billing for how usage will be metered later.