Inference API
Run open LLMs through an OpenAI-compatible API, served on a decentralized network of GPU nodes.
The ParalonCloud Inference API lets you call open large language models over HTTP — no GPU to provision, no model to download. It's OpenAI-compatible, so any tool or SDK that speaks the OpenAI API works by changing two lines: the base URL and the key. Requests are routed to GPU nodes across the network that can serve the model you ask for.
Base URL
https://paraloncloud.com/v1
The API mirrors the OpenAI surface — /v1/chat/completions, /v1/models — and authenticates with a bearer key.
Quick start
Point the official OpenAI Python SDK at ParalonCloud:
from openai import OpenAI
client = OpenAI(
api_key="prlc_your_key_here",
base_url="https://paraloncloud.com/v1",
)
response = client.chat.completions.create(
model="qwen3-8b",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
Or with curl:
curl https://paraloncloud.com/v1/chat/completions \
-H "Authorization: Bearer prlc_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-8b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
What's next
- Authentication — create an API key and make your first authenticated call.
- Chat completions — the request and response format, parameters, and streaming.
- Models — list available models and pick the right one.
Try it without code
Want to test a model before you write any code? Open the Playground to chat with the available models in your browser, or head to the Console to create keys and watch your usage.
Inference is free right now while the network is in its early phase — build without metering. See Credits & Billing for how usage will be metered later.