Models
List the open LLMs available on the ParalonCloud Inference API and pick the right one for your workload.
ParalonCloud serves a selection of open large language models. Because the network is powered by independent GPU nodes, the exact catalog changes over time as nodes and models come online — so the list is best read live, not memorized.
List models programmatically
Call /v1/models to get the current catalog in OpenAI format:
curl https://paraloncloud.com/v1/models \
-H "Authorization: Bearer prlc_your_key_here"
{
"object": "list",
"data": [
{ "id": "qwen3-8b", "object": "model", "max_context": 32768 }
]
}
Use the id field as the model value in your chat completion requests.
Browse in the dashboard
You can also see the available models — along with their context length and the GPU memory they need — in the Inference console and try them interactively in the Playground.
Choosing a model
- Smaller models respond faster and cost less per token — good for high-volume or latency-sensitive tasks.
- Larger models are more capable on hard reasoning and long-context tasks, at higher cost and latency.
- Context length (
max_context) caps how much text — prompt plus reply — a single request can hold. Pick a model whose context comfortably fits your longest prompts.
Availability depends on which provider nodes are online, so a model may appear or disappear as the network shifts. Always read
/v1/modelsat runtime rather than hard-coding the catalog.