Documentation

Models

List the open LLMs available on the ParalonCloud Inference API and pick the right one for your workload.

ParalonCloud serves a selection of open large language models. Because the network is powered by independent GPU nodes, the exact catalog changes over time as nodes and models come online — so the list is best read live, not memorized.

List models programmatically

Call /v1/models to get the current catalog in OpenAI format:

curl https://paraloncloud.com/v1/models \
  -H "Authorization: Bearer prlc_your_key_here"
{
  "object": "list",
  "data": [
    { "id": "qwen3-8b", "object": "model", "max_context": 32768 }
  ]
}

Use the id field as the model value in your chat completion requests.

Browse in the dashboard

You can also see the available models — along with their context length and the GPU memory they need — in the Inference console and try them interactively in the Playground.

Choosing a model

  • Smaller models respond faster and cost less per token — good for high-volume or latency-sensitive tasks.
  • Larger models are more capable on hard reasoning and long-context tasks, at higher cost and latency.
  • Context length (max_context) caps how much text — prompt plus reply — a single request can hold. Pick a model whose context comfortably fits your longest prompts.

Availability depends on which provider nodes are online, so a model may appear or disappear as the network shifts. Always read /v1/models at runtime rather than hard-coding the catalog.