> ## Documentation Index
> Fetch the complete documentation index at: https://freesolo.co/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Supported models

> The base models Flash can fine-tune and serve, with sizes, reasoning, and token prices.

Flash trains a [**LoRA adapter**](/how-flash-works) on top of a curated catalog of base models. You
set the base model with one line in your config, and list the live catalog from
the CLI:

```bash theme={null}
flash models
```

```toml theme={null}
model = "Qwen/Qwen3.5-4B"   # required — see `flash models` for the options
```

`flash models` is always the source of truth. The table below mirrors the
current catalog.

## Catalog

Every catalog model supports [**both SFT and GRPO**](/guides/training#choose-sft-or-grpo), and all are hybrid reasoning
("thinking") capable.

| Model (`model =`)      |              Parameters | Algorithms | Reasoning |
| ---------------------- | ----------------------: | ---------- | --------- |
| `Qwen/Qwen3.5-0.8B`    |                    0.9B | SFT, GRPO  | Hybrid    |
| `openbmb/MiniCPM5-1B`  |                    1.2B | SFT, GRPO  | Hybrid    |
| `Qwen/Qwen3.5-2B`      |                    2.3B | SFT, GRPO  | Hybrid    |
| `Qwen/Qwen3.5-4B`      |                    4.7B | SFT, GRPO  | Hybrid    |
| `Qwen/Qwen3.5-9B`      |                    9.7B | SFT, GRPO  | Hybrid    |
| `Qwen/Qwen3.6-35B-A3B` | 35B total / \~3B active | SFT, GRPO  | Hybrid    |

The default `train.lora_rank = 32` is the safest choice across the catalog. If
you raise the rank, validate before a long run: Flash checks that the resulting
adapter can be deployed for the selected base model.

## Serving prices

Serving is billed per token after deployment. Prompt and completion tokens have
separate per-model rates, and cached prompt tokens use the model's cached-input
rate. The prices below are **per 1M tokens**; your [Freesolo billing dashboard](/platform#billing)
shows the charges you actually accrue.

| Model                  | Prompt / 1M | Completion / 1M | Cached prompt / 1M |
| ---------------------- | ----------: | --------------: | -----------------: |
| `Qwen/Qwen3.5-0.8B`    |     \$0.012 |         \$0.060 |           \$0.0024 |
| `openbmb/MiniCPM5-1B`  |     \$0.012 |         \$0.060 |           \$0.0024 |
| `Qwen/Qwen3.5-2B`      |     \$0.024 |         \$0.120 |           \$0.0048 |
| `Qwen/Qwen3.5-4B`      |     \$0.036 |         \$0.180 |           \$0.0072 |
| `Qwen/Qwen3.5-9B`      |     \$0.120 |         \$0.180 |           \$0.0240 |
| `Qwen/Qwen3.6-35B-A3B` |     \$0.180 |         \$1.200 |           \$0.0600 |

The **Cached prompt** column is the rate for prompt tokens served from the
**prefix cache**, which is automatic. See
[Billing](/guides/deploy-and-chat#billing) for how prefix caching works and when
it applies.

### Notes per model

* **`Qwen/Qwen3.5-0.8B`**: the smallest Qwen3.5. Cheapest for smoke tests and
  fast iteration.
* **`openbmb/MiniCPM5-1B`**: an on-device-class small model on a standard Llama
  architecture.
* **`Qwen/Qwen3.5-2B`**: a small, capable step up when 0.8B underfits.
* **`Qwen/Qwen3.5-4B`**: a balanced starting point for most tasks.
* **`Qwen/Qwen3.5-9B`**: the largest dense catalog model, useful when quality
  matters more than the run cost.
* **`Qwen/Qwen3.6-35B-A3B`**: a Mixture-of-Experts checkpoint (35B total, \~3B
  active per token) and the largest model in the catalog.

<Note>
  The Qwen3.5 entries are **text-only** fine-tunes: the checkpoints are natively
  multimodal, but Flash trains and serves the language model only.
</Note>

## Reasoning ("thinking") models

Every catalog model is **hybrid-reasoning**: it can run with or without an
explicit reasoning step. Turn reasoning on for a run with one line in your
config:

```toml theme={null}
thinking = true
```

See the [`thinking` field](/reference/configuration#top-level) in the
configuration reference.

## Choosing a model

<CardGroup cols={2}>
  <Card title="Start small" icon="feather">
    Use `Qwen/Qwen3.5-0.8B` or `2B` to validate your setup and data
    cheaply. Get a run working before you scale.
  </Card>

  <Card title="Scale up for quality" icon="arrow-up-right-dots">
    Move to `Qwen/Qwen3.5-4B` or `9B` once the task is wired up
    and you want stronger results. Larger models cost more per run.
  </Card>
</CardGroup>

<Card title="Pick a model, then train" icon="dumbbell" href="/guides/training#pick-a-base-model">
  Set `model` in your config and submit a run.
</Card>
