flash models is always the source of truth. The table below mirrors the
current catalog.
Catalog
Every catalog model supports both SFT and GRPO, and all are hybrid reasoning (“thinking”) capable.Model (model =) | Parameters | Algorithms | Reasoning |
|---|---|---|---|
Qwen/Qwen3.5-0.8B | 0.9B | SFT, GRPO | Hybrid |
openbmb/MiniCPM5-1B | 1.2B | SFT, GRPO | Hybrid |
Qwen/Qwen3.5-2B | 2.3B | SFT, GRPO | Hybrid |
Qwen/Qwen3.5-4B | 4.7B | SFT, GRPO | Hybrid |
Qwen/Qwen3.5-9B | 9.7B | SFT, GRPO | Hybrid |
Qwen/Qwen3.6-35B-A3B | 35B total / ~3B active | SFT, GRPO | Hybrid |
train.lora_rank = 32 is the safest choice across the catalog. If
you raise the rank, validate before a long run: Flash checks that the resulting
adapter can be deployed for the selected base model.
Serving prices
Serving is billed per token after deployment. Prompt and completion tokens have separate per-model rates, and cached prompt tokens use the model’s cached-input rate. The prices below are per 1M tokens; your Freesolo billing dashboard shows the charges you actually accrue.| Model | Prompt / 1M | Completion / 1M | Cached prompt / 1M |
|---|---|---|---|
Qwen/Qwen3.5-0.8B | $0.012 | $0.060 | $0.0024 |
openbmb/MiniCPM5-1B | $0.012 | $0.060 | $0.0024 |
Qwen/Qwen3.5-2B | $0.024 | $0.120 | $0.0048 |
Qwen/Qwen3.5-4B | $0.036 | $0.180 | $0.0072 |
Qwen/Qwen3.5-9B | $0.120 | $0.180 | $0.0240 |
Qwen/Qwen3.6-35B-A3B | $0.180 | $1.200 | $0.0600 |
Notes per model
Qwen/Qwen3.5-0.8B: the smallest Qwen3.5. Cheapest for smoke tests and fast iteration.openbmb/MiniCPM5-1B: an on-device-class small model on a standard Llama architecture.Qwen/Qwen3.5-2B: a small, capable step up when 0.8B underfits.Qwen/Qwen3.5-4B: a balanced starting point for most tasks.Qwen/Qwen3.5-9B: the largest dense catalog model, useful when quality matters more than the run cost.Qwen/Qwen3.6-35B-A3B: a Mixture-of-Experts checkpoint (35B total, ~3B active per token) and the largest model in the catalog.
The Qwen3.5 entries are text-only fine-tunes: the checkpoints are natively
multimodal, but Flash trains and serves the language model only.
Reasoning (“thinking”) models
Every catalog model is hybrid-reasoning: it can run with or without an explicit reasoning step. Turn reasoning on for a run with one line in your config:thinking field in the
configuration reference.
Choosing a model
Start small
Use
Qwen/Qwen3.5-0.8B or 2B to validate your setup and data
cheaply. Get a run working before you scale.Scale up for quality
Move to
Qwen/Qwen3.5-4B or 9B once the task is wired up
and you want stronger results. Larger models cost more per run.Pick a model, then train
Set
model in your config and submit a run.