Supported models - Freesolo Docs

Flash trains a LoRA adapter on top of a curated catalog of base models. You set the base model with one line in your config, and list the live catalog from the CLI:

flash models

model = "Qwen/Qwen3.5-4B"   # required — see `flash models` for the options

flash models is always the source of truth. The table below mirrors the current catalog.

Catalog

Every catalog model supports both SFT and GRPO, and all are hybrid reasoning (“thinking”) capable.

Model (`model =`)	Parameters	Algorithms	Reasoning
`Qwen/Qwen3.5-0.8B`	0.9B	SFT, GRPO	Hybrid
`openbmb/MiniCPM5-1B`	1.2B	SFT, GRPO	Hybrid
`Qwen/Qwen3.5-2B`	2.3B	SFT, GRPO	Hybrid
`Qwen/Qwen3.5-4B`	4.7B	SFT, GRPO	Hybrid
`Qwen/Qwen3.5-9B`	9.7B	SFT, GRPO	Hybrid
`Qwen/Qwen3.6-35B-A3B`	35B total / ~3B active	SFT, GRPO	Hybrid

The default train.lora_rank = 32 is the safest choice across the catalog. If you raise the rank, validate before a long run: Flash checks that the resulting adapter can be deployed for the selected base model.

Serving prices

Serving is billed per token after deployment. Prompt and completion tokens have separate per-model rates, and cached prompt tokens use the model’s cached-input rate. The prices below are per 1M tokens; your Freesolo billing dashboard shows the charges you actually accrue.

Model	Prompt / 1M	Completion / 1M	Cached prompt / 1M
`Qwen/Qwen3.5-0.8B`	$0.012	$0.060	$0.0024
`openbmb/MiniCPM5-1B`	$0.012	$0.060	$0.0024
`Qwen/Qwen3.5-2B`	$0.024	$0.120	$0.0048
`Qwen/Qwen3.5-4B`	$0.036	$0.180	$0.0072
`Qwen/Qwen3.5-9B`	$0.120	$0.180	$0.0240
`Qwen/Qwen3.6-35B-A3B`	$0.180	$1.200	$0.0600

The Cached prompt column is the rate for prompt tokens served from the prefix cache, which is automatic. See Billing for how prefix caching works and when it applies.

Notes per model

Qwen/Qwen3.5-0.8B: the smallest Qwen3.5. Cheapest for smoke tests and fast iteration.
openbmb/MiniCPM5-1B: an on-device-class small model on a standard Llama architecture.
Qwen/Qwen3.5-2B: a small, capable step up when 0.8B underfits.
Qwen/Qwen3.5-4B: a balanced starting point for most tasks.
Qwen/Qwen3.5-9B: the largest dense catalog model, useful when quality matters more than the run cost.
Qwen/Qwen3.6-35B-A3B: a Mixture-of-Experts checkpoint (35B total, ~3B active per token) and the largest model in the catalog.

The Qwen3.5 entries are text-only fine-tunes: the checkpoints are natively multimodal, but Flash trains and serves the language model only.

Reasoning (“thinking”) models

Every catalog model is hybrid-reasoning: it can run with or without an explicit reasoning step. Turn reasoning on for a run with one line in your config:

thinking = true

See the thinking field in the configuration reference.

Choosing a model

Start small

Use Qwen/Qwen3.5-0.8B or 2B to validate your setup and data cheaply. Get a run working before you scale.

Scale up for quality

Move to Qwen/Qwen3.5-4B or 9B once the task is wired up and you want stronger results. Larger models cost more per run.

Pick a model, then train

Set model in your config and submit a run.

​Catalog

​Serving prices

​Notes per model

​Reasoning (“thinking”) models

​Choosing a model

Start small

Scale up for quality

Pick a model, then train

Catalog

Serving prices

Notes per model

Reasoning (“thinking”) models

Choosing a model