Skip to main content
Flash trains a LoRA adapter on top of a curated catalog of base models. You set the base model with one line in your config, and list the live catalog from the CLI:
flash models
model = "Qwen/Qwen3.5-4B"   # required — see `flash models` for the options
flash models is always the source of truth. The table below mirrors the current catalog.

Catalog

Every catalog model supports both SFT and GRPO, and all are hybrid reasoning (“thinking”) capable.
Model (model =)ParametersAlgorithmsReasoning
Qwen/Qwen3.5-0.8B0.9BSFT, GRPOHybrid
openbmb/MiniCPM5-1B1.2BSFT, GRPOHybrid
Qwen/Qwen3.5-2B2.3BSFT, GRPOHybrid
Qwen/Qwen3.5-4B4.7BSFT, GRPOHybrid
Qwen/Qwen3.5-9B9.7BSFT, GRPOHybrid
Qwen/Qwen3.6-35B-A3B35B total / ~3B activeSFT, GRPOHybrid
The default train.lora_rank = 32 is the safest choice across the catalog. If you raise the rank, validate before a long run: Flash checks that the resulting adapter can be deployed for the selected base model.

Serving prices

Serving is billed per token after deployment. Prompt and completion tokens have separate per-model rates, and cached prompt tokens use the model’s cached-input rate. The prices below are per 1M tokens; your Freesolo billing dashboard shows the charges you actually accrue.
ModelPrompt / 1MCompletion / 1MCached prompt / 1M
Qwen/Qwen3.5-0.8B$0.012$0.060$0.0024
openbmb/MiniCPM5-1B$0.012$0.060$0.0024
Qwen/Qwen3.5-2B$0.024$0.120$0.0048
Qwen/Qwen3.5-4B$0.036$0.180$0.0072
Qwen/Qwen3.5-9B$0.120$0.180$0.0240
Qwen/Qwen3.6-35B-A3B$0.180$1.200$0.0600
The Cached prompt column is the rate for prompt tokens served from the prefix cache, which is automatic. See Billing for how prefix caching works and when it applies.

Notes per model

  • Qwen/Qwen3.5-0.8B: the smallest Qwen3.5. Cheapest for smoke tests and fast iteration.
  • openbmb/MiniCPM5-1B: an on-device-class small model on a standard Llama architecture.
  • Qwen/Qwen3.5-2B: a small, capable step up when 0.8B underfits.
  • Qwen/Qwen3.5-4B: a balanced starting point for most tasks.
  • Qwen/Qwen3.5-9B: the largest dense catalog model, useful when quality matters more than the run cost.
  • Qwen/Qwen3.6-35B-A3B: a Mixture-of-Experts checkpoint (35B total, ~3B active per token) and the largest model in the catalog.
The Qwen3.5 entries are text-only fine-tunes: the checkpoints are natively multimodal, but Flash trains and serves the language model only.

Reasoning (“thinking”) models

Every catalog model is hybrid-reasoning: it can run with or without an explicit reasoning step. Turn reasoning on for a run with one line in your config:
thinking = true
See the thinking field in the configuration reference.

Choosing a model

Start small

Use Qwen/Qwen3.5-0.8B or 2B to validate your setup and data cheaply. Get a run working before you scale.

Scale up for quality

Move to Qwen/Qwen3.5-4B or 9B once the task is wired up and you want stronger results. Larger models cost more per run.

Pick a model, then train

Set model in your config and submit a run.