Training - Freesolo Docs

A training run is described by a single TOML config and submitted with flash train. Flash runs the job on managed infrastructure, supervises it, and streams checkpoints and logs back to you. This guide covers the config and the run lifecycle; see the configuration reference for every field.

Pick a base model

Flash trains a LoRA adapter on top of a supported base model. List the catalog of base model ids and their parameter sizes (see Supported models for algorithms, reasoning, and pricing):

flash models

Set your choice at the top of the config:

model = "Qwen/Qwen3.5-4B"

Choose SFT or GRPO

algorithm = "sft"

sft: supervised fine-tuning, for when you already have the answers. The model imitates the prompt/answer pairs in your environment’s dataset.
grpo: reinforcement learning, for when there’s no fixed answer to copy. Your environment’s reward scores each completion.

Both are driven by the same environment. See how Flash works for the difference.

Anatomy of a config

model = "Qwen/Qwen3.5-4B"
algorithm = "sft"
# thinking = true        # opt into reasoning mode (thinking-capable models only)

[environment]
id = "your-org/your-env"

[train]
epochs = 3                           # SFT is epoch-driven; GRPO is step-driven (set steps = N instead)
max_examples = 1000                  # SFT: rows to train on (set to your dataset size)
lora_rank = 32
lora_alpha = 64
# learning_rate = 1e-4
# batch_size = 8
# group_size = 8                     # GRPO: completions sampled per prompt

[wandb]
# project = "my-project"             # optional Weights & Biases logging

Infrastructure is managed

You choose the model, algorithm, environment, and [train] settings. Flash chooses the training resources, retry path, checkpointing, and artifact storage for the run.

Validate before you submit

--dry-run parses and validates the config locally and prints the resolved job spec:

flash train config.toml --dry-run

Submit the run

flash train config.toml

By default flash train follows the logs until the run finishes. Useful flags:

Flag	Effect
`--background`	Submit and return immediately instead of following logs
`--dry-run`	Validate locally without submitting
`--cost`	Print the pre-flight USD cost and exit (no submit)
`--set key=value`	Override a config value (repeatable)
`--config other.toml`	Deep-merge additional TOML for config composition (repeatable)

# override values at submit time
flash train config.toml --set train.steps=300 --set train.lora_rank=16

Cost and billing

Flash checks your org’s prepaid balance against the pre-flight estimate before a run is submitted, then bills a successful run at the quoted Flash cost; a run cancelled after training starts is repriced to the steps it reached, and setup time is reported but not billed. Preview the estimate before you submit:

flash train config.toml --cost

See Cost and billing for what affects cost, how to read the preview output, and how cancellations are repriced.

Monitor a run

Ctrl-C during flash train just detaches you. The run keeps going on Freesolo.

flash runs                # all your runs: state, cost, model
flash status <run-id>     # status JSON, including the cost record
flash status <run-id> --follow  # poll status until completion, without replaying logs
flash log <run-id>              # print the full log snapshot
flash log <run-id> --follow     # stream logs until completion
flash cancel <run-id>     # cancel a run

After training

When a run reaches done, serve it with flash deploy and talk to it with flash chat.

​Pick a base model

​Choose SFT or GRPO

​Anatomy of a config

​Infrastructure is managed

​Validate before you submit

​Submit the run

​Cost and billing

​Monitor a run

​After training