flash train. The only required
fields are model and environment.id. algorithm defaults to sft, and
everything else has a sensible default. Validate any config locally with
flash train config.toml --dry-run.
Top level
| Key | Type | Default | Description |
|---|---|---|---|
model | string | (required) | Base model to train a LoRA adapter on, e.g. Qwen/Qwen3.5-4B. See flash models. |
algorithm | string | sft | sft (supervised): learns by imitating example completions you provide. grpo (RL): learns from a reward you define. See SFT vs GRPO. |
thinking | bool | false | Enable reasoning mode (thinking-capable models only). The reasoning trace shares the max_tokens budget with the answer, so raise max_tokens (and max_length) when enabling it — see Troubleshooting. |
[environment]
| Key | Type | Default | Description |
|---|---|---|---|
id | string | (required) | Published Freesolo environment id, your-org/your-env. |
params | table | {} | Keyword arguments passed to the env’s load_environment(**kwargs). split is also honored by Flash for packaged env datasets — see Datasets: it selects dataset/<split>.jsonl, and a missing split file is an error (no silent train.jsonl fallback). |
pip | list[string] | [] | Extra third-party pip requirements the worker installs before importing your environment (an escape hatch for deps your environment.py imports). |
secrets | list[string] | [] | Environment variable names to forward to the worker as runtime secrets. Values are read from your shell, .env, or .env.local at submit time. |
[environment].pip is the worker install path for reward and environment
dependencies. Do not rely on pyproject.toml, requirements.txt, or lockfiles
inside the published environment artifact for managed training installs. Those
files may describe your local development environment, but runtime packages
belong in this config list.
Never put secret values in [environment.params]; it becomes part of the run
spec. Every name you list under [environment].secrets must
resolve to a value in your shell, .env, or .env.local when you submit.
Reserved platform secret names such as FREESOLO_API_KEY, HF_TOKEN,
GITHUB_TOKEN, RUN_ID, and HF_REPO are owned by Flash.
[train]
| Key | Type | Default | Description |
|---|---|---|---|
lora_rank | int | 32 | LoRA rank; higher means a larger adapter with more capacity and cost. Use the default unless you have a reason to increase it; Flash validates deployability for the selected base model. |
lora_alpha | int | 64 | LoRA alpha. |
learning_rate | float | recipe default | Optimizer learning rate. |
batch_size | int | recipe default | Training batch size. |
max_length | int | recipe default | Max sequence length. |
save_every | int | recipe default | Checkpoint cadence (steps). Every save is uploaded: if a save fires while a previous upload is still in flight it is queued (newest wins) and uploaded next, and the final checkpoint is flushed at train end — saves are not skipped on a slow uplink. |
init_from_adapter | string | - | Warm-start from an existing adapter. Use the run id that flash status prints, e.g. init_from_adapter = "<run-id>". To warm-start from a saved checkpoint instead of the run-level adapter, use the exact short ref listed by flash checkpoints: "<run-id>/step-N". Longer storage refs are rejected in config; use the short refs printed by Flash. |
SFT-specific
| Key | Type | Description |
|---|---|---|
epochs | int | Number of epochs (alternative to GRPO steps). |
max_steps | int | cap on optimizer steps (0 = no cap). |
max_examples | int | Truncate the training dataset to N examples (0 = no cap). |
GRPO-specific
| Key | Type | Description |
|---|---|---|
steps | int | Number of training steps (default 150). |
group_size | int | Completions sampled per prompt. |
temperature | float | Sampling temperature for rollouts. |
max_tokens | int | Max tokens per rollout completion. |
kl_penalty_coef | float | Strength of the penalty that keeps the trained model from drifting too far from the base model. |
advantage_clip | float | Recorded for recipe parity but currently a no-op (TRL clips the importance ratio, not the advantage value). |
thinking_length_penalty_coef | float | Penalty on reasoning length. |
stop_sequences | list[string] | Stop sequences for generation. |
Managed infrastructure
Flash chooses the training resources, retry path, wall-clock limits, checkpointing, and run artifact storage. Those platform-managed fields may appear in resolved specs and run status for observability, but they are not config knobs.[worker_env] (advanced)
[worker_env] passes non-secret string values into the worker environment and is
serialized into the run spec and artifacts. Use it only for harmless labels or
feature flags. Secret-looking keys and values are rejected; runtime secrets
belong in [environment].secrets.
[wandb]
Optional Weights & Biases logging labels. These values are
non-secret; set WANDB_API_KEY in your local environment when submitting a run
to enable logging to your own W&B account.
| Key | Type | Description |
|---|---|---|
project | string | W&B project name. |
run_name | string | W&B run name. |
Overrides & composition
Any value can be set at submit time without editing the file. The override flag is--set (repeatable, dotted keys):