Skip to main content
A run is described by a TOML config passed to flash train. The only required fields are model and environment.id. algorithm defaults to sft, and everything else has a sensible default. Validate any config locally with flash train config.toml --dry-run.
model = "Qwen/Qwen3.5-4B"
algorithm = "sft"

[environment]
id = "your-org/your-env"

[train]
epochs = 3
max_examples = 1000
lora_rank = 32

Top level

KeyTypeDefaultDescription
modelstring(required)Base model to train a LoRA adapter on, e.g. Qwen/Qwen3.5-4B. See flash models.
algorithmstringsftsft (supervised): learns by imitating example completions you provide. grpo (RL): learns from a reward you define. See SFT vs GRPO.
thinkingboolfalseEnable reasoning mode (thinking-capable models only). The reasoning trace shares the max_tokens budget with the answer, so raise max_tokens (and max_length) when enabling it — see Troubleshooting.

[environment]

KeyTypeDefaultDescription
idstring(required)Published Freesolo environment id, your-org/your-env.
paramstable{}Keyword arguments passed to the env’s load_environment(**kwargs). split is also honored by Flash for packaged env datasets — see Datasets: it selects dataset/<split>.jsonl, and a missing split file is an error (no silent train.jsonl fallback).
piplist[string][]Extra third-party pip requirements the worker installs before importing your environment (an escape hatch for deps your environment.py imports).
secretslist[string][]Environment variable names to forward to the worker as runtime secrets. Values are read from your shell, .env, or .env.local at submit time.
[environment]
id = "your-org/your-env"
pip = ["openai>=1.0.0"]
secrets = ["SERVICE_API_KEY"]
Environment code reads the secret normally:
import os

service_api_key = os.environ["SERVICE_API_KEY"]
[environment].pip is the worker install path for reward and environment dependencies. Do not rely on pyproject.toml, requirements.txt, or lockfiles inside the published environment artifact for managed training installs. Those files may describe your local development environment, but runtime packages belong in this config list. Never put secret values in [environment.params]; it becomes part of the run spec. Every name you list under [environment].secrets must resolve to a value in your shell, .env, or .env.local when you submit. Reserved platform secret names such as FREESOLO_API_KEY, HF_TOKEN, GITHUB_TOKEN, RUN_ID, and HF_REPO are owned by Flash.

[train]

KeyTypeDefaultDescription
lora_rankint32LoRA rank; higher means a larger adapter with more capacity and cost. Use the default unless you have a reason to increase it; Flash validates deployability for the selected base model.
lora_alphaint64LoRA alpha.
learning_ratefloatrecipe defaultOptimizer learning rate.
batch_sizeintrecipe defaultTraining batch size.
max_lengthintrecipe defaultMax sequence length.
save_everyintrecipe defaultCheckpoint cadence (steps). Every save is uploaded: if a save fires while a previous upload is still in flight it is queued (newest wins) and uploaded next, and the final checkpoint is flushed at train end — saves are not skipped on a slow uplink.
init_from_adapterstring-Warm-start from an existing adapter. Use the run id that flash status prints, e.g. init_from_adapter = "<run-id>". To warm-start from a saved checkpoint instead of the run-level adapter, use the exact short ref listed by flash checkpoints: "<run-id>/step-N". Longer storage refs are rejected in config; use the short refs printed by Flash.

SFT-specific

KeyTypeDescription
epochsintNumber of epochs (alternative to GRPO steps).
max_stepsintcap on optimizer steps (0 = no cap).
max_examplesintTruncate the training dataset to N examples (0 = no cap).

GRPO-specific

KeyTypeDescription
stepsintNumber of training steps (default 150).
group_sizeintCompletions sampled per prompt.
temperaturefloatSampling temperature for rollouts.
max_tokensintMax tokens per rollout completion.
kl_penalty_coeffloatStrength of the penalty that keeps the trained model from drifting too far from the base model.
advantage_clipfloatRecorded for recipe parity but currently a no-op (TRL clips the importance ratio, not the advantage value).
thinking_length_penalty_coeffloatPenalty on reasoning length.
stop_sequenceslist[string]Stop sequences for generation.

Managed infrastructure

Flash chooses the training resources, retry path, wall-clock limits, checkpointing, and run artifact storage. Those platform-managed fields may appear in resolved specs and run status for observability, but they are not config knobs.

[worker_env] (advanced)

[worker_env] passes non-secret string values into the worker environment and is serialized into the run spec and artifacts. Use it only for harmless labels or feature flags. Secret-looking keys and values are rejected; runtime secrets belong in [environment].secrets.
[worker_env]
FEATURE_FLAG = "enabled"

[wandb]

Optional Weights & Biases logging labels. These values are non-secret; set WANDB_API_KEY in your local environment when submitting a run to enable logging to your own W&B account.
KeyTypeDescription
projectstringW&B project name.
run_namestringW&B run name.

Overrides & composition

Any value can be set at submit time without editing the file. The override flag is --set (repeatable, dotted keys):
flash train config.toml --set train.steps=300 --set train.lora_rank=16
flash train base.toml --config overlay.toml      # deep-merge extra TOML