> ## Documentation Index
> Fetch the complete documentation index at: https://freesolo.co/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Configuration reference

> Every field in a Flash training config (TOML).

A run is described by a TOML config passed to `flash train`. The only required
fields are `model` and `environment.id`. `algorithm` defaults to `sft`, and
everything else has a sensible default. Validate any config locally with
`flash train config.toml --dry-run`.

```toml theme={null}
model = "Qwen/Qwen3.5-4B"
algorithm = "sft"

[environment]
id = "your-org/your-env"

[train]
epochs = 3
max_examples = 1000
lora_rank = 32
```

## Top level

| Key         | Type   | Default    | Description                                                                                                                                                                                                                              |
| ----------- | ------ | ---------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model`     | string | (required) | Base model to train a LoRA adapter on, e.g. `Qwen/Qwen3.5-4B`. See [`flash models`](/reference/models).                                                                                                                                  |
| `algorithm` | string | `sft`      | `sft` (supervised): learns by imitating example completions you provide. `grpo` (RL): learns from a reward you define. See [SFT vs GRPO](/guides/training#choose-sft-or-grpo).                                                           |
| `thinking`  | bool   | `false`    | Enable reasoning mode (thinking-capable models only). The reasoning trace shares the `max_tokens` budget with the answer, so raise `max_tokens` (and `max_length`) when enabling it — see [Troubleshooting](/reference/troubleshooting). |

## `[environment]`

| Key       | Type          | Default    | Description                                                                                                                                                                                                                                                                                   |
| --------- | ------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `id`      | string        | (required) | [Published Freesolo environment](/guides/environments#publish-it) id, `your-org/your-env`.                                                                                                                                                                                                    |
| `params`  | table         | `{}`       | Keyword arguments passed to the env's `load_environment(**kwargs)`. `split` is also honored by Flash for packaged env datasets — see [Datasets](/guides/datasets#load-sidecars): it selects `dataset/<split>.jsonl`, and a missing split file is an error (no silent `train.jsonl` fallback). |
| `pip`     | list\[string] | `[]`       | Extra third-party pip requirements the worker installs before importing your environment (an escape hatch for deps your `environment.py` imports).                                                                                                                                            |
| `secrets` | list\[string] | `[]`       | Environment variable names to forward to the worker as runtime secrets. Values are read from your shell, `.env`, or `.env.local` at submit time.                                                                                                                                              |

```toml theme={null}
[environment]
id = "your-org/your-env"
pip = ["openai>=1.0.0"]
secrets = ["SERVICE_API_KEY"]
```

Environment code reads the secret normally:

```python theme={null}
import os

service_api_key = os.environ["SERVICE_API_KEY"]
```

`[environment].pip` is the worker install path for reward and environment
dependencies. Do not rely on `pyproject.toml`, `requirements.txt`, or lockfiles
inside the published environment artifact for managed training installs. Those
files may describe your local development environment, but runtime packages
belong in this config list.

Never put secret values in `[environment.params]`; it becomes part of the run
spec. Every name you list under `[environment].secrets` must
resolve to a value in your shell, `.env`, or `.env.local` when you submit.
Reserved platform secret names such as `FREESOLO_API_KEY`, `HF_TOKEN`,
`GITHUB_TOKEN`, `RUN_ID`, and `HF_REPO` are owned by Flash.

## `[train]`

| Key                 | Type   | Default        | Description                                                                                                                                                                                                                                                                                                                                                 |
| ------------------- | ------ | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `lora_rank`         | int    | `32`           | LoRA rank; higher means a larger adapter with more capacity and cost. Use the default unless you have a reason to increase it; Flash validates deployability for the selected base model.                                                                                                                                                                   |
| `lora_alpha`        | int    | `64`           | LoRA alpha.                                                                                                                                                                                                                                                                                                                                                 |
| `learning_rate`     | float  | recipe default | Optimizer learning rate.                                                                                                                                                                                                                                                                                                                                    |
| `batch_size`        | int    | recipe default | Training batch size.                                                                                                                                                                                                                                                                                                                                        |
| `max_length`        | int    | recipe default | Max sequence length.                                                                                                                                                                                                                                                                                                                                        |
| `save_every`        | int    | recipe default | Checkpoint cadence (steps). Every save is uploaded: if a save fires while a previous upload is still in flight it is queued (newest wins) and uploaded next, and the final checkpoint is flushed at train end — saves are not skipped on a slow uplink.                                                                                                     |
| `init_from_adapter` | string | -              | Warm-start from an existing adapter. Use the run id that `flash status` prints, e.g. `init_from_adapter = "<run-id>"`. To warm-start from a saved checkpoint instead of the run-level adapter, use the exact short ref listed by `flash checkpoints`: `"<run-id>/step-N"`. Longer storage refs are rejected in config; use the short refs printed by Flash. |

### SFT-specific

| Key            | Type | Description                                                 |
| -------------- | ---- | ----------------------------------------------------------- |
| `epochs`       | int  | Number of epochs (alternative to GRPO `steps`).             |
| `max_steps`    | int  | cap on optimizer steps (`0` = no cap).                      |
| `max_examples` | int  | Truncate the training dataset to N examples (`0` = no cap). |

### GRPO-specific

| Key                            | Type          | Description                                                                                                 |
| ------------------------------ | ------------- | ----------------------------------------------------------------------------------------------------------- |
| `steps`                        | int           | Number of training steps (default 150).                                                                     |
| `group_size`                   | int           | Completions sampled per prompt.                                                                             |
| `temperature`                  | float         | Sampling temperature for rollouts.                                                                          |
| `max_tokens`                   | int           | Max tokens per rollout completion.                                                                          |
| `kl_penalty_coef`              | float         | Strength of the penalty that keeps the trained model from drifting too far from the base model.             |
| `advantage_clip`               | float         | Recorded for recipe parity but currently a no-op (TRL clips the importance ratio, not the advantage value). |
| `thinking_length_penalty_coef` | float         | Penalty on reasoning length.                                                                                |
| `stop_sequences`               | list\[string] | Stop sequences for generation.                                                                              |

## Managed infrastructure

Flash chooses the training resources, retry path, wall-clock limits,
checkpointing, and run artifact storage. Those platform-managed fields may
appear in resolved specs and run status for observability, but they are not
config knobs.

## `[worker_env]` (advanced)

`[worker_env]` passes non-secret string values into the worker environment and is
serialized into the run spec and artifacts. Use it only for harmless labels or
feature flags. Secret-looking keys and values are rejected; runtime secrets
belong in `[environment].secrets`.

```toml theme={null}
[worker_env]
FEATURE_FLAG = "enabled"
```

## `[wandb]`

Optional [Weights & Biases](https://wandb.ai) logging labels. These values are
non-secret; set `WANDB_API_KEY` in your local environment when submitting a run
to enable logging to your own W\&B account.

| Key        | Type   | Description        |
| ---------- | ------ | ------------------ |
| `project`  | string | W\&B project name. |
| `run_name` | string | W\&B run name.     |

## Overrides & composition

Any value can be set at submit time without editing the file. The override flag
is `--set` (repeatable, dotted keys):

```bash theme={null}
flash train config.toml --set train.steps=300 --set train.lora_rank=16
flash train base.toml --config overlay.toml      # deep-merge extra TOML
```
