Skip to main content
flash env setup scaffolds a starter project into the current directory. A run is fully described by what lands on disk: an environment (your task and how it’s scored) and a config (how to train on it). Every file is plain text you can read, diff, and version-control. Rerunning is safe: any file that already exists is left untouched.
./
├── environment.py        # the task + reward (a Freesolo environment)
├── dataset/
│   └── train.jsonl       # a tiny starter dataset (input/output rows)
├── configs/
│   ├── sft.toml          # an SFT (supervised) training config
│   └── rl.toml           # a GRPO (RL) training config
└── TRAINING.md           # a playbook for the AI agent driving your runs
What it is: your environment, the single source of truth for what the model practices on and how it’s graded. It defines load_environment(), which returns a Freesolo EnvironmentSingleTurn (or EnvironmentMultiTurn) carrying a dataset and a score_response reward. This is the file you edit first.When Flash reads it: flash env push packages and uploads it; flash train --cost may import it locally to count training examples; and on every run the worker imports it and calls load_environment(**params). The scaffolded starter loads its rows from dataset/train.jsonl.See the scaffolded file in full — StarterEnv with build_prompt_messages and score_response — in Environments.
What it is: a tiny starter dataset of input/output rows that the scaffolded environment.py loads. Replace it with your real training rows before a real run. See Datasets.When Flash reads it: your environment.py reads it on the worker at run time (and locally when flash train --cost counts its rows; --dry-run validates only the config and does not read it); flash env push uploads the dataset/ folder with the environment.
dataset/train.jsonl
{"input":"What is 2 + 2?","output":"4"}
{"input":"What is 3 + 5?","output":"8"}
What it is: an SFT training config for supervised fine-tuning on the input/output pairs in your environment’s dataset. You set model, the [environment] id, and the [train] knobs (epochs, lora_rank); the training infrastructure and artifact storage are managed for you. Copy it per experiment.When Flash reads it: every flash train, --dry-run, and --cost parses this file into the resolved job spec.
configs/sft.toml
model = "Qwen/Qwen3.5-4B"
algorithm = "sft"

[environment]
id = ""   # paste the id returned by `flash env push --name my-env .`

[train]
epochs = 3
max_examples = 1000
lora_rank = 32
What it is: the same config shape with algorithm = "grpo", for GRPO (RL) training that optimizes against your environment’s score_response reward. Keep both and pick one at train time.When Flash reads it: same as sft.toml; pass it to flash train to run GRPO instead of SFT.
flash train configs/rl.toml
What it is: a playbook for the AI coding agent you point at this project — how to design the reward, what to read in a run’s output, and how to decide a run actually improved the model. It includes current CLI usage and common Flash issue mitigations.When it travels: if you publish the whole scaffolded folder, flash env push includes .md sidecars, so TRAINING.md can travel with the environment source in the Hub for humans and coding agents.

Packaging an environment as a folder

The scaffolded environment.py is enough to publish on its own. Once your task grows data files or helper modules, move it into a folder with environment.py at the root and publish the whole folder.
math/
├── environment.py        # defines load_environment()
├── helpers.py            # optional sibling modules, imported by environment.py
└── dataset/
    ├── train.jsonl       # input/output records
    └── eval.jsonl
What it is: any sibling Python modules your environment.py imports. Keep imports either from installed packages or from files in the folder you publish.When Flash reads it: uploaded with the folder by flash env push and importable on the worker.
What it is: sidecar data files (.jsonl, .json, .csv, and more) that your environment loads. See Datasets for the full list.When Flash reads it: your environment code reads it (for example via load_task_examples(...)); flash env push includes the dataset/ folder in the uploaded artifact.
dataset/train.jsonl
{"input":"What is 2 + 2?","output":"4"}
{"input":"What is 3 + 5?","output":"8"}