flash env setup scaffolds a starter project into the current directory. A
run is fully described by what lands on disk: an environment (your task and
how it’s scored) and a config (how to train on it). Every file is plain text
you can read, diff, and version-control. Rerunning is safe: any file that already
exists is left untouched.
environment.py
environment.py
What it is: your environment, the single source of
truth for what the model practices on and how it’s graded. It defines
load_environment(), which returns a Freesolo EnvironmentSingleTurn (or
EnvironmentMultiTurn) carrying a dataset and a score_response reward. This
is the file you edit first.When Flash reads it: flash env push packages and uploads it;
flash train --cost may import it locally to count training examples; and on
every run the worker imports it and calls load_environment(**params). The scaffolded
starter loads its rows from dataset/train.jsonl.See the scaffolded file in full — StarterEnv with build_prompt_messages
and score_response — in Environments.dataset/train.jsonl
dataset/train.jsonl
What it is: a tiny starter dataset of
input/output rows that the
scaffolded environment.py loads. Replace it with your real training rows
before a real run. See Datasets.When Flash reads it: your environment.py reads it on the worker at run
time (and locally when flash train --cost counts its rows; --dry-run
validates only the config and does not read it); flash env push
uploads the dataset/ folder with the environment.dataset/train.jsonl
configs/sft.toml
configs/sft.toml
What it is: an SFT training config
for supervised fine-tuning on the
input/output pairs in your environment’s
dataset. You set model, the [environment] id, and the [train] knobs
(epochs, lora_rank); the
training infrastructure and artifact storage are managed for you. Copy
it per experiment.When Flash reads it: every flash train, --dry-run, and --cost parses
this file into the resolved job spec.configs/sft.toml
configs/rl.toml
configs/rl.toml
What it is: the same config shape with
algorithm = "grpo", for GRPO
(RL) training that optimizes against your environment’s score_response
reward. Keep both and pick one at train time.When Flash reads it: same as sft.toml; pass it to flash train to run
GRPO instead of SFT.TRAINING.md
TRAINING.md
What it is: a playbook for the AI coding agent you point at this project —
how to design the reward, what to read in a run’s output, and how to decide a
run actually improved the model. It includes current CLI usage and common
Flash issue mitigations.When it travels: if you publish the whole scaffolded folder,
flash env push includes .md sidecars, so
TRAINING.md can travel with the environment source in the Hub for humans
and coding agents.Packaging an environment as a folder
The scaffoldedenvironment.py is enough to publish on its own. Once your task
grows data files or helper modules, move it into a folder with
environment.py at the root and publish the whole folder.
helpers.py
helpers.py
What it is: any sibling Python modules your
environment.py imports. Keep
imports either from installed packages or from files in the folder you publish.When Flash reads it: uploaded with the folder by flash env push and
importable on the worker.dataset/
dataset/
What it is: sidecar data files (
.jsonl, .json, .csv, and more)
that your environment loads. See
Datasets for the full list.When Flash reads it: your environment code reads it (for example via
load_task_examples(...)); flash env push includes the dataset/ folder
in the uploaded artifact.dataset/train.jsonl