> ## Documentation Index > Fetch the complete documentation index at: https://freesolo.co/docs/llms.txt > Use this file to discover all available pages before exploring further. # Environment model > Your task as code: the dataset, interaction, and reward that make up the one part of a run you write. An **environment** is a small Python module that packages everything Flash needs to teach and grade your model: the data it practices on, how it interacts, and how its answers are scored. It is the single source of truth for *what the model learns* and *what counts as good*. ## What an environment packages Every environment bundles three things behind one `load_environment()` entrypoint: The prompts your model practices on, with optional gold answers. Authored as `input`/`output` records. See [Datasets](/guides/datasets). How the model engages: a single prompt and response, or a multi-turn exchange. This is the environment class you subclass. `score_response` looks at an answer and returns a `RewardResult` score. This score is the teacher. The same environment drives **SFT** (learn from gold answers), **GRPO** (learn from reward scores), **and eval**: you swap one line in the config and the environment stays put. For SFT, each row's `output` is the gold completion Flash trains on, appended after the environment's initial episode so system prompts and tool transcripts stay part of the example. See [Datasets](/guides/datasets#message-shaped-sft-targets) for the `output` shapes (a scalar answer, or a message trajectory for multi-turn and tool-use imitation). ## The one thing you write You write `environment.py`; Flash owns the rest of the loop. | You write | Flash runs | | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------- | | The dataset, interaction, and reward in `environment.py` | The training loop: sampling prompts, generating model attempts (rollouts), applying the algorithm | | (nothing) | Managed compute, batching, checkpointing, and auto-retry | | (nothing) | A versioned Environments Hub and [per-token serving](/guides/deploy-and-chat#billing) | The quality of the environment's dataset and reward sets the ceiling on what training can achieve. For the full split of responsibilities, see [How Flash works](/how-flash-works). ## Where it sits in the training loop The environment supplies two of the loop's steps: The training loop pulls a prompt from your environment's dataset. The current model attempts an answer, following your environment's interaction model (single response, or several turns). `score_response` returns a `RewardResult`. That score is the signal the algorithm learns from. See [How Flash works](/how-flash-works#the-training-loop) for the algorithm side of the loop. ## Single-turn and multi-turn The interaction model is set by which base class you subclass: * **`EnvironmentSingleTurn`:** prompt in, completion out, reward computed. Most tasks start here. If your prompt messages do not include a `system` message, the SDK prepends the run's prompt text as one. * **`EnvironmentMultiTurn`:** for conversations or tool use, where the model takes several steps before the whole sequence (its trajectory) is scored. You implement the episode hooks — `start_episode` (opening messages), `step_episode` (react to each action and decide whether the episode continues), `max_episode_turns` (the bound), and `score_episode` (reward the trajectory). See [Multi-turn environments](/guides/environments#multi-turn-environments) for the full loop, an action protocol, and the stateless-step pattern. A minimal single-turn environment: ```python environment.py theme={null} from freesolo.datasets import TaskExample from freesolo.environments import EnvironmentSingleTurn, RewardResult class CustomEnv(EnvironmentSingleTurn): dataset = [{"input": "What is 2 + 2?", "output": "4"}] def build_prompt_messages(self, example: TaskExample, prompt_text: str): return [{"role": "user", "content": example.input}] def score_response(self, example: TaskExample, response_text: str) -> RewardResult: expected = str(example.output or "").strip() return RewardResult(score=1.0 if expected and expected in response_text else 0.0, threshold=1.0) def load_environment(**kwargs) -> CustomEnv: return CustomEnv() ``` See [Environments](/guides/environments) for the full SDK: prompt builders, loading dataset files, parameters, and secrets. With `thinking = true`, `response_text` is the answer text by default; it also exposes the separated reasoning trace and raw output when a reward needs them (see [Environments](/guides/environments#use-the-sdk)). ## From local file to managed run An environment is authored locally but has to be reachable by id when training runs on managed infrastructure. The lifecycle is four steps: Write `environment.py` with a `load_environment()` that returns a Freesolo environment. `flash env push --name my-env .` packages the folder and uploads it to the managed Environments Hub, which versions it and prints an id (`your-org/my-env`). Put that id in your config's `[environment] id`. Optional `[environment.params]` are passed to `load_environment(**params)`. The worker installs your published environment, imports `environment.py`, and calls `load_environment(**params)` to drive the run. ## Next The full SDK: author a dataset and reward, then publish it. Package task records and data files inside an environment. Where the environment lives on disk and what Flash reads. The full run loop the environment plugs into.