> ## Documentation Index
> Fetch the complete documentation index at: https://freesolo.co/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Environment model

> Your task as code: the dataset, interaction, and reward that make up the one part of a run you write.

An **environment** is a small Python module that packages everything Flash needs
to teach and grade your model: the data it practices on, how it interacts, and
how its answers are scored. It is the single source of truth for *what the model
learns* and *what counts as good*.

## What an environment packages

Every environment bundles three things behind one `load_environment()` entrypoint:

<CardGroup cols={3}>
  <Card title="A dataset" icon="table">
    The prompts your model practices on, with optional gold answers. Authored as
    `input`/`output` records. See [Datasets](/guides/datasets).
  </Card>

  <Card title="An interaction model" icon="comments">
    How the model engages: a single prompt and response, or a multi-turn
    exchange. This is the environment class you subclass.
  </Card>

  <Card title="A reward" icon="trophy">
    `score_response` looks at an answer and returns a `RewardResult` score. This
    score is the teacher.
  </Card>
</CardGroup>

The same environment drives **SFT** (learn from gold answers), **GRPO** (learn
from reward scores), **and eval**: you swap one line in the config and the
environment stays put.

For SFT, each row's `output` is the gold completion Flash trains on, appended
after the environment's initial episode so system prompts and tool transcripts
stay part of the example. See
[Datasets](/guides/datasets#message-shaped-sft-targets) for the `output` shapes
(a scalar answer, or a message trajectory for multi-turn and tool-use imitation).

## The one thing you write

You write `environment.py`; Flash owns the rest of the loop.

| You write                                                | Flash runs                                                                                        |
| -------------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| The dataset, interaction, and reward in `environment.py` | The training loop: sampling prompts, generating model attempts (rollouts), applying the algorithm |
| (nothing)                                                | Managed compute, batching, checkpointing, and auto-retry                                          |
| (nothing)                                                | A versioned Environments Hub and [per-token serving](/guides/deploy-and-chat#billing)             |

The quality of the environment's dataset and reward sets the ceiling on what
training can achieve. For the full split of responsibilities, see
[How Flash works](/how-flash-works).

## Where it sits in the training loop

The environment supplies two of the loop's steps:

<Steps>
  <Step title="Flash samples a prompt">
    The training loop pulls a prompt from your environment's dataset.
  </Step>

  <Step title="The model produces a rollout">
    The current model attempts an answer, following your environment's
    interaction model (single response, or several turns).
  </Step>

  <Step title="Your reward scores it">
    `score_response` returns a `RewardResult`. That score is the signal the
    algorithm learns from.
  </Step>
</Steps>

See [How Flash works](/how-flash-works#the-training-loop) for the algorithm side
of the loop.

## Single-turn and multi-turn

The interaction model is set by which base class you subclass:

* **`EnvironmentSingleTurn`:** prompt in, completion out, reward computed. Most
  tasks start here. If your prompt messages do not include a `system` message,
  the SDK prepends the run's prompt text as one.
* **`EnvironmentMultiTurn`:** for conversations or tool use, where the model takes
  several steps before the whole sequence (its trajectory) is scored. You
  implement the episode hooks — `start_episode` (opening messages),
  `step_episode` (react to each action and decide whether the episode continues),
  `max_episode_turns` (the bound), and `score_episode` (reward the trajectory).
  See [Multi-turn environments](/guides/environments#multi-turn-environments) for
  the full loop, an action protocol, and the stateless-step pattern.

A minimal single-turn environment:

```python environment.py theme={null}
from freesolo.datasets import TaskExample
from freesolo.environments import EnvironmentSingleTurn, RewardResult

class CustomEnv(EnvironmentSingleTurn):
    dataset = [{"input": "What is 2 + 2?", "output": "4"}]

    def build_prompt_messages(self, example: TaskExample, prompt_text: str):
        return [{"role": "user", "content": example.input}]

    def score_response(self, example: TaskExample, response_text: str) -> RewardResult:
        expected = str(example.output or "").strip()
        return RewardResult(score=1.0 if expected and expected in response_text else 0.0, threshold=1.0)

def load_environment(**kwargs) -> CustomEnv:
    return CustomEnv()
```

See [Environments](/guides/environments) for the full SDK: prompt builders,
loading dataset files, parameters, and secrets.

With `thinking = true`, `response_text` is the answer text by default; it also
exposes the separated reasoning trace and raw output when a reward needs them
(see [Environments](/guides/environments#use-the-sdk)).

## From local file to managed run

An environment is authored locally but has to be reachable by id when training
runs on managed infrastructure. The lifecycle is four steps:

<Steps>
  <Step title="Author it locally">
    Write `environment.py` with a `load_environment()` that returns a Freesolo
    environment.
  </Step>

  <Step title="Publish it">
    `flash env push --name my-env .` packages the folder and uploads it to the
    managed Environments Hub, which versions it and prints an id
    (`your-org/my-env`).
  </Step>

  <Step title="Reference it by id">
    Put that id in your config's `[environment] id`. Optional
    `[environment.params]` are passed to `load_environment(**params)`.
  </Step>

  <Step title="Flash loads it at run time">
    The worker installs your published environment, imports `environment.py`,
    and calls `load_environment(**params)` to drive the run.
  </Step>
</Steps>

## Next

<CardGroup cols={2}>
  <Card title="Build an environment" icon="flask" href="/guides/environments">
    The full SDK: author a dataset and reward, then publish it.
  </Card>

  <Card title="Datasets" icon="table" href="/guides/datasets">
    Package task records and data files inside an environment.
  </Card>

  <Card title="Explore the directory" icon="folder-tree" href="/directory-structure">
    Where the environment lives on disk and what Flash reads.
  </Card>

  <Card title="How Flash works" icon="diagram-project" href="/how-flash-works">
    The full run loop the environment plugs into.
  </Card>
</CardGroup>
