Your First Forecast (Mock Data) #
This is the simpliest path to perform a forecast. All data is pre-committed, so there’s nothing to download or prepare. The walkthrough has three steps.
Prerequisites #
A working M4MD repo. [TODO: formalize this]
Step 0: Understand the mock data #
The mock dataset represents a single park unit (ELDO) with two vegetation strata (A and B), three sites per stratum, and three transects per site — surveyed annually from 2000 to 2025. Each transect records how many of 100 sample points hit a plant (y_hits out of n_points = 100). A single climate covariate, precipitation (ppt), declines gradually over the training period and drives the vegetation response.
There are three files, all pre-committed to assets/_data/:
| File | What it contains |
|---|---|
pg-hits.csv | Transect-level vegetation observations (one row per transect-year) |
pg-covariates.csv | Annual precipitation per site (one row per site-year) |
pg-covariates-scenarios.csv | Future precipitation under three climate scenarios × three GCM model runs |
You don’t need to generate the data — it’s already in the repo. The script that created it is at forecasting/getting-started/generate-mock-data.R if you want to see exactly how it was built or regenerate it.Training observations (pg-hits.csv)
#
Each row is one transect in one year. The model fits to y_hits as a binomial outcome.
| stratum | park_unit | event_year | transect | site | n_points | y_hits |
|---|---|---|---|---|---|---|
| A | ELDO | 2000 | 1 | A1 | 100 | 65 |
| A | ELDO | 2000 | 2 | A1 | 100 | 64 |
| A | ELDO | 2000 | 3 | A1 | 100 | 57 |
| … | … | … | … | … | … | … |
Climate covariates (pg-covariates.csv)
#
One precipitation value per site per year — shared across all transects at that site.
| park_unit | stratum | site | event_year | ppt |
|---|---|---|---|---|
| ELDO | A | A1 | 2000 | 476 |
| ELDO | A | A1 | 2001 | 500 |
| ELDO | A | A1 | 2002 | 518 |
| … | … | … | … | … |
Future scenarios (pg-covariates-scenarios.csv)
#
Three climate trajectories (continued decline, flat, increasing), each with three GCM model runs, covering 2026–2040.
| scenario_name | model_run_name | park_unit | stratum | site | event_year | ppt |
|---|---|---|---|---|---|---|
| continued_decline | Mock-GCM-1 | ELDO | A | A1 | 2026 | 380 |
| continued_decline | Mock-GCM-1 | ELDO | A | A1 | 2027 | 405 |
| flat | Mock-GCM-1 | ELDO | A | A1 | 2026 | 456 |
| … | … | … | … | … | … | … |
What the data looks like #
Vegetation hit rate tracks precipitation closely, with stratum A starting at a higher baseline (~73%) than stratum B (~35%). Both strata decline as precipitation falls.

The precipitation–vegetation relationship is what the model learns and uses to generate forecasts.

The three future scenarios bracket the range of plausible climate outcomes. Each has three GCM model runs (small offsets representing ensemble spread).

Step 1: Fit the Model #
The mock training data (assets/_data/pg-*.csv) are accompanying configuration files are already in this repo. Let’s take a look at them before fitting the pipeline.
Run the fitting pipeline to generate the model artifacts that the forecasting
pipeline will read:
Rscript analysis-pipeline.R assets/_config/M4MD/ELDO/mock-cover.yml \
--save-forecast-inputs
Step 2: Inspect the Output (Recommended) #
Before forecasting, it is worth browsing the artifacts the fitting step wrote:
assets/_output/M4MD/ELDO/mock-cover/<run-id>/04-forecast/forecast-inputs/
Step 3: Run the Forecast #
Rscript forecasting/forecast/forecast-pipeline.R \
--config forecasting/getting-started/mock-forecast-config.yaml