Mock data walkthrough

Your First Forecast (Mock Data) #

This is the simpliest path to perform a forecast. All data is pre-committed, so there’s nothing to download or prepare. The walkthrough has three steps.

Prerequisites #

A working M4MD repo. [TODO: formalize this]

Step 0: Understand the mock data #

The mock dataset represents a single park unit (ELDO) with two vegetation strata (A and B), three sites per stratum, and three transects per site — surveyed annually from 2000 to 2025. Each transect records how many of 100 sample points hit a plant (y_hits out of n_points = 100). A single climate covariate, precipitation (ppt), declines gradually over the training period and drives the vegetation response.

There are three files, all pre-committed to assets/_data/:

FileWhat it contains
pg-hits.csvTransect-level vegetation observations (one row per transect-year)
pg-covariates.csvAnnual precipitation per site (one row per site-year)
pg-covariates-scenarios.csvFuture precipitation under three climate scenarios × three GCM model runs
You don’t need to generate the data — it’s already in the repo. The script that created it is at forecasting/getting-started/generate-mock-data.R if you want to see exactly how it was built or regenerate it.

Training observations (pg-hits.csv) #

Each row is one transect in one year. The model fits to y_hits as a binomial outcome.

stratumpark_unitevent_yeartransectsiten_pointsy_hits
AELDO20001A110065
AELDO20002A110064
AELDO20003A110057

Climate covariates (pg-covariates.csv) #

One precipitation value per site per year — shared across all transects at that site.

park_unitstratumsiteevent_yearppt
ELDOAA12000476
ELDOAA12001500
ELDOAA12002518

Future scenarios (pg-covariates-scenarios.csv) #

Three climate trajectories (continued decline, flat, increasing), each with three GCM model runs, covering 2026–2040.

scenario_namemodel_run_namepark_unitstratumsiteevent_yearppt
continued_declineMock-GCM-1ELDOAA12026380
continued_declineMock-GCM-1ELDOAA12027405
flatMock-GCM-1ELDOAA12026456

What the data looks like #

Vegetation hit rate tracks precipitation closely, with stratum A starting at a higher baseline (~73%) than stratum B (~35%). Both strata decline as precipitation falls.

Vegetation hit rate by stratum over time, with precipitation overlaid

The precipitation–vegetation relationship is what the model learns and uses to generate forecasts.

Hit rate vs. precipitation scatter plot

The three future scenarios bracket the range of plausible climate outcomes. Each has three GCM model runs (small offsets representing ensemble spread).

Climate scenarios: historical and three future trajectories

Step 1: Fit the Model #

The mock training data (assets/_data/pg-*.csv) are accompanying configuration files are already in this repo. Let’s take a look at them before fitting the pipeline. Run the fitting pipeline to generate the model artifacts that the forecasting pipeline will read:

Rscript analysis-pipeline.R assets/_config/M4MD/ELDO/mock-cover.yml \
  --save-forecast-inputs

Before forecasting, it is worth browsing the artifacts the fitting step wrote:

assets/_output/M4MD/ELDO/mock-cover/<run-id>/04-forecast/forecast-inputs/

Step 3: Run the Forecast #

Rscript forecasting/forecast/forecast-pipeline.R \
  --config forecasting/getting-started/mock-forecast-config.yaml

Reading Your Outputs #