uas-patterns · PIE intelligence

Forecast Accountability

PIE issues confident probabilistic forecasts every day. This page asks the uncomfortable question back: how many does it actually grade — and is it calibrated? Reads prediction_outcomes live from the intel API; everything below is computed in the browser from that payload.

Descriptive accountability metric — not a forecast

This page measures the engine’s self-grading coverage and calibration, not future events. A high grade-rate means more forecasts were checked against intel evidence; it is not a claim about what will happen. Calibration figures on a small graded set are indicative, not conclusive.

—Resolved

—Grade rate

—Mass unverified

—Brier (graded)

The accountability funnel

How many resolved predictions survive to a real verdict (confirmed/refuted).

Resolved calls — the actual verdicts

Every graded prediction: what PIE called, the confidence it assigned, and the open-source evidence that confirmed or refuted it. Newest first.

Why predictions go ungraded

Most losses are evidence-matching failures (unresolvable / legacy expired) or ambiguous partial signal — not wrong calls.

Grading coverage by impact tier

Graded (green) vs ungraded (red), split by the stakes of the call.

Where the engine places its confidence

Histogram of predicted probability across all resolved forecasts.

Reliability diagram

Each point is a probability bucket: x = predicted, y = observed frequency. Points should sit on the dashed line. Bubble size = sample count.

Grading rate by domain

Where forecasts go to die — volume vs. how often they close.

Domain	Total	Graded	Rate

Loading prediction outcomes…

Source: prediction_outcomes via /api/data, generated by pipeline/prediction_resolver.py. The resolver grades each prediction by matching it against the intel article DB; the grade_rate it emits is the headline number here.