MaP Model Watch

Which model, at which reasoning level, for the work in front of you — across both families in use.

Answer two questions and this page returns a recommendation in the exact form a launch handoff needs. Beneath the picker sits what it rests on: what’s on the shelf, what each constraint actually costs, and how much of the published evidence anyone outside the vendor has checked.

01 · Pick a launch

What should I run this on?

Nothing here is a lookup table. Every answer is re-derived from the two questions, so the recommendation can always say which one fired — and so can you, when you override it.

02 · The two questions

What sets the dials

How effort is derived

Anti-anchoring

The step down

Deliberately not inputs

Three things that feel like they should move the dials and don’t. Each one was tried and rejected; the reasoning lives on the decision page.

03 · The shelf

What’s available, and what it actually costs you

On a subscription, per-token price is the number that binds least. The two that bind are the Fable weekly cap and how much context a Codex session can really hold — so those get columns, not footnotes.

04 · The evidence

What’s published, and who has checked it

Real published scores only. Where there is no number, the cell is empty rather than filled with an estimate — a gap you can see is more useful than a figure you can’t trust. Colour marks who stands behind each one.

05 · The effort dial

Two sliders, one logic

The tiers are named differently on each surface and stop at different places. The two questions map onto both the same way.

Conceptual — shape only, no measured values

Where effort buys something

Three shapes, not three datasets. On mechanical work the line is flat and already high — there is nothing for effort to buy. On build work it climbs to high and plateaus. On an open map it is still climbing at the top of the slider, which is the only place the top of the slider earns its cost. Model Watch 1.0 plotted this with a numeric axis and forty-five invented points; the shape was the only real content, so the numbers are gone.

06 · Recent movement

What changed

07 · Open questions

What we don’t know yet

Carried forward until a review resolves them. An open question here is a live reason to hold a recommendation loosely.