Home / Research / Evidence base
The graded evidence base
These are the studies behind Green Life — 31 of them, across instruction, cognition, self-regulation, and assessment. Each carries a grade for evidence quality, the study and sample, the measured effect, and why we use it. We discount effects for the real world ("the Voltage Drop") and say so where the research is mixed.
Grades (A+ down) weigh study design, replication, and bias — the bar medicine uses — not the prestige of the source. An A− means strong, well-replicated evidence honestly discounted for classroom reality. Drawn from the Engine of Trust research appendix.
Students of CGI-trained teachers outperformed controls in number facts, problem solving, and confidence. Long-term effect grows with grade: g≈0.02 primary → g≈0.16 intermediate, five years out. What Works Clearinghouse: moderate evidence. Honest scope: CGI is teacher PD, not a script — effect depends on sustained training (a Voltage Drop).
Why we use it: We start math with what the child already knows and build on how they actually think — assessment and teaching become one act.
Carpenter, Fennema, Peterson, Chiang & Loef 1989, American Educational Research Journal 26(4) 499–531 (RCT) · Schoen, Rhoads, Perez, Tazaz & Secada 2024, Journal of Research on Educational Effectiveness (5-year follow-up) · Randomized controlled trial (40 first-grade teachers) + 22-school randomized PD offer tracked 5 years
Explicit, systematic phonics has a moderate, well-replicated benefit (d≈0.41; ~0.55 when begun in kindergarten) and is IDA's recommended approach for early/struggling readers. Honest scope: the OG *brand* itself shows a positive but non-significant effect (ES≈0.22, p=.40) and isn't proven superior to other structured-phonics programs — so we frame it as 'Structured Literacy in the OG tradition.'
Why we use it: We teach the sound–symbol code of English deliberately, multisensory, and earliest — the part of the evidence that is strongest.
National Reading Panel 2000 (NICHD) — systematic phonics · Stevens, Austin, Moore, Scammacca, Boucher & Vaughn 2021, Exceptional Children 87(4) — OG meta-analysis · International Dyslexia Association — Structured Literacy · NRP: 38 high-quality studies / 66 comparisons · Stevens: meta-analysis of OG interventions
Elementary read-aloud-with-Socratic-discussion studies show rising critical-thinking behaviors over time; metacognitive instruction d≈0.69 in Hattie's synthesis — a broad synthesis figure cited as supporting context, not a block-specific claim (discounted for Voltage Drop)
Why we use it: Question-before-answer activates the reasoning network and pairs naturally with read-aloud — which is exactly why the afternoon block fuses Socratic discussion with the read-aloud text.
Socratic/dialogic discussion + metacognition research; Hattie metacognitive-strategies synthesis · Elementary studies (read-aloud + Socratic discussion); Hattie: 2,100+ meta-analyses, ~300M students
Guided play beats direct instruction on early math (g=0.24), shape knowledge (g=0.63) and task-switching (g=0.40), and beats free play on spatial vocabulary (g=0.93) (Skene 2022). Spacing learning across the day beats massing it (d=0.54), and the advantage grows for long-term retention. A word that takes ~10–20+ rote repetitions can be learned in ~2–8 playful multimodal encounters — each carries sound, image, action and social meaning at once. Honest scope (Voltage Drop): the win comes from GUIDED play — Alfieri 2011 found UNGUIDED discovery does worse than explicit instruction (d=−0.38). Effect sizes are real but modest, not a fixed multiplier — there is no credible hundred-fold figure.
Why we use it: For a fixed minute of instruction, guided play returns more durable learning: fewer repetitions to mastery, lower mental load (spacing), stronger transfer. This is the engine behind a play-woven day, and why focus blocks are playful and short rather than long and didactic. It is the Concrete and Representational stages of our CPA convention delivered as play. Complement to the free-play card: free play builds the learner; guided play teaches the lesson.
Skene et al. 2022, Child Development 93(4) (guided-play meta-analysis, 39 studies) · Alfieri et al. 2011, J. Educational Psychology (164 studies; discovery vs. instruction) · Distributed-Practice Effect on Classroom Learning, Behavioral Sciences 2025 (meta-analytic review) · Bisson et al. 2014 (multimodal incidental vocabulary) · Zosh, Hopkins et al. 2017, LEGO Foundation evidence review · Skene: 39 studies / ~3,893 children ages 1–8 · Alfieri: 164 studies · Distributed practice: classroom meta-analysis, N>3,000 · Bisson: multimodal exposure experiments
Acceleration ≈ +0.23 on academic achievement and ≈ +0.34 on math attitudes (meta-analytic); recognized evidence-based practice, with variability across techniques
Why we use it: Matching pace and depth to the child keeps capable learners engaged and builds durable reasoning tools (memory palace, Socratic habit) rather than busywork.
Acceleration meta-analyses; 'A Nation Empowered' (2015) research synthesis · Meta-analyses of high-ability/accelerated learners across decades
Digital literacy linked to better learning outcomes (mediated by self-efficacy & digital climate); programs succeed where paired with teacher PD, not tech alone
Why we use it: Naming and practicing attention control early builds the metacognitive 'flashlight' the whole curriculum relies on; media literacy is now a recognized K-12 standard in a growing number of states.
K-12 media/digital-literacy research (2024–25); attention-restoration & digital-literacy outcome studies · Systematic reviews of school digital-literacy programs; state K-12 media-literacy mandates (CA, NJ, DE, TX)
CGI improves problem-solving and number sense; durable effect modest (g≈0.16) and PD-dependent — discounted for real-classroom Voltage Drop
Why we use it: Concept before procedure builds understanding; short daily reps build fluency without burnout. The tool's Story Problems tab is the CGI piece; the Practice Game is the fluency piece.
Cognitively Guided Instruction (Carpenter et al.); concrete–pictorial–abstract (Bruner); distributed practice for fluency · CGI: multi-year RCTs, K–3 teachers & students
The cleanest transfer case in the dossier: Singapore's CPA approach produced attainment gains when transplanted to England under a randomized trial — rare for any curriculum. It rests on cognitive load theory: the concrete and pictorial stages give a novice a schema before symbols overload working memory. Honest scope (Voltage Drop): the strongest evidence is for CPA in math; applying it to reading (sounds/play → graphemes → fluent reading) is a sound extension of the same load-management principle, but the reading base is the science of reading, not 'CPA' by that name.
Why we use it: In the pilot every concept node carries a cpa_stage (concrete / representational / abstract); prompts match the stage and the system never offers a child a stage more abstract than the one they've mastered. It's the spine under both math and the reading tapping pages.
Hall et al. 2019, Frontiers in Education (Maths-No-Problem CPA, English RCT) · Singapore MOE Primary Mathematics Syllabus (CPA-anchored) · Bruner's enactive–iconic–symbolic foundation · Randomized English trial of a CPA program; Singapore (CPA-anchored) tops PISA, TIMSS & PIRLS simultaneously
One of the most replicated findings in cognitive psychology. 'The Non-Adopted A': top systems (Singapore, Japan) deliver spacing only implicitly via the annual calendar — none has a concept-level spacing engine, which is exactly the lane Green Life seizes. Pays off most visibly in math facts (spaced 2-minute checks beat massed drilling for durable automaticity). Honest scope (Voltage Drop): most spacing studies are lab-based — classroom-scale dosage curves are less settled — and the evidence is largely on neurotypical samples; optimal spacing for atypical memory profiles is a question the platform can study.
Why we use it: In the pilot every concept carries four fields (first_learned, last_encounter, last_retrieval_success, next_encounter_target) and a tiny computeNextGap function schedules each child's next review — arithmetic, no LLM — and both Command Centers surface 'due today.'
Cepeda, Pashler et al. 2006, Psychological Bulletin (distributed-practice meta-analysis) · Dunlosky et al. 2013 (effective learning techniques review) · 317 experiments / 839 effect sizes (Cepeda 2006); ranked a top-utility technique by Dunlosky 2013
The distinction that upgrades any spiral: revisiting is re-exposure; retrieving is the A-grade mechanism — same calendar slot, dramatically different retention. It pairs with spacing (the engine schedules WHEN, retrieval defines HOW — a recall check, not a re-teach). Low-stakes and frequent beats high-stakes and rare. Honest scope (Voltage Drop): effect size varies with test format and feedback timing, and retrieval works best once material is initially understood — it consolidates learning, it doesn't replace first teaching. For a struggling child: retrieve once, then re-teach with a worked example.
Why we use it: In the pilot each concept serves a retrieval prompt ('What do you remember about…?') before any re-teach; the child self-scores or the guide taps 0/1/2, and that result feeds the spacing gap. In reading, a correct tap-and-blend or sound-drill recall counts as retrieval.
Roediger & Karpicke 2006, Psychological Science (test-enhanced learning) · Rowland 2014 (testing-effect meta-analysis) · Dunlosky et al. 2013 · Roediger & Karpicke: 61% vs 40% recall at one week (tested vs re-study); confirmed across hundreds of studies (Rowland 2014)
Resolves a real tension: pure discovery/Socratic teaching can overload novices, but a prompt aimed at the child's current schema IS a worked example in disguise. The rule: the earlier the developmental level, the closer the prompt sits to a full worked example; the later, the more it opens into genuine Socratic challenge. It underwrites CPA — the concrete and pictorial stages exist precisely to manage load before symbols arrive. Honest scope (Voltage Drop): the strong-form claim that minimally-guided instruction 'does not work' (Kirschner, Sweller & Clark 2006) is a B-grade polemic with live scholarly disagreement; the worked-example mechanism under it is A-grade — we lean on the mechanism, not the polemic (which is why this card is A− while CPA, spacing and retrieval are A).
Why we use it: In the pilot the Sage/guide prompt style is keyed to developmental level — full worked example at the earliest levels ('I'll show you, then we do it together') fading to open Socratic ('What makes you say that?') at the top. The numbered-finger tapping aid in reading is worked-example fading applied to a motor skill.
Sweller 1988, Cognitive Science (cognitive load & worked examples; replicated meta-analytically since) · the expertise-reversal effect · Kirschner, Sweller & Clark 2006 (minimal-guidance critique) · Sweller's foundational worked-example studies (1988), replicated across decades; expertise-reversal effect established
Time in nature reduces rumination 12–15%; behavioral engagement (on-task participation, effort) is a distinct, measurable construct; flow is the deep-end form of presence
Why we use it: Presence is the decline this ring exists to catch — engagement collapses from 74% in 5th grade to 34% in 11th.
Bratman et al. 2015, PNAS (primary) · Fredricks, Blumenfeld & Paris 2004 · Csikszentmihalyi (flow) · n=38 controlled experiment + engagement-construct literature
Metacognition: effect size 0.69 — the single most powerful learning strategy; high curiosity activates the hippocampus + dopamine reward circuit and improves memory, even for incidental material
Why we use it: A real question is gradeable because curiosity literally switches the brain's memory system on.
Hattie 2009 — metacognition (primary) · Gruber, Gelman & Ranganath 2014 · 800+ meta-analyses (Hattie synthesis) + fMRI
Prosocial behavior predicts 20–28% higher group performance; cooperative learning ~0.53 SD on achievement, plus gains in self-esteem and peer relationships
Why we use it: Helping others doesn't trade off against learning — it drives it.
Podsakoff et al. 2009 (primary) · Johnson & Johnson cooperative-learning meta-analyses · OCB meta-analysis + cooperative-learning meta-analyses
Perseverance-of-effort is a substantially stronger predictor than consistency of interest; process-praise (effort, strategy) is the mechanism. Honest scope: grit alone explains only ~4% of outcome variance
Why we use it: We measure the part of grit the evidence actually supports — effort and strategy, not innate stick-to-it-iveness.
Deci & Ryan, Self-Determination Theory (primary) · grit meta-analysis · Dweck (growth mindset) · 127 studies, 45,485 participants + 40 yrs of SDT research
Belonging activates brain reward (and pain) circuitry; a <30-minute belonging intervention halved the racial achievement gap over three years — possibly the strongest single data point in the set
Why we use it: Belonging isn't soft — the brain treats it like food, and a small, well-timed signal can change a trajectory.
Eisenberger 2012; Baumeister & Leary 1995 (primary) · Walton & Cohen 2011, Science · fMRI + RCT followed 3+ years
Physical activity improves attention, on-task behavior, and academic performance; FBA replacement behavior (replaces shutting down), effect ≈0.60
Why we use it: A short movement burst resets attention and discharges restless energy before focused work.
Hillman, Erickson & Kramer 2008; CDC 2010 review of 50 studies · Review of ~50 studies
Rumination reduced 12–15%; reduced activity in the subgenual prefrontal cortex; FBA replacement behavior (replaces withdrawal, sensory overload), effect ≈0.50
Why we use it: Brief time in nature quiets the self-critical mental loop that crowds out learning.
Bratman et al. 2015, PNAS · n=38, controlled experiment
5 min/day of cyclic sighing (double inhale, long exhale) produced the greatest mood improvement and largest drop in respiratory rate — outperforming box breathing, cyclic hyperventilation, AND mindfulness meditation; in the classroom it serves as the FBA replacement behavior for fidgeting and rocking
Why we use it: The extended exhale is the body's fastest off-switch for stress. We teach cyclic sighing specifically because, head-to-head, it's the most effective method we have.
Yilmaz Balban, Spiegel & Huberman et al. 2023, Cell Reports Medicine · n=111, randomized controlled trial
Supports a child's return to the window of tolerance before re-engaging with a task; FBA replacement behavior (replaces escalation, disruption), effect ≈0.69
Why we use it: A calm, regulated adult lends their nervous system until the child can self-regulate.
Rosanbalm & Murray 2018 (co-regulation); Kuypers, Zones of Regulation · Practice-based research synthesis
More less-structured time predicts better self-directed executive function (Barker), while more structured time predicts worse. Recess raises on-task behavior and is tied to better classroom behavior and math/reading achievement — more recess, not less, helps. Managed risky play is linked to lower anxiety (Brussoni), and the decades-long decline in children's independent play tracks the rise in youth anxiety and depression (Gray). Honest scope (Voltage Drop): most of this evidence is correlational, not randomized; and on narrow academic targets free play is the weakest of three approaches — guided play and direct instruction teach specific content faster (Skene). Free play builds the child who can learn; it is not a delivery system for today's lesson.
Why we use it: Self-direction, self-regulation, resilience, and mental health are built through child-directed play, not assigned on a worksheet. Protected breaks also buy focus back — a 45-min reading block after movement beats 60 min of fading attention — so the trimmed instructional minutes are not lost. The guide's job during free play is safety and restraint, not running stations. (Garden Check stays its own outdoor intervention: nature + movement, distinct from free play.)
Barker, Semenov, Michaelson, Provan, Snyder & Munakata 2014, Frontiers in Psychology 5:593 · Skene et al. 2022, Child Development 93(4) (guided vs. free play meta-analysis) · Gray, Lancy & Bjorklund 2023, Journal of Pediatrics 260:113352 · Brussoni et al. 2015, IJERPH 12(6) (risky outdoor play review) · Barker: 70 six–seven-yr-olds, time-use correlational · Skene: 39 studies / ~3,893 children ages 1–8 · Gray: convergent-evidence synthesis · Brussoni: 21-study systematic review
Formative assessment produces effect sizes of 0.40–0.70 — outperforming standardized testing as a driver of learning
Why we use it: Naming what a child did well is itself the intervention: assessment and teaching become one act.
Black & Wiliam 1998 · Synthesis of 250+ studies
Autonomy, competence and relatedness together → 30–45% higher intrinsic motivation
Why we use it: Rating your own growth builds the self-evaluation muscle that outlasts any grade.
Deci & Ryan, Self-Determination Theory (40+ yrs) · Large multi-study literature
Reflective journaling builds metacognitive awareness and intrinsic motivation; daily gratitude journaling raised optimism, life satisfaction and positive affect in middle-schoolers; expressive writing yields small but real well-being benefits (r≈.08)
Why we use it: Where affect labeling names a feeling in the moment, journaling is the end-of-day reflection that builds self-awareness and lets a child savor what went well.
Reflective journaling & metacognition; Froh, Sefick & Emmons 2008 (gratitude); Pennebaker / Frattaroli 2006 (expressive writing) · Froh: RCT, 221 sixth–seventh graders · Frattaroli: meta-analysis, 146 studies
Process-praise (effort, strategy) builds a growth mindset; teacher expectations measurably shape outcomes
Why we use it: Praising the process, not the child's ability, is what makes the encouragement stick.
Rosenthal & Jacobson (Pygmalion); Dweck (growth mindset) · Classic + replicated
Prosocial behavior predicts 20–28% higher group performance; social acknowledgment activates brain reward circuitry
Why we use it: Being seen helping wires helping to reward — and it lands hardest when spoken face to face.
Podsakoff et al. 2009; Eisenberger 2012; Baumeister & Leary 1995 · OCB meta-analysis + fMRI
A <30-minute belonging intervention halved the racial achievement gap over three years and improved health and career outcomes into adulthood
Why we use it: A small, well-timed signal that you belong here can change a student's whole trajectory.
Walton & Cohen 2011, Science · RCT, followed 3+ years
Consistent gains in science achievement (emerging in math & literacy when standards-aligned); greener schoolyards improve attention, lower physiological stress, and raise prosocial behavior
Why we use it: Working in the garden every day turns science into something a child does with their hands — and the green setting itself restores attention and calms stress.
Williams & Dixon 2013 (garden-based learning synthesis); green-schoolyard intervention research; Bratman et al. 2015 · Synthesis of 48 studies (1990–2010) + RCT
School financial education produces sizeable gains in financial knowledge (stronger for younger learners, at 'teachable moments'); a running store makes arithmetic concrete — counting, change, pricing — and lifts math engagement
Why we use it: A daily store is a real teachable moment: children practice money sense and applied math by actually earning, pricing, and making change.
Kaiser & Menkhoff 2019 (school financial-education meta-analysis); classroom-economy / applied-math research · Meta-analysis incl. RCTs
0.44 SD additional progress (~5 extra months/year); +8 percentage points earning college-credit AP scores; with cooperative learning 0.59; movement integrated into content up to 0.94 across 83 studies. Honest caveat (Duke 2021): effect depends on implementation quality — a Voltage Drop
Why we use it: Real projects drive an engagement a worksheet can't — when the design is installed well.
Chen & Yang PBL meta-analysis (66 studies) + Knowledge in Action RCTs (Lucas Education Research); Duke et al. 2021 · Meta-analysis (66 studies) + cluster RCTs
Predictable visual structure lowers anxiety and frees working memory for learning; especially supportive for neurodivergent learners
Why we use it: When a child can see what's coming, the brain stops spending energy on uncertainty.
Visual supports / predictable routines (UDL; trauma-informed practice) · Practice-based evidence
We don't cherry-pick. Where an effect is modest or PD-dependent, the card says so. Where the research is mixed, we tell you. This is the evidence standard we hold our own methods to — and the reason you can trust the rest.