Gamifying the read-along lesson: a 4-day A/B test that lifted day-zero activation by 2.7 points
SpeakX's Read-along lesson asks early learners to read text aloud while audio plays. The original was static: text, audio, a progress bar. I designed a gamified variant with points, streaks, and a per-word success animation, and ran it head-to-head against control across four days and ~2K paid users per bucket per day. The variant won on activation every single day.
Day-zero activation is the metric that decides whether a paid user stays.
SpeakX's paid economics live and die at D0. A user who starts and finishes their first exercise on the day they pay is dramatically less likely to cancel inside the refund window, and dramatically more likely to come back on D1. The Read-along lesson sits early in the path for early learners, and it had been static for a long time: text on screen, audio plays, a thin progress bar moves. It worked, but it didn't pull.
I had a hypothesis I'd been wanting to test cleanly: that adding visible reward, points, streaks, a per-word success animation, would lift the rate at which users who land on the lesson actually start and finish it. Not a redesign of the surface. A behavioural overlay on the same content.
Plenty of users were reaching the lesson. Fewer were starting it. Fewer still were finishing.
Pre-test, the static Read-along lesson was holding D0 activation in the low-80s. Healthy on its own, but the gap between "user opens lesson" and "user actually presses start" had been flat for months. Completion was lower again. And inside the same 24-hour window, ~19% of users were cancelling, not for product reasons, but because nothing on D0 had given them a reason to feel they'd already started.
The thesis: the lesson rewarded effort silently. Audio played, text scrolled, a bar inched. Nothing on screen told the user they were doing well while they did it. That silence was costing us at the start, the finish, and the cancellation step.
Test had to be clean. Variant had to be additive, not disruptive.
- Same content, same audio, same lesson length. If I changed the underlying lesson, I couldn't attribute lift to the gamification layer. The variant was a strict overlay on identical content.
- ~2K paid users per bucket per day. Sample was honest but not infinite. I planned for at least 4 days of stable data so we wouldn't be reading single-day noise.
- Children and early learners. The rewards had to feel earned and age-appropriate, celebratory, not casino. No coin-shower animations, no FOMO timers.
- No engineering for new lesson types. Points, streaks and the per-word animation had to fit inside the existing Read-along renderer with minimal scope.
The lesson was rewarding the right behaviour. It just wasn't showing it.
I sat with [X] sessions of children running through the static Read-along. The pattern was consistent: kids would read along, sometimes well, sometimes mumbling, and there was no moment of feedback. The audio kept moving regardless. Parents watching over their shoulder couldn't tell whether the child had done the thing or not, and neither could the child.
The reframe was small. The lesson was already detecting per-word success behind the scenes for scoring. We just weren't surfacing that detection. Every successfully read word was an opportunity for a micro-reward that already existed in the data, sitting silently.
"He read it. I don't know if the app knows he read it."
Parent of a 7-year-old, watching a Read-along sessionThe hypothesis I committed to. Add three things on top of the existing lesson: a points counter that ticked up on each successful word, a streak indicator that built across consecutive correct words, and a small per-word success animation at the moment of detection. Hold everything else constant. Run a clean A/B against control.
Run a 50/50 split, four days, ~2K paid users per bucket per day.
The test ran from June 17 to June 20, 2025. Variant got the gamified lesson; control got the existing static one. Same audience filter, same lesson set, same entry points.
- Primary metric: D0 Active Exe Start: did the user actually press start on the lesson on the day they paid.
- Secondary: D0 Active Exe Completed, D0 cancellations, D0 time spent.
- Guardrail: D0 GT 6 Lessons Start: were users still moving through the lesson plan, or were they getting stuck inside one rewarding lesson.
The trade-off I named upfront: if the per-word reward made the first lesson feel good enough on its own, we might see fewer users push past 6 lessons on D0. I wanted to know if we were trading depth for activation. The answer turned out to be: a little, yes, and I want to be honest about it below.
What shipped into the test, what got cut, what's queued for v2.
What shipped into the variant. Live points counter (per word), streak indicator with a small break/reset animation, and a per-word success micro-animation tied to the existing speech-detection signal. No new lesson type, no new audio, no new copy in the lesson itself.
What didn't. A "best streak today" carry-over across lessons was descoped. We wanted the test surface contained to a single lesson. End-of-day recap and parent-facing streak summary were also held back.
v2 targets. Carry streaks across lessons in a session, surface a daily streak to parents, and re-test specifically against the 6-lessons guardrail to see whether end-of-lesson "next lesson" framing closes the depth gap.
Measured on Jun 20, 2025. Direction held all four days.
The honest tradeoff. D0 GT 6 Lessons Start moved the wrong way: 48.5% on variant vs 50.2% on control, a ~1.7 point dip. Modest, but real. The likely read: the gamified lesson is satisfying enough on its own that some users who would have pushed past 6 lessons stopped earlier feeling rewarded. That's the tradeoff to fix in v2: keep the activation lift, recover the depth.
Across all four days the variant won activation, won completion, and lowered cancellation. The lift is modest. It's not a miracle. But it's consistent, directional across every metric we cared about, and it held outside single-day noise.
"This is the only test we ran in that window where we can cleanly say a design change caused the lift. The numbers aren't huge, but they're clean."
Paramjeet Singh · Product Manager, SpeakXWhat I'd do differently. What I'll carry forward.
What I'd do differently. I'd have planned the depth-recovery experiment as part of the same shipping plan, not as a v2 follow-up. The 6-lessons dip was foreseeable. The moment you make a single lesson feel rewarding enough on its own, you should also make the next-lesson handoff feel like the natural continuation of that reward. I left that on the table.
What I underestimated. How much the cancellation drop mattered. I'd designed for activation. The 1.7 point cancellation drop is, in revenue terms, the most valuable number on the page. It compounds across the entire paid base, not just D0. Reward visibility didn't only pull users into the lesson; it gave wavering users a reason to feel they'd already started.
What I'll carry forward. The smallest test that isolates one variable wins. I had pressure to ship the bigger gamification shell. The boring, narrow variant is the one I can stand behind a year later, because it's the only one whose lift I can attribute. This is the test where I can cleanly say design caused a measurable lift. That clarity is worth more than a bigger uncontrolled launch.