A new paper in Science — Gullich et al., “Recent discoveries on the acquisition of the highest levels of human performance” — looked at top performers in athletics, science, math, and music. The headline finding is one most parents would like to hear: among the highest adult achievers, peak performance is negatively associated with early performance. Don’t push your kid to specialize early; let them play in the yard.
Andrew Gelman flagged the paper this week, gently, with the kind of sigh the rest of us should all probably learn to share. The math, once unpacked, says almost the opposite of what the abstract says. The reason is Berkson’s paradox, and it is doing nearly all of the work.
A Picture in Two Frames
Suppose you want to be a successful film actor. You can be unusually talented, or you can be unusually good-looking. Either path will do, both is nice but unnecessary, and the world produces children who are mostly neither. In the general population, the two traits — talent and looks — are uncorrelated.
Now sample only the successful actors. You will find — and Joseph Berkson noticed this in 1946 — that talent and looks are negatively correlated within your sample. The successful actors who aren’t especially good-looking had to be unusually talented to make it; the ones who aren’t especially talented had to be unusually photogenic. The negative correlation is an artifact of the selection rule. It tells you nothing about the underlying joint distribution of talent and beauty. It tells you only that you conditioned on something.
This is Berkson’s paradox. It is the most reliable engine of spurious findings in observational research, and the structure transposes cleanly onto the Gullich paper.
Same Story, Different Sport
Substitute “successful actor” with “adult-elite performer in athletics, science, math, or music.” The paths into adult eliteness aren’t literally talent and looks, but they don’t have to be — they only have to be more than one. You can be a top junior who matured into a top adult, or a less-distinguished junior who developed late, or a middle-of-the-pack kid who caught a break. To wind up in the adult-elite sample, you need at least one of these paths to have worked for you. Conditional on adult eliteness, the paths will look negatively correlated, even if in the general population they are independent or — more plausibly — positively correlated.
The conditional you want runs the other way. The unconditional question is not “among adult elites, who was an elite kid?” The unconditional question is “among elite kids, what fraction become adult elites?” That second number is roughly forty times larger than the base rate in the general population. Early performance is hugely predictive of late performance. The paper’s reported direction comes from running the conditional backwards: \(P(\text{elite kid} \mid \text{elite adult}) \ne P(\text{elite adult} \mid \text{elite kid})\). These are different objects, and the relationship between them depends on the base rates of “elite kid” and “elite adult” in the general population. The first quantity can fall while the second remains enormous. It does.
The Selection Is the Measurement
This will be familiar furniture for readers here. The selection rule for who gets into the dataset — “is an adult elite, yes or no” — is itself a classifier. The dataset is whatever the classifier admitted. Its internal correlations are statements about the joint behavior of the classifier and the underlying variables, not statements about the underlying variables alone. There is no act of measurement that can extract a population-level fact about early-vs-late development from a sample whose membership was determined by the very outcome you are trying to study.
I have written variations on this for a long time. Back in 2014 I argued that interest groups strategically pick the fights they enter, so that the observed correlation between their stated positions and policy outcomes overstates their influence — the analyst, I wrote then, has simultaneously more data and less information than the people being studied. The same year, I argued that the things we notice are selected by the fact of being noticed, so that pointing at the noticing as the cause of the noticeable runs the conditional backwards. Both arguments were Berkson’s paradox in non-technical clothes. I didn’t know, until reading the commentary on Gullich this week, that the abstract version has carried Berkson’s name since 1946. I’m glad to have the vocabulary.
This is, more or less, the same point I was making yesterday in a different costume: the classifier that decides who counts as elite does not merely describe eliteness, it constitutes the dataset whose correlations you are about to report. Once you condition on adult-elite, you cannot un-condition on it by running better statistics on the resulting sample. (This is a point Maggie and I keep making in a less athletic register.)
The Footnote Most Parents Want
The cultural reception is its own measurement problem. Parents reading the press release will hear: “Don’t push your kid into early specialization; the data say it backfires.” What the data actually say is: “We sampled adults who made it to the top of their field and noticed that, among them, early bloomers and late bloomers are both represented.” Those are different propositions. The first is normative parenting advice; the second is a description of how Berkson’s paradox interacts with a sample selected on the outcome variable. (Over a hundred again, John.) The slippage between them is exactly the cultural use of statistical findings that the field could afford to be a little more careful about.
The Right Question
There is a useful question hiding here. It is not whether early elite predicts late elite — it does, robustly — but what fraction of elite adults came up through each path, and how the relative size of those paths varies across domains. Track that and you start to see something interesting about how achievement gets sorted across human lifetimes. Treat it as a horse race between paths and you have reproduced Berkson’s mistake at scale. The cost has been a steady drip of advice that doesn’t quite mean what it says.
With that, I leave you with this.