If you’ve spent any time reading about astrology and science, you’ve probably encountered some version of this sentence: “Astrology was scientifically tested and failed.” Often, though not always, the study being referenced is Shawn Carlson’s 1985 experiment, published in Nature — one of the most prestigious scientific journals in the world, and not a venue that publishes astrology research lightly.
The Carlson experiment has become shorthand for “astrology doesn’t work” in a way that few single studies achieve. It’s cited in textbooks, Wikipedia articles, skeptic literature, and popular science writing as the definitive test. What’s less often discussed is what the study actually did, how it was designed, what its results actually were in detail, and what — if anything — it left untested.
This matters not because the study should be dismissed. Its design was unusually rigorous, and its central finding has held up. But “the definitive test of astrology” and “a rigorous test of one specific astrological claim, designed with astrologer input” are different descriptions, and the difference is where the more interesting analysis lives.
What Carlson Actually Tested
Shawn Carlson, then a physics graduate student at UC Berkeley, designed a double-blind experiment to test a specific claim made by the National Council for Geocosmic Research (NCGR), a major astrological organization that had agreed to participate in the test’s design — a crucial detail often left out of summaries.
The claim being tested was this: given a person’s natal chart (birth date, time, and location, used to calculate planetary positions) and three personality profiles — one belonging to the person whose chart it was, and two belonging to other people — could a competent astrologer match the correct profile to the correct chart at a rate significantly above chance (which would be 1 in 3, or 33.3%)?
The personality profiles were generated using the California Psychological Inventory (CPI), a well-validated personality assessment instrument. 116 subjects completed the CPI and provided birth data. 28 astrologers, selected by the NCGR as competent practitioners (not random members of the public — this is an important point that gets lost in many retellings), were each given one subject’s natal chart along with three CPI profiles and asked to identify which profile belonged to the chart’s owner.
Crucially, the experimental design was reviewed and approved by the astrologers before the experiment ran. The NCGR’s representatives confirmed that natal chart-to-CPI-profile matching was a fair test of what astrology claims to be able to do, and that the CPI was an appropriate instrument. This collaborative design process is why the study carries more weight than an experiment designed unilaterally by skeptics — the test was, by the participants’ own agreement beforehand, a fair one.
The Results
The astrologers predicted they would achieve a match rate of at least 50%, well above the 33.3% chance baseline. Carlson and the participating astrologers had agreed in advance that this would constitute a meaningful success threshold.
The actual result: astrologers correctly matched charts to profiles 33.1% of the time — statistically indistinguishable from the 33.3% chance rate. Subjects themselves, asked to choose which of three CPI profiles best described them (a separate part of the experiment, testing self-recognition rather than astrological skill), performed at 33.4% — also at chance.
The astrologers’ confidence in their own matches did not correlate with accuracy. High-confidence matches were no more likely to be correct than low-confidence ones — a finding that is itself notable, because confidence-accuracy correlation is often used as informal evidence of genuine skill in other domains.
By every measure the study was designed to assess, and by the standard the astrologers themselves had agreed to in advance, the result was a clear failure of the tested claim.
What the Study Did Not Test
This is the part of the Carlson experiment that gets least attention, and it matters for understanding what the result actually establishes.
The test used natal charts and CPI personality profiles specifically. It tested whether a chart could be matched to a standardized psychometric instrument. It did not test predictive claims (transits, timing, forecasting), relational claims (synastry, compatibility), or claims based on systems other than Western tropical astrology with the specific chart format used.
The astrologers worked from charts alone, without consultation. Real astrological practice, as most practitioners describe it, typically involves conversation with the client — follow-up questions, observed reactions, contextual information about the person’s life. The Carlson design removed all of this, testing chart interpretation as a purely abstract exercise. Whether this is a fair simplification or an unfair removal of the practice’s actual working conditions is genuinely debated. Carlson’s position, and the position of most researchers in this area, is that if astrology works because of the chart’s information content, removing the consultation context shouldn’t matter — the chart should still carry signal. If it only “works” through the consultation dynamic, that’s informative too, just about a different mechanism (closer to cold reading) than the one being claimed.
The personality instrument was the CPI, not a Big Five measure or any astrology-specific instrument. Some astrologers have argued that the CPI’s dimensions don’t map cleanly onto the personality dimensions astrology claims to describe — though this argument is weakened by the fact that the NCGR approved the CPI’s use before the experiment ran.
The sample was 28 astrologers and 116 subjects. This is a reasonable sample for the statistical power needed to detect the claimed effect (a jump from 33% to 50%+ is a large effect, detectable with moderate sample sizes), but it’s not a massive study, and it tested one cohort of astrologers using one methodology at one point in time.
Why the Study Has Held Up
Despite the limitations above, the Carlson experiment has not been successfully challenged on methodological grounds in the nearly four decades since publication — which is itself notable, because the astrological community had every incentive to identify design flaws if they existed, and had agreed to the design in advance, removing the most common objection (that skeptics rigged the test).
Several smaller studies using similar matching paradigms — testing whether astrologers, given charts, can outperform chance at identifying corresponding personality data, occupation, or other life details — have generally replicated the null result. A 1985 study by geneticist and astrology researcher Geoffrey Dean reviewed a substantial body of similar matching studies and found a consistent pattern of chance-level performance.
This consistency across multiple matching-paradigm studies, not just Carlson’s alone, is what gives the null result its weight. Carlson’s study is the most famous because of its Nature publication and collaborative design, but it’s part of a broader pattern rather than an outlier finding.
What “Failed Test” Actually Means Here
The honest summary of the Carlson experiment is this: when astrologers agreed in advance on a fair test of whether natal charts could be matched to standardized personality profiles at above-chance rates, and that test was run under double-blind conditions, the result was chance-level performance. This is a real, meaningful negative finding about a specific, well-defined claim.
What it does not establish: that astrology has zero effects of any kind on anyone, that the experience of astrological consultation has no value, that other astrological claims (synastry, timing, electional astrology, or non-Western systems like BaZi or Vedic astrology) would produce the same result if tested with equal rigor, or that astrologers cannot produce personality descriptions that feel meaningful to clients (a separate question, addressed by Barnum effect and cold reading research, where the evidence runs in a different direction).
The Carlson experiment tested one specific, well-operationalized claim and found it didn’t hold. That’s worth taking seriously — it’s strong evidence against natal-chart-to-personality matching specifically. It’s weaker evidence, by extension only, against the wider universe of divination claims, many of which are structured differently and would require their own equally rigorous tests, most of which have not been conducted. The Vernon Clark studies, conducted decades earlier with a different design, produced different — though also contested — results, and are worth understanding as part of the same broader picture.
What a Fair Reading Requires
The Carlson experiment is sometimes invoked as though it settled the entire question of whether divination systems have any validity. It didn’t, and treating it that way oversells a well-designed but narrow result.
It’s also sometimes dismissed by astrology proponents as flawed or rigged — a characterization that doesn’t hold up given the collaborative design process and the lack of successful methodological challenges over nearly forty years.
The accurate position is the least satisfying one: this was a good test of a specific claim, the claim failed, and the result has generalized reasonably well to similar claims tested with similar methods. Claims with different structures — particularly those involving timing, synthesis across multiple systems, or outcomes other than static personality matching — remain largely untested. “Astrology failed a test” is true. “Astrology was definitively disproven” overstates what any single experiment, however well-designed, can establish.