Chapter 1 – The Sample with the Built-in Bias
This chapter dives into the concept of sample bias, illustrating how a seemingly impressive statistic can be misleading due to the way the sample is selected.
Some of the key points:
- The Misleading Average: The chapter opens with a noted on the Time magazine that the average Yale graduate from the class of 1924 earns $25,111 a year (Adjusted for inflation, $25,000 in 1924 is equal to $443,275 in 2024 by the dollartime.com ). This figure is presented as evidence of the financial success associated with a Yale education. However, this is “surprisingly precise” and is “quite improbably salubrious”. Because “it is not particularly probable that you know your own income for last year so precisely as that unless it was all derived from salary” and “$25,000 incomes are not often all salary”. Furthermore, the $25,000 is just a number claimed with no evidences.
- Unrepresentative Sample: The author reveals that this figure is based on a survey of Yale graduates, but the respondents are not representative of the entire class. The sample is biased because it only includes those who:
- Could be located (likely excluding less successful individuals who may have moved frequently or fallen out of touch).
- Chose to respond to the survey (likely excluding those who are embarrassed by their lower incomes).
- The Missing Figures: The chapter emphasizes that the missing figures – the incomes of those not included in the sample – are crucial for understanding the true average income of the Yale class of 1924. The excluded individuals are likely to have lower incomes, which would significantly lower the overall average.
- The Illusion of Precision: The author points out that the precise figure of $25,111 creates an illusion of accuracy, making the statistic seem more reliable than it actually is. In reality, the biased sample makes the figure unreliable and unrepresentative of the entire class.
Main takeaway:
The main takeaway from the chapter is that statistics can be easily manipulated through biased sampling. When presented with a statistic, it’s crucial to consider how the sample was selected and whether it truly represents the population it claims to represent. Always question the methodology behind the numbers to avoid being misled by inaccurate or incomplete data.