Wednesday, June 10, 2015

Small Groups Are More Likely to Excel, and That Means Nothing

A Megan McArdle piece this week on why the US can't necessarily become Norway referenced an old Marginal Revolution post by Alex Tabarrok which pointed out that the reason why small schools often appear in lists of top performing schools may well be mostly probability rather than performance. It's a good post, but as soon as I read it I wanted to play with some visualizations, because I think it's the sort of thing people can grasp a lot more easily with some visuals.

Say we're looking at ranked performance of any type. Since this is inspired by Tabarrok's piece on schools, we're say there's school children who've taken a standardized test and are now ranked in terms of percentile: 1st to 99th.

I created a completely random data set with 1,000 data points using Excel's RANDBETWEEN() function. So I now have a list of the scores of 1,000 students in my imaginary town. This is a totally average town. It's 1,000 students have an average test score in the 50th percentile. If we look at how many student fall into each decile, it's pretty clean.


Now let's start to break the students into groups. Remember, our students are one long list of random numbers from 0 to 99, so we're going to take that list, which is in no order, and start dividing it. The differences between those divisions will be nothing but statistical noise. We cut them into groups of 500 and then into groups of 250.


You see what's happening. Even though these group assignments are totally random, we now have a five point spread between our best and worst "school" of 250 students within our town of 1,000. As we break down into smaller and smaller groups, the probably range of variation becomes wider.

Obviously, with 1,000 students, half of whom are above the 50th percentile and half of whom are below, you could randomly assign them to two groups of 500 and have one group be all students below 50th percentile and the other all students above the 50th percentile, putting their average scores fifty points apart (25 versus 75) but that is very, very, very unlikely.

As the groups get smaller, the chances of outliers become higher.


This difference in score distribution based on group size becomes important if we compare groups of different size.

Let's now say we have four towns, each with 1,000 students.

Town A has two schools of 500 students each.

Town B has four schools of 250 students each.

Town C has ten schools of 100 students each.

Town D has 20 schools of 50 students each.

Remember, each town has exactly the same set of student scores and all differences in school average scores is based on random variation: which students are randomly assigned to each school.

The county now decides to put together an honor roll of the best 10 schools in the county. Which schools are on the list?


Town D with its 50-student schools hold seven of the top ten slots on the Honor Roll. Town C with its 100-student schools gets two slots. Town B with 250-student schools gets one slot. The highest performing 500-student school is number 15 out of 36.

Of course the flip side is: The bottom ten is exactly the same breakdown: seven schools from Town D, two schools form Town C, one school from Town B.

The larger the school is, the more statistically representative it will be of the population as a whole. Thus, 500-student schools are very, very close to the average of the town as a whole. However, when we break the population up into smaller groups, random chance starts to play a larger role. It's far more likely that you get a disproportionate share of high performing student assigned to a group of 50 than to a group of 500.

You can see the same effect with the classic coin toss. Flip a coin ten times, and there's a decent chance you'll get a lopsided result. Flip is a hundred times and you'll start to get a lot closer to 50/50 in your results. A friend of mine who was a high school math teacher used to assign a project where students had to flip a coin a large number of times and record the result. He could usually detect the students who tried to fake their results because in an effort to show "realistic" results they would show too even a distribution of heads and tales within small groups of tosses. It's actually fairly likely that when tossing a coin a hundred times you'll have it come up heads four or five times in a row at some point. But students trying to save time by faking results seldom had these longer runs of luck.

Now, all of this is simply looking at what happens when we take a population of students that already varies and assign them to schools of different sizes: large schools will look more average while the highest and lowest performing schools will be from among the smallest size ones. However, in real life, there's also an averaging effect to large organizations when it comes to performance. If you had a small, independent school teaching only 50 students, a lot of your success would rely on a small number of teachers and how well they did their jobs. Having one or two really great teachers could make your whole school look amazing. Whereas, having a couple of really bad teachers could tank your scores. In a really large school, a lot of those effects would average out. You might have one teacher who does a really great job with her twenty students, but if those students are hidden in among 480 others, the effect will be a lot smaller.

So when we're looking at information of this kind, it's very important to look at the sizes of the groups involved. Ranking groups of radically different sizes (such as schools that vary in size by a factor of ten) can result in a lot of the difference you think you are seeing being the result of statistical noise.

2 comments:

  1. Yeah, I was a little skeptical when my kids' high school (less than 100 students) was ranked number 1 in the diocese in terms of SAT scores, but never could have figured out specifically why I should have been. : )

    ReplyDelete
  2. "The larger the school is, the more statistically representative it will be of the population as a whole."

    Only under your assumption of a randomly assigned sets of idealized students. Is this what happens in reality? I hardly think so. Some kids, maybe most, end up in the schools they attend by accidents of geography; some get where they are by more or less heroic efforts of their parents.

    Such efforts would affect small schools more than larger schools, in so far as the addition or subtraction of a single student can skew the results among 50 students much more dramatically than among 1000.

    Sure, it's important to note that a lot of these variations are just noise - assuming the schools are otherwise similar. But that's a big assumption, once you start adding in the traditionally smaller religious and experimental schools.

    I'd say that all these size analyses - school, class, hours at school, number of days at school, number of hours of homework and so on - are so much hand-waving meant to distract. The real issue is whether the student is treated as a moral actor, as an immortal soul, and taught with the love of friendship, or viewed as 'graded' 'inputs' destined to see their value solely as 'human resources'. The graded classroom model does the latter, while the Greeks and the Scholastics did the former.

    ReplyDelete