Mythbusters and the Importance of Variation

Mythbusters, Adam Savage and Jamie Hyneman

Mythbusters, Adam Savage and Jamie Hyneman

I love the Mythbusters.

I’ve watched their show since the first season, and unlike some other shows of a skeptical persuasion, it’s maintained it’s edge, humor and integrity. But, I’ve noticed a rather disturbing trend (or at least a disappointing outcome that has now occurred more than once). You see, Adam, Jamie and the build team often do a rigorous job of designing experiments and demonstrate that they are properly cautious in drawing conclusions. But, I would prefer to see them demonstrate true statistical procedures (or at least allude to them) when presenting their conclusions.

In 2005, the Mythbusters tested the myth that yawning is contagious. It was a sloppy experiment from the start, wrought with potential confounding variables and an utter lack of experimental control. With some revisions the crew finally seemed to get the experiment together and carried it out with a rather rigorous design. I was so pleased with the design and procedures, that I decided to show this episode to my statistical methods class as a way to encapsulate the work that goes into experimental design, and the controls and manipulation you must put in place.

The second reason why I selected this episode as a good sample for my students was the poor conclusion the crew arrived at. Click this link to view the clip.

In this clip you can view the raw data alongside the proportions Adam has calculated. Contrary to what Adam and Jamie conclude, the results do not ‘confirm’ the myth. I could explain why, but someone else has already done a swell job at that. Adam and Jamie replicated this error in judgment in their most recent episode, “No Pain, No gain” – where they tested the myth that women have a higher threshold than men for tolerating sustained pain. They assembled 25 women and 25 men, had each participant place their hand in a bowl of ice water and told them to state when pain began and pull their hand out when they couldn’t stand it any longer, all the while timing the length of time between the introduction of the pain stimulus and each event. In the end, Adam reveals that women held their hand in the water for 100.4 seconds, and men only 84.3 seconds.

So, is the myth confirmed? Perhaps, but I’m unsatisfied. You see unlike the Yawning myth, Adam did not reveal the raw data, so I’m unable to conduct the calculations that had been done for the yawning myth, where it was determined the P value was not significant. Why is this important? To begin, a standard, arithmetic mean is calculated by taking the sum of all of the values in a given data set (ΣX) and divide it by the number of values within the set (n). It is a very common measure of central tendency: a statistical procedure that aims to summarize a set of data by examining what is typical within the data set. Here’s the problem with relying solely on the mean for a between group analysis. Say I have two groups, with each member producing a score on a test. In group A the scores are {95, 90, 90, 50, 50}. In group B the scores are {80, 75, 75, 75, 70}. Now, if you’ve done the math as I described above you’ll notice that both groups have a mean of 75.  By examining the data, however, you’ll see that both data sets are vastly different. The mean fails to encapsulate the variation within the data set, or how much each score typically deviates from the mean of the distribution.

Another example of the problems with reliance on the comparison of means alone involves the effect of an outlier, or a score that deviates dramatically from the typical range of values. Say you have two groups, and the members of each group are being scored on a test. In group A the scores are {80, 72, 78, 83, 74}. The scores in group B are {79, 76, 80, 72, 157}. If you calculate the mean for both groups you’ll find the group A has a mean of 77.4 and group B a mean of 92.8, a difference of 15.4 points. The seemingly large difference in means can be deceptive, because if you were to remove the largest value from each data set (which would eliminate the effects of the outlier in group B),  both groups appear very similar in their averages. Means, more so than other measures of central tendency, easily become greater or lesser with extreme outliers.

If the only information you had available were the means in either of these conditions, you might falsely conclude that the first pair of groups were not dissimilar and the second pair groups were, and in both conclusions you’d be wrong. Perhaps it’s impressive that the woman lasted (on average) about 16 seconds longer than men. But it might also be one or two females that beat the clock coupled with one or two guys that barely lasted a moment.

Significance tests help us to know the difference between a random fluctuation in the data set and a systemic difference that is not likely due to chance. Why oh why do the Mythbusters, with what I’m certain is a more than decent enough team of researchers and consultants, ignore this simple procedure? Granted, they don’t have to imbue each episode with a lesson in statistics, but at the very least they could let us know if the results are significant by stating as much. What I admire (and continue to admire) about the show is how they make science both accessible and fun to a lay audience. However, the mistake often made when presenting science to a the public is to gloss over the more technical procedures. Perhaps it’s out of concern that the audience will get lost in the details, but I’m not advocating for a step by step T test, just a nod in the direction of how important statistical hypothesis testing procedures are in the evaluation of experimental findings. Most especially if they want the audience to accept that a myth has truly been confirmed. 

Share and Enjoy:
  • Facebook
  • RSS
  • MySpace
  • Reddit
  • StumbleUpon
  • LinkedIn
  • Tumblr
  • Twitter

9 comments to Mythbusters and the Importance of Variation

  • Douglass Smith

    Thanks for pointing that out, Lisa. I’d love to hear a response from them. Although I’m not trained in statistics, I know enough to have been skeptical of some of their tests in the past — especially those with what appear to me to be low sample sizes.

    I do love the Mythbusters, however; they give at least the impression of being devoted amateurs who are interested in doing the right thing, albeit of course at the mercy of having to produce engaging drama.

  • Couldn’t agree more! Not every myth investigation generates enough data for stats testing, but many of them do. I think throwing in some more educational tidbits about how to generate conclusions from raw numbers here and there (every three or four episodes) would not hurt the show.

  • malendras

    The Mythbusters do edit these things down for a reason – most people won’t watch a show where they show all the raw numbers and math. Stephen Hawking was told “every equation halves sales [of his book]” so I’d imagine they’re doing the same thing in the show. They rarely show all the tests they do, editing it for time, pacing, and viewer interest. It would be really cool if they published the numbers on their site, or put those shots up on the site, but putting them into the show would probably alienate more viewers than it would impress.

  • Clinton Freeman

    I signed on to a Mythbuster fan website that I haven’t written on in 5 years and one of the first questions was exactly the one Lisa raised. Someone asked if the raw data from this segment could be provided and I asked that I be given it too. If I get it I’ll pass it on.
    My problem with the segment is the assumption that keeping your hand in water has a direct correlation to your ability to tolerate pain. Especially when difference in result could just as likely be explained by differences in subject compliance and/or suggestive signals from Jaime.
    My biggest criticism of the show for the last few years is that they test fewer actually myths and just reproduce special effects (particularly ones where they get to blow things up.)

  • Clinton Freeman

    They just posted a notice about this video to their Facebook page.

    It doesn’t really address the questions raised here, but it did remind me that you can post questions to, too.

  • PonderingTurtle

    The other thing that they were sloppy with in these tests was shown in their Redhead vs non Redhead pain testing, that of motivation. If someone has the right motivation they will tolerance a lot more pain than they would with no motivation. For example if they gave people say $150 if they went for a minute and a half, I suspect the average would be much higher.

    So that in the tests they seemed to tell redheads that they were expected to have a lower pain tolerance than non red heads. Now no one wants to be thought as weak willed, so the redheads have something to prove, while especially if they did not tell the non redheads this they have nothing to prove and less motivation to endure the pain. So they ended up with a large difference between the groups.

    So as a redhead I do not think that at least I have a low pain tolerance, but they had some serious control issues in their study if they want to pretend that it had any validity. I mean they introduced confounding variables into it by telling them how they would break the groups down.

  • I’d also like to point out that their experiment testing the effect of swearing on pain tolerance was flawed due to the failure to randomize the order of the control and treatment.

  • Malik B.

    It’s also important not to make an inferential error by thinking that women tolerating more localized pain due to a hand in an ice bath must necessarily mean that they can sustain an overall higher amount of pain (in general) than men.

    I would hardly say that given the experimental design that the myth, as stated, could be confidently said to be busted or not, though we might be able to say something about variations in tolerance to cold between the sexes.

  • Cesar Rabak


    I’ve only watched the show today, and immediately got concerned with the statistical approach. I would like to add two aspects
    to the critics:

    1) Since they consulted with Stanford researchers about methods
    of safely inflicting pain in the subjects, they could also have
    given a peek at the way of comparing the results!
    2) I think the better procedure to compare the means of the
    outcome would have been to use 95% IC to report them and an
    appropriate statistical test for testing the equality of the
    two means.

    My second point above is in addition to your comments about a
    better examination of the data about outliers (as from watching the show it can be perceived that a lady got the limit of 3′).

    For statistical classes this ‘simple’ experiment brings also an another (IMNSHO) important issue with this class of tests when the upper limit is censored (in this case due safety reasons).

    And the last but not the least, in the episode aired here, there was not shown the results for the time it take to feel pain: the real time the subjects ‘to sustain the pain’.

Like us? Support Us!