Gotham Skeptic Survey Results, Part 2 of 3: Bias in Testing

skepticism chartInstead of continuing my thorough examination of the data from Gotham Skeptic’s first survey, I’d like to respond to comments the instrument has received. In  particular, concerning the likert type item “Skepticism is the same as Atheism” When the survey was first published to the blog, a few comments emerged stating that the instrument was a biased one. Then, upon publication of the first of these 3 parts reporting the results, additional comments were made  regarding this item in particular, once more equating the phrasing of this item to instrument bias. If I might quote one comment directly “The questions on  atheism led me to think that this was a survey which was already biased towards assuming that skepticism = atheism and I wasn’t keen on contributing to a  study that seemed to assume I was an atheist.” I’m quite glad these comments (and others like them) were made, as it has given me an opportunity to segue into the topic of bias in testing (a subject that this part of the curriculum I teach in within statistical measurement and experimental psychology).

I would like to begin by stating where the item came from. Quite some time ago, when first I inquired about getting more involved with the NYC Skeptics, I  met with co-founder and NYSC president, Michael Feldman. During that conversation we discussed how often we are mistaken as atheists when stating that we are  skeptics. In the time now that I have become more involved with NYCS, and now with Gotham Skeptic, I’ve met with fellow skeptics who have expressed a  similar disappointment. Yet, still many skeptics seemed to present skepticism as synonymous with atheism, and skeptical organizations as having an atheistic  agenda. When I decided to develop some polls for the Gotham Skeptic, I wanted to explore this further. After the first poll, taken in the Fall of 2008,  presented in the first issue of the Gotham Skeptic, I found that there did seem to be a higher incidence of atheistic beliefs within the skeptical community  than outside of it. So, in designing this second survey, I designed more pointed questions to assess not only the incidence of atheism amongst skeptics, but  the nature of the relationship of such beliefs and personal identity.

It is regrettable that one would walk away with the impression that the item presumed the statement it made to be true, and is therefore a biased item.  Perhaps most especially because the claim being made does not fit the criteria of instrument bias. For an instrument (such as a survey, test of knowledge,  psychological assessment, etc.) to be biased, it would limit the capacity of the person completing the inventory to share their true point of view or  demonstrate their true ability. Often bias is discovered when a test or instrument systemically excludes a group or groups from the same performance as  others. Item bias, however, does not mean that a respondent disagreed with an item, it would mean that there was unequal opportunity to disagree (or to  agree) with the item.

Now, there was most certainly bias represented within this instrument, depending on one’s interpretation of what was being measured. For example, as this  survey was being offered through a link on this blog, only readers of this blog would have come across it, making so that the available pool of participants  be the relatively small (at least when compared with national surveys) group of individuals reading the blog regularly. More so, as it was completely  voluntary, only those individuals who then chose to submit the form, after already being in the the select group with access to the link, would contribute to  the data. Therefore, from the start the survey was targeting a convenience sample, and then suffered potential additional nonresponse bias from the number of  individuals who chose not to participate.

Bias in design refers to, as stated earlier, items (or the instrument on the whole) that systemically excludes a group or groups in some way. My favorite  example of this is comically presented on The Colbert Report when Stephen Colbert interviews members from the House of Representatives. He would ask each  member if George Bush was a ‘great’ president or ‘the greatest’ president. This is an example of such bias because it excludes people who believe George Bush  to be something other than ‘great’ or ‘the greatest’ from answering honestly. Now, this is an extreme example of this, but to demonstrate it another way,  take the example below:

Example #1

Now, I actually saw this as a HW question given by a former colleague of mine. The task was to tabulate data (provided separately) and summarize the data by  answering these questions. Now, answering the first 2 questions seems simple enough, as students were provided data about the sex of the respondent and their  sandwich preferences. However, a student would need to know what a reform Jew is, and what it means to keep Kosher in order to answer the third part of the  assignment. A student might be quite able to examine cross tabulation summary, but without this prior background knowledge, they would be unable to perform  the remainder of this assignment successfully. Had the course offered both instruction in statistical analysis of data AND common practices of different sects  within Judaism, then this question would not be inappropriate. For example, I recently asked my students on a midterm exam the following question:

GS  image

Now, whether our readers would be able to answer this question or not does not make the item biased. Given that this was an item designed to measure if my  students had learned the difference between simple and multiple regression (a topic I lectured on, and had scheduled readings for) made the item both  appropriate and necessary.

Designing items for an instrument, one needs to take into consideration the ability of each item to accurately measure the construct under investigation, within the population for which it is intended. Biased items would include those not having sufficient categories to satisfy all possible points of view (ex:  Select your religion: A) Christian B) Catholic C) Protestant) or having items that require specialized knowledge (and not knowledge expected within your  population under investigation) in order to participate fully (ex: Do you agree that in the world series, substitutions may be made while the ball is  alive?).

Likert items are usually a safe way to go, but can be polarizing. The very nature of a likert type item is that the item makes an affirmative statement and  allows the participant to agree or disagree with the statement along a 5 point scale (extremes being defined by 1 and 5). In order to assess attitudes, beliefs opinions and the like. Likert items are often necessary to measure what points of view a respondent would agree or disagree with, and from then on, what relationships one can observe amongst those who agree of disagree with certain items. The claim that the likert item on this survey, “Skepticism is the same as Atheism” is a biased one is not entirely valid on the grounds that it presumes  skepticism IS the same thing as atheism, or that a skeptic must be an atheist. The item does not presume anything, it offers respondents the ability to agree  or disagree with it. That a person would disagree with the item does not mean the item prevented them from demonstrating their point of view, their task would  have been to ensure their point of view was heard by responding in disagreement.

It is true the item could have been phrased in different ways. Perhaps it could have been phrased “Skepticism is not the same thing as atheism”, but perhaps  then the 12% of respondents who agreed with the item would have claimed it to be exclusionary. It might have also been phrased as a question, “Do you believe  skepticism is the same thing as atheism? Yes, No or Maybe.” Honestly, any of these options, including the design I decided upon, should have elicited the  same results, because they are all measuring the same thing. No one, in each case, would have been prevented one from sharing their true point of view on the  matter.

There are such things as wording effects, take for example the following two Likert items (1-strongly disagree — 5-strongly agree)

“Homosexuals should have the equal rights to marriage”

“Homosexuals should be able to get married”

Upon examination, both items would appear the same, and probably if someone were to agree with the first, they should likely also agree with the second.  However, notice the use of the phrase ‘equal rights’ in the first example. This is semi-loaded, and one might find person’s agreeing with the first and  disagreeing with the second because it is probably harder to disagree that a group should have equal rights than it would be to disagree that people should ‘be able to get married’.

Admittedly, loaded items are a bit more challenging to avoid in instrument design. Having designed several instruments myself, and having consulted on  several my students have designed, it does happen when I review an item that previously sounded fair and reasonable, I find it only sounded as such because I  agreed with the item (or because I strongly disagreed with the item, and my efforts to present an oppositional point of view resulted in unintentional  wording effects). It is possible that my decision to phrase the atheism item as I had was motivated by my disagreement with it, to ensure the allowance of an  alternative point of view.

In the end, I do think the word bias is tossed around a bit too often when the appropriate criteria is not sufficiently met. For a survey to be biased, then  it should be demonstrable that items prevent an accurate representation of the population being investigated. I should hope we all ensure our voices are  heard in polls, surveys and other forms of research, especially in cases when we disagree with a viewpoint being presented. If we walk away because we (dare  I say falsely?) believe presumptions are being made, the resulting data is contaminated (and potentially skewed) by such exclusion of participation.

Share and Enjoy:
  • Facebook
  • RSS
  • del.icio.us
  • MySpace
  • Reddit
  • StumbleUpon
  • LinkedIn
  • Tumblr
  • Twitter

1 comment to Gotham Skeptic Survey Results, Part 2 of 3: Bias in Testing

  • Hi, Lisa. Thanks for the explanation.

    I wasn’t the one complaining about a specific question, but I noted what I perceived as bias in the phrasing of the questions in general. Just looking at the “skepticism == atheism” point for a moment, though, consider the first three Likert questions (after the general demographics; numbers are mine, and the survey questions are not numbered):

    1. Skepticism is the same as athiesm. [sic]

    2. Being a skeptics [sic] means you don’t believe in anything.

    3. It is impossible to be a skeptic and believe in a God.

    You have the first three questions that are all in the same vein, and no questions that directly explore a different angle or other nuances. The survey finishes with another such question, leaving the subject with a final thought (and perhaps affecting the subject’s review and revision):

    16. Being a skeptic means you don’t believe in god and accept the theory of evolution as scientific fact.

    Now, while it’s true that none of these questions prevent the subject from giving an answer that faithfully (excuse the term) reflects her thoughts on this, neither do the questions explore any middle ground or alternative thoughts. There’s also such a thing as a leading question, and putting these three together at the beginning of the survey arguably has the effect of leading the respondent — as we saw with the commenter who chose not to do the survey, rather than to be led.

    There are questions in the middle of the survey (the ones about the “forces in the universe that science cannot explain”) that start to go in a more exploratory direction, but it’s possible (perhaps likely) that the first three questions have already biased the answers to these (“Gee, maybe I’m not really a skeptic if I believe in this. Hm…”).

    To claim an unbiased survey, it’s not sufficient to show that each question, individually, can be answered truthfully by all subjects. First, that doesn’t show that each question, individually, lends no bias to the survey (only that none lends a specific kind of bias), and second, it doesn’t show what bias the set of questions, taken together, gives.

    That said, it has to be true that essentially any survey is biased, because the choice of wording for the questions, the grouping and order of the questions, and the manner and context in which the survey is given all introduce bias. One can decide which kinds of bias one most cares to avoid, and one can design the survey to avoid those, but other bias will still be there, and will likely be exacerbated by the avoidance of the first. It’s like the slapstick routine where the dead guy’s leg is sticking up in the air, and pushing that leg down makes an arm stick up instead.

Like us? Support Us!

CHOOSE MEMBERSHIP LEVEL


Archives