by clicking the arrows at the side of the page, or by using the toolbar.
by clicking anywhere on the page.
by dragging the page around when zoomed in.
by clicking anywhere on the page when zoomed in.
web sites or send emails by clicking on hyperlinks.
Email this page to a friend
Search this issue
Index - jump to page or section
Archive - view past issues
Click here to view past issues
Campus News : July 2011
2 Campus News July 2011 University of Wollongong www.uow.edu.au OPINION Measurement in the social sciences has gone off the rails due to an overemphasis on statistics. A new and non-statistical approach is needed. Measurement in all sciences consists of three stages: construct definition, measure selection, and the obtaining of observations or scores from the measure. The dominant approach in social sciences such as sociology, social psychology, occupational psychology and management is known as psychometrics, but the psychometric approach covers only the latter two stages, dealing with the purely statistical relationship between the measure and its scores. The vital relationship between the first two stages, in which the content of the definition must be closely translated into the content of the measure, is either carelessly done or ignored altogether. Instead, psychometricians rush to produce impressive statistical evidence as proof that the measure is valid. Statistical validity has largely replaced basic content validity in social science measurement. It can be shown that insufficient care taken with establishing the content validity of the measure will result in serious mismeasurement of social science constructs. Mismeasurement is masked by the impressive though often much-massaged statistical evidence accompanying the measure, which leads other researchers to adopt it. Journal editors then publish misleading findings based on measures that are not in the first place highly content-valid. A societally important example of mismeasurement is the construct of social class. One's social class membership is a major predictor of mental health, physical health, and more mundanely, retail shopping behaviour. A household's (or an individual's) social class category can only be fully validly measured qualitatively. Valid measurement of social class requires in-home interviews by an expert sociological interviewer who must not only ask lots of questions about the head or heads of household's modes of work and living but also assess the prestige value of the residential area and of the householder's visible possessions. These include furnishings, books, art objects, work clothes and leisure clothes, and of course the car or cars that they drive. This is all too difficult, time-consuming, and inconvenient for social science researchers. They have therefore abandoned the construct of social class and replaced it with a construct called "socioeconomic status", which is measured conveniently but inadequately by the single variable of occupational status. The Australian Bureau of Statistics, for example, employs an occupational status scale developed by researchers at ANU as the single measure of what is supposed to be not only one's economic wealth rank in Australia but also one's social standing in the community. Occupational status, however, is a very poor measure of socioeconomic status and an even less adequate measure of social class. These days, we have plumbers and electricians earning more than many of those with university degrees. Also, the swelling of modern occupations in information technology and service industries has blurred the old status divisions, a blurring made worse by the inflated and politically correct job titles that most people are allowed to use. A second example of mismeasurement concerns people's moral, social, and political values. The last Coalition Government tried to include in its infamous Citizenship Test statements of "Australian values" to which prospective immigrants were going to have to pledge their agreement. These statements were supposed to comprehensively represent measures of uniquely Australian values. This was a hopeless exercise from the start. How, for example, can one adequately define, in a single statement, complex Australian values like "mateship", or "not taking yourself too seriously and coming off as a "tall poppy", or the "just-within-the-law larrikinism" that we much admire. Questionnaire items to capture such subtle beliefs can be validly written only by first doing thorough qualitative research, conducted and analysed by an expert. These questionnaire items certainly cannot be devised by politicians, and neither can they be validly developed by psychometricians. Another example of mismeasurement is in the organisational psychology area of job satisfaction. It was recently reported in The Australian that only about a third of Australian workers, 36 percent, were "happy" at work. This low incidence contrasts sharply with the very high incidence of Australian workers reporting that they are "satisfied" with their job. For instance, a national survey conducted by ANU's Research School of Social Sciences, admittedly almost a decade ago in 2003, found that nearly all Australian workers, 89 percent, gave a positive rating on job satisfaction, above the midpoint of 5 on an 11-point scale where 0 = "extremely dissatisfied" and 10 = "extremely satisfied". The key question here is really a theoretical one: is it happiness at work or satisfaction with one's job that most determines productivity, promotion, and job tenure? What is clear is that the occupational psychology researcher certainly cannot regard job satisfaction as being the same as happiness at work. The researcher cannot get away with measuring, as in the above studies, just one of these constructs. The conventional psychometrics approach has also led to the problem of employing too many items to measure basic or "concrete" constructs. Psychometricians have long argued that any one item is inherently fraught with "measurement error", although they never specify what this is. Multiple items, however, will inevitably include items that stray from the original construct. A psychometrician would attempt to measure how much you like or dislike Weet-Bix, for example, by not only asking you to rate how much you like or dislike it, which is a perfectly adequate single-item measure, but also how unpleasant or pleasant you believe Weet-Bix to be and how bad or good you think it is. These last two items are extraneous and off-base. Pleasantness is not a good synonym for likeability, and goodness, when coupled with badness, implies an entirely different judgment regarding the nutritional value of the product. Averaging in the scores from the two unnecessary items with the single valid item will necessarily worsen the accuracy of the total multiple-item score. High content validity of the measure pertains not only to item content but also to the content of the answer scale. By far the most popular answer scale or rating scale in the social sciences is known as the Likert scale. This is typically a 7-category bipolar answer scale ranging from "strongly disagree" at one end, through "neither" in the middle, to "strongly agree" at the other end. However, the Likert answer scale has very poor content validity. It is not at all clear what disagreeing with the item statement implies. Take, for example, someone who, when presented with the following item in opinion poll, "strongly disagrees" that "the NSW Liberal Party is well prepared to govern the state". Does strong disagreement mean that the person is trying to indicate that he or she believes that the Liberal Party is "prepared" but not "well prepared"? Or that the now-in- opposition NSW Labor Party was the better prepared? People will of course tick or click on an answer to every Likert scale presented to them. If the answer is on the "disagree" side, the researcher will have no idea what the answer means. Many respondents, in fact, opt for the middle answer category of "neither", which also is ambiguous because it could mean "neutral" or something quite different, namely, "I don't know". These content-validity problems with current measures of social science constructs are hidden in the statistics. I have spent the last decade as a research academic and research practitioner coming up with, indeed inventing, a new measurement procedure for the social sciences. This is a six-step procedure called by the acronym C-OAR-SE (a deliberate pun against the statistical over-refinement inherent in the psychometrics approach). My C-OAR-SE theory argues that the only requirement for a measure is expert-assessed high content validity -- of the items and the answer scale. I plan to spend the rest of my academic career re-doing some of the major studies in social psychology, such as those mentioned in this article, with properly valid new measures. JR * John Rossiter is a research professor in the Institute for Innovation in Business and Social Research at the University of Wollongong. His new book, Measurement of the Social Sciences, has been published this year by Springer. Measuring the Social Sciences accurately By Professor John Rossiter*