I am often amused and sometimes alarmed by the way data and statistics are handled by firms, public officials and particularly the media. The erroneous use of data is either at best a result of ignorance on behalf of these people or at worst a deliberate misuse to manipulate public opinion.
One area that exemplifies this misuse is the subject of the road toll. Let me concoct an example to demonstrate what I mean.
In the mythical state of Dystopia the annual death from road accidents over the last ten years has been as follows:
99/00 00/01 01/02 02/03 03/04 04/05 05/06 06/07 07/08 08/09
321 346 307 339 341 329 348 328 336 345
Over this period of time the average road toll has been 334 with a standard deviation of 12.9. (If you are not familiar with statistics, standard deviation measures the dispersal of the values over the range of data. But don’t worry about it. I will give an example for you later of the impact of this measure.)
In 2003/04, in the face of a rising road toll for three years, the Minister for Police in a fit of high moral dudgeon and ministerial paternalism accused the motorists of Dystopia of being careless drivers and subject to speeding and drink driving. As a result he determined to mount a serious campaign to stop this aberrant behaviour and, hopefully, save the lives of a few citizens.
At the end of the next year he trumpeted the success of his intervention. Apparently, his acute understanding of the problem had to be recognised because there were 12 fewer deaths. He had obviously found the solution and more of the same would create even better outcomes. It became then very difficult to explain the increasing deaths in the next two years.
At the end of 2005/06 the Government was thrown out of office. The new police minister was more concerned with the rising number of “break and enters” in the cities of Dystopia and therefore took the focus off the road toll and put more resources into policing suburbia. Yet strangely the road toll fell dramatically. What was going on here?
Well, as you would expect intuitively, just because the average road toll is 334, it is still very unlikely, because of chance (reflected by the dispersal of data as measured by the standard deviation) that a particular year’s outcome will come out as exactly 334. All that we know is, that over an extended period, the outcome will tend to that average – some years giving higher outcomes and some years giving lower. If the Minister had taken no action, all things being equal, he had a far better than even chance (in fact slightly more than a 70% chance) that the road toll would have fallen. That is because the figure for that year was well above the mean.
The standard deviation measure tells us that there is a 68% likelihood that the road toll will fall between 321 and 347. It is very difficult to determine the impact of the Minister’s intervention in the face of this probabilistic distribution. So the road toll goes up a few percent and the Minister regales us for being poor drivers and it goes down a few percent and he tells us how marvellously effective his interventions have been. More than likely the natural variability in the data has had more impact than anything he might have done.
(Although for this simple example I have assumed a normal distribution of road deaths this is not quite the case. Because of population growth the Minister should expect some increase and if for example, the economy is improving and the ownership of cars per capita is increasing that would contribute to an increase as well.)
When you get a result well below the median then the next set of data is likely to be higher. When you get a result that is way above the median the next set is probably going to be lower. I read somewhere about someone training pilots to land aircraft. He had come to the conclusion that the best response was to castigate those who landed badly, because when he did this their next attempts were invariably better. He had tried praising those who landed well, but very often their next attempt was worse. Therefore he had come to the conclusion that punishing bad performance was more effective than rewarding good performance. But his strategy again was overtaken by probabilities and the causation he implied was, at best, very dubious!
Another trick often resorted to by the media and those with great vested interests in increasing fear amongst citizens (for example, drug companies) is to resort to relative statistics without revealing the absolute base. You might read for example, that the effect of even smoking one joint of marijuana is to increase your risk of succumbing to schizophrenia by 40%. Although not wanting to promote such behaviour, I should still tell you what they don’t tell you. The actual risk of being schizophrenic is something in the order of one percent. Consequently the elevated risk is still very minor.
As someone said, “Fear sells.” Therefore there are always players wanting to exaggerate risk (for commercial benefits).
Finally, one of the most common mistakes with statistics is to imply causation where there is correlation. As per the above example, if Marijuana usage seems to correlate with the risk of increased schizophrenia, our natural inclination is to assume a causal connection. And of course this is not always the case.
I can remember as a student, a lecturer telling us that a strong correlation existed in the UK between the consumption of bananas and the birth rate. Did this imply a causal relation? Well no! It transpires that in the UK bananas are a luxury item. Consequently when the economy is doing well people consume more bananas. As well when people are more optimistic about economic outcomes they tend to have more children. Thus both outcomes (the consumption of bananas and the propensity to have more children) are related to the economic conditions of the time. Those expecting that their fertility might be increased by the consumption of bananas would be greatly disappointed!
This is then just a precautionary tale to warn you that you must take great care both with the use of statistics and the interpretation of statistics used by others. Look to the source. The media will inevitably try to sensationalise things or drug companies (and lawyers, insurance companies, security firms etc.) will want to make you afraid and thus inveigle you to use their products. Politicians are mainly ignorant of statistical processes and will be prone to use dubious figures to have you re-elect them! They are, however, pretty well across the probability of that outcome!