Tag Archive | statistics

YouGov Distorts its Latest Election Poll to Get Headlines

With the longest British election campaign in full swing, the pollsters are getting more attention than usual. It is disappointing, then, to see a major polling company falling into bad statistical habits.

Certainty

YouGov has conducted a poll of 1800 people that reverses the Conservative lead of a few days ago, which itself reversed Labour’s lead from the previous poll. You might think that the random seeming changes in results separated by only a few days might, indeed, be random sampling errors, but YouGov is happy to imply otherwise with its headline “Labour Lead at 4%”. That headline has been repeated, without any caveats, by The Sunday Times with a banner headline of “Labour races into 4-point lead after Miliband’s TV success”. But it isn’t true. Read More…

Ofsted Fails to Reach a Grade B in Statistics

Michael Wilshaw’s Ofsted has made a fool of itself yet again as it publishes a report which says more about its naive approach to statistics than it does about the progress of the most able students.

Very Disappointed

Ofsted is not happy. Its 2013 report on the progress of those students who achieved level 5 in their Key Stage 2 exams made some recommendations. Apparently, Ofsted were unhappy then that less than a quarter of those achieving the highest level in Maths and English went on to achieve a B grade or above in their GCSEs, and two years later nothing has improved.

Notwithstanding that expecting that two years is enough time to see improvements when the children involved had been through twelve years of education already, is it a reasonable complaint? Read More…

Wilshaw’s Annual Report Reminds Us of Ofsted Weaknesses

It is the season of heart-warming warming tradition, joy, pre-Christmas sales and crass householders generating enough global warming for a whole town with their shameless lighting displays. So what better way is there to prepare for the holidays than unwrapping the education sector’s annual ticking-off, as Michael Wilshaw issues his Annual Report on Schools?

Wilshaw, forever fighting the urge to tell us how he single-handedly turned his Hackney school into an outstanding beacon of excellence by recruiting middle class students from out of town, this year ripped in to schools with little sixth-forms. Apparently, students attending small school sixth forms “achieve considerably poorer results than those in larger sixth forms”.

Very Small Sixth Forms

Wilshaw said: Read More…

Relax – Latest Health Scare is Just a Scare

The Sunday Telegraph reported this morning that:

“Stroke victims who are admitted to hospital are far more likely to die if they are treated outside central London, an investigation has found. The NHS statistics show survival rates for stroke victims sent to central London hospitals are 54 per cent higher than for those in some parts of the country.”

Fortunately, the ‘Health Correspondent’ Laura Donnelly has got her statistical knickers in a twist. She goes on to write:

“The death rate within 30 days of admission for stroke is 14.6 per cent in the capital’s central sites, according to analysis of the nine years’ data ending 2009 – compared with rates of more than 22 per cent in industrial cities and manufacturing towns”

So, in the poor industrial towns the survival rate is 100% – 22% = 78%. If London is 54% better (154% as good) that makes the survival rate in London 0.78 times 1.54 = 123%. Wow! In London, for every 100 stroke victims taken to hospital, 123 of them survive!

It is the death rate of the industrial towns that is 22% divided by 14.6% = 154%, that is 54% worse. But the vast majority of stroke victims survive, so the survival rate difference is not the same number at all.

Actually, of course the survival rate in London is 100% – 14.6% = 85.4%. The difference in rates is then 85.4% divided by 78% = 109.4%. So London survival is better by a whopping 9.4%. 

Not 54%, then.

Never mind, Laura Donnelly, there will be a real scandal along soon if you wish hard enough. Perhaps an introductory high-school statistics text could be put on your Christmas list this year?

The BMA’s Attack on Smokers with Made-Up ‘Evidence’

The British Medical Association, the Physicians’ trade union, has called for a ban on smoking in private cars, to add to the 2007 restriction on smoking in enclosed ‘public’ places. Their reason is that such a ban is a small price to pay to reduce the exposure of minors carried as passengers. Apparently, the level of toxins detectable in a smoker’s car is 23 times that found in pre-2007 smoky pubs. The Evening Standard published a piece supporting the proposed ban, deciding that “Demanding that people stop driving in a self-generated cloud of poisonous gas doesn’t seem a big ask.”

But it does seem a ‘big ask’.

For objections to this illiberal proposal, the first has to be the good word of the BMA itself, which campaigned heavily before the 2007 ban. It’s representatives sent out to TV and radio stations laughed pompously at the objectors warnings of the slippery slope such a ban would put the UK on: it was only to protect innocent pub-goers, they said, who had no choice in what they breathed in when exerxising their alcohol imbibing rights. Don’t they deserve to be protected?

That alcohol was by far the most hazardous part of a Friday night seemed to have been considered unimportant when smoking was coming to be seen as antisocial and presented an easy target of opportunity.

 The law banning smoking in public places also overlooked that most of the affected locations were, in fact, privately owned, and no person was obliged to visit a smoking pub or restaurant.

For the Children

It turns out that the ridiculed ‘slippery slope’ argiment was spot on, and campaigners are now openly pushing towards an outdoor ban and wispering about the move into homes. Of course, it will be phrased as if a ban was ‘for the children’, but you can be sure it will be a blanket ban that is demanded, as is the case for the current BMA proposal, for ease of enforcement.

Or just because smoker-persocutors are the new witch-finders and are bolstered by the political momentum to go for the criminalisation of tobacco, openly and without embarrassment.

Although I am, a lifelong non-smoker who has benefitted from smoke-free pubs and offices, I have fundamental objections to the BMA’s demand:

First is the liberty one. It is no business of doctors, in my mind, to stop me doing risky things if the risk to others is small. I am not prevented from climbing mountains, paragliding, skating on icy lakes, driving a car at high speeds (with passengers), and drinking alcohol. The BMA has not yet suggested banning these activities, but since they have a habit of sliding down the slippery slope, I am willing to draw the line at banning smoking.

Fake Figures

The second objection is the ’23 times more toxic than the smoky pub’ figure that is being quoted withoud attribution. The Today program speaker explained that this was the case even when no smoking was actually happening since the ‘toxins’ soaked the inside of the car. Is this serious? Just what was measured? And did it really equate to 23 times more hazardous than a pub atmosphere? It seems remarkably suspicious, and the BMA today withdrew its claim (See the Factcheck site for a debunking of this media myth, with sources.)

The argument for a wholesale ban on in-car smoking is a practical one: that it will be easier to enforce ‘for the children’ if everyone was banned.  This will include people without children, and people who will never carry children in their cars. In a spirit of tolerance, only dangerous activities should be prohibited – anything else is just illiberal. Why ban people from smoking in their own car just to make enforcement easier? (Foreign laws against smoking in cars only forbid smoking with a child in the car.) It will clearly result in the bulk of prosecutions being of drivers of cars with no children, since they are the ones who will most disagree with such a law.

Slippery Slope

The final criticism is made with the slippery-slope argument. Usually a poor excuse for logic, the current debate has used the existing ban on smoking in work vehicles and public spaces as justification. One can easily imagine a future when the argument move on to be that the only place a child can breathe smoke is in the home, so that it is a dangerous loophole that should be closed.

Intolerance breeds intolerance by emboldening the nannys that build on victory after victory, incrementally moving us from the traditional character of the country, where a free person can partake of anything not specifically banned, to the continental tradition where you can only do what is specifically prescribed, with everything else forbidden. 

The BMA took a fake fact about smoking with children in the car and made it into an attack on all smokers. Anyone who values the freedom to choose what risks to take for themselves will be wise to protest this move – if this moves onto the statute book, the nanny campaigners will not stop there in their programme to save us from ourselves.

Passive Smoking, Active Disinformation

Much as I enjoy smoke free pubs and restaurants, I always took the view that I had a free choice of where to go of an evening if I wanted to avoid cigarette smoke. Admittedly, there were few locations that banned smoking, but that was a commercial decision of the proprietors.

Key for those who see the ban on smoking in enclosed public spaces as a small start towards banning smoking anywhere is any evidence that smoking in private homes and cars impinges on the rights of powerless third parties. So the news that passive smoking increases the risk of still-births by a whopping 13% and that of birth malformations by 23% was reported widely.

The BBC news site quoted the press release freely:

“Fathers-to-be should stop smoking to protect their unborn child from the risk of stillbirth or birth defects, scientists say. They looked at 19 previous studies from around the world.
A UK expert said it was ‘vital’ women knew the risks of second-hand smoke.”

Vital that women knew the risks? So what are the risks? The paper (abstract) was not primary research, but combined data from multiple studies, which sounds good. But most of the studies were either of poor quality or did not address the desired health outcomes. In the end, it came down to 19 studies with four separate outcome measures. Two of them, the risk of miscarriage and the risk of perinatal or neonatal death, came out negative: no increased risk. The other two came out with the 13% (4 studies) and 23% (7 studies) increased risk.

So, the news reports could have started with headlines of “Passive Smoking Does Not Cause Miscarriage” or “New Study Produces Contradictory Results”, or even “We’re Trying Really Hard But We Still Can’t Prove Passive Smoking is Particularly Dangerous”. Although I can’t imagine researchers from the UK Tobacco Control research Network policy advocacy group pushing that last one!

Statistical Significance

When researchers attribute risks to particular behaviours, they calculate not only the best estimate of the increased risk (eg an odds ratio of 1.13, or an increase of 13%), but also the high and low limits within which they are confident that the ‘true’ risk lies. Any measurement will have uncertainties, and to be confident that a risk is real it must be repeatable: that is, doing the whole study again will produce the same result.

Obviously, you can’t wait until the next study before you publish, so you use the mathematics of chance to see the results might have been if thing had gone slightly differently during the study. The outcome, then, is not a ‘best’ figure, but a ‘confidence interval’ which the ‘true’ result would be within 95% of the time. (or outside the range 5% of the time).

Confidence Tricks

The study found that two of the outcomes had confidence intervals that started below an odds ratio of 1. That is, there is a real chance that there was no risk at all, even though the ‘best’ figure was higher. So the results are dismissed as not significant.

What of the other two? Stillbirth came out as 1.01 – 1.26 (middle value 1.13), with malformations as 1.09 – 1.38 (middle value 1.23). So, even without a further look, stillbirths could be increased by perhaps 1%, or as much as 26%. We can’t tell which, but we can tell that presenting 13% as the figure is misleading.

But it is worse that that. The researchers looked at many outcomes and picked out to publicise the ones which had the wanted results, which makes it far more likely that you will find significance in your results. As an example, let’s say that you roll four dice. The chance that any one of them will come up a six is 1/6 (or 17%), but the chance that at least one of the four will come up a six much greater at 48%.

For the researchers to be confident that their overall result of, say, ‘passive smoking causes harm to unborm babies’, an allowance must be made (eg the Bonferroni correction) for each of these multiple comparisons to get the overall confidence back up to 95%. For four tests as here, the intervals should be increased by the factor of around 1.27, so they become:

Still birth relative risk: 0.98 – 1.30
Congnital malformation relative risk: 0.91 – 1.40

Note that both now include the relative risk (ie no risk) in the range. On this test, none of the outcomes is significant.

Two Bites at the Cherry

The upshot is this. If you use statistical arguments to judge outcomes, you should know that the more measurements you make the more likely you are to come up with spurious results, so you should make allowances for it.

The headline should have been, at best, “Our Research Was Too Underpowered to be Sure of Anything, but it is Worth Asking for More Funding“.

Unlikely to be reported in the papers, but honest.

The Double Standards of Professional Contrarians

I normally avoid leaving comments on online newspaper articles as I don’t enjoy the anonymous behaviour of participants: rudeness, ignorance and unwillingness to engage in proper debate. But I did get stuck in to one of James Delingpole’s Telegraph Blog entries. (My spell-checker wants to replace ‘Delingpole‘ with ‘Delinquent‘. I’m tempted.)

Delingpole seriously embarrassed himself in the BBC’s Horizon programme Science Under Attack when he debated climate change with Nurse, a Nobel Prize winning scientist. Specifically, Delingpole described his climate change ‘journalism’ as interpreting interpretation: he didn’t read scientific papers, not even the abstracts.

More specifically, he has found a few people who share his biases and then uses their writings as evidence for his own opinions, as they use his to buttress theirs. ‘Science’ is a word used often, but the scientific method seems to be unknown to them as they resort to rhetoric instead. It seems that winning an ill-natured argument is far more important to them than actually being right. (They fervently believe they are right, of course, though they make no effort to develop secure lines of reasoning, relying on the whole list of pseudo-science techniques described here.)

The comments on the blog entries are even less nuanced, as they don’t even try to use rhetorical tricks and deceptions. If you have ever had so little going on in your life that you feel able to interact with the low-lifes that inhabit these sites, then you may skip to the end.

But this is the nature of argument from those that worship the self-important journalists such as Delingpole. Insults are the order of the day: anonymous posters are just rude. If you come up with a good argument, data that disproves a statement or even just try to act as a moderating influence, then expect to get flamed.

Ignore reasoned arguments

Tell the poster that their sort of person makes you sick and you can’t believe how much they wriggle and squirm in a proper debate. Tell them how thin skinned they are. If you are lucky, they will be distracted by your bilge and not notice that you had no answer to their line of argument.

Consensus Plays No Part in Science

If anyone has the front to point out that the specialists in the field are virtually unanimous in their judgements, so you are likely to be mistaken, bang on about the ‘fact’ that consensus plays no part in science. This is a great move, since you can act as an expert in your own right at the same time as denying real experts know anything about the reality of the science. It is, of course, nonsense. Science does not have authorities that pass judgement on theories when there is disagreement. The only way for tentative theories to enter the canon of accepted principles is for them to be debated back and forth along with the data in journals and at conferences, until everyone has had their objections answered and consensus is reached. Far from ‘consensus plays no part in science’, a lack of consensus is fatal to the progression of a scientific theory. Consensus is the only way in science.

Apply Different Standards of Evidence to Opponents

Appear to carefully pick apart statistical inferences with which you disagree, then slip in a non-sequitur based on an absence of evidence. For instance, challenge the last fifty years of warming by selecting your data from one of the regularly occurring decades where the warming slows or stops for a few years, say that there is no statistically significant warming. If there is warming, pick a new start year that is especially warm and try again to fit a negative gradient. Ignore the fact that the correlation is very weak (r=0.1) and insignificant. Try the line that since warming is not proven, so cooling must be happening. And add an insult as a diversion so no-one notices the sleight of hand.

Libel the Experts

Repeatedly point out that some of the experts are actually computer modellers, chemists or physicists, not ‘climate experts’, and make claims that they are in the pay of large governmental and NGO conspiracies. Refer to your own sources as ‘renowned climate experts’, even if they are retired engineers or computer modellers. (‘Renowned’ is the give-away term, as no reputable scientists refer to anyone as renowned.)

Quote Your Own Consensus

Quote a big, long list of scientists who signed up to an online statement supporting your view, but don’t worry if none of them are actually working in a related field of study. As long as they give academic titles and put PhD after their name, they are scientists, right? And don’t call it a consensus, as you have already claimed that consensus is not part of science.

Hide Contrary Views

To force recent posts that challenged your statements off the bottom of the first page, find a contrarian web site and cut and paste large chunks of it into your posts. This has the bonus of not requiring any thought whatsoever on your part. When the offending posts have disappeared, you can repeat what you wrote before, secure in the knowledge that new readers will not see that there are good reasons not to trust what you say.

The Lesson

This was the first time I tried to sustain interest in a blog comments section for a couple of days, and there were over a thousand posts in that time (some commenters seemed to post continually day and night – didn’t their mothers tell them to come up out of the basement and go to bed?)

I tried to direct arguments towards a discussion of evidence, towards an understanding of the statistical limits of certainty, towards the problematical bias of picking an opinion and searching out individuals who support that idea instead of dispassionately assessing opinions and evidence in the round. But it was for naught.

Delingpole told Paul Nurse in the Horizon programme that he didn’t read proper research papers, because peer-to-peer review (clever, huh!) was an improvement on peer-review because it allowed journalists and anyone with an interest to get stuck in.

And he said it with a straight face!

Election Poll News: No Change Really

Looking through last week’s papers (our paper delivery boy made it through all the snow disruption last month, but seems to have forgotten today now the weather’s fine and the sky is blue) I found a story I had missed.

The small front page story started with

“Gordon Brown has insisted Labour could still win the general election outright as another poll showed the Tory lead narrowing.

Research by ICM for the Sunday Telegraph put David Cameron’s party down one point since last month on 39%.”

I know it might indicate something about my personality type, but the story irritated me. Down one point, with a sample size of a measly thousand?

Now, results of course vary from sample to sample in a predictably random way, which puts a limit on the reliability of any judgements made from the data from just one sample. But how much? Trust the data to within 0.1%? Or 10%?

Here’s the maths bit — skip to the next paragraph if it’s not your thing.

If the poll results over time can be represented approximated by a Poisson distribution (a reasonable assumption), then the variance of the number of people preferring one party is equal to the mean number of people choosing that category. For opinion polls, we don’t know this mean, but it is close to the reported figure, i.e 39% of 1002 in the sample. With this variance, we can be about 95% sure that the true mean number of people in repeated samples choosing Tory would be 390 (40%) plus or minus twice the square root of the variance (about 40 votes, or 4%).

And the result is:

So the real result is “The Conservatives polled between 35% and 43%, which is consistent with no change at all.” OK, not a good headline, but even newspapers have an obligation to at least try to be right. I’m sure papers used to put this sort of information at the foot or the article (where hardly anyone would see it), even if the article writer ignored such a basic check. But to have every outlet from newspaper to the TV news run a similar story is laziness.

Every newsroom must have someone with enough maths skill to do this right, haven’t they?

Mass Produced Target Grades

Sixth form students will by now have dragged themselves through the January exam series. They can relax until scores are released in March, when most will be judged according to their college ‘target grades’. And it is likely to be a miserable experience for most.

I used to talk to my students and get to know their individual strengths and weaknesses. I would encourage those who I perceived were studying hard, and chide those who were just attending class without the necessary intellectual engagement. Reports to parents and managers were based on my professional opinion of each child. But not any more.

Teachers still get to know each of their charges, but their professional judgements are now routinely tempered by the knowledge that performance against their grade target trumps all other information.

Value Added

Target grades are now the ubiquitous tool of comparative assessment in schools: Key Stage 3 results are used to predict GCSE grades, while GCSE grade averages are used to compute the most likely grade a student might achieve at A Level. This is a very good process for working out if the school is doing a good job, since if a year group cohort gains a mean score above the mean predicted grade, then the group has learned more than could reasonable been expected. The school thus has recorded some value added, in the language of education.

Using the same data for individual teachers is only likely to be reliable over a period of several years, since the sample sizes from individual classes are much smaller, leading to more variation from year to year.

Blinded by Numbers

The big problem stems from applying these statistics to individual students. It is very easy to calculate an expected grade from a single child’s previous achievement, but with a sample size of just one, the precision is poor. The reliability stemming from a cohort in the hundreds is lost, and the prediction is routinely in error by a whole grade or so. (See my post Physics Exams Too Easy, Says Ofqual

Now, this would not be a problem if these figures were just another piece of the puzzle to be understood by the teacher, but OFSTED, the government overlord of teaching standards, thinks students should know these rough predictions, and be challenged to achieve them. And leaned on if they don’t come up to scratch.

Once upon a time, I got to know my own students, and made judgements as to their individual abilities and potentials, and assessed their effort accordingly. Not perfect, but at least both teacher and student were in the loop.

Forget the Child – Press the Button and Set That Target

Now, each student is given a grade to achieve by the end of a two year course, during which they will mature and develop. If they are very lucky, they will get several target grades which take into account the historical difficulties of each subject they are studying. If not, as is happening more commonly now, they will get a single grade to span the range from Photography and Media to Chemistry and Maths. And to make the target aspirational, a grade will be added to ensure that only a quarter of students will be able to meet their targets, with poor reports and disciplinary procedures for those souls unlucky enough to keep missing impossible targets.

Advantages

  • Simple and cheap to operate.
  • Keeps OFSTED happy.

Disadvantages

  • No educational merit.
  • Can turn keen students into serial target-missers.

An open and shut case for school managers. Shame about the children.

Claims of HIV Vaccine Success are Premature

HIV vaccine
The excitement is palpable — the vaccine that nobody thought would work “appeared to lower the rate of HIV infection by 31.2 percent compared to placebo ”, according to the press release, although printing all three significant figures sets my inner sceptic on edge.

The BBC story improved things marginally, reporting a rounded percentage:

“Scientists announced last month that a combination of vaccines gave a 31% level of protection in trials among 16,000 heterosexuals aged 18-30.
Doubts had been raised about whether the finding was significant.
But new data published at a conference in Paris indicates that, while small scale, the findings are robust and statistically significant.”

Exciting, but while statistically significant sounds very scientific and reliable (the results were published in the New England Journal of Medicine (NEJM) no less), the journalists should have read the report itself. The figures reveal a little sleight of hand.

The study randomised over 16 000 people to either the vaccine program or a placebo, with none of the participants knowing what they had (a double blind trial — the best sort). The randomisation here is key, as the study was rather underpowered and was only likely to produce a marginal result at best, and any deviation from this randomisation may have introduced biases that were hard to spot.

The researchers actually carried out three statistical tests on the data from the trial: intention to treat (ITT), per-protocol and modified intention to treat (mITT) analyses. The first two look at those participants who were enrolled (ITT) or completed the treatments (per-protocol), and so preserve the randomisation of patients. These both failed to show a statistically significant benefit from the vaccine.

The mITT process removed several people from the analysis, both breaking the randomisation and producing a statistically significant result. This might have been useful if the vaccine had a clear benefit, but the published benefit was not 31.2% exactly. Rather the confidence interval for the benefit (the range in which the researchers were confident that the true benefit figure lay) was 1%-52%, with the lower bound only staying positive because of the arbitrary choice of 95% confidence intervals chosen by statisticians over the years.

With the two tests that avoided the possibility of bias getting ignored by the press (since they didn’t make the press release), and the remaining result showing that the vaccine could have had no actual benefit, the publicity seems a little unjustified.