Why learning lessons from PISA is as hard as predicting who will win a football match
by David Spiegelhalter, Winton Professor for the Public Understanding of Risk, University of Cambridge
I am a statistician and I love bucket-loads of data. So I jumped at the chance to make a BBC Radio 4 programme on PISA’s testing of 500,000 school students across 65 countries. But after talking to many people and examining past PISA reports and the new data released on December 3rd 2013, I have been left with fairly serious concerns about their use of statistical evidence.
I should first say that I admire PISA’s organisation and thoroughness, and in particular I like the style of maths questions used in the 2012 tests. These focus on problem-solving and require skills in number, magnitude, length, area, estimation, reading off graphs, percentages and so on. These are concrete tasks rather than more abstract algebraic work, and it would be great if UK 15-year-olds could take such questions in their stride.
Many people we spoke to said PISA was valuable for within-country comparisons, but there was repeated scepticism about comparing countries. Some of this concerns the ability to translate questions, although this may be less of an issue with the maths items. Here are some issues that arose.
Can we be confident about the numbers?
PISA’s complex and rather opaque statistical methods have been criticised by a range of statisticians, who believe there may be both bias and extra random variability. Each student only does around a third of questions and ‘plausible values’ are imputed for the missing data based on estimates of the difficulties of the questions, and taken from a sample of students. The ‘complete’ data (in fact 5 plausible sets) are then analysed using complex survey techniques to produce the country scores and hence the rankings. PISA says the choice of questions is not important and assumes each question presents the same level of difficulty to all 15-year-olds tested – Professor Svend Kreiner of Copenhagen University has published an analysis claiming that questions are not of fixed difficulty, and the choice of questions can make a huge difference.
PISA does present the uncertainty in the scores and ranks– for example the UK rank in the 65 countries is said to be between 23 and 31 – but I believe that the imputation of plausible values, based on an over-simplistic model and assuming the ‘difficulties’ are fixed, known quantities, will underestimate, to an unknown extent, the appropriate uncertainty in the scores and rankings.
Can we tell why countries differ and change?
PISA tends to draw strong conclusions about what is causing differences and changes in performance, but this is based on retrospective analysis rather than setting up testable hypotheses. Some conclusions, such as better use of resources and increased teachers status, should be uncontroversial. Other findings are valuable, such as identifying that the Asian countries that excel at the maths problem-solving in fact have a rather abstract curriculum: this suggests to me that its their primary education that is giving them a strong sense of number and magnitude.
The Figure reproduced below comes from the PISA Summary report : it is clear that those who did well in 2003 tended to go down (apart from the star Asian contenders), while those that did badly in 2003 tended to go up (correlation is -0.6). This is exactly the pattern expected when much of the influence on the ranking is due to random variation, and is known as ‘regression-to-the-mean’, which reinforces my feeling that the precision of the estimates is not as great as claimed. When this pattern is observed, one should be very cautious about ascribing reasons for changes. While, with hindsight, any pundit can construct a reason why a football team lost a match, it’s not so easy to say what will make them win the next one.
What is PISA measuring anyway?
The additional data from PISA can provide fascinating insights: South Korea is high on attainment but bottom out of 65 countries for the question “I am happy at school”, and star-performers Finland and Estonia are only just above South Korea. Right at the bottom of the league table for “I am satisfied with my school” come Japan, Korea and Macao-China, while for “I enjoy getting good grades”, Japan, Korea and Vietnam are bottom. In contrast the UK fares rather well on these questions.
The crucial issue is that PISA provides performance indicators – no more and no less. It does not measure the functioning or the quality of an education system and, if PISA measures anything, it is the ability to do PISA tests. But the UK health service has shown that aligning policy along a few performance indicators can be damaging – we need to look at the whole picture.
In summary, PISA is a very valuable resource and has a huge amount to offer educational research. But my personal feeling is that PISA is over-confident in their conclusions and there may be some cherry-picking of evidence, particularly of reasons for changes. While international comparisons can inspire fine aspirations, policies should not be imported wholesale without careful testing in the home environment.