4 MAY 2020

An important new dataset emerged a few weeks ago: the Oxford COVID-19 Government Response Tracker, or OxCGRT. Every day, variables on the types of COVID-19 measures governments are introducing, continuing or discontinuing are updated. This important but painstaking work is done by around 100 analysts. This first graph illustrates the values of the overall restrictions stringency indicator developed by the Oxford group. This indicator covers movement restrictions in eight areas: schools, workplaces, public events, gatherings, public transport, home confinement, travel within the country, and international travel. Here I show South Africa's values, and the median daily values across 65 other countries which by 1 May had experienced 51 or more days of the pandemic - by 1 May South Africa's experience came to 51 days. My starting point for each country is the day the fifth Covid-19 case was found. Cases, as opposed to deaths, are in many ways a poor basis for comparing countries, because different countries miss different proportions of infected people, as testing systems vary so enormously. However, as an indicator of when Covid-19 obtained a foothold in a country, the day of the fifth case seems like a reasonable standard to use. What does the graph tell us? Clearly, it confirms the claim one often hears that South Africa has been particularly quick and stringent when it comes to imposing movement restrictions.

The OxCGRT dataset can be downloaded at link here.

A couple of weeks back I got a Stellenbosch Working Paper out, where I present an initial analysis of the data. Of course, so many things are changing that if I redid the analysis now, some things may look a bit different. Since I wrote the paper, more variables have been added to the OxCGRT data, and the formula for the overall stringency indicator has been improved. My paper is at link here. I'll definitely be doing more analysis of this kind. Comparing ourselves internationally is one way of grounding in evidence the debates about the awful trade-offs between reducing infections and saving livelihoods.

This second graph is similar to graphs I've posted previously. The OxCGRT dataset includes WHO values on cases and deaths per country. 'Other country' here refers to the other 65 countries.

10 APRIL 2020


This first graph is one that is often generated, but here there's a special focus on where South Africa fits in. Of the 90 countries, our situation is not that unusual. The data are from the ECDC: link here. The Stata code for automating the graphs is at link here.

This is a graph which I've not seen anywhere else. Deaths relative to total population is sometimes reported, yet COVID-19 deaths relative to how many people generally die in the country each year is arguably more informative, as this takes into account age distribution, but also reflects the country's and the health system's general capacity to deal with death. Here only 88 countries are covered, due to some missing values in the denominator (deaths per year in a recent year). The denominator data I used can be found here: link here.

Like the previous graph, but here South Africa is not highlighted. What is interesting when one looks at COVID-19 deaths relative to deaths in general is that smaller countries which have received limited coverage in the news, such as Belgium (BEL) and Netherlands (NLD), come to the fore. The highest value, that of Spain (ESP), translates into 3% of that country's annual deaths. China's values are exceptionally low, because infections were largely limited to just one part of that country's enormous population.

This graph is not easy to interpret, but it seems interesting. Sometimes one sees deaths divided by cases to produce a COVID-19 mortality rate. But this has been strongly criticised, as so many cases go undetected. Here the reverse happens: cases is divided by deaths. I've cautiously assumed that in general a higher ratio reflects a better willingness and capacity to test. I could have used lagged cases values as death would generally occur many days after detection of the virus, but I didn't want to complicate the graph too much. The graph suggests South Africa has fared relatively well in terms of detecting cases, and by implication testing. What the ECDC data do not include are total number of tests (many of which are negative, and hence not 'cases').

15 MARCH 2020

Fascinating article, and good use of the available data. link here. Even if it's only roughly correct that delaying social distancing by one day equals a 40% reduction in cases over the longer term, that's still amazing.

I had a look at daily data on the pandemic (link here), and then pulled in some basic pre-COVID19 mortality stats. I'm sure a graph such as this one has already been produced, but I couldn't find it. This graph focusses on deaths, not infections. Data on deaths can be considered far more reliable than data on infections, as implied by Pueyo's analysis. The horizontal axis is days since the country's first death. Among 48 countries with deaths in the data (up to 18 March), the median lag between the first observed case of infection and the first death is 62 days. If South Africa were to follow that pattern, the first death would occur only at the start of May. The vertical axis in the graph is cumulative deaths divided by how many people generally die per year in the country. In China, the 3,242 deaths in the data come to just 0.03% of China's approximately 9.9 million deaths in a typical recent year. And in China the curve has been flattening out. South Korea's curve, while it reaches much further up than China's, is not that much steeper than China's. This reflects a relatively good capacity to slow down the spread and save lives through good hospitalisation. Europe's curves, on the other hand, are much steeper. Italy's has already reached 0.4% of a typical year's deaths, and that's after just 25 days since the first death. Spain's curve has risen the fastest of all. All this is depressing, and the question is what our South African curve will look like a few months from now. We need to learn as much as we can from other countries, while realising that we face unique challenges which are unlike those of the countries in the graph. Taking action early to delay or limit the spread seems really important. (The seven countries in the graph are the ones that had reached a level of at least 0.025% on the vertical by yesterday.)

Here's a magnified version of the same graph.

Correction on my Comment 1. South Korea's curve clearly doesn't reach up beyond China's, at least not yet, but it is correct that in terms of steepness, it looks more like China's than that of, say, Italy.

The following graph is drawn from data which is strategically important right now. Epidemiologists require data from contact surveys when they model how quickly viruses spread. In these surveys, researchers collect information from people on the number and types of contacts they have in a day. The graph represents data from the South African survey being used by institutions such as Imperial College London to produce COVID-19 projections for several countries (google 'The global impact of COVID-19 and strategies for mitigation and suppression'). The survey seems to be the only South African contacts survey which helps in these types of projections. The microdata from the survey do not seem to be publicly available. I used a table from the relevant 2011 research article by Johnstone-Robertson et al to produce the graph (link here). The contacts reflected in the graph seem to refer to physical contacts and people spoken with. From an education planning perspective, what stands out is that around two-thirds of the contacts of people aged 5 to 19 are linked to schooling, either what happens in the actual school, or commuting to and from school. What is also interesting is that contacts in the school for youths aged 15 to 19 are about 50% higher than those for younger children. This is probably linked to subject teaching, whereby at the secondary level students move from classroom to classroom, regrouping as they move, depending on their subject choices.

19 JANUARY 2020

The blue curve in the following graph uses the PVGIS data source. This curve indicates how much solar radiation one could expect, in terms of Watts per square metre, from a given set of solar panels on a typical day in June in Pretoria. Basically you tell the PVGIS website where in the world you are, what kind of solar panels you have, and then you obtain a dataset, with one observation per 15-minute interval, for the last ten years, on how much incoming solar radition there has been. Very cool. How do they build this dataset? They use satellite data to derive estimates of how much solar radiation has been hitting the Earth's surface at different points in time. Largely this is about combining satellite data on cloud coverage with additional information on things like the angle of the sun. You could even use the data to find out how overcast it was on a particular date, and at a particular time, some years ago.

The PVGIS data source: link here.

Some politics... PVGIS is a European Union project. One of the drawbacks of Brexit, according to the journal Nature, is that research of this kind will become more difficult after the United Kingdom has withdrawn from the EU. See link here.

The graph in the initial post is from an Excel file downloadable at link here. Feel free to use it and distribute it. It could be useful both for a teacher wanting to use the PVGIS data to teach data use (and geometry and angles!), and for someone (like myself) planning to buy solar panels for the home.

These are five different solar radiation trends drawing from the PVGIS data. The best (highest) is the one you obtain by investing in two-axis mountings. That means you have panels which are able to move in any direction, throughout the day, to follow the sun. However, this is very costly for most homes and businesses. That's a pity, because it means most people have to settle for something that provides 25% less power. That is fixed axis panels. The two fixed axis curves are very similar. The first uses angles PVGIS has decided would be optimal for Pretoria - panels which are tilted 30% up, and minus 172 degrees from South, meaning around 8 degrees to the right (West) of North. The second uses the actual angles of my roof - I'm very fortunate that my roof is very close to the optimum. Clearly August or September is the best time of the year for generating solar power in Pretoria. Just before the summer rains, but with days which have become relatively long after the winter.

Here are the optimised fixed axis patterns by month for the country's three largest cities (so comparable to the yellow curve in the Pretoria graph). Johannesburg gets the most sunshine of the three (and it's obviously a lot like Pretoria). What's also good about Johannesburg is that the radiation received is evenly distributed across the year. Durban gets a bit less due to more cloudcover. Cape Town, which receives less radiation in total than Johannesburg, but more than Durban, has a very uneven distribution across the year. This is due to clear summers and cloudy winters. I picked the suburb of Goodwood, which is not too close to Table Mountain. Table Mountain would influence incoming radiation in areas close the mountain, both because of the mountain itself, and because of clouds concentrated around the mountain.

Interestingly, in all three cities the PVGIS data indicate an upward trend in radiation, using the assumption of optimised fixed axis panels. I suppose this might be linked to climate change. Less cloud cover, so more radiation. If so, climate change is making it a bit easier to reduce reliance on fossil fuels (electricity off the grid), which is causing climate change in the first place.

14 JANUARY 2020

This post corrects something the Sunday Times said, which misquotes what I provided Sunday Times. But it's also illustrative of the widespread confusion around concepts such as the dropout rate. The following is the end of a 13 January Sunday Times article headed 'Signs system is improving'. The highlighted part is incorrect.

The relevant bit of what I provided Sunday Times with, via e-mail, is the following: 'South Africans are understandably concerned about youths who do not successfully complete twelve years of education, which in South Africa largely translates into obtaining the Matric. According to Stats SA household data, 12% of young South Africans do not successfully complete Grade 9, the figures for Grade 10 being 16% and Grade 11, 27%. Around 70% of youths get to sit in a Grade 12 class. About 45% of youths do not obtain the Matric, meaning 55% do. If one takes into account colleges, then around 58% of youths complete the Matric or something equivalent to it. The reason why colleges do not make more of a difference is that most college students already have a Matric. Statistics that are clearly out of line with these household-based figures are likely to be wrong.

Note the key difference. I'm saying, for instance, '27% of young South Africans do not successfully complete Grade 11', while the Sunday Times is essentially saying (incorrectly quoting me) that there is a dropout of 27% in Grade 11. Those two statements are very different.

Here's a graph which explains how I get to my statistics, and hopefully clarifies what they mean. I'm using just one source: the 2018 General Household Survey (GHS) microdata of Stats SA. This type of household data are BY FAR the simplest and most reliable data source for understanding these kinds of things at a national level. In future, I hope, we will have publicly available administrative data from the schooling system, at the student level, with anonymisation of individuals. That would be an incredibly powerful source for getting even more accurate statistics, and above all statistics at a provincal, district and municipal level. The GHS sample is fine for national analysis, and often okay for province-level analysis, especially if you use data from several years. But forget anything below the level of province. The sample is too small. It's this graph which informs my 12% not successfully completing Grade 9, 16% not successfully completing Grade 10, 27% not successfully completing Grade 11, and 45% not successfully completing Grade 12. Basically, that's the gap between the top of the graph (100%) and the best (highest) point in each curve. If you look closely, you may wonder, for instance, why I say 12% and not 9% for Grade 9. This is due to other household I've also looked at, but my figures correspond pretty closely to this graph.

The statistics I gave to Sunday Times are NOT dropout rates. Dropout rates refer to movements BETWEEN JUST TWO YEARS, not the cumulative notion as in, for instance, 'the percentage of youths who successfully complete Grade 11'. The latter is cumulative in the sense that those who did not complete Grade 11 may have fallen out of the system in Grade 10, they may have fallen out in Grade 9, and so on. So what are the correct dropout rates per grade for South Africa? you may ask. The answer is that THERE ARE NO PROPERLY CALCULATED DROPOUT RATES FOR SOUTH AFRICA. Anywhere. The data requirements for a proper calculation are really demanding - see the UNESCO specs at link here. The problem for us in South Africa, and for most developing countries, is largely that we do not have excellent administrative data on HOW MANY LEARNERS IN ANY YEAR ARE REPEATERS. Without this, one cannot do the calculation properly. Okay, provincial and national education departments now have huge databases with individual learner records, but there are problems with these data, for instance inconsistencies over time in terms of the learner ID, which is important for examining repetition. Moreover, capacity to use the data are limited, and imperfect data require even better capacity, so one can impute, and so on. It's not as if no analysis occurred. See for instance link here. But the bottom line that we are still several steps away from having truly accurate dropout rates per grade going down to, say, the level of education district.

There are ways of getting around the data problems and arriving at rough estimates of the dropout rate, which MUST BE SPECIFIC TO A GRADE. The latter is important. One should avoid talking about something like the 'overall dropout rate at the secondary level'. That could lead to confusion, especially if your method is not properly explained. The UNESCO definition applies to GRADE-SPECIFIC rates, and there is no accepted methodology for aggregating across grades. Below is a table from a 2011 report of mine which uses 2007-2008 NIDS household data to arrive at national estimates of the dropout rate - see the second-last column (report available at link here). Given that in general grade attainment has improved since then, one can be pretty sure that these dropout rates would be considerably lower now. I've seen unpublished figures of 9% and 11% for grades 10 and 11 for 2013-2014, based on a preliminary analysis of the DBE's national learner database (LURITS). Clearly, an 11% dropout rate in Grade 11 is MUCH lower than the 27% figure the Sunday Times has attributed to me.

30 DECEMBER 2019

It's the time of the year when Matric and school attainment reach the news headlines. In the coming weeks, there will be a lot of debate around what the schooling system should do better. One of many calls will be for the system to get more youths to successfully complete the Matric. It's a legitimate call, and it's one the National Development Plan makes. However, as before, I'd argue one needs to take into account a key fact: we are roughly on a par with other middle income countries when it comes to successful completion of upper secondary schooling. Moreover, we have been improving against this indicator, at a rate which seems satisfactory. Put differently, we are already on the right path as far as THIS indicator is concerned. Should the questions then not be the following?: (1) Can we expect continued progress against the upper secondary attainment indicator with existing driving factors (and what are those factors)? (2) Are there other indicators we should worry more about? The following graph uses UIS.Stat values for the indicator 'Completion rate for upper secondary education (household survey data)'. UIS.Stat says the following about the ages used: 'The percentage of a cohort of children or young people aged 3-5 years above the intended age for the last grade of each level of education who have completed that grade.' So in the case of South Africa, this would probably be youths aged 21 to 23 who have successfully completed Grade 12. For this graph, I used the most recent indicator value per country from the period 2013 to 2018. The graph illustrates that there is a fairly systematic relationship between average income per capita for a country and successful completion of upper secondary education. The trendline is logarithmic, and the R squared of .67 reflects a relatively tight fit. The trendline suggests South Africa (ZAF), with an indicator value of 48%, is about 10 percentage points below where it should be. We should sit at around 58%. But even if we were there, the current complaint that are not getting enough youths to complete twelve years of education is likely to persist. We want every young person to obtain a Matric, or something equivalent. In fact, some countries of our level of development are not too far from 100%. Peru sits at 83%.

The next graph uses Stats SA General Household Survey (GHS) microdata and suggests we are closer to the 'norm' of 58% than the previous graph suggests. The 48% for South Africa seen in the previous graph is for 2016 and, I presume, ages 21 to 23. If we look at 2018 in the next graph, and extend the age range to 24, we get 56%. Extending the age range basically means taking into account that many youth obtain the Matric fairly late. This graph uses a moving average spanning three years, so the 2018 age 24 value is actually the average across ages 23 to 25. I did that to prevent 'outliers', strange peaks and troughs resulting from the sample-based nature of the data. Of course, the Stats SA graph also shows us that successful completion of Grade 12 has been continually rising, by around one percentage point a year. So perhaps we're not so far from the norm currently (i.e. 58%), and the trend is an upward one.

The next graph uses yet another data source: the IPUMS-International online data repository, easily the largest source of normalised household data from around the world. That source doesn't have all the household microdata that UIS.Stat makes use of, but it's interesting nonetheless. I focussed above all on non-rich G20 countries which had data from 2000 or a later year. I obviously used the most recent data per country. For South Africa, I used our 2018 GHS data. A big problem is that the graph compares different years, for instance South Africa 2018 to China 2000. In fact, China has improved and recently gets 61% of youths to complete upper secondary (see the first graph). But the next graph is still useful. Someone in China aged 40 in 2018 (so 22 in 2000) is far less likely to have a 'Matric' than someone aged 40 in South Africa in 2018, it seems. The stats would 28% for China and around 50% for South Africa. What this kind of graph also permits is a measure of how quickly countries are improving. The Indonesia 2010 curve, a fairly steep one, moves from 35% at age 36 to 52% at age 21. That points an improvement of 1.1 percentage points a year, around how quickly we are improving in South Africa.

This next graph focusses on something completely different: the percentage of children in around Grade 3 who attain a grade-appropriate 'minimum proficiency level' in reading. This is a UN Sustainable Development Goal (SDG) indicator, and many people have been working on improving definitions and the data for measuring this. The graph illustrates what is now widely known in South Africa: we are VERY behind when it comes to the quality of schooling. What is less known, is that here too, we have been improving about as fast as any country can. Here an informed reader would wonder what the hell I'm talking about, as the official PIRLS trend says no improvement between 2011 and 2016. This is not correct, and I'll publishing shortly a working paper which explains why, and which explains that we in fact saw among the steepest improvements of all PIRLS countries. The IEA, the international body which runs PIRLS, has in the last month agreed that the flat trend published in 2017 is not reliable. The inaccuracy lies not with the 2016 values (which point to the widely-quoted 78% children of children not reaching a minimum level), but with the earlier 2011 value. In short, the graph confirms what many say, namely that we need to focus particularly strongly on the basics of literacy. Without that, we are unlikely to see continued and meaningful improvements in the percentage of youths completing upper secondary schooling, or getting the Matric.

Some points about the source of the data in the previous graph: I've used Nadir Altinok's valuable work as a basis for arriving at statistics for the 188 countries appearing in the graph. Altinok's dataset is available on the UIS website: best is to google the accompanying document, 'GAML Brief 6: The Anchoring Method for Indicator 4.1.1 Reporting' - the URL keeps on changing. What I did is I cleaned up the Altinok data as there were clear errors, for instance relating to South Africa - Altinok seems to have confused the PIRLS and prePIRLS scales. My full report on this is not released by UIS yet, but there is a five-page summary at link here

I've not got even close to discussing properly the questions I've posed at the start of this post. To some extent those questions are answered in various places, but the debate remains, for me, a fairly under-developed one. We need to be looking at how other countries design policies to cater for the needs of youths who do not complete upper secondary schooling. What role do qualifications below Grade 12 play? How are academic and non-academic streams and institutions complementing each other? The South African household data clearly do not support the common notion that youths without Matric get no jobs at all - see for instance my slides at link here. And the graphs in this post suggest that at current rates of improvement, we'll be at Peru's 83% upper secondary completion in around 27 years time, so in 2046. We need a more comprehensive approach to assisting youths in the transition from school to post-school life.

16 NOVEMBER 2019

This appeared in the Finance Minister's October 2019 MTBPS speech. The 66% struck me as high.

I've worked a lot with the average educator pay, and publicly employed educators account for around 31% of the national and provincial government compensation of employees category (based for instance on Treasury's latest EPRE files and the Budget Review). I get a average real inflation-adjusted increase per educator for the years 2006 and 2018 of about 40%. There are many ways of getting this figure. For instance, one can focus on just the policy on increases over time. Or one can look at actual increases as seen in the payroll data. And then of course one must adjust for inflation (CPI), using Stats SA figures. Whether one uses policy or the payroll data, one gets to around 40%. Details on methods and data can be found in the following report recently made public by the Department of Basic Education, via their new 'Research Repository' available link here.

So what's happening? Have educators been receiving smaller increases than other public employees? Would this account for the large difference between the 66% and 40% values? Or is the 66% value incorrect? I suspected the latter. Having checked the full MTBPS report (available on the Treasury website) in more detail, I'm now pretty certain the correct figure is not 66%. Below is the graph from the MTBPS report which is used to arrive at the 66%. Below that, I've pasted a piece of Excel work I did where I entered the values from the brown bars, and incremented over time. I get an overall increase of 44%, not 66%. A figure of 44% seems more believable. Even with the fact that notches have 1.0% apart for educators, against 1.5% for other public servants (up until the 2018 wage agreement elimininated this discrepancy, in favour of educators), the earnings increase gap could not have been as large as what is suggested by the 66%.

The Excel screenshot.

This 66% value is not unimportant. The extent to which increases in the pay of public servants are behind the current fiscal crisis is a vital and emotive issue.

22 MAY 2019

One hears 'Fourth Industrial Revolution' a lot nowadays, at least in South Africa. In April 2018, President Ramaphosa set up a Fourth Industrial Revolution Commission. According to Wikipedia, the term originates from 'Industrie 4.0', used by German government planners some time before 2011 to describe changes in manufacturing technologies arising out of innovations such as artificial intelligence. 'Fourth Industrial Revolution' as a term only took off after Klaus Schwab, head of the World Economic Forum since 1971, published a book in 2016 with the title 'Fourth Industrial Revolution'. I've come across at least two areas of controversy around this concept, and many feel quite passionate about the issue. First, the more academic question is whether technology changes occurring now are really so abrupt that one can speak of a new revolution, of a FOURTH revolution, as opposed to just an evolution of the THIRD? (Some background: the First Industrial Revolution was largely about steam engines, the Second about phones and electricity, and the Third about computerS.) Secondly, how can one use this concept to advance development and reduce inequalities in South Africa? What are the right and wrong ways of going about this? How does one ensure that this is really developmental, and not merely something for technofiles to 'play with' (technofiles and 'playing' can bring about development, but not necessarily). How do we ensure a development agenda is what drives government action, and not commercial interests? The graph below comes out of some Google searches. Each bar is the number of occurrences of the term 'Fourth Industrial Revolution', or some equivalent, in the language of each country, per 1000 occurrences of the word 'skills', in English or some other language. I needed a denominator ('skills') as some governments simply make more things public through the internet. So a search in Google might be as follows: "Industry 4.0" site: That would find the number of times 'Industry 4.0' appeared on Kenyan government websites. That turned out to be 524. I then also looked for '4th Industrial Revoluation' and 'Fourth Industrial Revolution' and came to 593 - clearly Kenya uses the term 'Industry 4.0' most. And if you divide that by 30,000 instances of the word 'skills' (Google uses rounding), you get the 19.8 for Kenya seen in the graph. In South Africa the two values are 7,801 and 438,000, giving a ratio of 17.8. I selected countries we'd probably want to compare ourselves with. Clearly, South Africa emerges as a frequent user of the term, but not as as the greatest. What is interesting, though, is that the term has by no means been universally adopted by countries which invest a lot in technological innovation. The USA, France and India barely use the term. So for those saying we in South Africa have jumped on some strange bandwagon, the graph says that's not the case. China, Germany and Malaysia, for instance, seem to have embraced the term to an even greater degree. And for those saying you can't talk about technological progress without using the term, the graph says no, a lot of countries do quite fine without it. If you meet a Mexican and want to talk about the Fourth Industrial Revolution, you may get a blank stare.

This graph shows there is something unusual about South Africa. Where other countries have embraced 'Industry 4.0', we've gone for 'Fourth Industrial Revolution'. Again, probably worth taking into account if you get to talk to non-South Africans.

While the WEF uses 'Fourth Industrial Revolution' a hell of a lot, the World Bank and UNESCO hardly do, though obviously the latter two deal with issues of technological change. The fact that the WEF value exceeds 1000 means they use 'Fourth Industrial Revolution' more than the common word 'skills'. In fact, they're three times as likely to use the former as the latter. That is a hell of a lot. The World Bank and UNESCO, on the other hand, are on a par with Brazil or Australia, both luke-warm embracers of the term. There are interesting strategy differences, bordering on ideological differences, between the WEF and the other two global organisations, not only with respect to technology change, but also education policy - to find a paper I did on this Google "Standardised testing and examinations at the primary level". Interestingly, 'Industry 4.0' comes to just one-third of the occurrences of 'Fourth Industrial Revolution' or '4th Industrial Revolution' in the WEF, which makes their language look a lot like the unusual South African practice seen in the previous graph. If I were to speculate, perhaps the Kenyans were influenced by official German development aid people rather directly, while we got into this via participation in the WEF.

For a while I thought the uptake of the term in South Africa was concentrated in the education sector. The next graph indicates this is obviously not the case. The Department of Trade and Industry and the Department of Labour clearly come out on top.

My Excel file with the background values, and the terms I searched for in the different languages, is available at link here.

22 APRIL 2019

Holidays means time to look at data which are not directly work-related. This post is not about data ON education, but data FOR education. The map below draws from one of the world's most important datasets, yet one which is small and simple enough for teachers to use with students, at least at the upper secondary level. The dataset is that of the Global Historical Climatology Network (GHCN), based at the Arizona State University. This dataset is used extensively by scientists around the world for climate modelling. The data are fairly accessible, but the typical hoops apply: it's a bit difficult to work out which files one needs, and how different files do or do not link up with each other is something one in part has to discover on one's own. I've uploaded a simplified version of the dataset, in Excel, to facilitate easier access (zipped file of 2.3 MB, URL below). The map below (done using Stata 14) illustrates 1,744 weather stations which had enough data for this analysis. I've coloured the points according to the temperature change over the 50 years 1965 to 2015. Basically, I calculated the trend for each of the twelve months, then found the weighted average across the twelve, using the number of non-missing values as my weight.

The South African school curriculum, like that of many countries, now incorporates climate change and environmental issues in general into several subjects, as seen here. Grade 11 is, in many ways, the year when South African students really get to grapple with these topics, so perhaps that is a good grade for using the GHCN data. In addition, the subject mathematics requires students to understand data. Introducing students to environmental data is, for me, an activity that fits nicely into the Fourth Industrial Revolution, a concept which is now popping up everywhere in the South African policy debates. To cope with the future, students need a solid understanding of environmental change, they need to learn to question, for instance by examining raw data themselves as opposed to just accepting the analysis of others, and we all need scientifically-minded (but also politically savvy) youths to push for the economic and industrial change we need for sustaining our species.

Back to the GHCN data, this map provides a sense of the density of weather stations in each country, counting just the 1,744 with fifty years of data. In a way, this map also illustrates capacity for and commitment to science over a longer period. China stands out. It also seems like a few Francophone African countries have done particularly well when it comes to gauging temperature change.

There are lots of maps which use the GHCN data, but I don't recall seeing one like this. Here I've used the average across the stations in each country. Of course this is more playing with the data than serious analysis. Many countries have just one weather station. But 'playing' with data is important and is the first step towards deeper analysis. The patterns are fairly clear: countries in the north of Eurasia are experiencing particularly fast temperature changes. But there are also regions in the tropics with large increases: Arabian Peninsula, North Africa, South East Asia.

Among the South African weather stations, Cape Town offers the longest time series of data. Here the January-February annual average for around 160 years is shown. The curve shows a 15-year moving average. The pattern of fairly steep rises after 1960, and especially 1990, is a pattern seen all over the world. I had to trawl through a number of reports and articles before I discovered exactly how the monthly average temperatures in the GHCN data are calculated. At most stations, a daily average is found by simply averaging the daily minimum and maximum. Measurements are taken two metres above the ground. Then a monthly average is calculated using the daily averages.

Here's the winter graph for Cape Town.

The largest and most consistent temperature rise seen at a South African weather station is that of Marion Island, an uninhabited South African island 1,800 km south-east of Port Elizabeth.

Finally, here's the linear trend for six South African urban centres, using January-February figures. The steepest upward slope is that of Kimberley, which has overtaken Durban. The smallest increase is that for Upington, which is good for Upingtonians given how hot this town generally is.

28 JANUARY 2019

The following is a graph from the last DBE report on the outcomes of the 2018 Grade 12 examinations (disclosure: I contributed this graph and a few other things, to the report). The graph shows what no-one disputes: more and more youths are obtaining the National Senior Certificate (NSC), and qualify for Bachelors-level studies at a university. Jonathan Jansen, and he's not the only one, look at this and wonder: Does this upward trend really indicate that the schooling system is improving? Jansen, in a January article titled 'Annual matric results - a sleight of hand' says the following: 'As any beginning statistics student will tell you, no system that is chronically dysfunctional for 80% of its schools makes that kind of leap without dramatic and observable system-wide improvements in teaching and learning.' There are of course widely known observable system-wide indicators of improvements, and I don't know why Jansen doesn't seem to recognise them. Does he not trust the TIMSS results, for instance? If so, why not? The terrain of international assessments can be confusing and sometimes first impressions can deceive. What I want to do with this post is to reiterate the argument that there have been improvements in the quality of learning in South Africa's schools. But I want to do this in a fresh way, in part by asking questions which should be asked about PIRLS. But some background: Why is it important to understand whether learning outcomes are improving? It is important because if there is improvement, it is likely to be fairly invisible. Learning outcomes have always improved rather slowly, even in the most ambitious countries. If superficial observation suggests there has been no improvement, but improvement has in fact been happening, one can end up saying the current policies, for instance the curriculum, should be dumped in favour of something else. The intentions may be good, but the impatience can lead to undesirable instability, and can even harm a schooling system.

But before I get to the observable improvements below Grade 12, here's a link to a Mail & Guardian article of mine where I explain what concerns the Matric results should prompt. There are lots of things to worry about. I just don't think we need to worry about whether there's some nefarious plot by government to use data to create an impression of education improvements when there is none link here.

Here's a document which explains the details behind my graph. These details are easily confusing, unless one looks through them really here.

Though I haven't found the time to dig around in the PIRLS data, I've had the chance to look at TIMSS. One critical yet rather straightforward way of checking how comparable results are over time is to compare learners with similar socio-economic backgrounds across different periods. If there is improvement within those groups, in particular less advantaged groups, then that is good news and it probably confirms that there's been no distortion due to a greater or lesser representation of particular groups. This type of check has not been done with the publicly available PIRLS results. My check using TIMSS data appears on p. 7 of a background report that fed into the 2017 'High Level Panel' findings:link here, Note there are two reasons why the gap between 2011 and 2015 is smaller in the graph than that between 2002 and 2011. One is that the gaps represent different lengths of time. Another is that progress seemed to have slowed after 2011.

Here's a new graph I came up with. It shows the South Africa trends for three international testing programmes: TIMSS, SACMEQ and PIRLS. I've focussed on mathematics in the first two programmes, but had I focussed on their other subject, the findings wouldn't change substantially. The starting value for each programme is set fairly arbitrarily, to facilitate the reading of the graph. The difference between two measurement points in any programme is expressed in standard deviations. For example, TIMSS Grade 9 mathematics improved from 289 in 2002 to 352 in 2011, which is 0.70 standard deviations, given that the standard deviation of the results of South African students is 90 TIMSS points ([352 - 289] / 90 = 0.70). Hence the TIMSS curve moves from 0.20 to 0.90 (0.20 is the arbitrary starting point). The grey bar represents an improvement of 0.08 standard deviations a year. Historical trends suggest that improvements beyond this 'speed limit' don't exist. Thus we should be suspicious of trends that exceed this slope. A key intention with this graph is to raise questions around the PIRLS trends. The lack of improvement between 2011 and 2016 received much media attention. However, if one looks back at the earlier PIRLS result for Grade 4 of 2006, serious questions arise. The 2006 to 2011 improvement exceeds the 'speed limit' by a wide margin: 0.13 standard deviations a year against 0.08 standard deviations. The obvious follow-up question is: Is the 2011 PIRLS value not too high? If it is, one would perhaps not have a suspiciously steep 2006 to 2011 slope and one would see some improvement over the 2011 to 2016. One would perhaps have something more like a 0.06 slope a year improvement over the overall 2006 to 2016 period, a trend which would be credible in terms of the global 'speed limit' and would be in line with what has been seen in TIMSS and SACMEQ. Researchers could look more deeply into this using easily available data - I haven't had time for this yet.

Though I haven't found the time to dig around in the PIRLS data, I've had the chance to look at TIMSS. One critical yet rather straightforward way of checking how comparable results are over time is to compare learners with similar socio-economic backgrounds across different periods. If there is improvement within those groups, in particular less advantaged groups, then that is good news and it probably confirms that there's been no distortion due to a greater or lesser representation of particular groups. This type of check has not been done with the publicly available PIRLS results. My check using TIMSS data appears on p. 7 of a background report that fed into the 2017 'High Level Panel' findings:link here ,Note there are two reasons why the gap between 2011 and 2015 is smaller in the graph than that between 2002 and 2011. One is that the gaps represent different lengths of time. Another is that progress seemed to have slowed after 2011.

Finally, there is a myriad of issues which one should ideally take into account when using test data to make comparisons across countries, or over time. I summarised these issues, and how countries and global organisations could improve the capacity to make credible comparisons, in a recent UIS report(link here).To take one example, levels of grade repetition influence comparisons. If repetition declines, one would be, in a sense, double-counting a larger number of academically struggling learners in an earlier year and one could end up seeing an improvement when there was in fact none, at least not in terms of age cohorts, and ultimately what we're interested in is age cohorts. I found the following map fascinating. There are systematic differences across the world when it comes to grade repetition patterns at the primary level. In particular, Francophone Africa displays particularly high levels of grade repetition at the upper primary level. This has the potential to influence across-country comparisons in significant ways.

26 DECEMBER 2018

While there's been a surge in research into early childhood development, in SA and beyond, there's still considerable confusion around how many children are attend some form of ECD (or pre-school) institution. Stats SA data reliably tell us that currently around 2.4 million children do, though many official reports, and UNESCO, reflect a figure less than half of this. A colleague who works with data in other developing countries told me it's common for developing countries to under-estimate ECD participation. So if ECD participation is higher than what is widely believed, what are the implications? Basically, it means we have to worry less about expanding access and more about improving the quality of what already exists. The whole reason why ECD is receiving so much attention is because the right stimulation and nutrition in the years before school are proven to be about the most cost-effective way of improving learning at the primary school level. There's one interesting trend that has apparently not been explained yet, but probably could be using the available data (I haven't had time to look into this yet). There was a massive expansion in ECD participation between around 2007 and 2012. Thereafter, participation remained roughly stable, at around 2.4 million. What caused this expansion? It was probably a mix of an improved financial situation in households, combined with an increase in the public subsidisation of ECD. The graph below reflects the trend since 1998. One important reality today is that around 73% of children already attend some pre-primary institution during the year prior to Grade R (and Grade R coverage stands at around 95% currently). The NDP wants to take both Grade R and pre-Grade R to 100%. The 73% figure means we don't have too far to go at that level, yet the way the NDP target is discussed sometimes creates the impression we are starting from quite a low base. Clearly we aren't. There's a lot there already. We need to understand what's there, and find ways of reaching the 100%, while improving quality.

AfricaCheck used my analysis, and essentially agreed with it. See link here.

Full report available at link here.

This map shows that enrolment for children aged four at the start of the year, according to Community Survey 2016 data, was fairly widespread in many of the poorer parts of the country. Provincial values, from highest to lowest, are: Limpopo 87%; Free State and Eastern Cape 82%; Gauteng 81%; Mpumalanga and North West 77%; Western Cape 73%; Northern Cape and KwaZulu-Natal 71%. However, the relatively low levels of participation in KZN, our second-largest province in terms of population, stands out.

Here's a more fine-grained picture using the extremely valuable 2016 Community Survey data (collected by Stats SA, available via DataFirst). This trend is confirmed by the National Income Dynamics Study (NIDS) data collected by SALDRU at UCT. Because both these datasets include the year AND MONTH of birth of each child, one is able to gain a much clearer picture of who participates at different levels of ECD. One should ideally avoid statements such as 'X% of five-year-olds are in Grade R', without specifying what you mean by five-year-old. Having turned five at the START of the year? Turning five DURING the year in question?


Here's more unpacking of the 2016 Community Survey - Stats SA makes the data available via UCT's DataFirst. Here the focus is on the questions relating to household satisfaction with municipal services. The official reports from the survey in this regard are quite good, but there's a lot more one finds by trawling through the data. For instance, there are a couple of telling patterns in this first map. The survey asks households what their largest gripe with the municipality is ('diffulties facing the municipality'), out of a list of 20 options. The three most selected options are 'Lack of safe and reliable water supply' (18%), 'Lack of/inadequate employment opportunities' (12%) and 'Cost of electricity' (11%). In the 111 municipalities (of 213 in total) coloured blue, the water complaint accounts for 39% of households. These 111 municipalities tend to be in the east of the country, and are all non-metro. The 111 account for 40% of the national population. People in the eight metro municipalities (accounting for 42% of the national population) have rather different concerns. In five metros, concerns around the cost of electricity comes out on top . In the other three, it's 'Lack of/inadequate employment opportunities' (Johannesburg), 'Violence and crime' (Cape Town), 'Inadequate sanitation/sewerage/toilet services' (Mangaung) - Mangaung is 'Other' in the map as it's the only municipality with sanitation as a top issue. Water supply concerns top the list for just 6% of households in the metros - highest is Manguang, with 9%, and lowest Cape Town, with 3%. The urban-rural divide seen here is not so much a question of unequal access to electricity. In the 111 blue municipalities, 89% of households have access to some form of electricity, against 93% in the metros. This is a relatively small difference. All the blue in the map seems to be driven by serious water supply problems. Interesting also is the 39 municipalities where the most common option was 'None', meaning no complaint. These are concentrated in two provinces: Northern Cape (a remarkable 45% of households) and Western Cape (21%).

Corruption receives a lot of media coverage, so to what extent does corruption top the list of complaints? Obviously, corruption overlaps to a large degree with the other complaints. For instance, poor water supply could be linked to corruption. Where households identify corruption as the largest problem, this could be because they read the news more, or because their education helps them see the political problems behind, say, water cuts. Only 2.3% of households in South Africa identified corruption as the most worrying problem in 2016. The three provinces above the national average were Gauteng (3.1%), followed by North West and Mpumalanga (both 2.2%). Male-headed households were more likely to complain about corruption than female-headed households (2.6% against 1.8%). As one might expect, complaining about corruption is positively correlated with level of education. In households where at least one person had a post-school qualification, the figure was 4.1%, against 2.2% for just Grade 12 (Matric) and 1.5% for less than Matric. While this correlation held for all four population groups, there were large differences across these groups. For instance, if one focusses just on households with a post-school qualification, the figures were: white (7.2%), Indian (6.9%), coloured (4.5%), black African (2.6%). So what can we tell from the map? The geographical spread is not as systematic as in the previous map. One noteworthy pattern seems to be that among the eight metros, the three in which the ANC lost its majority in 2016 (Johannesburg, Tshwane, Nelson Mandela) are also the only three where complaints about corruption was 3.0% or more. One cannot draw hard conclusions from such a superficial analysis, but the pattern seems noteworthy nonetheless.

Here's the list of all the complaints. Note my 'Other' in the map is options 2, 5, 6 or 12 on this list, not the original 'Other'.


For a while statistics on religion in South Africa have been hard to come by. Census 2001 included religion questions, but not Census 2011. Hence Wikipedia, in its 'Religion in South Africa' page, makes use of the old Census 2001 data. In fact, Statistics South Africa has gone back to collecting religion data, though hardly anyone seems to use it. From 2013, the General Household Survey (GHS) has collected religion data. But what's much better, both in terms of the depth of the questions, and the size of the sample, is Stats SA's 2016 Community Survey. That's what I used for the following graph. But first, how have I defined being religious in the data? There are 23 faith categories. Four of these categories are counted as non-religious: 'No religious affiliation/belief' (89% of the non-religious), 'Do not know' (10%), Atheism (0.8%) and Agnosticism (0.5%). Clearly, of these four categories 'No religious affiliation/belief' is the largest. The 2.5% of responses which are missing are not used in the analysis. Finally, I've limited the analysis to adults aged 18 and above. Each bubble in the graph has an area proportional to the weighted respondents. What is interesting is that in South Africa, more education is associated with being more religious. In general all educational groups display high levels of religiosity - the average is never lower than 83%. But those with more education are in general more religious. Most religious are those with some post-school qualification, specifically those with 14 to 16 years of education (red bubbles). But the few with 17 (Masters) or 19 (PhD) years of education break the pattern, and display relatively low levels of religiosity. The patterns seen here are interesting because studies of data from other countries tend to display the reverse: more education is associated with less religiosity. A big but... those studies cover mostly developed countries, and seldom African countries.

In very few municipalities is there a MAJORITY religion. There are a few municipalities in KwaZulu-Natal, and one in Limpopo, where African Churches account for more than 50% of responses. In Northern Cape, Khâi-Ma is mostly Catholic, while Karoo Hoogland is mainly Reformed Church.

This map indicates what the largest religious category is per municipality. It is striking how prominent the 'African Christian Church' category is, even in the more urban Gauteng.

The right-hand one-third of the next graph, which roughly represents the middle class, looks rather different to the left-hand two-thirds. The African Christian Churches clearly have a limited following in the middle class. The Reformed Church, on the other hand, is particularly strong in the middle class relative to the poor and working class. My category 'Sundry' emerges as quite large for the middle class. This is accounted for mainly by Hinduism (1.9% of the middle class), Judaism (1.7%) and Jehovah's Witness (1.1%).

South Africa is religiously diverse. Of 23 religious categories referred to in the 2016 Community Survey, 15 are selected by at least 1.0% of the adult population. These 15, plus 'Sundry' for the remaining 8, are shown in the next graph. If more younger adherents is an indication of a growing religious category, then the graph suggests growth is occurring particularly for No religious affiliation, Pentecostal/Evangelistic, and African Christian Church. Shrinkage would then be seen for Reformed Church, Anglican/Episcopalian and Methodist. Clearly the category I've labelled 'African Christian Church' is the largest. The full description of this category in the questionnaire runs as follows: 'African Independent Church/African Initiated Church (e.g. Zion Christian Church; Apostolic Church; African Nazareth Baptist Church/Shembe)'. The fact that this one category refers to various churches is of course a reminder that to some degree classifying religions is a bit arbitrary. The divisions between or within them will often be debatable.

In this graph South Africa follows a global pattern: women tend to be considerably more religious than men. We also see that around age 30 is when people are least religious.

Here we see that the positive correlation between education and religiosity is true for black African respondents, but not for the other three groups. We also see that at all education levels, coloured and Indian respondents emerge as the most religious. Among those with Matric (12 years) or less, black African respondents emerge as the least religious. Of course religiosity is extremely difficult to define, let alone measure. These stats have to be interpreted carefully. Importantly, the question in the survey is 'How would you describe [your] MAIN religious affiliation/belief?'. Then, for Christians, there is a further question: 'Which Christian denomination or church, if any, [do you] identify with most closely?'.

11 JUNE 2018

This graph is from a recent article written by myself and Stephen Taylor and published in the Journal of African Economies. It shows what happened to the mathematics marks of schools which were moved into better provinces when provincial boundaries changed after 2005. Economists love these kinds of 'natural experiments' because they offer firm evidence of the effects of certain things. In this instance, it is largely about moving from North West to Gauteng, or from Limpopo to Mpumalanga. The vertical axis reflects top students in the moving schools AFTER ONE HAS TAKEN INTO ACCOUNT 'GAMING' OF THE SYSTEM BY MEANS OF KEEPING WEAKER LEARNERS OUT OF GRADE 12, AND OUT OF THE MATHEMATICS CLASS. So the trend is not indicating that schools got better at keeping weaker learners away from the exam. Teaching actually improved. So what does this mean for policy? Above all, it confirms that the way a province organises education is important. It can make a difference between more and fewer learners achieving marks that will give them access to mathematically-oriented programmes at university. Provinces matter! The media and legislators should worry more about the quality of provincial departments, for instance by looking at the quality and coherence of the plans, annual reports and websites of these departments. I know that sounds a bit boring, but as a good colleague of mine once said, 'When the work gets boring, you know the real work has begun'. Read the abstract or (better still) the article itself to get a sense of what the better provinces did better.

The abstract is at link here. Message or mail me if you want the article and don't have access to the journal.

2 APRIL 2018

Here's a graph from a recent paper I did with a colleague, Carol Nuga Deliwe. In any education system, one should monitor the degree of cheating in exams and tests. A useful rough idea of this is obtained by analysing patterns in the test data (assuming one has data on each question). Jacob and Levitt, inspiration for the 2010 movie Freakonomics, developed methods for this and found that in around 5% of Chicago's schools cheating occurred. We find cheating in 0% to 9% of schools, depending on country, in the SACMEQ 2000 and 2007 data. What came as quite a surprise is that levels of suspected cheating are quite highly correlated with a relevant and well-known World Bank indicator of general corruption. That's what the graph shows (values for 2000). What this suggests is that cheating in tests is in part linked to the general culture around how much rule-breaking is tolerated in the society. Paper at link here.

Grade 6! In this case, mathematics.

9 JANUARY 2018

We need fact-checkers like Africa Check. link here.

8 JANUARY 2018

Someone from Africa Check asked the logical question: What is the national curve for the graph two posts down? - that graph has only the nine provincial curves. Here's the national curve. It also uses pooled 2014 to 2016 data. The key thing is the 'ceiling' of the curve, which sits at around 52%. The curve is lower to the left (ages 20 and 21) because those youths have not achieved their Matric yet, though historical trends clearly suggest they will. This 52% is a bit lower than the 55% I referred to earlier, but one should remember that this national curve would have as its most recent influences the year-end examinations of 2013 to 2015 (the General Household Survey is conducted in June, so for instance the 2016 GHS would not reflect the 2016 year-end examinations). The exact figure I got using the schools (not household) data, is 53.4%, applicable after the 2013 examinations. Given that the general trend for Grade 12 attainment has been upwards, I've concluded that currently we're sitting at around 55%. Importantly, there are data issues. The GHS data have confidence intervals because they are sample-based. Government's age-specific enrolment data has some problems, and its not widely available for public scrutiny (in the way the Snap Survey data, which does not include age, is). But the data issues aren't large enough for us to cause too much uncertainty, and these data issues are common across most developing countries. What is clearly not defensible using any data source are statistics such as 38% of youths complete Grade 12. Note that what I am NOT including in my 55% of youths figure is youths who obtain something equivalent to the NSC outside a school, such as a TVET college NC(V)4 qualification, WITHOUT PREVIOUSLY OBTAINING THE NSC FROM A SCHOOL. If one includes such youths, the figure rises by around 2 percentage points, so for instance 55% becomes 57%.

7 JANUARY 2018

The other way of calculating successful completion of Grade 12 is to take ONE AGE COHORT from an earlier year, for instance all 14-year-olds, where we are certain virtually all children are enrolled, and then to divide Grade 12 passes by that denominator. One does need to take into account several factors, including the small percentage of 14-year-olds who are not enrolled, supplementary Grade 12 examinations, part-time examination candidates, and so on. What one should NOT to is to divide Grade 12 passes by, say, all Grade 10 learners or all Grade 2 learners from an earlier year. Why? The key reason is REPEATERS. For instance, around 20% of Grade 10 learners in recent years are repeating that grade. This means that if you use ALL Grade 10 learners as your denominator, you are unduly inflating your denominator, and that will produce a rather low ratio. The data, whether it's household data or our school data, show fairly conclusively that around 55% of youths get to pass Grade 12 nowadays. That's higher than figures of around 38% are that have been quoted in various places. Moreover, it's clear from the data that this percentage has been on the rise. The details are in a DBE report published in 2016 which I did a lot of work on:link here.

7 JANUARY 2018

This is a crucial graph. There no simple way of explaining successful Grade 12 attainment, because youths attain at different ages. The easiest way of looking at this issue is to simply ask what percentage of youths at a variety of ages has successfully completed Grade 12. This is effectively what this graph does, by province. The data are General Household Survey data, available via UCT's DataFirst portal. What seems undisputable is that in Gauteng, a far higher percentage of youths have completed Grade 12. Of course some of this could be due to Matriculants being more likely to migrate to Gauteng than non-Matriculants. However, there is a completely different way of doing the calculation, using just school data, and that's what I explain in the post one up. Doing the calculation this other way still produces the conclusion that Gauteng enjoys the highest secondary completion ratio, not Free State, and not Western Cape. The reason why Free State produces the highest 'pass rate' in the sense of successful Matriculants divided by those who sit for the examination, is that in this province, dropping out before Grade 12 is particularly high. (The graph uses GHS data pooled across the three years 2014 to 2016. This is necessary to avoid jumpiness in the curves caused by the sample-based nature of the GHS data.)

7 JANUARY 2018

I agree that we pay far too little attention to youths who 'fall out the system' by never obtaining a Matric qualification or anything equivalent. Around 45% of youths 'fall out' currently. However, what does one do about this? It is implied by Katharine Child's article that we must simply get more youths to complete twelve years of education. I agree that is part of the solution. However, this doesn't happen overnight. As pointed out in the Department of Basic Education's main 2017 examinations report, completion of secondary schooling in South Africa is approximately on a par with that of other middle income countries. Our completion figures are better than those of Tunisia, Egypt, Indonesia and Uruguay. Even the United States only gets around 90% of youths to complete twelve years of education. A part of the solution should be, I believe, to have a qualification below Grade 12 so that we don't leave youths entirely without any widely recognised proof of their schooling. We are an unusual country in not having some qualification below the end of the secondary level. The 2014 report of the Ministerial committee looking into the Grade 12 examinations recommended that a Grade 9 exit certificate be introduced. Somehow that important recommendation has got lost in our lively education policy debates. In a couple of posts appearing above I argue why I don't agree with the 'true pass rates' quoted in this Business Day article (they are in fact around 17 percentage points higher if one examines the data carefully). I also explain why not Free State, but also not Western Cape, is best at getting youths to obtain the Matric. The top province is in fact Gauteng. I also don't agree that the number of high school dropouts is 'skyrocketing', which means increasingly rapidly. In fact, the proportion of youths who drop out has been declining gradually over many here.

29 DECEMBER 2017

This graph goes with the post immediately below.

29 DECEMBER 2017

link here. Fact-checking is so important, and often not that difficult. Business Day has just come out with an article on a report by the Institute of Race Relations (IRR). I can't find the IRR report, but maybe it's not published yet. According to Business Day, a key conclusion of IRR's report into poor science results in schools is that only 18% of schools have science labs, and that this largely explains the under-performance of many schools. It is logical and humane to argue that science labs should be available, in particular at the secondary level. But there are problems with the IRR's apparent analysis (I'm getting it second-hand via Business Day, but I have no reason to believe BD would misrepresent the IRR report). Firstly, 18% of ALL schools, including primary schools, may not have science labs. That sounds credible. But the TIMSS data (which IRR seems to have used in their analysis) indicates that 49% of Grade 9 students had a science lab in their school in 2015. TIMSS countries with lower figures than South Africa are Hungary (30%) and Lithuania (11%), countries that score much better in science than us. That said, we are almost at the bottom of the TIMSS rankings when it comes to access to science labs. But are science labs THE solution to poor science performance in South Africa? My graph (one post up) suggests focussing on labs as THE 'magic bullet' here would be a mistake. If one compares each student's mathematics and science scores in Grade 9 TIMSS (2015), it becomes clear that having a lab seems to add just 8 points to the science score, for a student with a specific mathematics score. This is a small improvement in the larger picture. There are many students WITH access to labs who perform very poorly in science. As many as 55% of these students perform below 400, the so-called 'low international benchmark'. Conversely, there are many students WITHOUT access to labs who perform rather well. All this supports the evidence that the most powerful levers we have for pushing up learning outcomes, including science results, is a better reading and numeracy foundation at the primary level, more focussed support to teachers on how to teach, and a general culture of accountability and discipline amongst all actors in the schooling system. Magic bullets can seem attractive, but at least when it comes to schooling we need to think broadly, and we need to make use of the data we have.

An important point someone reminded me of: Amongst the 51% without fully-fledged laboratories would be students who actually do view and even conduct science experiments using science kits and mobile labs. It seems there are no national stats on this phenomenon.

20 DECEMBER 2017

When I produced these three graphs I was stunned by what I saw. I used Census 2011 data (the '10% sample', available for downloading via DataFirst), but General Household Survey data going up to 2016 indicate that the patterns persist. Obviously, the Census 2011 data have far more observations, hence I'm using that here. Surely the Christmas present that many South African children really need is a pair of glasses. This must explain some of the learning inequalities in schools. Eye glass use amongst white and Indian children, and the gender breakdown of this, is about in line with what you'd find in developed (rich) countries. But on average, these children are many times more likely to have eye glasses than their black African and coloured compatriots, and the gap increases precisely at those ages where having glasses becomes particularly important for learning. The data suggest that some provincial governments are better than others at doing the right thing: identifying who needs glasses and supplying the need through a public intervention. Yet in every province large race-based differences persist, in an area of service delivery where the cost of the intervention would be relatively small.

11 NOVEMBER 2017

These graphs go with the discussion one post down.

I've redone the five graphs after realising that Treasury's spreadsheets reflect, for 2010/11 and before, budget programmes, in particular TVET colleges, which were moved to the national level in 2015. That means that even if one uses the most recently released values for each financial year (this one should always do), one still finds that 2010/11 and 2011/12 are not comparable. It's not apples and apples. So one has to adjust the figures for 2010/11 and before. I did that, and in general the graphs don't look that different, though the situation appears to improve a bit (because the base years now have downwardly adjusted figures).

11 NOVEMBER 2017

National Treasury's spreadsheets, published each year, which reproduce key figures from the Estimates of Provincial Revenue and Expenditure (EPRE) are useful and under-utilised. It's difficult to find something this good on Ministry of Finance websites in other developing countries with decentralised spending. One thing the Treasury figures indicate is that very serious financial squeezes in education are currently experienced by all provinces, not just Gauteng (something which one may conclude from reading the article below). In the graphs I'm posting above, it's clear that a loss of purchasing power is something most provinces have been experiencing for some years (it's all provinces if you consider purchasing power in per learner terms as enrolments have been rising). Clearly, CPI is not a good basis for calculating the purchasing power of provincal education departments. The actual costs faced by these departments are rising much faster than CPI, largely due to above-inflation wage increments. But there are other factors too which contribute to basic education's financial squeeze. Health costs have risen enormously, meaning this sector is accounting for an increasing percentage of overall provincial spending. More than ever, we need good analysis of the development and poverty alleviation implications of budget reprioritisation. I'm not saying the current realignments in favour of health are wrong, but if they are right, we need to know why. Despite the declining purchasing power of the public schooling system, in recent years public spending on the sector has actually risen as a percentage of GDP. That's largely because economic growth as been so slow. link here.

7 OCTOBER 2017

My response to link here appears below...

For an article on mathematics and science innovation, one might have expected a more rigorous engagement with the numbers. The claim that only one out of every five science, engineering and technology enrolments will graduate seems strange and is at odds with publicly available figures produced recently by Van Broekhuizen and the Council for Higher Education, which point to a graduation ratio of around 50% in these subjects (the National Council for Innovation report is not available on the web yet, so I can't check exactly what the article's source says). Okay, average performance in schools is still way too low. Agreed. But you can't relegate improvements over the last ten years to a footnote on 'a percentage point here and there' when international testing systems (TIMSS and SACMEQ) have shown improvements which several South African researchers (including myself) argue are fast by middle income country standards. If the author had engaged properly with these numbers, and perhaps challenged them, it would have been a far more interesting and valuable article. In support of the notion of a non-static system, my own analysis points to an increase of around 60% since 2008 in the number of black matriculants with high-level mathematics competencies. On journal publications, okay, Mouton has pointed to the problem of predatory journal publications, but do the numbers he has released wipe out the 1996 to 2015 tripling in research outputs? I haven't done the maths on this, but I highly doubt that this would be the case. The fact that relatively privileged kids in independent schools do much better than poor kids in no-fee public schools has little meaning on its own. In every country the rich do considerably better than the non-rich in school because of home background advantages. The crucial question is the size of South Africa's test score inequality relative to that of other countries. TIMSS figures suggest that whilst this inequality in South Africa is not unlike what one finds in other developing countries, what is clearly problematic are low scores across the entire range, even for the relatively rich. Whilst the middle class in South Africa fares better than the poor, even our middle class lags behind similar groups in other countries. I agree that our inequalities are horrendous, and that we have a long way to go, but this article's portrayal of a static gloom is not helpful.


These three maps provide a picture of how the school quintiles are distributed across the country. Each cell is not a school, but a geographical area within which one or more schools can be found. I find using these cells (or tesselations, or honeycombs) far more useful than points per school. The problem with the latter is that in densely populated areas, schools points end up covering each other, meaning you can't see what's going on. One thing I learnt from these maps is how Eastern Cape essentially placed all their quintile 1 (poorest category) schools in what used to be Transkei, meaning there are virtually no quintile 1 schools in the rest of the province. Limpopo also has a few areas (often quite poor ones), with no quintile 1 schools. This made a big difference up to 2012, when quintile 1 schools were funded considerably more than quintile 2 and 3 schools (in terms of the non-personnel 'school allocation'). But since then, quintiles 1 to 3 funding has been equalised. I found out that two publicly available sources have somewhat different data on school quintiles: the 2016 master list of schools on the Department of Basic Education's (DBE) website, and 2016 Snap Survey data shared by the DBE via the DataFirst portal. I used the Snap source, as the quintile data found there is clearly more updated. For instance, the Snap data reflect the recent re-classification of many quintiles 4 and 5 schools into lower quintiles in Mpumalanga, a step taken to justify more spending on schools considered to be under-funded, and unable to raise sufficient revenue through school fees. Feel free to use these maps if you are working in this area (but acknowledge the source). Stata (which I use for statistical analysis) now has rather useful spatial analysis tools. I've put together a small tutorial which can be found if you google "Stata .do file with tips and tricks for producing maps".

the materials are at link here, under the 2013 entries. It's the 'producing maps' one. All the best

17 JULY 2017

Came out in Business Day here. There are some edits I wasn't involved in (what's a 'nonpublic data analyst', hopefully not me!), so here's the link to what I submitted: link here.

My paper with the details behind this is available at link here.

I'm even more convinced it was ARV access and not the child support grant after reading an important academic article by Myer et al available at link here. That article found that an increase in ARV access roughly during the period I'm looking at led to increased pregnancies in South Africa of a magnitude roughly in line with the increases suggested by the enrolment and birth registrations data.

1 MAY 2017

Thanks to all these public holidays I've managed to complete something the environmentalist geek in me has wanted to complete for a while. How green is an electric car in the South African context? Does riding in a minibus taxi carry a larger or smaller carbon footprint than riding on the Gautrain? A more legible PDF version at link here. All the formulas and sources at link here.

9 APRIL 2017

Two important statistics on educational attainment. In the last three or so years, around 88% of youths have successfully completed Grade 9, whilst 55% have successfully completed Grade 12. The second stat has improved rather substantially in recent years. The first has remained stubbornly fixed, more or less, for many years. My own sense is we should worry a bit more about the first of these numbers.

9 APRIL 2017

This flows from the previous two posts below. There I looked at Grade 12 attainment. Here I look at Grade 9 attainment. Nationally, around 12% of youths did not successfully complete Grade 9 in recent years. I think that is worth worrying a bit more about. Grade 9 is the end of the General Education and Training band, or what could be considered truly basic education in South Africa. The situation is especially bad in Eastern Cape. Perhaps some of the worrying currently directed towards youths not getting the Matric could be shifted towards the problem of youths not even completing Grade 9 successfully. Note how late many youths complete Grade 9 - the peak for Gauteng and KwaZulu-Natal is 21 years, for instance. (Source here is three years of General Household Survey data: 2013, 2014 2015.)

9 APRIL 2017

One post below I explain that around 55% of South African youths obtain a Grade 12 certificate (the 'Matric'). Despite the fact that other analysts have obtained lower figures, which I would argue are clearly incorrect, for a lot of people I talk to, 55% seems like a low figure. How can 45% of youths NOT be obtaining a Matric, they ask? Can the problem really be this large? The explanation lies partly in the fact that many of these people are Gautengers, and in Gauteng the situation is not that bad. In that province around 63% of youths have been getting the Matric in recent years. However, the situation is much worse in certain other parts of the country. That's what the map below shows. Here I've used Census 2011 data (the 10% sample version), which is a bit dated, but is great insofar is allows for analysis at a very local level. In certain municipalities in the 'three Capes', fewer than 30% of youths had the Matric. Importantly, this is due in part to dropping out from school, but also to some degree to migration. More educated people are more likely to move away from remote and rural areas. The worst municipality in 2011 was the Free State muncipality of Mohokare. Here only 15% of youths had a Matric in 2011. Northern KwaZulu-Natal is interesting. Many different data sources have pointed to the fact that this province has for a long time been relatively good at ensuring that youths do not drop out of school, and that they obtain the Matric. (In calculating the stats for this graph, I've used the method I outlined one post down, where I select the age with the highest attainment figure.)

9 APRIL 2017

Statistics on grade attainment (How many youths successfully complete Grade X?) are often wrong. There are three main reasons for this. First, those who use just household data (generally a good option) frequently trip over the following question: What age should I use? (Should I ask what proportion of 16 year olds have completed Grade 9? for example. Or should I use age 18?, etc.). My advice would be use ALL ages, and then find the maximum, whilst controlling a bit for 'bumps' in the data resulting from the sample-based nature of the data. Thus in the graph below, which uses General Household Survey data 2012-2014 (the average across the three years), the conclusion would be that 49% of youths in Northern Cape successfully complete Grade 12. The smooth curve is a simple polynomial trendline (easily produced in Excel). You would expect a concave curve (a hill as opposed to a valley). At younger ages, many youths haven't obtained their Matric yet. At older ages, youths are too old to have benefitted from expansions to the schooling system. Look where the peak is: at age 25. This is indicative of how late many obtain their Matric. This might look a bit complicated, but I can't think of a better way of using household data to answer the question of how many youths successfully complete X years of education. The second reason why weird attainment stats may arise is if you compare, say, the official number of Grade 12 (Matric) passes per year to the number of, say, 18 year olds in the population based on official Stats SA population estimates. Here the problem is that there are large discrepancies between the Matric and population numbers. If you want to see my conclusions on this matter, you can go to link here or link here. The bottom line is you can't meaningfully divide the one by the other. The third reason is that if you go the route a few have gone and divide, say, the number of Matrics in 2016 by the number of Grade 1 learners in 2004 (12 years earlier) you end up, again, with a meaningless statistic as you're double-counting a lot of Grade 1 pupils, namely the repeaters. Unless you have reliable repeater values from 2004 (and we don't), you can't do the calculation that way. If one takes all these factors into account, and if one spends quite a lot of time analysing the relevant data (some of which is public via e.g. DataFirst, some of which I have access to via my work in DBE), one finds that around 55% of youths are getting a Matric in recent years - there are a number of lower, and incorrect, figures floating around. Full details at link here.

17 DECEMBER 2016

This graph goes with the post one down. South Africa is ZAF.

17 DECEMBER 2016

Some inputs on the South Africa TIMSS here.

27 OCTOBER 2016

I put this graph together, using UNESCO data, to gauge where we sit as South Africa (ZAF) when it comes to the public funding of tertiary (higher) education relative to other countries. What is clearly true, and this is often raised in the current debates, is that overall funding of higher education relative to the economy is rather low. Just 0.73% of GDP is public spending on higher education. The average across the other 34 countries I selected is 1.15%, so 1.6 times our figure. However, the main problem is not that state spending per student is too low, at least in terms of the UNESCO indicator of spending per student relative to GDP per capita. Here our figure of 38% is in position 14 (out of 35 countries). We fare far better against this indicator than Brazil (BRA), Russia (RUS), Thailand (THA) and Indonesia (IDN). But, on the other hand, we fare worse than Malaysia (MYS) and India (IND). (I'm deliberately not referring to the average across the 35 countries here as that comes to almost 100% mainly as a few poor countries display extremely high values, the highest being 1725% for Malawi, which is off the graph.) What does this suggest? Yes, we need to spend more on higher education, but mainly by enrolling many more students. This seems a much more urgent priority than raising public spending per student. I'm not saying the latter is not also a good idea, to some extent, but in the bigger picture the really big challenge is to have more students. An obvious question would be: But that is spending per student relative to the average income of the population, what about actual spending? Here we don't do too badly either. We're in position 16 amongst the 35, better than Brazil, Thailand and even Russia, but worse than Mexico and Malaysia. That's with respect to US dollars adjusted to take into account the cost of living (so PPP). If we instead use unadjusted US dollars our position is a bit worse: we drop to position 18 (still above Russia and Thailand, but below Brazil). Obviously these types of cross-country comparisons only suggest things. They aren't conclusive. But cross-country comparisons are important, and they should inform our policy debates to a larger degree. One step forward is that we are now amongst the 47% of countries that report tertiary education spending to UNESCO. Not doing so can be seen as indicative of weak capacity in the country to monitor spending. South Africa has these figures on UNESCO's system starting in 2013, after having had nothing for almost 20 years (and the figures from the last apartheid years jump around so much they must be wrong). The 38% per student spending over GDP per capita figure I refer to above looks right to me. I obtained 43% for 2012 in my own attempt to run the calculation (p. 235 at link here.

How did I select the 35 countries? I took the 114 countries with per student spending values for any of the years in the 2011 to 2014 period and then manually selected countries South Africans are likely to have some knowledge of and against whom we are occasionally compared. There are 14 high income countries and 21 developing countries amongst the 35. Above all, I didn't want to clutter the graph. If we use data from all the 114 countries, the picture does not change much. We end up in position 44 of 114 in terms of per student spending relative to GDP per capita, for instance. The dotted line in the graph is a trendline. The fact that it is almost horizontal indicates that there is a not a clear positive or negative correlation between the two indicators. Put differently, the pattern for the 35 countries does not suggest that when you expand your higher education sector relative to the rest of the economy, you should increase or decrease your per student spending (relative to GDP per capita). All this is relatively easy analysis. To really get a firmer grasp of where we are going, and what is possible, when it comes to higher education funding, we need to understand a lot better how public (and even private) spending is spread across students, relative to the socio-economic status of those students. There's very little such research available currently. We could do a lot more with the data we have.

28 JULY 2016

I couldn't resist running the list of local government election candidates (off the IEC's website) through some Stata analysis (obviously other people have done similar things)... Altogether 40,565 people are standing for election. Of these, 14,931 are only list candidates, so they are not linked to specific wards. 15,637 are only appearing as ward candidates. The remaining 9,997 are standing for both wards and as list candidates. Of the 25,634 linked to wards, 3,098 are standing in more than one ward. The record here is Michelle Maria Calitz (National People's Party), who is standing in 116 of the 262 wards in Cape Town. The EFF is fielding the youngest and the oldest candidates, namely Sisonke Jaca (who will just have turned 18 on election day) and Arnold Mthuthuzeli Specman (age 90). Both are from Eastern Cape. Twenty-nine candidates are standing for two parties, for instance for different parties in two different wards. The presence of larger parties is not as predominant as many might imagine. The five largest parties in terms of number of candidates (ANC, EFF, DA, IFP, COPE) account for just 65% of all candidates. Turning to age, as one might expect the EFF has the youngest candidates (see graph). The EFF comes out on top amongst the Big Five in terms of percentage of women (see the graph). If one looks at party (Big Five) and province, and limits the analysis to instances where there are at least 20 candidates, then ANC in Northern Cape comes out best in terms of the percentage of women (54%) whilst the worst is IFP in Limpopo (24%).


28 JUNE 2016

Valid points on the problem of exaggerating the graduate unemployment phenomenon in South Africa at link here (see my comment on the News24 page).

20 JUNE 2016

I like this map. Above all, it confirms that schooling in South Africa is very often not a question of primary schooling up to Grade 7, then secondary schooling from Grade 8 to 12. The apartheid legacy is strong in the sense that if you're in what used to be Transkei, the chances are strong that you'll go up to Grade 9 in one school and then (with luck) move to another Grade 10 to 12 school. This structural reality lies behind many patterns, such as relatively low participation in schooling in Eastern Cape beyond Grade 9. Or the rather good levels of textbook access and academic (ANA) performance in Grade 9 in Eastern Cape, at least compared to other poor provinces. Grade 9 enjoys a certain 'final grade' status. (The map is from link here.

The names of schools and the grades they offer plus a lot more is available in the Snap Survey downloads obtainable vialink here.

20 JUNE 2016

The technical paper on understanding the mathematics and science trends in Grade 12 is uploaded at link here.

25 MAY 2016

Here's the comment I entered on the SAIRR page (see their press release below): I find a lot of the analysis produced by SAIRR interesting and engaging, but last February's FastFacts was incredibly weak, and fundamentally incorrect in many respects. Let me deal with your five bullets here on the schooling sector (my area of specialisation). Yes, poverty is a determinant of the quality of education a child will receive at school, in every country in the world where poverty exists. That's not news. Teachers shy away from socio-economically disadvantaged schools, poor learners get less support in the home, and so on. But how does this support your 'schools present the biggest obstacle' hypothesis? True, just under half of South Africa's youths don't complete twelve years of schooling, but this is roughly in line with the situation in other middle income countries. Even in the US, about 10% of youths don't complete twelve years. So what is your argument? School completion in South Africa is a matter that deserves serious attention, but I'm afraid you didn't add any value here. Yes, race-based inequities in education remain huge. The question is, though, whether these gaps are narrowing. An article I wrote in response to your FastFacts provides evidence that they are. The worst mistake you make is to argue that the quality of mathematics in schools in declining, on the basis of just two clearly incomparable statistics, and in contradiction of the widely publicised TIMSS 2002 to 2011 trends. More on that in my article. Independent school enrolments have indeed grown, from around 3% to 4% of all enrolments 2005 to 2015. Here I suspect frustration with problems in public schools, and there are many problems, have played a role, but hard evidence here seems scarce, and these trends barely support your central hypothesis. My concerns are not just academic. When the SAIRR misrepresents educational trends in the country it influences investors, both foreign and local. My article can be found at link here.

The article with my response is also on the City Press website: link here.

8 MAY 2016

The Statistician-General's response to Stephen Taylor at link here is, as I see it, less a disagreement with Taylor's argument and more a questioning of whether South Africa is succeeding in getting the balance right between school education and post-school education. Specifically, have we perhaps over-prioritised schooling at the expense of post-school education? My graph (see link here) suggests that in terms of per student spending we're more or less in line with other upper middle income (UMC) countries, but could be over-prioritising, in terms of enrolments, upper secondary schooling whilst under-prioritising the tertiary level.

1 MAY 2016

Important opinion piece which corrects a few alarming, but wrong, perceptions around supposed declines in the education levels of young South Africanlink here.

27 APRIL 2016

I think the title takes the argument way too far. (One doesn't get to choose these titles oneself!) But I still stand by the rest of the article. [Article available at link here.

27 APRIL 2016

link here[Opinion piece available at link here.

27 APRIL 2016

link here [Article available at link here.]