Show me the way you talk and I tell you who you are. Mobile phone metadata carry valuable information about the user’s behaviour and characteristics, such as the ability to read and write. This begs the question: Can they help countries to monitor literacy? In this project we complemented a major household survey with mobile phone metadata to improve the measurement of illiteracy in Senegal. Using mobile phone metadata as auxiliary data is a promising alternative to other administrative data sources, which lack availability in many developing countries.
Illiterate people call rather
than send text messages
In its 2030 Agenda for Sustainable Development, the world community committed to leave no one behind . This seems to be a bold promise considering the way data on, for instance, poverty and education are currently collected: Censuses are held every 10 years and even though surveys are undertaken more frequently, they only take account of a small fraction of the population. This leaves room for local hotspots to develop unnoticed and it derides policy makers of the possibility to plan evidence-based, targeted and timely interventions in these places. Mobile communication has become a global phenomenon and is now part of many people’s everyday life around the world. Consequently, data which are collected for billing purposes (e.g. who talks or sends a message to whom for how long in which place) – the call detail records (CDRs) – have become a valuable source for information on users’ habits and characteristics. While mobile communication is in general gradually shifting towards internet-based services, text messages and calls still remain the main channels of communication in many parts of the world. Consequently, when thinking about communication patterns being split into text messages and calls, literacy becomes a natural indicator of choice: People who have limited abilities to read and write may prefer a call over a text message.
Augmenting surveys with
call detail records
We conducted a study  aimed at testing the feasibility and determining the usability of CDRs to improve measurements of illiteracy in developing countries. The proposed method incorporates augmenting traditional surveys with CDRs in order to combine the advantages of both data sources: First, traditional surveys provide a rigorous methodology. For example, the survey used in this study applies a stratified two-stage cluster sampling design in which a predefined number of census districts per region are randomly selected. The household lists within those census districts are then updated and a certain share of those households randomly selected for interviewing. Second, CDRs provide larger sample sizes as well as greater geographical coverage.
We used CDRs from Senegal in 2013, kindly provided by Orange/Sonatel as part of the Data for Development (D4D) Challenge , survey data from the Demographic and Health Survey (DHS) in Senegal and a modified small area model. The outputs are literacy estimates for each commune in Senegal, including those in which interviews for the survey could not have been conducted (Figure 1).
Figure 1: Map of communes in Senegal showing the literacy rate of women on commune level
The project has shown that CDRs carry information on socio-demographic characteristics of their users and that statistical methodologies, well established in the realm of official statistics, can make use of them as auxiliary data. Most intriguing, traditional statistical data such as surveys and censuses and CDRs greatly complement each other in their strengths and weaknesses. CDRs are collected for billing purposes, thus alternative uses carry almost zero marginal production costs and they are cost-insensitive to scale. So why not putting them to use?