How is data collected and tabulated for Pakistan's 2017 census?

Published April 7, 2017
An official from the Pakistan Bureau of Statistics collects information from a resident during a census as security personnel guard them in Peshawar on March 15.— AFP
An official from the Pakistan Bureau of Statistics collects information from a resident during a census as security personnel guard them in Peshawar on March 15.— AFP

A population and household census is underway in the country after a delay of nine years. In the two decades since the last census of 1998, there has been a great transformation in ways in which census data is collected and analysed across the globe.

Historically, censuses have been conducted manually by teams of enumerators and statisticians who gather, compile, and analyse data using paper-based forms. However, this classical method is no longer the only way governments collect and analyse information about their populations.

Census information is now increasingly being gathered through online questionnaires, toll-free telephone numbers, and pre-paid envelopes.

None of these methods are being used in the 2017 census because the Pakistan Bureau of Statistics (PBS), the federal authority responsible for the task, feels that there is no guarantee that these questionnaires will be filled up and returned, “Literacy matters,” says a PBS official.

Though the PBS is collecting census data manually, it is using Optical Character Recognition (OCR) technology to convert this data into machine-readable format and transfer it onto computers.

The OCR system provides full alphanumeric recognition of printed or handwritten characters at electronic speed.

The version available with the Bureau has been updated with an Intelligent Character Recognition (ICR) feature allowing recognition of image data, in particular alphanumeric text. It turns images of handwritten or printed characters into ASCII data (machine-readable format).

Additionally, the OCR technology being used by the Bureau has also been updated for input of data in Urdu language.

Read more: Phase One of Census Enters Second Stage

The OCR technology is not just effective in converting handwritten or typed characters into machine-readable format for tabulation or compilation purposes but also helps cut costs.

The United Nations Statistics Division calculates that use of OCR imaging saves up to two percent of the total cost of the census and requires less staff for data analysis.

However, the OCR is not as accurate as the Optical Mark Recognition (OMR) technology used for data collection in the 1998 census. Data-entry operators at the Bureau are, thus, required to check all forms manually before converting them into machine-readable format. The operators work in batches of 120.

Additionally, the OCR machines also have a built-in automatic error-detection system.

Unlike the OCR, the OMR technology used in 1998 could not recognise hand-printed or machine-printed characters. It featured automated data input using customised paper-based forms.

A common example of OMR usage is in examinations for answering questions with multiple answer choices. Those taking the exam are required to mark their answers on specially printed sheets using either a pencil or a special marker. The data from the sheets is read using the OMR scanner.

Another suggestion floated during the planning phase was to use a tablet-based application for data collection and tabulation, says an official privy to the planning process.

The official says the proponents had argued that the tablet could not only easily count citizens bearing Computerised National Identity Cards (CNICs) but also collect data of those not yet registered by the National Database Registration Authority (NADRA).

“Enumerators could have been linked to the NADRA system. The Punjab Information Technology Board (PITB) was willing to provide the technological expertise in this regard,” he says.

However, the suggestion was dropped as no consensus could be reached on it. It was argued that the procurement of these tablets would be expensive and time-consuming.

There were also concerns about transparency and credibility of the software used with tablets. “There was not enough time to procure these devices and programme them to suit the needs of the census,” says another official familiar with the matter.

PBS officials overseeing the census say that enumerators are collecting data on two forms.

Form 1 is being used to count houses and Form 2 to count households. The bureau expects to complete the count and release a provisional analysis of the data in two months.

This information will provide a clear picture of the country’s demographics and will end reliance on projections and estimates for a range of activities including delimitation of constituencies and distribution of seats in the parliament, development funds and tax revenues as well as lead to more informed policies.

This article originally appeared in MIT Tech Review Pakistan and has been reproduced with permission.

Opinion

Editorial

Time for dialogue
Updated 24 Jun, 2024

Time for dialogue

If the PML-N and PTI remain mired in mutual acrimony, an ever-widening gap will continue to allow non-political forces to assert themselves.
Property taxes
24 Jun, 2024

Property taxes

ACCORDING to reports in the local media, along with the higher taxes imposed on real estate in the recent budget, ...
Fierce heat
24 Jun, 2024

Fierce heat

CLIMATE change is unfolding as predicted by experts: savage heat, melting glaciers, extreme rainfall, drought, ...
China’s concerns
23 Jun, 2024

China’s concerns

Pakistan has no option but to neutralise militant threat to Chinese projects, as well as address its business and political stability concerns.
War drums
23 Jun, 2024

War drums

If it is foolish enough to launch another war in Lebanon, Tel Aviv will be solely responsible for setting the Middle East on fire.
Balochistan budget
23 Jun, 2024

Balochistan budget

BALOCHISTAN’S Rs955.6bn budget for the fiscal year 2024-25 makes many pledges to the poor citizens of Pakistan’s...