How is data collected and tabulated for Pakistan's 2017 census?

Published April 7, 2017
An official from the Pakistan Bureau of Statistics collects information from a resident during a census as security personnel guard them in Peshawar on March 15.— AFP
An official from the Pakistan Bureau of Statistics collects information from a resident during a census as security personnel guard them in Peshawar on March 15.— AFP

A population and household census is underway in the country after a delay of nine years. In the two decades since the last census of 1998, there has been a great transformation in ways in which census data is collected and analysed across the globe.

Historically, censuses have been conducted manually by teams of enumerators and statisticians who gather, compile, and analyse data using paper-based forms. However, this classical method is no longer the only way governments collect and analyse information about their populations.

Census information is now increasingly being gathered through online questionnaires, toll-free telephone numbers, and pre-paid envelopes.

None of these methods are being used in the 2017 census because the Pakistan Bureau of Statistics (PBS), the federal authority responsible for the task, feels that there is no guarantee that these questionnaires will be filled up and returned, “Literacy matters,” says a PBS official.

Though the PBS is collecting census data manually, it is using Optical Character Recognition (OCR) technology to convert this data into machine-readable format and transfer it onto computers.

The OCR system provides full alphanumeric recognition of printed or handwritten characters at electronic speed.

The version available with the Bureau has been updated with an Intelligent Character Recognition (ICR) feature allowing recognition of image data, in particular alphanumeric text. It turns images of handwritten or printed characters into ASCII data (machine-readable format).

Additionally, the OCR technology being used by the Bureau has also been updated for input of data in Urdu language.

Read more: Phase One of Census Enters Second Stage

The OCR technology is not just effective in converting handwritten or typed characters into machine-readable format for tabulation or compilation purposes but also helps cut costs.

The United Nations Statistics Division calculates that use of OCR imaging saves up to two percent of the total cost of the census and requires less staff for data analysis.

However, the OCR is not as accurate as the Optical Mark Recognition (OMR) technology used for data collection in the 1998 census. Data-entry operators at the Bureau are, thus, required to check all forms manually before converting them into machine-readable format. The operators work in batches of 120.

Additionally, the OCR machines also have a built-in automatic error-detection system.

Unlike the OCR, the OMR technology used in 1998 could not recognise hand-printed or machine-printed characters. It featured automated data input using customised paper-based forms.

A common example of OMR usage is in examinations for answering questions with multiple answer choices. Those taking the exam are required to mark their answers on specially printed sheets using either a pencil or a special marker. The data from the sheets is read using the OMR scanner.

Another suggestion floated during the planning phase was to use a tablet-based application for data collection and tabulation, says an official privy to the planning process.

The official says the proponents had argued that the tablet could not only easily count citizens bearing Computerised National Identity Cards (CNICs) but also collect data of those not yet registered by the National Database Registration Authority (NADRA).

“Enumerators could have been linked to the NADRA system. The Punjab Information Technology Board (PITB) was willing to provide the technological expertise in this regard,” he says.

However, the suggestion was dropped as no consensus could be reached on it. It was argued that the procurement of these tablets would be expensive and time-consuming.

There were also concerns about transparency and credibility of the software used with tablets. “There was not enough time to procure these devices and programme them to suit the needs of the census,” says another official familiar with the matter.

PBS officials overseeing the census say that enumerators are collecting data on two forms.

Form 1 is being used to count houses and Form 2 to count households. The bureau expects to complete the count and release a provisional analysis of the data in two months.

This information will provide a clear picture of the country’s demographics and will end reliance on projections and estimates for a range of activities including delimitation of constituencies and distribution of seats in the parliament, development funds and tax revenues as well as lead to more informed policies.

This article originally appeared in MIT Tech Review Pakistan and has been reproduced with permission.

Opinion

Editorial

Failed martial law
Updated 05 Dec, 2024

Failed martial law

Appetite for non-democratic systems of governance appears to be shrinking rapidly. Perhaps more countries are now realising the futility of rule by force.
Holding the key
05 Dec, 2024

Holding the key

IN the view of one learned judge of the Supreme Court’s recently formed constitutional bench, parliament holds the...
New low
05 Dec, 2024

New low

WHERE does one go from here? In the latest blow to women’s rights in Afghanistan, the Taliban regime has barred...
Online oppression
Updated 04 Dec, 2024

Online oppression

Plan to bring changes to Peca is simply another attempt to suffocate dissent. It shows how the state continues to prioritise control over real cybersecurity concerns.
The right call
04 Dec, 2024

The right call

AMIDST the ongoing tussle between the federal government and the main opposition party, several critical issues...
Acting cautiously
04 Dec, 2024

Acting cautiously

IT appears too big a temptation to ignore. The wider expectations for a steeper reduction in the borrowing costs...