The Bank of Urdu and language`s survival

E-Paper | June 04, 2026

Though you may or may not agree with me, I feel that Urdu's popularity in the 21st century depends largely on its ability to adapt to the force that is going to decide the future of languages technology.

Urdu may survive in the 21st century as a spoken language but without machine readability and its adaptability to the Internet and computer, Urdu won't be able to make it beyond a circle which will keep on shrinking as far as the traditional ways of reading and writing are concerned. The so-called paperless society may still be a distant dream, but more and more people now like to read their newspapers online. Major newspapers in the West are recording a sharp drop in the circulation of their printed editions and are increasingly depending on their Internet editions for both readers and advertising revenues. Recent surveys confirm that the prophets of doom had it right when they predicted a few years ago the death of printed newspapers and now the experts feel that it is a matter of only 20 years or even less, before we see the Internet replacing the printed newspapers.

As far as Urdu is concerned, the situation may not be that grave but persisting with the ways how we read and write might reduce Urdu's radius of influence if not send it to the grave altogether. All over the world, scientific research on languages is on the rise and computer has opened up new vistas of the most complex research in language engineering and linguistics. What was made possible through computers also includes the scientific study of the structures of a language that requires exponential data. With the help of computers, this unbelievably huge data is available online. Known as corpus, this data is revolutionising the way we see a language and the way we compile a dictionary.

The Bank of English is one such repository. Established at Birmingham University, the corpus contains about 650 million collections of written and spoken material in machine-readable form and is available online to the researchers around the world. Though established basically as a repository to provide maximum and reliable information to researchers and lexicographers, the corpus is not a wordlist. It is a collection of material that shows the current and functional use of the English language and gives its inflected orthographic forms. Collin's Cobuild English Dictionary is virtually based on the corpus and in 2006 its fifth edition appeared, incorporating the changes in the use of the language the corpus recorded over the years. Corpus of Contemporary American English (COCA) contains some 385 million such collections. And these figures, mind you, may be outdated, say, in a month as about a million new texts are added to it every month.

Now what haunts an 'Urduwala' like me is where Urdu stands in this world of cyber linguistics and computational grammar that is expanding at breakneck speed. You may be tempted to think that like other areas we are lacking in this field, too. But all is not lost. There are some computer geeks who are also lovers of Urdu, though this is an extremely rare or rather unlikely combination to come by. Urdu and computer are going hand in hand aiming at catapulting the language into the future. Efforts have been afoot to establish The Bank of Urdu, along the lines of the English corpuses.

The Centre of Excellence for Urdu Informatics, established at the National Language Authority (NLA), or Muqtadira Qaumi Zaban, under Dr Atash Durrani, has been working for the establishment of an Urdu Data Bank. At a workshop held at NLA in 2008, it was suggested to change the nomenclature as UDB or Urdu Dictionary Board had the same initials. It was then decided that the Urdu corpus henceforth would be called The Bank of Urdu (TBU) and Urdu Misaal Ghar would be its Urdu equivalent. This database working along the lines of The Bank of English and the COCA would store Urdu texts in machine-readable forms so as to facilitate the researchers working on the patterns of Urdu and the changes in its usage. This, in turn, can be an invaluable source for the lexicographers of Urdu. Though Urdu Dictionary Board's 22-volume Urdu dictionary serves as an Urdu corpus, there are two hitches firstly, it is not machine-readable and, secondly, this dictionary usually takes into account the literary usage while a corpus is supposed to draw examples from different and current sources, including TV, radio, interviews and informal talks.

Dr Sarmad Hussain and Dr Madeeha Aijaz had written a very informative and useful article 'Corpus Based Urdu Lexicon Development' in CLT07, the journal published by the Peshawar University's Computer Science department. Dr Sarmad Hussain has done commendable work for Urdu software development and machine-readable Urdu. Another person who has relentlessly been advocating the case of an Urdu corpus is Dr Hafiz Safwan Mohammad Chohan. He has not only written many an article on the issue, but also has worked for his PhD under Prof John McHardy Sinclair (1933-2007), the professor of Modern English Language at the Birmingham University and the moving spirit behind the British National Corpus (BNC) and the Collin's Cobuild dictionaries.

Hafiz Sahib's latest article on the issue of The Bank of Urdu has appeared in Bahauddin Zakaria University's Journal of Research. Let me have a few words about the journal here. Published from Multan's BZU's department of Urdu, the journal is among the ones that carry valuable research papers and its every issue has something to offer. The just-published current issue (2008, volume 14) carries, among others, an absorbing research paper on 19th century Urdu chronograms. Written jointly by Dr Anwaar Ahmed and Abrar Abdus Salam, the paper gives some very interesting chronograms and their background.

Hafiz Safwan's article in the journal emphasises the need for establishing an Urdu corpus along the lines of the COCA and The Bank of English which, according to him “are serving as the backbone of English language engineering, discourse analysis, corpus and lexicon development”.

This proposed Urdu corpus, The Bank of Urdu (TBU), he says, “will be a repository of Urdu texts of both written and spoken language gathered in platform-independent and machine-readable Indo-Perso-Arabic script.... Aimed at discourse analysis, language engineering and natural language processing in Urdu and, of course, providing a vital base for contemporary Urdu lexicon development, this proposed portal will not only separate the Urdu language from Urdu literature, but will also cast regional Pakistani languages in stationing their scholarly resources in their own scripts for such researches”.

drraufparekh@yahoo.com

Our readers are at the heart of everything we do.
Do you have a thought to share or a way we can improve? We’d love to hear it. Reach out to us at feedback@dawn.com.

Branded Content

One Homes announces $35m foreign investment in Lahore after success of One Canal Road development

One Homes has acquired a prime site on Raiwind Road, where the developer will launch its next landmark residential project in Lahore.

K&N's SmartCooking Recipes: Lucknowi Kabab Handi

Tender kababs simmered in a rich, creamy Lucknowi-style handi sauce infused with aromatic spices and cashew paste.

'From Collapse to Comeback': The rise of ABHI Bank

The shareholder’s origins have always been rooted in technology and financial innovation.

Most Popular

US proposes new tariffs on 60 economies, including Pakistan, over failure to act on forced labour

Gang rape victim dies during botched abortion in Lahore

Missing confidence

SBP launches riyal, dirham Naya Pakistan Certificates

Trump loses his temper with Netanyahu in expletive-laden call on Israel's escalation in Lebanon: report

US House passes resolution calling for end to war on Iran, rebuking Trump

Dubai's non-oil miracle faces its hardest test

Bangladesh elected president of UN General Assembly in closely contested vote

In meeting with business leaders, PM Shehbaz says govt taking measures to bring informal economy into tax net

Latest Stories

At Ghizer rally, Bilawal urges voters to give PPP ‘heavy mandate’ to safeguard region’s rights

SHC orders protection for newlywed Jacobabad couple after angry relatives torch village

Vast astronaut mission kicks off commercial race to replace ISS

Australia win toss, choose to bat against Pakistan in final ODI

India sees 'perfect complimentarity' with Venezuela in energy trade amid Gulf crisis

Noor Mukadam case: SC dismisses Zahir Jaffer's review plea, upholds death sentence

Opinion

Weaponising climate

Ali Tauqeer Sheikh

No relief fund has ever been legally ring-fenced.

Updated 04 Jun, 2026

The embrace of Nessus

Khurram Husain

04 Jun, 2026

The healing pen

F.S. Aijazuddin

04 Jun, 2026

Environmental reckoning

Aisha Khan

03 Jun, 2026

Fighting Israel’s war

Zahid Hussain

Editorial

Updated 04 Jun, 2026

Budget delay

With economic stabilisation yet to translate into tangible improvement in living standards, the country’s leaders are finding it increasingly difficult to ignore demands for relief.

04 Jun, 2026

Absentee lawmakers

TWENTY per cent. That is the percentage of lawmakers whose commitment to their vocation is reflected in the time ...

04 Jun, 2026

Deliberate provocationst

THE latest events at Al-Aqsa Mosque reflect the growing impunity with which extremist Israeli settlers operate. ...

03 Jun, 2026

Missing confidence

For the government, the economy may be more stable now than it was three years ago, but for manufacturers and exporters, it is still difficult to do business.

03 Jun, 2026