Alert Sign Dear reader, online ads enable us to deliver the journalism you value. Please support us by taking a moment to turn off Adblock on

Alert Sign Dear reader, please upgrade to the latest version of IE to have a better reading experience


The Bank of Urdu and language`s survival

May 05, 2009


Though you may or may not agree with me, I feel that Urdu's popularity in the 21st century depends largely on its ability to adapt to the force that is going to decide the future of languages technology.

Urdu may survive in the 21st century as a spoken language but without machine readability and its adaptability to the Internet and computer, Urdu won't be able to make it beyond a circle which will keep on shrinking as far as the traditional ways of reading and writing are concerned. The so-called paperless society may still be a distant dream, but more and more people now like to read their newspapers online. Major newspapers in the West are recording a sharp drop in the circulation of their printed editions and are increasingly depending on their Internet editions for both readers and advertising revenues. Recent surveys confirm that the prophets of doom had it right when they predicted a few years ago the death of printed newspapers and now the experts feel that it is a matter of only 20 years or even less, before we see the Internet replacing the printed newspapers.

As far as Urdu is concerned, the situation may not be that grave but persisting with the ways how we read and write might reduce Urdu's radius of influence if not send it to the grave altogether. All over the world, scientific research on languages is on the rise and computer has opened up new vistas of the most complex research in language engineering and linguistics. What was made possible through computers also includes the scientific study of the structures of a language that requires exponential data. With the help of computers, this unbelievably huge data is available online. Known as corpus, this data is revolutionising the way we see a language and the way we compile a dictionary.

The Bank of English is one such repository. Established at Birmingham University, the corpus contains about 650 million collections of written and spoken material in machine-readable form and is available online to the researchers around the world. Though established basically as a repository to provide maximum and reliable information to researchers and lexicographers, the corpus is not a wordlist. It is a collection of material that shows the current and functional use of the English language and gives its inflected orthographic forms. Collin's Cobuild English Dictionary is virtually based on the corpus and in 2006 its fifth edition appeared, incorporating the changes in the use of the language the corpus recorded over the years. Corpus of Contemporary American English (COCA) contains some 385 million such collections. And these figures, mind you, may be outdated, say, in a month as about a million new texts are added to it every month.

Now what haunts an 'Urduwala' like me is where Urdu stands in this world of cyber linguistics and computational grammar that is expanding at breakneck speed. You may be tempted to think that like other areas we are lacking in this field, too. But all is not lost. There are some computer geeks who are also lovers of Urdu, though this is an extremely rare or rather unlikely combination to come by. Urdu and computer are going hand in hand aiming at catapulting the language into the future. Efforts have been afoot to establish The Bank of Urdu, along the lines of the English corpuses.

The Centre of Excellence for Urdu Informatics, established at the National Language Authority (NLA), or Muqtadira Qaumi Zaban, under Dr Atash Durrani, has been working for the establishment of an Urdu Data Bank. At a workshop held at NLA in 2008, it was suggested to change the nomenclature as UDB or Urdu Dictionary Board had the same initials. It was then decided that the Urdu corpus henceforth would be called The Bank of Urdu (TBU) and Urdu Misaal Ghar would be its Urdu equivalent. This database working along the lines of The Bank of English and the COCA would store Urdu texts in machine-readable forms so as to facilitate the researchers working on the patterns of Urdu and the changes in its usage. This, in turn, can be an invaluable source for the lexicographers of Urdu. Though Urdu Dictionary Board's 22-volume Urdu dictionary serves as an Urdu corpus, there are two hitches firstly, it is not machine-readable and, secondly, this dictionary usually takes into account the literary usage while a corpus is supposed to draw examples from different and current sources, including TV, radio, interviews and informal talks.

Dr Sarmad Hussain and Dr Madeeha Aijaz had written a very informative and useful article 'Corpus Based Urdu Lexicon Development' in CLT07, the journal published by the Peshawar University's Computer Science department. Dr Sarmad Hussain has done commendable work for Urdu software development and machine-readable Urdu. Another person who has relentlessly been advocating the case of an Urdu corpus is Dr Hafiz Safwan Mohammad Chohan. He has not only written many an article on the issue, but also has worked for his PhD under Prof John McHardy Sinclair (1933-2007), the professor of Modern English Language at the Birmingham University and the moving spirit behind the British National Corpus (BNC) and the Collin's Cobuild dictionaries.

Hafiz Sahib's latest article on the issue of The Bank of Urdu has appeared in Bahauddin Zakaria University's Journal of Research. Let me have a few words about the journal here. Published from Multan's BZU's department of Urdu, the journal is among the ones that carry valuable research papers and its every issue has something to offer. The just-published current issue (2008, volume 14) carries, among others, an absorbing research paper on 19th century Urdu chronograms. Written jointly by Dr Anwaar Ahmed and Abrar Abdus Salam, the paper gives some very interesting chronograms and their background.

Hafiz Safwan's article in the journal emphasises the need for establishing an Urdu corpus along the lines of the COCA and The Bank of English which, according to him “are serving as the backbone of English language engineering, discourse analysis, corpus and lexicon development”.

This proposed Urdu corpus, The Bank of Urdu (TBU), he says, “will be a repository of Urdu texts of both written and spoken language gathered in platform-independent and machine-readable Indo-Perso-Arabic script.... Aimed at discourse analysis, language engineering and natural language processing in Urdu and, of course, providing a vital base for contemporary Urdu lexicon development, this proposed portal will not only separate the Urdu language from Urdu literature, but will also cast regional Pakistani languages in stationing their scholarly resources in their own scripts for such researches”.