Pashto making inroads into modern computational linguistics

Published July 24, 2019
Prof Omar Usman Marwat. — Dawn
Prof Omar Usman Marwat. — Dawn

PESHAWAR: Languages around the world were being integrated into technology to serve communities better and Pashto was fast heading towards achieving that integration, said experts.

They said that it would open up new avenues and possibilities for scholars and linguists worldwide and also the work would be beneficial to expand work on analysis of Pashto and its dialects, develop new software tools and pave way for standardised scripts.

Most recent achievements in this regard were the activities carried out under the auspices of an MoU between Pashto Academy, University of Peshawar, and the FAST-NUCES Centre for Computational Linguistics (CoCL), inked earlier this year.

Prof Omar Usman Marwat, an expert on computational linguistics, told this scribe that the collaboration had already been resulted in preparation of an online dictionary and thesaurus of Pashto; Pashto part of speech taggers; and a Pashto Treebank for grammar checking. He said that the work would eventually contribute towards the field of Pashto dialectology, machine translation, lexicology, morphology and phonology and would be available online under open licences.

Expert says 20 per cent vocabulary of online dictionary already uploaded on Pashto Academy’s website

Prof Omar said that the ‘Pashto to Pashto’ dictionary contained a total of 98,000 words and terminologies, prepared painstakingly over many decades by the Pashto Academy, while efforts were underway to make that wealth of information available online within the current year.

He said that about 20 per cent of the vocabulary already uploaded online could be searched on the Pashto Academy’s website.

The expert stated that the field visits for preparation of an online speech corpus had also been carried out as hundreds of speech samples of Afridi, Shinwari, Malagori, Khattak and Yousafzai, Peshawar, Charsadda, Swabi, Mardan, Swat, Shangla, Wazir, Mehsud, Marwat, Betani, Kakar, Banusi and Dawar dialects had also been acquired and being transcribed according to the International Phonetic Association (IPA) standard.

“Dialect analysis by linguists and their findings had already been presented at a dialect conference in Baragali earlier this month. The speech corpus will be made online and will pave way for Pashto dialect analysis and development of speech recognition software,” said Prof Omar.

Experts said that provision of those basic datasets was important to shift towards Natural Language Processing (NLP). They said that NLP was an inter-disciplinary field, which made use of artificial intelligence techniques to help computers read, decipher and understand natural languages of humans in a manner that it was valuable and beneficial to the community.

Prof Omar said that provision of basic tools and data sets would elevate status of Pashto from resource limited to a resourceful language because those developments could then open opportunities for the research community to create software that could convert text to speech, speech to text, images to text, and electro-medical signals to various actions.

Published in Dawn, July 24th, 2019

Opinion

Editorial

Punishing evaders
02 May, 2024

Punishing evaders

THE FBR’s decision to block mobile phone connections of more than half a million individuals who did not file...
Engaging Riyadh
Updated 02 May, 2024

Engaging Riyadh

It must be stressed that to pull in maximum foreign investment, a climate of domestic political stability is crucial.
Freedom to question
02 May, 2024

Freedom to question

WITH frequently suspended freedoms, increasing violence and few to speak out for the oppressed, it is unlikely that...
Wheat protests
Updated 01 May, 2024

Wheat protests

The government should withdraw from the wheat trade gradually, replacing the existing market support mechanism with an effective new one over the next several years.
Polio drive
01 May, 2024

Polio drive

THE year’s fourth polio drive has kicked off across Pakistan, with the aim to immunise more than 24m children ...
Workers’ struggle
Updated 01 May, 2024

Workers’ struggle

Yet the struggle to secure a living wage — and decent working conditions — for the toiling masses must continue.