Pashto making inroads into modern computational linguistics

Published July 24, 2019
Prof Omar Usman Marwat. — Dawn
Prof Omar Usman Marwat. — Dawn

PESHAWAR: Languages around the world were being integrated into technology to serve communities better and Pashto was fast heading towards achieving that integration, said experts.

They said that it would open up new avenues and possibilities for scholars and linguists worldwide and also the work would be beneficial to expand work on analysis of Pashto and its dialects, develop new software tools and pave way for standardised scripts.

Most recent achievements in this regard were the activities carried out under the auspices of an MoU between Pashto Academy, University of Peshawar, and the FAST-NUCES Centre for Computational Linguistics (CoCL), inked earlier this year.

Prof Omar Usman Marwat, an expert on computational linguistics, told this scribe that the collaboration had already been resulted in preparation of an online dictionary and thesaurus of Pashto; Pashto part of speech taggers; and a Pashto Treebank for grammar checking. He said that the work would eventually contribute towards the field of Pashto dialectology, machine translation, lexicology, morphology and phonology and would be available online under open licences.

Expert says 20 per cent vocabulary of online dictionary already uploaded on Pashto Academy’s website

Prof Omar said that the ‘Pashto to Pashto’ dictionary contained a total of 98,000 words and terminologies, prepared painstakingly over many decades by the Pashto Academy, while efforts were underway to make that wealth of information available online within the current year.

He said that about 20 per cent of the vocabulary already uploaded online could be searched on the Pashto Academy’s website.

The expert stated that the field visits for preparation of an online speech corpus had also been carried out as hundreds of speech samples of Afridi, Shinwari, Malagori, Khattak and Yousafzai, Peshawar, Charsadda, Swabi, Mardan, Swat, Shangla, Wazir, Mehsud, Marwat, Betani, Kakar, Banusi and Dawar dialects had also been acquired and being transcribed according to the International Phonetic Association (IPA) standard.

“Dialect analysis by linguists and their findings had already been presented at a dialect conference in Baragali earlier this month. The speech corpus will be made online and will pave way for Pashto dialect analysis and development of speech recognition software,” said Prof Omar.

Experts said that provision of those basic datasets was important to shift towards Natural Language Processing (NLP). They said that NLP was an inter-disciplinary field, which made use of artificial intelligence techniques to help computers read, decipher and understand natural languages of humans in a manner that it was valuable and beneficial to the community.

Prof Omar said that provision of basic tools and data sets would elevate status of Pashto from resource limited to a resourceful language because those developments could then open opportunities for the research community to create software that could convert text to speech, speech to text, images to text, and electro-medical signals to various actions.

Published in Dawn, July 24th, 2019

Opinion

Editorial

X post facto
Updated 19 Apr, 2024

X post facto

Our decision-makers should realise the harm they are causing.
Insufficient inquiry
19 Apr, 2024

Insufficient inquiry

UNLESS the state is honest about the mistakes its functionaries have made, we will be doomed to repeat our follies....
Melting glaciers
19 Apr, 2024

Melting glaciers

AFTER several rain-related deaths in KP in recent days, the Provincial Disaster Management Authority has sprung into...
IMF’s projections
Updated 18 Apr, 2024

IMF’s projections

The problems are well-known and the country is aware of what is needed to stabilise the economy; the challenge is follow-through and implementation.
Hepatitis crisis
18 Apr, 2024

Hepatitis crisis

THE sheer scale of the crisis is staggering. A new WHO report flags Pakistan as the country with the highest number...
Never-ending suffering
18 Apr, 2024

Never-ending suffering

OVER the weekend, the world witnessed an intense spectacle when Iran launched its drone-and-missile barrage against...