सोमवार, 16 जून 2008

Meet Mr. Vijay K. Malhotra

Meet Mr. Vijay K. Malhotra
Semantics – The Art of Language Computing
Vijay K. Malhotra, a veteran of several years’ service with Indian Railways, now pursues his passion in Indian languages with the BhashaIndia team. This interview highlights Mr. Malhotra’s work in the field of semantics and his association with the development of Office XP in Hindi
Since when did your develop an interest in languages?
V: My interest in languages dates back to my early education days in Gurukul Kangri University, Hardwar with special emphasis on Sanskrit and Hindi, but it was matured in the University of York, UK while teaching Hindi to the students belonging to various ethnic groups across the globe such as the British, Indians and Africans. For the first time I realized the importance of Linguistics and Language Technology for teaching a language of foreign origin to the non-natives.
Which feature of languages and their technology makes you passionate about them?
V: When I look at the core of the languages, I find two distinct features in any language of the world: Universal features as well as language-specific features. Universal features are those that are common in all languages belonging to even extremely different languages across the globe such as Hindi, Tamil, English, Chinese and Arabic etc. For example, 'khaayaa' (ate) is a transitive verb, which requires an object to be eaten and a subject who eats. Similarly it requires its subject to be an animate. The tree structure of this verb as well as its universal features such as transitivity is common to all languages of the world, but the use of 'ne' with the subject of this verb is a language-specific feature of Hindi. There are certain features which are language group-specific also. For example, the use of 'ko' along with the subject of a specific sentence pattern is common to all languages in India; i.e. 'Raam Ko Bukhaar Hai' in Hindi 'Malaa Taap Aahe' in Marathi 'Ramakku Jwaram' in Tamil, 'Raamannu Paniyaa Nu' in Malayalam, ‘Ramanige Jvar Ide' in Kannada and 'Ramer Taap Aachhe' in Bengali, but in English it is translated as 'Ram has a fever’. Here you will notice the conspicuous absence of the corresponding use of 'Ko'. This shows that India is one single linguistic zone. If we make use of the technology to analyze these features, we can come out with the most sophisticated language tools such as Language Tutor, Auto Correct, Grammar Checker and even machine translation systems in Indian languages. This is how and why I am passionate about the languages and their technology.
Did your family background play any role in your interest in linguistics?
V: Although I belong to a publishing family, my interest in linguistics grew over time while meeting various challenges of the implementation of Hindi in Government offices and PSUs. However my Gurukul background of studying Hindi and Sanskrit for over 14 years played a crucial role in developing my interest in languages but my interest in linguistics developed while teaching Hindi to non-natives in UK.
What made you retire voluntarily from Indian Railways and go back to your passion?
V: Indian Railways is one of the biggest organizations in the world with the working force of over 16 lakh employees and it has direct contact with the common man. As a Director (Official Languages), I was responsible for introduction of Hindi in day-to-today working of Indian Railways. This provided me an opportunity to find ways and means to implement Hindi at various levels across the country. I was also aware that the atmosphere in southern states is not very congenial to implement Hindi and hence I decided to use both culture and technology to promote Hindi. We used theatre to promote Hindi in non-Hindi speaking areas and we also encouraged the use of regional languages at all points of contact with public and finally technology helped us inculcate the habit of using computers for various language applications in Railways including reservation charts. In spite of this initial success, I found that the Language tools such as Language Tutor, Grammar Checker, Machine Translation System etc, are still not available in the market and if given a chance I would devote the rest of my life to minimize the digital divide between the language (Hindi) and technology. Microsoft offered me this opportunity and now I am back to my initial passion.
What were the initiatives taken by you to spread Hindi computing at the grass root level?
V: Since the employees of Indian Railways at grassroots level in Region A (Hindi speaking regions) and B (Maharashtra, Punjab and Gujarat) are well versed in Hindi and wanted English to be replaced with Hindi for day-to-day working as soon as possible, we achieved initial success but with the introduction of computerization, all applications such as reservation charts, salary bills, correspondence, notifications, tenders and contracts were once again being prepared in English even in Hindi-speaking areas, because the computers didn't have the proper facility to work in Hindi. Now, again, there was a challenge to develop computer applications in Hindi. In consultation with General Managers of zonal railways, we decided to prioritize the applications which were meant for the common man such as reservations charts, railway tickets, forms, salary bills, circulars, gazette notifications, identity cards for Class 3 and Class 4 employees. Since the computer applications at that time were confined to word processing only, we had a tough time to prepare the Reservations Charts and computerized railway tickets on proprietary systems in Hindi. Certain applications such as Seniority Lists and Sorting were still not possible in Hindi because of the lack of Language Tools in Hindi. The non-compatibility of various systems used in Indian Railways was another reason of the limited use of applications in Hindi. With the advent of Unicode and XML system in Indian languages raised the hopes of developing computer applications in Hindi, especially at the grassroots level.
How has been the response of the people, especially in the rural areas towards the initiatives?
V: The initiative of Reservation Charts in Hindi was welcomed by the common man in rural as well semi-urban areas. Since, this facility was made available all over India, the response of the passengers across the country was unprecedented. In non-Hindi speaking areas, the secret of our success was to make use of the respective regional languages along with Hindi and English. This could be possible with the extensive use of Computer technology only.
Do you think that the spread of Hindi computing has been successfully replicated for other Indian languages?
V: Since most of the Indian scripts have been originated from Brahmi, there is a lot of commonality among them in terms of their phonetic structure. The scientists working on Indian languages realized this and hence INSCRIPT was evolved as a common keyboard for all Indian languages. ISCII was a common standard for Indian languages and now Unicode has incorporated ISCII to cover all these languages. Unity in Diversity is also true with regards to Indian scripts and languages. Hence I am convinced that spread of Hindi computing has to be successfully replicated for other Indian languages.
You have been involved with projects like developing Hindi Parser with University of Pennsylvania and Hindi Thesaurus with University of York. Tell us something about these two projects?
V: In 1984, I was offered the Nuffield Fellowship to teach Hindi in the University of York, UK for one semester. Along with teaching I was also asked to take up a project of developing a prototype of Hindi Thesaurus. Although it was a Computational Linguistic project, I was not aware of the Computational aspect for developing the same. Similarly I didn't have the formal training of Linguistics so I decided to prepare the Thesaurus manually, but the university authorities arranged initial training of computer for me and asked me to use Oxford Concordance Manual for preparing Concordance and Indexing (sorting) for the same. Although the ideal way was to pick up the basic words though Corpora, but taking in view the limitation of time and lack of my Computational knowledge and skill, I decided to pick up only 2000 basic words and find out synonyms and antonyms for the same. It was also decided to group them under 14 semantic fields in the form of Concordance. This was how we developed a small prototype of Hindi Thesaurus. In 1996, the Computer Science Department of University of Pennsylvania, USA offered me to assist their NLP group to develop the Hindi Parser. I consider this as a great honour, as this invitation had come from the University, which was responsible for developing the 1st computer in the world - ENIAC. TAG (Tree Adjoining Grammar) was my favourite algorithm and it was developed by Prof. Aravind Joshi, Head of Computer Science Department of this University. I have always found this algorithm quite suitable for multiple languages having different syntactic structures such as Hindi and English. Whereas English is a positional language and Hindi a language of relatively free word order. For example, if you change the word order of the sentence "Ram (Subject) killed (verb) Ravan (Object)" and replace Subject with Object "Ravan (Subject) killed (verb) Ram (Object)" the meaning changes completely, but in Hindi the meaning remains the same even after changing the word order . "राम (Subject) ने रावण (Object) को मारा (verb)". "रावण (Object) को राम (Subject) ने मारा (verb)".The TAG handles both languages on the basis of its verb and picks up the universal features as SVO in English and SOV in Hindi. The domain selected for the project was Officialese (The language used in administration). The features of the Officialese are almost similar across the languages. For example, the use of past participles is quite common in Officialese. "Mr.Verma has been transferred from Delhi to Mumbai with effect from March1, 2005 and posted as Director (Operations)".But the typical feature of Hindi is the use of honorific use of Shri Verma. In English, it is enough to use Mr. before the name of the person to show respect and the verb remains singular. But in Hindi even the verb changes into plural. श्री वर्मा निदेशक हो गए (plural). With these examples, I wanted to make it clear that unless language specific are addressed while developing the Parser, Language Tools such as Machine Translation system can not developed successfully. This Parser was found quite useful to analyze language specific features in Hindi.
You have translated Windows XP to Hindi. How was the multilingual support of Windows XP helpful during the translation process, especially while translating the hardcore technical terms?
V: My job was to coordinate and moderate the translated text undertaken by various vendors. Before taking up the translation, I drafted some principles and got it approved by Redmond to maintain the uniformity throughout the system. Accordingly, hard code terms were divided into 2 groups. The terms indicating the concept are normally required to be used with its multiple derivatives. Hence they were translated in Hindi. For example, the term "Format >स्वरूप" has derivative such as "Formatted> स्वरूपित", "Print >मुद्रण" has derivative such as "Printed >मुद्रित", but the English terms already popular in Hindi were retained in the form of their transliteration, such as File, Computer, Window, Office, Bullet, Font etc. were not translated but transliterated in Hindi. Certain acronyms were also transliterated in Hindi because of their popular usage, but proper care was required to be taken while writing the same in Hindi. For example, ROM has been transliterated as रॉम and not as राम. In Hindi; most of the people normally avoid the use of "ardhchandra" such as डाक्टर.But we decided to make use of the same to avoid the ambiguity. With inclusion and addition of Hindi in the multilingual support of Windows XP, translation process for other Indian languages became easier because of the commonality of these languages, but for us it was not of any help. We had to start everything a fresh.
What led you to develop language tools like Auto Correct and Auto Correct for Hindi?
V: While reviewing various features of Office XP in Hindi, I found that there is a lot of scope to modify the existing feature of Auto Correct for Hindi. At the outset, I clarified that Auto Correct is not a script specific feature but it’s a language specific feature. For example, Devanagari script is used for both Hindi and Marathi, but their spelling structure is quite different. Even the words commonly derived from Sanskrit are spelt differently in both languages. Most of the words ending with small "i" in Hindi are spelt with long "ii" in Marathi. For example, कवि is spelt in Hindi with small "i", but in Marathi it is spelt as कवी with long "ii". How can there be a common Auto Correct for both Hindi and Marathi? Besides, I collected the samples from various Hindi speaking regions as well as non-Hindi speaking regions to understand the pattern of errors committed by different language speakers. The errors committed by Marathi speakers are quite different than that of Punjabi speakers. Since Hindi is used in most parts of India, there is lot of variations in its pattern of errors. If Marathi and Gujarati speakers commit mistakes of long and small "ii" and "i", the speakers of South Indian languages commit mistakes for aspirated sounds. They write भाषा as बाषा and खाना as काना.This is because of the mother tongue interference. Finally it may be noted that the modified Auto Correct developed by me is still to be uplinked.
What are your future plans for Hindi on the Bhasha India portal?
V: In the present form, Hindi Section of Bhasha India portal is just a Hindi translation of the portal originally conceived in English having no direct interaction with the real users of Hindi computing. Our future plan is to make it a forum of those users who originally think and write in Hindi and use computers to achieve their purpose. I also don't assume that they know English. In the present scenario, most of the Hindi users are still using non-standard fonts and due to the non-compatibility across the systems and fonts, they do not attempt to use multiple applications such as e-mail, chat, templates, auto text, thesaurus, spell checkers etc. Very few users attempt data applications such as Excel, Access in Hindi. Power Point is also not commonly used in Hindi. This was due to the fact that there was no common standard in Hindi across the systems. ISCII was a good beginning in this direction, but in the world of globalization, we need a global standard where all languages of the world can co-exist with each other irrespective of multiple platforms, fonts and systems. UNICODE is the answer to all these problems. Hence our endeavour will be to make them aware with the advantages of Hindi computing in Unicode. We will also discuss all their problems on our forum. To encourage those using new applications, we will launch various schemes of incentives. I am sure the Hindi section of bhashaindia.com will be the forum of real users of Hindi computing.

कोई टिप्पणी नहीं: