Language Technology: The Present Scenario

"Information Technology deals with the acquisition, organisation, storage, processing, transmission and delivery of information."

Language Technology refers to a field of computer linguistics aimed at enabling computers to understand and process human or natural language input. At present, the means of human-machine interaction is heavily biased in favour of the machine. The mouse and the keyboard are primary input devices, and visual display unit is the primary output device. Usage of such interfaces requires special skills and mental aptitude. This machine-centric communication needs to be evolved into human-centric interfaces to make the technology available for all. Obviously, the faculty of sight is most effective in capturing information, while speech is the most preferred and convenient means of communication.

Information Technology is a phenomenon mainly concentrated with the use and application of the English language. Yet, over a period of a decade or two, the need of using vernacular languages for computer applications has gained importance, and substantial progress has been made in this field. Use of local languages in computational applications has multifold utilities; it helps the masses to reap the benefits of technology with maximum efficiency; moreover, it opens up a vast market for gadgets and other technology-based applications. For example, India presently and potentially a huge market for mobile phones, but a great percentage of Indian mobile phone users are not proficient in English, and therefore, are not being able to utilize all the facilities provided by the gadget, despite paying for it.

We can sum up the above as follows:-

  • IT is one of the most important and unavoidable elements of modern society
  • IT being chiefly based on English, a lot of people are not able to utilize the benefit of the Technology because of lack of proficiency in English.

Taking into consideration the features mentioned above, initiatives have been taken to use local languages in computing since the late Eighties. It is interesting to note that the history of trying to communicate to machines by conventional modes, such as, verbal / visual dates back to1950s – with development of systems like SHRDLU, ELIZA etc. However those earlier efforts could not bear much fruitful results insofar as the processing power of the machines were very limited. In the meantime, computer technology maintained its steady growth in both Hardware and Software realms, and Language technology took a backstage, owing to the lack of intelligent and efficient technology required for it. But during the 1980s, with the machines gaining sufficient processing power and user-friendliness, the issue of Language technology started to gain momentum once again.

Several attempts were made, with moderate or little success in different languages; first, to write and display local languages in the PC, followed by Spell Checkers, Machine Translators, OCRs etc. With the advent of the Internet, it became a necessity for the coding systems to be universal, where UNICODE took center stage over the other encoding standards.

INDIC LANGUAGES:

India being a place of diverse cultures and numerous languages and dialects, became an ideal background for testing the effectiveness of Language technology. There are over a thousand languages with well-defined grammars from language families such as – Indo-Aryan, Dravidian, Austro-Asiatic, Tibeto-Burman, and Andamanese, as well as innumerable dialects. Hindi is the Official Language (OL) and English, though not a Indian language is the Associate Official Language (AOL). The Indian Constitution also provides equal opportunity for everyone to use and preserve the various Indian Languages in whatever ways possible, and considers it as one of the fundamental rights bestowed upon each and every citizen. The Clause says,

"Any section of the citizens residing in the territory of India or any part thereof having a distinct language, script or culture of its own, shall have the right to conserve the same."

Considering the need of development of Language Technology in the backdrop of a complex linguistic scenario, The Government of India formed Technology Development for Indian Languages (TDIL) program under the Ministry of Information Technology (MIT) and UNDP, for the purpose of encouraging and funding IT initiatives in the Indian languages and knowledge based systems. The TDIL website says,

"India is a multilingual country, with 22 official languages and 12 scripts. In India only about 5% people know English and rest are deprived of benefits of information technology development. The benefits of information technology can reach to the common man only when software tools and human machine interface systems are available in people’s own languages. To enable wide proliferation of ICT in Indian languages, tools, products and resources should be freely available to the general public."

Under the TDIL program, Indian Institutes of Technology (IITs), Indian Institutes of Information Technology (IIITs), Centre for Development of Advanced Computing (C-DAC), Indian Institute of Science (IIS), Indian Statistical Institute (ISI), Jawaharlal Nehru University (JNU), among others have made significant contributions to the field. Some private enterprises, e.g., the Tata Institute of Fundamental Research (TIFR), Tata Consultancy Services (TCS), etc., have also participated in Indian language technology R&D. India has developed the Indian Standard Code for Information Interchange (ISCII), a coding scheme for representing various writing systems of India, and the INSCRIPT Keyboard Layout to facilitate input and process Indic scripts.

Apart from this, India is a voting member of UNICODE CONSORTIUM, where a few Indian states individually participated in the mission. However, compared to the rest of the world, Language technology in India is lagging behind in several fields, with substantial difference in the level of adoption among different Indian Languages. Hindi, with Devanagari script, is leading the race, it being the Official language of India and supported by speaker majority, followed by languages like Telugu, Kannada and Bengali. Bengali has a distinct advantage in being the single national language of Bangladesh and thereby getting the undivided attention of government resources. Moreover, there are several unresolved disputes over issues on account of the lack of standardization and ethno-geo-socio-political concerns, which are hampering the process of development of Technology for Indic Languages.

THE LOCAL LANGUAGES OF ASSAM AND THE NE REGION OF INDIA:

The NE region of India and Assam in particular, also reflects the problems of the Indian Language Technology Scenario, but to the most acute degree. The situation of NE languages in the perspective of language technologies can be summarized as follows:

  • No North-East Indian language has sufficient tools suitable for modern technology.
  • Assamese Unicode and other typing are still largely based on tools for Bengali , as the scripts have plenty of similarity.
  • Web content of North-East Indian languages are very minimal. For example, Assamese content in the World Wide Web is embarrassingly less, considering the fact that it’s the mother tongue of 1.3 crores of people.(As per 2001 census)
  • Awareness among the people about the whole issue is nominal
  • Lack of organized effort to co-ordinate the Govt. and Voluntary bodies/ Individuals working in the field of language technology.
  • Almost none of the major players in IT have had any Assamese specific releases; no popular websites such as google, facebook, etc are available in Assamese, unlike many of its counterparts.
  • Lack of standardization in the basic linguistic part itself.
  • And so on..

In such a scenario, one can conclude that Assamese and other NE languages are not yet ready to take up the challenge of Language technology, in some cases not even actively considering the necessity of it. It is obvious that the human civilization is now in a state of transition in the field to knowledge sharing, that is, moving from print to electronic media. Because of predominance of English, and for that matter other lucrative languages, many local languages and dialects are slowly treading the path of final extinction, causing anguish and discontentment among linguistic groups. TDIL website put the scenario like this

"Some scholars claim that there has been traditionally no organized effort for language policies. The government has been certainly vacillating in its approach to language issues in the country after the linguistic reorganization of the states, and efforts to implement the Three Language Formula have not been successful. The government also faces continuous pressure from linguistic communities for including their languages in the Language Schedule of the Constitution of India. The government efforts to promote Hindi as the OL have been partially successful, though Hindi has received a lot of popular support from the film and electronic media. The race for evolving and promoting an OL after independence has also led to a neglect of many languages, spoken by comparatively smaller population groups. Upward socio-economic mobility causes language death in India, when younger generations stop learning and using their native languages. Barriers among major Indian languages have also caused conflicts and fueled political hot-spots"

And this is where the organization SLTD, Assam is trying to place itself. With an aim towards developing Language technology for Languages and Dialects of the Assam and NE region, SLTD, has been formulated by a few like-minded people from different field of activities – some of them are working in Linguistics, some in IT, some in graphic designing and some are simply users of available language technology. With a firm belief and commitment, SLTD, Assam has decided to continue its mission of utilizing the benefits of Language Technology for development of local languages, making them suitable for modern technological advancements. SLTD believes in the dignity of our indigenous languages, and also in the fact that effective language technology only can reassign the younger generation to love, use, and respect and develop the vernacular languages, which is of utmost importance for their survival.