MICROSERVICES AT YOUR SERVICE: a project to help disseminate NLP tools in non-English languages

Gradiant participates in a European project to promote ELG and improve the dissemination of NLP tools in Peninsular and Nordic languages

This project aims to raise the visibility of peninsular languages in this field at an international level

 

Gradiant is working on the European project Microservices at your service: bridging the gap between NLP research and industry, which proposes to help make Natural Language Processing (NLP) tools more widely known and accessible to a wider audience by making them available through the European Language Grid (ELG) repository. This cloud platform provides access to hundreds of commercial and non-commercial language technologies for all European languages, tools and running services, as well as datasets and resources in order to act as the yellow pages of European language technology.

Joaquín Lago, engineer-researcher in the Intelligent Systems area of Gradiant explains that “we hope that this project will enrich the European Language Grid with a wide set of NLP tools that will facilitate its use both for research and for the development of new services”.

Thus, the aim of the project is to improve the dissemination of NLP tools in languages such as Spanish, Portuguese, Icelandic, Norwegian or Swedish, among others, since the main models are generated for English, so finding resources for other languages is more complex.

At the same time, the focus is on languages of our region such as Spanish, Portuguese and other languages of the Iberian Peninsula to make it easier for researchers and software developers to use these tools in their studies and creation of services, as well as to make it easier for artificial intelligences to act as closely as possible to what a person would expect from another person in a conversation.

Launched in March 2021, this initiative involves several international institutions such as the Galician Telecommunications Technology Center (Gradiant), the Finnish company Lingsoft and the universities of Tartu (Estonia) and Reykjavik (Iceland). In addition, the project is funded by the Connecting Europe Facility (CEF) of the European Union in the field of Telecommunications.

 

NLP tools in peninsular languages and language technologies

Regarding the work developed by the Galician technology center Gradiant, our activities are focused on the contact with research institutions and the collection of NLP tools working in peninsular languages.

We will collect different types of language resources and technologies (LRT): tools, corpus, models and computational grammars, mainly oriented to the following uses:

  • Information extraction (IE): services that take text and annotate it with metadata in specific segments. For example, entity recognition (NER): the task of extracting people, locations and organizations from a given text.
  • Text classification (TC): services that take text and return a classification for the given text from a finite set of classes. For example, text categorization, which is the task of classifying text into organized categories.
  • Machine Translation (MT): services that take text in one language and translate it into text in another language.

This is intended to provide tools so that institutions in these locations can offer better services in their official languages.