Projects

Current projects

CONTENT: Cognitive and Neural Bases for Terminology-Enhanced Translation (2015-2017)

The main objective of CONTENT is to fully exploit the contents and components of EcoLexicon for purposes of translation and natural language processing. Accordingly, this project will create and implement a prototype for the terminology-enhanced translation of specialized environmental texts. This means expanding the architecture of the relational database where EcoLexicon is stored, as well as enriching the following modules: (i) the linguistic module (inclusion of relations between terms); (ii) the conceptual module (specification of non-hierarchical relations based on paradigms encoded in the phraseological module); (iii) phraseological module (expansion of syntagmatic relations enhanced with different types of collocational information). This prototype will make EcoLexicon’s data available online to users in contexts that provide a selection of semantic, syntactic, and pragmatic information specifically related to the terms in the source text. The enhancement of the modules in EcoLexicon as well as the design of the prototype require the use of more effective terminographic methods and the extensive semi-automatic processing of the corpus based on the extraction of knowledge patterns. Parallel to the design and implementation of the prototype, still another goal is to facilitate the interoperability of EcoLexicon by linking it to other resources by means of Linked Data, a technology that publishes structured data and links them to information in other resources in compliance with Semantic Web standards. The data in the formal ontology version of EcoLexicon will thus be accessible by means of SPARQL queries. Furthermore, Ecolexicon data will be limited to the information in GEMET and AGROVOC or DBpedia with a view to offering an open resource integrated in the Semantic Web or more concretely, in Linguistic Open Data. The linking process will be semi-automatically performed by means of RDF properties (e. g. rdf: SeeAlso), OWL (e. g. owl:SameAs) and SKOS (e. g. skos:broader). In this way, the conceptual and linguistic information in Ecolexicon can be transformed into a disambiguation resource. At the same time, the content of the linked resources will be exploited in the terminology-enhanced prototype for assisted specialized translation. Finally, the inventory of semantic relations in Ecolexicon and its underlying conceptual structure will be validated by an fMRI experimental study, based on the successful results of a previous pilot study (Faber et al. 2014). The objectives will focus on the representation, storage, and processing of specialized concepts as well as their semantic relations. Following Muelhaus et al. (2014), the subjects (experts and non-experts) will be subjected to different stimuli (images, terminological designations, and terms associated with different types of relation) in order to analyze which type of semantic relation most facilitates the comprehension of a concept and whether vertical and horizontal semantic relations have different brain activation patterns.

Funded by the Spanish Ministry of Economy and Competitiveness. Code: FFI2014-52740-P

Project leader: Dr Pamela Faber

COMBIMED: Léxico combinatorio en medicina: cognición, texto y contexto (2015-2017)

El proyecto Léxico combinatorio en medicina: cognición, texto y contexto (CombiMed) plantea el estudio de las combinaciones léxicas de la medicina y de su representación y visualización desde una perspectiva pragmática y cognitiva de cara a la comprensión de los procesos psicolingüísticos, comunicativos y estilísticos de selección léxica que llevan a cabo tanto proveedores y usuarios de los sistemas de salud como redactores y traductores. La combinación de técnicas de análisis de corpus y pruebas experimentales ofrece un enfoque más eficiente que los métodos puramente descriptivos.

A través del estudio de corpus, nos centramos en aspectos novedosos como son el análisis de las connotaciones, prosodia semántica e interferencia en las combinaciones de unidades léxicas en textos originales y traducidos sobre medicina. Asimismo, el estudio de corpus nos permitirá profundizar en la variación de origen geográfico y en los distintos grados de normalización terminológica asociada a ciertos géneros textuales. En los corpus en soporte electrónico identificaremos metáforas y propondremos técnicas de anotación para las unidades léxicas metafóricas.

Mediante pruebas experimentales de comprensión y producción léxica analizaremos el proceso y el producto de las operaciones cognitivas implicadas en la lexicalización de los conceptos, haciendo hincapié en la metáfora conceptual y multimodal, así como en sus esquemas de imagen subyacentes. Se pretende conectar las dimensiones denominativas de los conceptos con las representaciones multimodales del conocimiento (imágenes, animaciones), de tal suerte que se identifique la relación lingüística y cognitiva entre lexicalizaciones y visualización del conocimiento. El estudio de la metáfora y las técnicas de visualización del conocimiento especializado contribuirán a la comprensión y adquisición de unidades poliléxicas médicas, ayudarán al paciente a verbalizar su experiencia con la enfermedad durante el proceso asistencial y, en definitiva, mejorarán la comunicación entre los proveedores de servicios sanitarios y el paciente.

De este modo, se conecta el estudio distribucional en corpus, que permite la caracterización de particularidades en variantes geográficas y en géneros textuales desde el análisis del léxico combinatorio, con las pruebas experimentales, que nos hablan de la motivación cognitiva de la selección léxica. Podremos describir de manera fiable si en los géneros textuales y por parte de los distintos tipos de usuarios de la terminología médica se prioriza la selección de una dimensión denominativa frente a otra. CombiMed proporcionará un recurso multimodal bilingüe inglés-español que aglutine la perspectiva cognitiva, textual y contextual de la terminología médica y de las imágenes asociadas a conceptos.

El recurso VariMed será el punto de referencia en la selección de estímulos experimentales. Además, la mejora y ampliación de datos en el recurso permitirá múltiples modos de consulta adecuados tanto al público general como a los usuarios del léxico especializado y los investigadores de la terminología médica. Tomando como base las últimas premisas teóricas de la lingüística cognitiva basada en el uso, buscamos ahondar en el estudio del fenómeno de la variación léxica como elemento sociolingüístico y cognitivo.

Funded by the Spanish Ministry of Economy and Competitiveness. Code: FFI2014-51899-R

Project leaders: Dr Maribel Tercedor Sánchez and Dr Clara Inés López Rodríguez

Past projects

RECORD: Knowledge Representation in Dynamic Networks (2012-2014)

The main objective of the research project, RECORD, is to increase knowledge of the Environment by developing new forms of knowledge representation in the EcoLexicon knowledge base. Such representations are based on theories of cognition that facilitate technological application and transfer as well as knowledge acquisition by the user. For this purpose, the information in the knowledge base will be contextualized, following the analysis and definition of different user profiles as well as the validation of of its conceptual design by a neurolinguistic pilot study. This contextualization will be reflected in the definition and application of constraints to the following modules of EcoLexicon: (i) conceptual networks; (ii) graphical and linguistic resources; (iii) terminological definitions. New languages will also be added and the multilingual corpus of specialized texts will be semantically tagged as well as expanded and classified according to pragmatic parameters. The tagging of the corpus will be carried out with a view to developing semi-automatic knowledge extraction methods. EcoLexicon will also be made availible to a larger range of potential users through the creation of API web services that facilitate integrated data access. Finally Linked Data technology will be applied in order to facilitate the transfer of environmental knowledge to the scientific community.

Funded by the Spanish Ministry of Science and Innovation. Code: FFI2011-22397

Project leader: Dr Pamela Faber

VARIMED: Denominative variation in medicine: Multilingual multimodal tool for research and knowledge dissemination (2012-2014)

Denominative variation is a key element in medical communication both at the intralinguistic level (amigdalitis-sore throat) and at the interlinguistic level (keyhole surgery--laparoscopia). A close study of the phenomenon reveals cultural and cognitive patterns typical of a certain group of speakers in a given language community. The objectives of VARIMED are the following: (1) to create a corpus of medical texts in English and Spanish multimodal communication contexts; (ii) to register and classify terminological variants in English and Spanish and study their semantic and pragmatic features from the perspective of situated cognition; (iii) to carry out a series of experimental studies that will provide insights into the phenomenon of variation in relation to the cognitive processes of lexical production and comprehension; (iv) to generate a multifunctional and reusable lexical resource in the field of health care with image support for linguistic research, translation, and technical writing for knowledge dissemination.

More information: VARIMED website

Funded by the Spanish Ministry of Science and Innovation. Code: FFI2011-23120

Project leader: Dr Maribel Tercedor

ECOSISTEMA: Single Information Space for Frame-based Environmental Data and Thesaurus
(2008-2011)

The main objective of the research project Single Information Space for Frame-based Environmental Data and Thesaurus is the conceptual representation of the specialized domain of the Environment in the form of a visual thesaurus of specialized concepts organized in a constellation of interrelated dynamic knowledge frames. This multilingual (Spanish-English-German-French) resource, which will include various access modes, will generate a series of multimedia products that provide information about the Environment for different user profiles (e.g. experts of different nationalities, writers of scientific books and articles, university students, tourism operators, translators, and interpreters). These resources will increase knowledge of the environment, a specialized domain of great interest because of its impact on the lives of people and society in general The terminological tools generated will facilitate knowledge transfer and application with a view to formulating, implementing, and evaluating new environmental policies. They will also facilitate the exchange of specialized knowledge within Spain as well as other European countries.

This research project was part of a larger project developed in coordination with researchers from the University of Málaga (ECOSISTEMAUMA) and the University of Vigo.

Funded by the Spanish Ministry of Science and Innovation. Code: FFI2008-06080.C03-01/FILO.

 

MarcoCosta: Multilingual Specialized Knowledge Frames for Integrated Coastal Management
(2006-2009)

The main objective of the research project Multilingual Specialized Knowledge Frames for Integrated Coastal Management was the conceptual representation of the specialized domain of integrated coastal management in the form of a visual thesaurus of specialized concepts organized in a constellation of interrelated dynamic knowledge frames. This multilingual (Spanish- English-German) resource, which included various access modes, generated a series of multimedia products that provide information about the Andalusian coastline for different user profiles (e.g. experts in environmental studies, writers of scientific books and articles, university students, translators, and even tourists). These resources increase knowledge of the coastal areas and their ecosystems in terms of risk, elasticity, thresholds, and recovery and evolution. The terminological tools generated facilitate knowledge transfer and application with a view to formulating, implementing and evaluating new coastal management policies. They also facilitate the exchange of specialized knowledge within Spain as well as other European countries.

Funded by the Andalusian Regional Government. Code: PO6-HUM-01489.

 

PuertoTerm: Coastal engineering: Knowledge representation and the generation of terminological resources
(2003-2006)

Coastal engineering: Knowledge representation and the generation of terminological resources was a research project whose principal objective was the generation of terminological resources for the representation of specialized concepts. The principal products contemplated within the context of this research were the creation of specialized trilingual glossaries, dictionaries, and multimedia knowledge bases related to the domain of coastal engineering. The data categories used for the representation of these concepts were compatible with ISO norm 12620, and the languages used, Spanish, English, and German. This project facilitated knowledge exchange between European countries and contributed to the standardization of the designations of concepts in different languages, as well as geographic varieties of the same language (peninsular Spanish and South American Spanish). Three important areas in the elaboration of terminological resources were the following: (1) the conceptual organization underlying any knowledge resource; (2) the multidimensional nature of conceptual representations; (3) knowledge extraction through the use of multilingual corpora related to the domain of coastal engineering.

Funded by the Spanish Ministry of Education and Science. Code: BFF2003-04720.