More on the LCC Programme
Current LCC research projects
Global WordNet Grid
Prof. Dr. Vossen is Founder and President of the Global WordNet Association. He founded GWA (with Christiane Fellbaum of Princeton University) in 2000 as a public and non-commercial organization that provides a platform for discussing, sharing and connecting wordnets for all languages in the world. Since 2002 GWA is organizing a 2-yearly international conference on WordNets. For more information see: Global Wordnet Association. In 2006 Vossen launched the Global Wordnet Grid: the building of a complete free worldwide wordnet grid. This grid will be build around a shared set of concepts, such as the Common Base Concepts used in many wordnet projects. These concepts will be expressed in terms of Wordnet synsets and SUMO definitions. People from all language communities are invited to upload synsets from their language to the Grid. Gradually, the Grid will then be represented by all languages. The Grid will be available to everybody and will be distributed completely free. for more information click here
In search for the referent: anaphor resolution in linear text and hypertext
This project focuses on the resolution of anaphoric expressions (i.e. anaphoric noun phrases such as the book, the purchase). Anaphoric expressions can only be fully interpreted if they are linked to entities introduced elsewhere in the discourse more...
New CAMeRA project: "Semantics of History"
The research project Semantics of History develops a historical ontology and a lexicon that are used in a new type of information system that can handle the time-based dynamics and varying perspectives in historical archives more...
Piek Vossen recieves NWO grant for "DUTCHSEMCOR "
The goal of DutchSemCor is to deliver a one-million word Dutch corpus that is fully sense-tagged with senses and domain tags from the Cornetto database (STEVIN project STE05039). 250K words of this corpus will be manually tagged. The remainder will be automatically tagged using three different word-sense-disambiguation systems (WSD), and will be validated by human annotators. The corpus data will be based on existing corpus material collected in the projects CGN, D-CoI and SoNaR. These corpora have already been automatically annotated with morpho-syntactic tags and structures. The corpora will be extended where necessary to find sufficient examples for meanings of words that are less frequent and do not appear in the above corpora. The resulting corpus, for which we aim to offer the same balance in types of text as these basic resources, will be extremely rich in terms of lexical semantic information. Its availability will enable many new lines of research and technology developments for the Dutch language. In particular, it will enable research into the relation between language form and language interpretation, and as such it will be applicable in the fields of cognitive science, (psycho-)linguistics, language learning and language teaching, semantic web applications, information retrieval, machine translation, text mining, and document interpretation (summarization, topic segmentation). We foresee that the corpus will create new directions of research and technology development on a par with current developments for English.
Projectmanager: prof. Dr. Piek Vossen / Partners: UVA Amsterdam en Universiteit van Tilburg, kickoff per september 2009. For more information click here
‘Applied Natural Language Processing’, with Irion Technologies; Prof. P. Vossen; automated word-sense disambiguation in news texts in English and Spanish (‘SemEval’), text mining in scientific, company and news web pages in 6 European languages, dialogue systems in local authority websites in Dutch and English, and text complexity measurement in government and business texts in Dutch and English (Bureau Taal, ‘Texamen’); objective testing and commercial success. Spoken and web-based dialogue system on open unstructured data sets, GemeenteConnect. Sentiment analysis in textual sources and human-computer-interaction.
‘Communication advice and conversational practice in institutional settings’, Drs. K.Y.Sliedrecht, Dr. M.L. Komter, Dr. F. van der Houwen, Drs. M.C.G. Schasfoort, Prof. W. Spooren (supervisor). Relation between communication advice and conversational practice in three institutional settings: police interrogations, job interviews, and journalistic interviews.
‘Conversationalization in public communication’, VU-Ster (programme ‘Communication, Text, and Culture: Rhetorical Devices in Public Communication’, with FPP and FSW); Prof. W. Spooren and Prof. G. Steen (supervisors), Drs. T. Pasma and Drs. K. Vis. Linguistic characterization of various genres in written and spoken media: metaphor and subjectification in Dutch news texts and conversations.
‘Cornetto’, ‘A semantic database for Dutch’ (Taalunie, Stevin); Prof. P. Vossen, Drs. I. Maks, R. Segers, Dr. H. Van der Vliet.
A Combinatoric and Relational Network for Language Technology": the extension of the Dutch wordnet with combinatoric and referential relations based on usage of words within domains. Cornetto is an initiative of Vossen and the Free University of Amsterdam to combine the Dutch wordnet and the Referentie Bestand Nederlands (a Dutch database with combinatoric information of Dutch word meanings) in a unique resource for Dutch. Cornetto will build a lexical semantic database for Dutch with rich vertical and horizontal semantic relations and combinatorial lexical constraints such as multiword expressions, idioms and collocations on the one hand, and lexical functions and frames on the other. The concepts will be aligned with the English Wordnet so that ontologies and domain labels can be imported. The semantic layer will be validated with a formal ontology, to make it usable in Semantic Web environments. Cornetto will cover 40K entries, including the most generic and central part of the language. The database goes beyond the structure and content of Wordnet and FrameNet.
Established collaboration (Prof. P. Vossen) with the research group of Prof. F van Harmelen of the AI group of the department of Computer Science on the level of ontologization and the NWO project CHOICE (contact Prof. G. Schreiber) focusing on semi-automatic semantic annotation and employing context information.
‘Expertise center for terminology’ (Taalunie), Dr. H. van der Vliet, 1 Research Assistant; construction of terminology system, with special reference to government language, in Dutch; system plus website, with strong connections to the AI department as well.
‘Genre across languages'. Dr. M. van den Haak, Prof. M. Hannay, Drs. L. Tavecchio, Dr. D. Torck. Linguistic characterization of various genres in written media: evaluation of user friendliness of local authority web texts, in Dutch and English; information packaging and rhetorical patterns in news texts, academic texts, and fiction texts, in Dutch and English; information packaging and rhetorical patterns in news texts in Dutch, German and French.
‘Intertextuality in judicial settings’, NWO-programme, Drs. T.C. van Charldorp, MPhil, Dr. F. Van der Houwen, Dr. M. Komter (PI), Dr. P. Sneijder, Prof. W. Spooren (supervisor). Linguistic characterization of various genres in written and spoken media: the interrelations between talk and written documents in Dutch police interrogations and criminal trials.
‘Kyoto’, 7th Framework Project in the area of Digital Libraries, Prof. P. Vossen. Conceptual modeling of knowledge related to expression in language, which involves, among others, the characterization of various genres in written and spoken media, across languages: cross-lingual and cross-cultural knowledge. The ultimate goal is to develop a knowledge and information transition system that is applied to the domain of the environment, in Dutch, English, Italian, Spanish, Chinese, Japanese; this Special Project constitutes a highly advanced contribution to the development of the Media Center of CAMeRA, linking text to meaning across cultures and genres.
‘Metaphor in discourse’, NWO-Vici programme, Prof. G. Steen (PI), A.G. Dorst, MA, J. Herrmann, MA, Drs. A. Kaal, T. Krennmayr, MA. Linguistic characterization of various genres in written and spoken media: metaphor in English news texts, academic texts, fiction texts, and conversations.