17 Language Resources

Order by:

 2006 CoNLL Shared Task - Ten Languages    
  • Bulgarian
  • Danish
  • Dutch; Flemish
  • German
  • Japanese
  • Portuguese
  • Slovenian
  • Spanish; Castilian
  • Swedish
  • Turkish

ID: ELRA-W0086

ISLRN: 578-227-532-044-0

2006 CoNLL Shared Task - Ten Languages consists of dependency treebanks in ten languages used as part of the CoNLL 2006 shared task on multi-lingual dependency parsing. The languages covered in this release are: Bulgarian, Danish, Dutch, German, Japanese, Portuguese, Slovene, Spanish, Swedish and...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
 CAREGIVER Corpus    
  • Dutch; Flemish
  • English
  • Finnish

ID: ELRA-S0410

ISLRN: 072-357-063-759-1

A multi-lingual speech corpus used for modeling language acquisition called CAREGIVER has been designed and recorded within the framework of the EU funded Acquisition of Communication and Recognition Skills (ACORNS) project. The motivation behind the corpus and its design relies on current knowle...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
 CLEF AdHoc-News Test Suites (2004-2008) – Evaluation Package    
  • Bulgarian
  • Czech
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Hungarian
  • Italian
  • Persian
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swedish

ID: ELRA-E0036

ISLRN: 378-279-085-589-0

The Cross-Language Evaluation Forum (CLEF) promotes R&D in multilingual information access (MLIA) by (i) developing an infrastructure for the testing, tuning and evaluation of information retrieval systems operating on European languages in both monolingual and cross-language contexts, and (ii) c...

MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
150.00 € submit
500.00 € submit
NON MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
300.00 € submit
1000.00 € submit

Special offers are also available. Check here for details.

 CLEF Question Answering Test Suites (2003-2008) – Evaluation Package    
  • Bulgarian
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Italian
  • Portuguese
  • Romanian; Moldavian; Moldovan
  • Spanish; Castilian

ID: ELRA-E0038

ISLRN: 394-993-527-034-7

The Cross-Language Evaluation Forum (CLEF) promotes R&D in multilingual information access (MLIA) by (i) developing an infrastructure for the testing, tuning and evaluation of information retrieval systems operating on European languages in both monolingual and cross-language contexts, and (ii) c...

MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
150.00 € submit
500.00 € submit
NON MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
300.00 € submit
1000.00 € submit

Special offers are also available. Check here for details.

 Collins Multilingual database (MLD) – PhraseBank with audio files    
  • Arabic
  • Chinese
  • Croatian
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Hindi
  • Italian
  • Japanese
  • Korean
  • Modern Greek (1453-)
  • Norwegian
  • Persian
  • Polish
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swedish
  • Thai
  • Turkish
  • Vietnamese

ID: ELRA-S0383

ISLRN: 398-655-047-044-5

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, see ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank, see ELRA-T0377). This version includes the audio files corresponding t...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3360.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
4480.00 € submit
 Collins Multilingual database (MLD) – WordBank with audio files    
  • Arabic
  • Chinese
  • Croatian
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Italian
  • Japanese
  • Korean
  • Modern Greek (1453-)
  • Norwegian
  • Polish
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swedish
  • Thai
  • Turkish
  • Vietnamese

ID: ELRA-S0382

ISLRN: 309-438-781-042-2

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, see ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank, see ELRA-T0377). This version includes the corresponding audio files c...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3640.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
5200.00 € submit
 Dutch PAROLE Distributable Corpus    
  • Dutch; Flemish

ID: ELRA-W0019

ISLRN: 440-290-917-102-7

The Dutch PAROLE Distributable Corpus is a 3 million words selection from the 20 million words Dutch PAROLE Reference corpus. The Dutch corpus annotation and checking was made accordingly to the common core PAROLE tagset. The Dutch data were also checked for type. The Dutch PAROLE Distributable...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
270.00 € submit
800.00 € submit
Licence: Commercial Use - ELRA VAR
1600.00 € submit
1600.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
300.00 € submit
1300.00 € submit
Licence: Commercial Use - ELRA VAR
2500.00 € submit
2500.00 € submit

Special offers are also available. Check here for details.

 Dutch Polyphone Database    
  • Dutch; Flemish

ID: ELRA-S0010

ISLRN: 117-997-161-308-7

The Dutch Polyphone corpus contains telephone speech from 5050 speakers. The corpus comprises 222,075 speech files (based on 44 or, in a few cases 43 items per speaker), which all have been orthographically transcribed. The data were collected in 8-bit A-law digital form, directly off an ISDN tel...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
12000.00 € submit
25000.00 € submit
Licence: Commercial Use - ELRA VAR
25000.00 € submit
25000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
20000.00 € submit
35000.00 € submit
Licence: Commercial Use - ELRA VAR
35000.00 € submit
35000.00 € submit
 ECI/MCI (European Corpus Initiative/Multilingual Corpus I)    
  • Albanian
  • Bulgarian
  • Chinese
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Estonian
  • French
  • German
  • Italian
  • Japanese
  • Latin
  • Lithuanian
  • Malay (macrolanguage)
  • Modern Greek (1453-)
  • Norwegian
  • Portuguese
  • Russian
  • Scottish Gaelic; Gaelic
  • Serbian
  • Spanish; Castilian
  • Swedish
  • Turkish
  • Uzbek

ID: ELRA-W0004

ISLRN: 511-168-567-582-5

The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus, and supports existing and projected national and international efforts to carefully design, collect and publish large-scale multilingual written and spoken corpora. ECI has ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
50.00 € submit
50.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
50.00 € submit
50.00 € submit
 GRONINGEN    
  • Dutch; Flemish

ID: ELRA-S0020

ISLRN: 819-542-178-821-7

The 4 CD-ROMs contain over 20 hours of speech. It is a corpus of read speech material in Dutch, recorded on PCM tape under fairly good conditions. These 4 CD-ROMs contain speech from 238 speakers who read: · 2 short texts (the famous North wind text, and a longer text, "de Koning" by Godfried Bo...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
400.00 € submit
1600.00 € submit
Licence: Commercial Use - ELRA VAR
1600.00 € submit
1600.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
800.00 € submit
3200.00 € submit
Licence: Commercial Use - ELRA VAR
3200.00 € submit
3200.00 € submit
 Letter of rights for persons arrested on the basis of a European Arrest Warrant (Processed)    
  • Bulgarian
  • Dutch; Flemish
  • English
  • French
  • German
  • Italian
  • Latvian
  • Modern Greek (1453-)
  • Polish
  • Romanian; Moldavian; Moldovan

ID: ELRA-W0301

ISLRN: 175-028-844-014-3

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Letter of rights for persons arrested on the basis of a ...

MEMBERacademiccommercial
Licence: Attribution - CC-BY-4.0
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution - CC-BY-4.0
0.00 € submit
0.00 € submit
 MIST Multi-lingual Interoperability in Speech Technology database    
  • Dutch; Flemish
  • English
  • French
  • German

ID: ELRA-S0238

ISLRN: 189-835-264-931-4

In 1996, some 75 Dutch people participated in recording a multi-purpose continuous speech database. Most of them were recruited from the TNO Human Factors Research Institute, where the recordings were made. The main part of the database consisted of Dutch sentences. However, most speakers partici...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
400.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
500.00 € submit
 MLCC Multilingual and Parallel Corpora    
  • Danish
  • Dutch; Flemish
  • English
  • French
  • German
  • Italian
  • Modern Greek (1453-)
  • Portuguese
  • Spanish; Castilian

ID: ELRA-W0023

ISLRN: 963-635-729-341-8

The MLCC text corpus has two main components - one set to allow comparable studies to be carried out in different languages and one set as the basis for translation studies. The first set is referred as the Polylingual Document Collection, a collection of newspaper articles from financial new...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
1600.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3600.00 € submit
 Parallel Corpora & Domains (bilingual and multilingual)    
  • Arabic
  • Chinese
  • Danish
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Hebrew
  • Italian
  • Japanese
  • Korean
  • Modern Greek (1453-)
  • Northern Sami
  • Norwegian
  • Polish
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swedish
  • Turkish

ID: ELRA-W0336

ISLRN: 471-919-856-164-1

Parallel corpora for nearly 400 language pairs and numerous multilingual combinations, including 10 million bilingual segments and 90 million tokens in 20 languages: Arabic, Chinese (Simplified), Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Italian, Japanese, Korean, North Sami...

MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
0.10 € submit
0.10 € submit
NON MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
0.11 € submit
0.11 € submit

Special offers are also available. Check here for details.

 Speaking atlas of the regional languages of France    
  • Basque
  • Breton
  • Catalan; Valencian
  • Corsican
  • Dutch; Flemish
  • French
  • Romany

ID: ELRA-S0402

ISLRN: 112-393-061-014-3

Directed by LIMSI (CNRS) with the support of the DGLFLF (Ministry of Culture and Communication), the Speaking atlas of the regional languages of France offers the same Aesop’s fable read in French and in a number of varieties of languages of France. This work, which has a scientific and heritage ...

MEMBERacademiccommercial
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA
0.00 € submit
0.00 € submit
 The CLEF Test Suite for the CLEF 2000-2003 Campaigns – Evaluation Package    
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Italian
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swedish

ID: ELRA-E0008

ISLRN: 317-005-302-361-6

The CLEF Test Suite contains the data used for the main tracks of the CLEF campaigns carried out from 2000 to 2003: Multilingual text retrieval, Bilingual text retrieval, Monolingual text retrieval, and Domain-specific text retrieval. The CLEF Test Suite is composed of: • The multilingual docum...

MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
150.00 € submit
500.00 € submit
NON MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
300.00 € submit
1000.00 € submit

Special offers are also available. Check here for details.

 The FAME! Speech Corpus    
  • Dutch; Flemish
  • Western Frisian

ID: ELRA-S0391

ISLRN: 340-994-352-616-4

The components of the Frisian data collection are speech and language resources gathered for building a large vocabulary ASR system for the Frisian language. Firstly, a new broadcast database is created by collecting recordings from the archives of the regional broadcaster Omrop Fryslân, and ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
1500.00 € submit
Licence: Commercial Use - ELRA VAR
1500.00 € submit
1500.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3500.00 € submit
Licence: Commercial Use - ELRA VAR
3500.00 € submit
3500.00 € submit