Search and Browse – ELRA Catalogue

2006 CoNLL Shared Task – Arabic & Czech text

Arabic
Czech

ID: ELRA-W0087

2006 CoNLL Shared Task – Arabic & Czech consists of dependency treebanks used as part of the CoNLL 2006 shared task on multi-lingual dependency parsing. The Conference on Computational Natural Language Learning (CoNLL) is accompanied every year by a shared task intended to promote natural lan...

MEMBER	academic	commercial
Licence: Non Commercial Use - Non Standard Licence Terms

NON MEMBER	academic	commercial
Licence: Non Commercial Use - Non Standard Licence Terms

2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish text

Basque
Catalan; Valencian
Czech
Turkish

ID: ELRA-W0121

ISLRN: 769-620-932-723-2

2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish consists of dependency treebanks in four languages used as part of the CoNLL 2007 shared task on multi-lingual dependency parsing and domain adaptation. The languages covered in this release are: Basque, Catalan, Czech and Turkish. The ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

CLEF AdHoc-News Test Suites (2004-2008) – Evaluation Package text

Bulgarian
Czech
Dutch; Flemish
English
Finnish
French
German
Hungarian
Italian
Persian
Portuguese
Russian
Spanish; Castilian
Swedish

ID: ELRA-E0036

ISLRN: 378-279-085-589-0

The Cross-Language Evaluation Forum (CLEF) promotes R&D in multilingual information access (MLIA) by (i) developing an infrastructure for the testing, tuning and evaluation of information retrieval systems operating on European languages in both monolingual and cross-language contexts, and (ii) c...

MEMBER	academic	commercial
Licence: Evaluation Use - ELRA EVALUATION	150.00 €	500.00 €

NON MEMBER	academic	commercial
Licence: Evaluation Use - ELRA EVALUATION	300.00 €	1000.00 €

Special offers are also available. Check here for details.

Collins Multilingual database (MLD) – PhraseBank with audio files audio

Arabic
Chinese
Croatian
Czech
Danish
Dutch; Flemish
English
Finnish
French
German
Hindi
Italian
Japanese
Korean
Modern Greek (1453-)
Norwegian
Persian
Polish
Portuguese
Russian
Spanish; Castilian
Swedish
Thai
Turkish
Vietnamese

ID: ELRA-S0383

ISLRN: 398-655-047-044-5

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, see ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank, see ELRA-T0377). This version includes the audio files corresponding t...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3360.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	4480.00 €

Collins Multilingual database (MLD) – WordBank with audio files audio

Arabic
Chinese
Croatian
Czech
Danish
Dutch; Flemish
English
Finnish
French
German
Italian
Japanese
Korean
Modern Greek (1453-)
Norwegian
Polish
Portuguese
Russian
Spanish; Castilian
Swedish
Thai
Turkish
Vietnamese

ID: ELRA-S0382

ISLRN: 309-438-781-042-2

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, see ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank, see ELRA-T0377). This version includes the corresponding audio files c...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3640.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5200.00 €

Czech SpeechDat(E) Database audio

Czech

ID: ELRA-S0094

ISLRN: 891-889-899-078-7

The Czech SpeechDat(E) Database (Eastern European Speech Databases for Creation of Voice Driven Teleservices) comprises 1052 Czech speakers (526 males, 526 females) recorded over the Czech fixed telephone network. This database is partitioned into 6 CDs. The speech databases made within the Speec...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	10000.00 €	16000.00 €
Licence: Commercial Use - ELRA VAR	16000.00 €	16000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	10000.00 €	16000.00 €
Licence: Commercial Use - ELRA VAR	16000.00 €	16000.00 €

Special offers are also available. Check here for details.

Czech Speecon database audio

Czech

ID: ELRA-S0298

ISLRN: 897-416-018-798-6

The Czech Speecon database is divided into 2 sets: 1) The first set comprises the recordings of 550 adult Czech speakers (275 males, 275 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place). 2) The second set comprises the record...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER
Licence: Commercial Use - ELRA VAR

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER
Licence: Commercial Use - ELRA VAR

ECI/MCI (European Corpus Initiative/Multilingual Corpus I) text

Albanian
Bulgarian
Chinese
Czech
Danish
Dutch; Flemish
English
Estonian
French
German
Italian
Japanese
Latin
Lithuanian
Malay (macrolanguage)
Modern Greek (1453-)
Norwegian
Portuguese
Russian
Scottish Gaelic; Gaelic
Serbian
Spanish; Castilian
Swedish
Turkish
Uzbek

ID: ELRA-W0004

ISLRN: 511-168-567-582-5

The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus, and supports existing and projected national and international efforts to carefully design, collect and publish large-scale multilingual written and spoken corpora. ECI has ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	50.00 €	50.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	50.00 €	50.00 €

GlobalPhone 2000 Speaker Package audio

Arabic
Bulgarian
Chinese
Croatian
Czech
French
German
Hausa
Japanese
Korean
Polish
Portuguese
Russian
Spanish; Castilian
Swahili (macrolanguage)
Swedish
Tamil
Thai
Turkish
Ukrainian
Vietnamese

ID: ELRA-S0400

ISLRN: 331-592-378-424-7

The GlobalPhone 2000 Speaker Package contains transcribed read speech spoken by 2000 native speakers in 22 languages. The data are sampled from the GlobalPhone Speech and Text Data available in the ELRA Catalogue, i.e.: Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), Chinese-Mandarin (ELRA-S0193), C...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1200.00 €	6000.00 €
Licence: Commercial Use - ELRA VAR	6000.00 €	6000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1400.00 €	7200.00 €
Licence: Commercial Use - ELRA VAR	7200.00 €	7200.00 €

GlobalPhone Czech audio

Czech

ID: ELRA-S0196

ISLRN: 852-715-156-961-1

The GlobalPhone corpus developed in collaboration with the Karlsruhe Institute of Technology (KIT) was designed to provide read speech data for the development and evaluation of large continuous speech recognition systems in the most widespread languages of the world, and to provide a uniform, mu...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	600.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	3000.00 €	3000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	700.00 €	3600.00 €
Licence: Commercial Use - ELRA VAR	3600.00 €	3600.00 €

Special offers are also available. Check here for details.

GlobalPhone Multilingual Model Package audio

Arabic
Bulgarian
Chinese
Croatian
Czech
French
German
Hausa
Japanese
Korean
Polish
Portuguese
Russian
Spanish; Castilian
Swahili (macrolanguage)
Swedish
Tamil
Thai
Turkish
Ukrainian
Vietnamese

ID: ELRA-S0399

ISLRN: 204-945-263-927-6

The GlobalPhone Multilingual Model Package contains about 22 hours of transcribed read speech spoken by native speakers in 22 languages. The data are sampled from the GlobalPhone Speech and Text Data available in the ELRA Catalogue, i.e.: Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), Chinese-Manda...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1200.00 €	6000.00 €
Licence: Commercial Use - ELRA VAR	6000.00 €	6000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1400.00 €	7200.00 €
Licence: Commercial Use - ELRA VAR	7200.00 €	7200.00 €

Parallel texts from Swedish Work environment Authority (Processed) text

Bulgarian
Czech
English
Estonian
Finnish
French
German
Hungarian
Italian
Latvian
Lithuanian
Modern Greek (1453-)
Polish
Romanian; Moldavian; Moldovan
Spanish; Castilian
Swedish

ID: ELRA-W0304

ISLRN: 448-438-055-941-1

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts from the Swedish Work Environment authori...

MEMBER	academic	commercial
Licence: Other - Public Domain	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Other - Public Domain	0.00 €	0.00 €

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

Resource Type:

Media Type:

12 Language Resources