Resource Type:
Corpus: | |
Lexical/Conceptual: | |
Tool/Service: | |
Language Description: |
Media Type:
Text: | |
Audio: | |
Image: | |
Video: | |
Text Numerical: | |
Text N-Gram: |
7 Language Resources
Order by:
- Arabic
- Bengali
- Chinese
- Croatian
- Czech
- Danish
- Dutch; Flemish
- English
- Finnish
- French
- German
- Hindi
- Italian
- Japanese
- Korean
- Malayalam
- Modern Greek (1453-)
- Norwegian
- Polish
- Portuguese
- Romanian; Moldavian; Moldovan
- Russian
- Spanish; Castilian
- Swedish
- Tamil
- Thai
- Turkish
- Ukrainian
- Vietnamese
ID: ELRA-T0376
ISLRN: 990-814-402-335-7The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank) and a multilingual set of sentences in 28 languages (the PhraseBank, distributed separately under reference ELRA-T0377). The WordBank contains 10,000 words...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
2400.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
3600.00 €
|
- Arabic
- Bulgarian
- Chinese
- Croatian
- Czech
- French
- German
- Hausa
- Japanese
- Korean
- Polish
- Portuguese
- Russian
- Spanish; Castilian
- Swahili (macrolanguage)
- Swedish
- Tamil
- Thai
- Turkish
- Ukrainian
- Vietnamese
ID: ELRA-S0400
ISLRN: 331-592-378-424-7The GlobalPhone 2000 Speaker Package contains transcribed read speech spoken by 2000 native speakers in 22 languages. The data are sampled from the GlobalPhone Speech and Text Data available in the ELRA Catalogue, i.e.: Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), Chinese-Mandarin (ELRA-S0193), C...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1200.00 €
|
6000.00 €
|
Licence: Commercial Use - ELRA VAR |
6000.00 €
|
6000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1400.00 €
|
7200.00 €
|
Licence: Commercial Use - ELRA VAR |
7200.00 €
|
7200.00 €
|
- Arabic
- Bulgarian
- Chinese
- Croatian
- Czech
- French
- German
- Hausa
- Japanese
- Korean
- Polish
- Portuguese
- Russian
- Spanish; Castilian
- Swahili (macrolanguage)
- Swedish
- Tamil
- Thai
- Turkish
- Ukrainian
- Vietnamese
ID: ELRA-S0399
ISLRN: 204-945-263-927-6The GlobalPhone Multilingual Model Package contains about 22 hours of transcribed read speech spoken by native speakers in 22 languages. The data are sampled from the GlobalPhone Speech and Text Data available in the ELRA Catalogue, i.e.: Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), Chinese-Manda...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1200.00 €
|
6000.00 €
|
Licence: Commercial Use - ELRA VAR |
6000.00 €
|
6000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1400.00 €
|
7200.00 €
|
Licence: Commercial Use - ELRA VAR |
7200.00 €
|
7200.00 €
|
- Tamil
ID: ELRA-S0205
ISLRN: 269-930-371-035-1The GlobalPhone corpus developed in collaboration with the Karlsruhe Institute of Technology (KIT) was designed to provide read speech data for the development and evaluation of large continuous speech recognition systems in the most widespread languages of the world, and to provide a uniform, mu...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
100.00 €
|
500.00 €
|
Licence: Commercial Use - ELRA VAR |
500.00 €
|
500.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
125.00 €
|
600.00 €
|
Licence: Commercial Use - ELRA VAR |
600.00 €
|
600.00 €
|
Special offers are also available. Check here for details.
- Bengali
- English
- Hindi
- Malayalam
- Tamil
- Telugu
- Urdu
ID: ELRA-W0320
ISLRN: 657-350-757-058-6The Parallel Corpora for 6 Indian Languages contains data sets for Bengali (540,000 words – 20,000 parallel sentences), Hindi (1,200,000 words – 37 000 parallel sentences), Malayalam (660,000 words – 29,000 parallel sentences), Tamil (747,000 words – 35,000 parallel sentences), Telugu (951,000 wo...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Share Alike - CC-BY-SA-3.0 |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Share Alike - CC-BY-SA-3.0 |
0.00 €
|
0.00 €
|
- Assamese
- Bengali
- English
- Gujarati
- Hindi
- Kannada
- Kashmiri
- Malayalam
- Marathi
- Oriya (macrolanguage)
- Panjabi; Punjabi
- Sinhala; Sinhalese
- Tamil
- Telugu
- Urdu
ID: ELRA-W0037
ISLRN: 039-846-040-604-0The EMILLE/CIIL Corpus consists of three components: monolingual, parallel and annotated corpora. There are fourteen monolingual corpora, including both written and (for some languages) spoken data for fourteen South Asian languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayala...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
- Bengali
- English
- Gujarati
- Hindi
- Panjabi; Punjabi
- Sinhala; Sinhalese
- Tamil
- Urdu
ID: ELRA-W0038
ISLRN: 438-045-014-925-0The EMILLE Lancaster Corpus consists of three components: monolingual, parallel and annotated corpora. There are monolingual corpora for seven South Asian languages: Bengali, Gujarati, Hindi, Punjabi, Sinhala, Tamil, Urdu. The EMILLE monolingual corpora contain approximately 58,880,000 words (i...
MEMBER | academic | commercial |
---|---|---|
Licence: Commercial Use - ELRA VAR |
7500.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Commercial Use - ELRA VAR |
12000.00 €
|