Resource Type:
Corpus: | |
Lexical/Conceptual: | |
Tool/Service: | |
Language Description: |
Media Type:
Text: | |
Audio: | |
Image: | |
Video: | |
Text Numerical: | |
Text N-Gram: |
42 Language Resources (Page 1 of 3)
« Previous | Next »Order by:
- Amharic
- English
ID: ELRA-W0074
ISLRN: 590-255-335-719-0The Amharic-English bilingual corpus contains parallel text from legal and news domains in Amharic script, in transliterated form and in English. The size of the corpus is of 232,653 words in Amharic and 291,701 in English. This parallel corpus contains documents from two domains, namely legal...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
2000.00 €
|
Licence: Commercial Use - ELRA VAR |
2000.00 €
|
2000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
4000.00 €
|
Licence: Commercial Use - ELRA VAR |
4000.00 €
|
4000.00 €
|
- English
- Spanish; Castilian
ID: ELRA-S0335
ISLRN: 277-380-359-561-3This database contains Bilingual (English and Spanish) Festival HTS models. Models were trained with 9h of speech from 2 female bilingual speakers and 2 male bilingual speakers. Each speaker recorded 2h 15 min per language. The speech data can be found in the TC-STAR Bilingual Voice-Conversion S...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
- Chinese
- Vietnamese
ID: ELRA-W0312
ISLRN: 128-772-037-486-0The Chinese-Vietnamese Parallel Corpus consists of 200,000 sentence pairs, with an average length of 15 words per sentence. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
200.00 €
|
400.00 €
|
Licence: Commercial Use - ELRA VAR |
1400.00 €
|
1400.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
300.00 €
|
600.00 €
|
Licence: Commercial Use - ELRA VAR |
2100.00 €
|
2100.00 €
|
- Chinese
- Vietnamese
ID: ELRA-S0485
ISLRN: 428-557-564-826-7Chinese-Vietnamese - PhraseBank with audio files of daily conversations spoken by native speakers containing 4002 sentence pairs. Scripts with Pinyin, Topic, Cat, Vietnamese translation with corresponding audio in Chinese and Vietnamese. Corpus in XML and WAV formats.
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
400.00 €
|
500.00 €
|
Licence: Commercial Use - ELRA VAR |
900.00 €
|
900.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
600.00 €
|
750.00 €
|
Licence: Commercial Use - ELRA VAR |
1350.00 €
|
1350.00 €
|
- English
- Persian
ID: ELRA-W0118
ISLRN: 074-825-114-781-7The English-Persian parallel corpus contains more than 200,000 aligned sentences across a variety of text types from the domains of art, law, culture, science, religion, literature, medicine, idioms, politics and others. It is an extension of the English-Persian parallel corpus already distribute...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1000.00 €
|
5000.00 €
|
Licence: Commercial Use - ELRA VAR |
5000.00 €
|
5000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1200.00 €
|
6000.00 €
|
Licence: Commercial Use - ELRA VAR |
6000.00 €
|
6000.00 €
|
- English
- Persian
ID: ELRA-W0051
ISLRN: 671-618-321-687-7Please refer to ELRA-W0118 for the latest version of this corpus. This version consists of about 3,500,000 English and Persian (Farsi) words aligned at sentence level (about 100,000 sentences, distributed over 50,021 entries). The format of the files is Unicode. It has been originally created wi...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
500.00 €
|
2500.00 €
|
Licence: Commercial Use - ELRA VAR |
2500.00 €
|
2500.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
600.00 €
|
3000.00 €
|
Licence: Commercial Use - ELRA VAR |
3000.00 €
|
3000.00 €
|
- English
- Panjabi; Punjabi
ID: ELRA-W0319
ISLRN: 695-759-706-170-8The English-Punjabi Code-Mixed Social Media Content corpus is composed is composed of 893,615 parallel sentences of English-Punjabi distributed over the following domains: - 82,341 parallel sentences of English-Punjabi code-mixed Agriculture Domain Data, - 59,158 parallel sentences of English-P...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
- English
- Vietnamese
ID: ELRA-W0124
ISLRN: 838-483-738-912-8This is a corpus of 500,000 English-Vietnamese sentence pairs, built to develop SMT (Statistical Machine Translation) systems. The parallel corpus contains English documents translated by professional translators into Vietnamese. The source texts include books, dictionaries, newspapers, online ne...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
600.00 €
|
1200.00 €
|
Licence: Commercial Use - ELRA VAR |
6000.00 €
|
6000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1000.00 €
|
2000.00 €
|
Licence: Commercial Use - ELRA VAR |
8000.00 €
|
8000.00 €
|
- English
- Vietnamese
ID: ELRA-W0311
ISLRN: 893-470-491-825-6The English-Vietnamese Parallel Corpus consists of 1,000,000 sentence pairs, with an average length of 20 words per sentence. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
900.00 €
|
1800.00 €
|
Licence: Commercial Use - ELRA VAR |
9000.00 €
|
9000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1500.00 €
|
3000.00 €
|
Licence: Commercial Use - ELRA VAR |
12000.00 €
|
12000.00 €
|
- English
- Portuguese
ID: ELRA-W0090
ISLRN: 435-502-922-727-2The EUROPARL Corpus (Portuguese-English subpart of the parallel corpora), was extracted from the proceedings of the European Parliament. It contains transcriptions of sessions dating back from 1996 to 2011, with a total of approximately 58,324,562 tokens of European Portuguese (L1) and 49,216,896...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
- Korean
- Vietnamese
ID: ELRA-W0313
ISLRN: 365-128-449-700-7The Korean-Vietnamese Parallel Corpus consists of 200,000 sentence pairs, with an average length of 15 words per sentence. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
200.00 €
|
400.00 €
|
Licence: Commercial Use - ELRA VAR |
1400.00 €
|
1400.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
300.00 €
|
600.00 €
|
Licence: Commercial Use - ELRA VAR |
2100.00 €
|
2100.00 €
|
- Bantu languages
- French
ID: ELRA-S0396
ISLRN: 747-055-093-447-8The Mbochi speech corpus was developed in the framework of ANR-DFG BULB project. This project aims to provide field linguists (eg working on morphology) with tools for less or not written languages. The provided corpus is a subset from the corpus developed in this framework. The provided corpu...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
- Chinese
- English
ID: ELRA-S0457
ISLRN: 451-966-049-653-3The data is recorded by 3972 Chinese native speakers with accents covering seven major dialect areas. The recorded text is a mixture of Chinese and English sentences, covering general scenes and human-computer interaction scenes. It is rich in content and accurate in transcription. It can be used...
MEMBER | academic | commercial |
---|---|---|
Licence: Commercial Use - ELRA VAR |
145825.00 €
|
145825.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Commercial Use - ELRA VAR |
145825.00 €
|
145825.00 €
|
Special offers are also available. Check here for details.
- English
- German
ID: ELRA-S0137
ISLRN: 734-992-877-270-4TAXI was produced by BAS, in collaboration with the German research centre for artificial intelligence, DFKI. This speech database contains recordings which consist of dialogues, 94 on the whole (spontaneous speech), between a German speaking cab dispatcher and his clients, who always answered in...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
127.82 €
|
383.03 €
|
Licence: Commercial Use - ELRA VAR |
383.03 €
|
383.03 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
255.65 €
|
511.29 €
|
Licence: Commercial Use - ELRA VAR |
511.29 €
|
511.29 €
|
- English
- Spanish; Castilian
ID: ELRA-S0313
ISLRN: 088-656-828-489-38 hours of speech as spoken by 2 female speakers and 2 male speakers for each language (English and Spanish).
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
1500.00 €
|
Licence: Commercial Use - ELRA VAR |
1500.00 €
|
1500.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
600.00 €
|
2000.00 €
|
Licence: Commercial Use - ELRA VAR |
2000.00 €
|
2000.00 €
|
- Dutch; Flemish
- Western Frisian
ID: ELRA-S0391
ISLRN: 340-994-352-616-4The components of the Frisian data collection are speech and language resources gathered for building a large vocabulary ASR system for the Frisian language. Firstly, a new broadcast database is created by collecting recordings from the archives of the regional broadcaster Omrop Fryslân, and ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
1500.00 €
|
Licence: Commercial Use - ELRA VAR |
1500.00 €
|
1500.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
3500.00 €
|
Licence: Commercial Use - ELRA VAR |
3500.00 €
|
3500.00 €
|
- Arabic
- English
ID: ELRA-W0108
ISLRN: 213-044-240-074-6This is a parallel corpus of 10,000 words in Arabic and a reference translation in English. The source texts are emails collected from Wikiar-I, a mailing list for discussions about the Arabic Wikipedia. The collected emails are dated from 2004 to 2007. The translation has been conducted follow...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
150.00 €
|
500.00 €
|
Licence: Commercial Use - ELRA VAR |
500.00 €
|
500.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
300.00 €
|
1000.00 €
|
Licence: Commercial Use - ELRA VAR |
1000.00 €
|
1000.00 €
|
- Arabic
- English
ID: ELRA-W0106
ISLRN: 858-529-510-480-2This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are emails collected from Wikiar-I, a mailing list for discussions about the Arabic Wikipedia. The collected emails are dated from 2010 to 2012. The translation has been conducted by tw...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
150.00 €
|
500.00 €
|
Licence: Commercial Use - ELRA VAR |
500.00 €
|
500.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
300.00 €
|
1000.00 €
|
Licence: Commercial Use - ELRA VAR |
1000.00 €
|
1000.00 €
|
- Arabic
- English
ID: ELRA-W0099
ISLRN: 764-187-795-074-0This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are articles collected in 2012 from the Arabic version of Le Monde Diplomatique. The translation has been conducted by two different translation teams following a strict protocol aimed at...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
150.00 €
|
500.00 €
|
Licence: Commercial Use - ELRA VAR |
500.00 €
|
500.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
300.00 €
|
1000.00 €
|
Licence: Commercial Use - ELRA VAR |
1000.00 €
|
1000.00 €
|
« Previous | Next »