Search and Browse – ELRA Catalogue

Catalan; Valencian

ID: ELRA-W0047

The Catalan Corpus of News Articles comprises articles in Catalan from 1 January 1999 to 31 March 2007. These articles are grouped per trimester without chronological order inside. The DVD contains one folder per year. Each folder has been divided into subfolders, containing the archives per tri...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	2975.00 €	14855.00 €
Licence: Commercial Use - ELRA VAR	14855.00 €	14855.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3930.00 €	19315.00 €
Licence: Commercial Use - ELRA VAR	19315.00 €	19315.00 €

Catalan-Spanish Parallel Corpus text

Catalan; Valencian
Spanish; Castilian

ID: ELRA-W0053

ISLRN: 124-613-721-890-1

This corpus contains more than 100 million words and it contains 10 years of bilingual articles from “El Periódico de Catalunya”. Both language data are rather close as the Catalan text is a translation of the Spanish one, partly achieved by means of Machine translation and then post-edited. The...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	2000.00 €	20000.00 €
Licence: Commercial Use - ELRA VAR	20000.00 €	20000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3000.00 €	24000.00 €
Licence: Commercial Use - ELRA VAR	24000.00 €	24000.00 €

Chinese-Vietnamese Parallel Corpus text

Chinese
Vietnamese

ID: ELRA-W0312

ISLRN: 128-772-037-486-0

The Chinese-Vietnamese Parallel Corpus consists of 200,000 sentence pairs, with an average length of 15 words per sentence. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	200.00 €	400.00 €
Licence: Commercial Use - ELRA VAR	1400.00 €	1400.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	300.00 €	600.00 €
Licence: Commercial Use - ELRA VAR	2100.00 €	2100.00 €

CINTIL-DeepBank text

Portuguese

ID: ELRA-W0062

ISLRN: 368-672-631-502-0

The CINTIL-DeepBank (Branco et al., 2010) is a corpus of sentences annotated with their full-fledged deep grammatical representations, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	3000.00 €	3000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	3000.00 €	3000.00 €

CINTIL-DependencyBank text

Portuguese

ID: ELRA-W0061

ISLRN: 133-035-138-613-6

The CINTIL-DependencyBank (Silva and Branco, 2012) is a corpus of sentences annotated with their syntactic dependency graphs and grammatical function tags composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	3000.00 €	3000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	3000.00 €	3000.00 €

CINTIL-PropBank text

Portuguese

ID: ELRA-W0056

ISLRN: 723-486-478-286-6

The CINTIL-PropBank is a corpus of sentences annotated with their constituency structure and semantic role tags, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082 tokens). In addition,...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	3000.00 €	3000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	3000.00 €	3000.00 €

CINTIL-TreeBank text

Portuguese

ID: ELRA-W0055

ISLRN: 411-691-515-701-9

The CINTIL-TreeBank is a corpus of syntactic constituency trees of Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 t...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	3000.00 €	3000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	3000.00 €	3000.00 €

Corpus for fine-grained analysis and automatic detection of irony on Twitter text

English

ID: ELRA-W0337

ISLRN: 478-366-550-085-8

The Corpus for fine-grained analysis and automatic detection of irony on Twitter was carefully annotated by trained annotators (Master’s students in Linguistics) using a detailed annotation scheme for irony categorization, which describes four labels: ‘ironic by means of a polarity contrast’, ‘si...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	100.00 €
Licence: Commercial Use - ELRA VAR	100.00 €	100.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	200.00 €
Licence: Commercial Use - ELRA VAR	200.00 €	200.00 €

Corpus of Contemporaneous Spanish Novels text

Spanish; Castilian

ID: ELRA-W0041

ISLRN: 837-873-214-287-0

This corpus consists of 11 novels written in Castilian Spanish by Inmaculada Ferrer-Vidal Turull, a contemporaneous author. The list of novels consists of: - La búsqueda: 113,639 words - Tristeza: 41,125 words - Cuarto menguante: 42,419 words - Recuerdos: 55,694 words - Sucedió en Abril: 46,040 w...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	400.00 €	800.00 €
Licence: Commercial Use - ELRA VAR	800.00 €	800.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	500.00 €	1000.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

Corpus of Interactions between Seniors and an Empathic Virtual Coach in Spanish, French and Norwegian text

English
French
Norwegian
Spanish; Castilian

ID: ELRA-S0414

ISLRN: 631-345-309-445-9

The Corpus of Interactions between Seniors and an Empathic Virtual Coach in Spanish, French and Norwegian was built within the EMPATHIC project (Empathic, Expressive, Advanced Virtual Coach to Improve Independent Healthy-Life-Years of the Elderly), funded within the European Union’s Horizon 2020 ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	500.00 €	25000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	500.00 €	25000.00 €

Special offers are also available. Check here for details.

CRATER 2 Corpus text

English
French
Spanish; Castilian

ID: ELRA-W0033

ISLRN: 052-466-219-226-4

The CRATER corpus was built upon the foundations of an earlier project, ET10/63, which was funded in the final phase of the Eurotra programme. The Corpus Resources and Terminology Extraction project (MLAP-93 20) extended the bilingual annotated English-French International Telecommunications Unio...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	25.00 €
Licence: Commercial Use - ELRA VAR	25.00 €	25.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	125.00 €
Licence: Commercial Use - ELRA VAR	125.00 €	125.00 €

CRATER corpus text

English
French
Spanish; Castilian

ID: ELRA-W0003

ISLRN: 645-721-607-031-5

The Corpus Resources and Terminology Extraction project (MLAP-93 20) has extended the bilingual annotated English-French International Telecommunications Union corpus to include Spanish, and has also debugged the existing corpus. The offer consists of a multi-lingual aligned corpus of 1,000,000 t...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	20.00 €
Licence: Commercial Use - ELRA VAR	20.00 €	20.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	100.00 €
Licence: Commercial Use - ELRA VAR	100.00 €	100.00 €

Danish Propbank text

Danish

ID: ELRA-W0117

ISLRN: 213-212-351-142-5

The Danish Propbank (DPB) is a multi-layer treebank, annotated not only with morphosyntactic, but also with semantic information, in particular propositions/frames with VerbNet classes and semantic roles for both arguments and satellites. In addition, the corpus has been annotated with 20 Named E...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	150.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	800.00 €	7000.00 €
Licence: Commercial Use - ELRA VAR	7000.00 €	7000.00 €

deL1L2IM corpus text

German

ID: ELRA-W0083

ISLRN: 339-799-085-669-8

The deL1L2IM corpus, created between May and August 2012 and last updated in August 2014, has been collected within the framework of a PhD project on the development of a learning method implying conversations with an artificial companion. This PhD work is presented as a qualitative investigation...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

Dutch PAROLE Distributable Corpus text

Dutch; Flemish

ID: ELRA-W0019

ISLRN: 440-290-917-102-7

The Dutch PAROLE Distributable Corpus is a 3 million words selection from the 20 million words Dutch PAROLE Reference corpus. The Dutch corpus annotation and checking was made accordingly to the common core PAROLE tagset. The Dutch data were also checked for type. The Dutch PAROLE Distributable...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	270.00 €	800.00 €
Licence: Commercial Use - ELRA VAR	1600.00 €	1600.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	300.00 €	1300.00 €
Licence: Commercial Use - ELRA VAR	2500.00 €	2500.00 €

Special offers are also available. Check here for details.

ECI-ELSNET Italian & German tagged sub-corpus text

German
Italian

ID: ELRA-W0005

ISLRN: 869-857-775-378-7

The objective is to provide a small but fine grained morphosyntactically tagged corpus, 50.000 running words for each of the two languages (Italian and German) to be used in research work on tagging methods and models. The text for German comes from the Frankfurter Rundschau extracted from the EC...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	20.00 €	20.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	45.00 €	45.00 €

ECI/MCI (European Corpus Initiative/Multilingual Corpus I) text

Albanian
Bulgarian
Chinese
Czech
Danish
Dutch; Flemish
English
Estonian
French
German
Italian
Japanese
Latin
Lithuanian
Malay (macrolanguage)
Modern Greek (1453-)
Norwegian
Portuguese
Russian
Scottish Gaelic; Gaelic
Serbian
Spanish; Castilian
Swedish
Turkish
Uzbek

ID: ELRA-W0004

ISLRN: 511-168-567-582-5

The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus, and supports existing and projected national and international efforts to carefully design, collect and publish large-scale multilingual written and spoken corpora. ECI has ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	50.00 €	50.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	50.00 €	50.00 €

English-Chinese-Vietnamese Trilingual Parallel Corpus text

Chinese
English
Vietnamese

ID: ELRA-W0314

ISLRN: 637-630-726-817-9

The English-Chinese-Vietnamese Trilingual Parallel Corpus consists of 20,046 trilingual sets of sentence pairs. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	150.00 €	500.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	225.00 €	750.00 €
Licence: Commercial Use - ELRA VAR	1500.00 €	1500.00 €

English-Nepali Parallel Corpus text

English
Nepali (macrolanguage)

ID: ELRA-W0077

ISLRN: 853-487-663-161-6

The Nepali Monolingual written corpus is one of the 3 resources that constitute the Nepali National Corpus. The Nepali National Corpus was produced in 2006 in the framework of the project Bhasha Sanchar (“language communication”), also known as Nelralec, for Nepali Language Resources and Localiza...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

English-Persian parallel Corpus text

English
Persian

ID: ELRA-W0051

ISLRN: 671-618-321-687-7

Please refer to ELRA-W0118 for the latest version of this corpus. This version consists of about 3,500,000 English and Persian (Farsi) words aligned at sentence level (about 100,000 sentences, distributed over 50,021 entries). The format of the files is Unicode. It has been originally created wi...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	500.00 €	2500.00 €
Licence: Commercial Use - ELRA VAR	2500.00 €	2500.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	600.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	3000.00 €	3000.00 €

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

Resource Type:

Media Type:

151 Language Resources (Page 2 of 8)