Text (151)
Video (2)
Audio (1)
True (2)
TEI (10)

Resource Type:

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Media Type:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

151 Language Resources (Page 2 of 8)

« Previous | Next »Order by:

 Catalan Corpus of News Articles    
  • Catalan; Valencian

ID: ELRA-W0047

ISLRN: 000-089-517-382-8

The Catalan Corpus of News Articles comprises articles in Catalan from 1 January 1999 to 31 March 2007. These articles are grouped per trimester without chronological order inside. The DVD contains one folder per year. Each folder has been divided into subfolders, containing the archives per tri...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2975.00 € submit
14855.00 € submit
Licence: Commercial Use - ELRA VAR
14855.00 € submit
14855.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3930.00 € submit
19315.00 € submit
Licence: Commercial Use - ELRA VAR
19315.00 € submit
19315.00 € submit
 Catalan-Spanish Parallel Corpus    
  • Catalan; Valencian
  • Spanish; Castilian

ID: ELRA-W0053

ISLRN: 124-613-721-890-1

This corpus contains more than 100 million words and it contains 10 years of bilingual articles from “El Periódico de Catalunya”. Both language data are rather close as the Catalan text is a translation of the Spanish one, partly achieved by means of Machine translation and then post-edited. The...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2000.00 € submit
20000.00 € submit
Licence: Commercial Use - ELRA VAR
20000.00 € submit
20000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3000.00 € submit
24000.00 € submit
Licence: Commercial Use - ELRA VAR
24000.00 € submit
24000.00 € submit
 Chinese-Vietnamese Parallel Corpus    
  • Chinese
  • Vietnamese

ID: ELRA-W0312

ISLRN: 128-772-037-486-0

The Chinese-Vietnamese Parallel Corpus consists of 200,000 sentence pairs, with an average length of 15 words per sentence. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
200.00 € submit
400.00 € submit
Licence: Commercial Use - ELRA VAR
1400.00 € submit
1400.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
300.00 € submit
600.00 € submit
Licence: Commercial Use - ELRA VAR
2100.00 € submit
2100.00 € submit
 CINTIL-DeepBank    
  • Portuguese

ID: ELRA-W0062

ISLRN: 368-672-631-502-0

The CINTIL-DeepBank (Branco et al., 2010) is a corpus of sentences annotated with their full-fledged deep grammatical representations, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
 CINTIL-DependencyBank    
  • Portuguese

ID: ELRA-W0061

ISLRN: 133-035-138-613-6

The CINTIL-DependencyBank (Silva and Branco, 2012) is a corpus of sentences annotated with their syntactic dependency graphs and grammatical function tags composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
 CINTIL-PropBank    
  • Portuguese

ID: ELRA-W0056

ISLRN: 723-486-478-286-6

The CINTIL-PropBank is a corpus of sentences annotated with their constituency structure and semantic role tags, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082 tokens). In addition,...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
 CINTIL-TreeBank    
  • Portuguese

ID: ELRA-W0055

ISLRN: 411-691-515-701-9

The CINTIL-TreeBank is a corpus of syntactic constituency trees of Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 t...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
 Corpus for fine-grained analysis and automatic detection of irony on Twitter    
  • English

ID: ELRA-W0337

ISLRN: 478-366-550-085-8

The Corpus for fine-grained analysis and automatic detection of irony on Twitter was carefully annotated by trained annotators (Master’s students in Linguistics) using a detailed annotation scheme for irony categorization, which describes four labels: ‘ironic by means of a polarity contrast’, ‘si...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
100.00 € submit
Licence: Commercial Use - ELRA VAR
100.00 € submit
100.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
200.00 € submit
Licence: Commercial Use - ELRA VAR
200.00 € submit
200.00 € submit
 Corpus of Contemporaneous Spanish Novels    
  • Spanish; Castilian

ID: ELRA-W0041

ISLRN: 837-873-214-287-0

This corpus consists of 11 novels written in Castilian Spanish by Inmaculada Ferrer-Vidal Turull, a contemporaneous author. The list of novels consists of: - La búsqueda: 113,639 words - Tristeza: 41,125 words - Cuarto menguante: 42,419 words - Recuerdos: 55,694 words - Sucedió en Abril: 46,040 w...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
400.00 € submit
800.00 € submit
Licence: Commercial Use - ELRA VAR
800.00 € submit
800.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
500.00 € submit
1000.00 € submit
Licence: Commercial Use - ELRA VAR
1000.00 € submit
1000.00 € submit
 Corpus of Interactions between Seniors and an Empathic Virtual Coach in Spanish, French and Norwegian      
  • English
  • French
  • Norwegian
  • Spanish; Castilian

ID: ELRA-S0414

ISLRN: 631-345-309-445-9

The Corpus of Interactions between Seniors and an Empathic Virtual Coach in Spanish, French and Norwegian was built within the EMPATHIC project (Empathic, Expressive, Advanced Virtual Coach to Improve Independent Healthy-Life-Years of the Elderly), funded within the European Union’s Horizon 2020 ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
500.00 € submit
25000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
500.00 € submit
25000.00 € submit

Special offers are also available. Check here for details.

 CRATER 2 Corpus    
  • English
  • French
  • Spanish; Castilian

ID: ELRA-W0033

ISLRN: 052-466-219-226-4

The CRATER corpus was built upon the foundations of an earlier project, ET10/63, which was funded in the final phase of the Eurotra programme. The Corpus Resources and Terminology Extraction project (MLAP-93 20) extended the bilingual annotated English-French International Telecommunications Unio...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
25.00 € submit
Licence: Commercial Use - ELRA VAR
25.00 € submit
25.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
125.00 € submit
Licence: Commercial Use - ELRA VAR
125.00 € submit
125.00 € submit
 CRATER corpus    
  • English
  • French
  • Spanish; Castilian

ID: ELRA-W0003

ISLRN: 645-721-607-031-5

The Corpus Resources and Terminology Extraction project (MLAP-93 20) has extended the bilingual annotated English-French International Telecommunications Union corpus to include Spanish, and has also debugged the existing corpus. The offer consists of a multi-lingual aligned corpus of 1,000,000 t...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
20.00 € submit
Licence: Commercial Use - ELRA VAR
20.00 € submit
20.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
100.00 € submit
Licence: Commercial Use - ELRA VAR
100.00 € submit
100.00 € submit
 Danish Propbank    
  • Danish

ID: ELRA-W0117

ISLRN: 213-212-351-142-5

The Danish Propbank (DPB) is a multi-layer treebank, annotated not only with morphosyntactic, but also with semantic information, in particular propositions/frames with VerbNet classes and semantic roles for both arguments and satellites. In addition, the corpus has been annotated with 20 Named E...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
150.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
800.00 € submit
7000.00 € submit
Licence: Commercial Use - ELRA VAR
7000.00 € submit
7000.00 € submit
 deL1L2IM corpus    
  • German

ID: ELRA-W0083

ISLRN: 339-799-085-669-8

The deL1L2IM corpus, created between May and August 2012 and last updated in August 2014, has been collected within the framework of a PhD project on the development of a learning method implying conversations with an artificial companion. This PhD work is presented as a qualitative investigation...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
 Dutch PAROLE Distributable Corpus    
  • Dutch; Flemish

ID: ELRA-W0019

ISLRN: 440-290-917-102-7

The Dutch PAROLE Distributable Corpus is a 3 million words selection from the 20 million words Dutch PAROLE Reference corpus. The Dutch corpus annotation and checking was made accordingly to the common core PAROLE tagset. The Dutch data were also checked for type. The Dutch PAROLE Distributable...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
270.00 € submit
800.00 € submit
Licence: Commercial Use - ELRA VAR
1600.00 € submit
1600.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
300.00 € submit
1300.00 € submit
Licence: Commercial Use - ELRA VAR
2500.00 € submit
2500.00 € submit

Special offers are also available. Check here for details.

 ECI-ELSNET Italian & German tagged sub-corpus    
  • German
  • Italian

ID: ELRA-W0005

ISLRN: 869-857-775-378-7

The objective is to provide a small but fine grained morphosyntactically tagged corpus, 50.000 running words for each of the two languages (Italian and German) to be used in research work on tagging methods and models. The text for German comes from the Frankfurter Rundschau extracted from the EC...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
20.00 € submit
20.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
45.00 € submit
45.00 € submit
 ECI/MCI (European Corpus Initiative/Multilingual Corpus I)    
  • Albanian
  • Bulgarian
  • Chinese
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Estonian
  • French
  • German
  • Italian
  • Japanese
  • Latin
  • Lithuanian
  • Malay (macrolanguage)
  • Modern Greek (1453-)
  • Norwegian
  • Portuguese
  • Russian
  • Scottish Gaelic; Gaelic
  • Serbian
  • Spanish; Castilian
  • Swedish
  • Turkish
  • Uzbek

ID: ELRA-W0004

ISLRN: 511-168-567-582-5

The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus, and supports existing and projected national and international efforts to carefully design, collect and publish large-scale multilingual written and spoken corpora. ECI has ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
50.00 € submit
50.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
50.00 € submit
50.00 € submit
 English-Chinese-Vietnamese Trilingual Parallel Corpus    
  • Chinese
  • English
  • Vietnamese

ID: ELRA-W0314

ISLRN: 637-630-726-817-9

The English-Chinese-Vietnamese Trilingual Parallel Corpus consists of 20,046 trilingual sets of sentence pairs. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
150.00 € submit
500.00 € submit
Licence: Commercial Use - ELRA VAR
1000.00 € submit
1000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
225.00 € submit
750.00 € submit
Licence: Commercial Use - ELRA VAR
1500.00 € submit
1500.00 € submit
 English-Nepali Parallel Corpus    
  • English
  • Nepali (macrolanguage)

ID: ELRA-W0077

ISLRN: 853-487-663-161-6

The Nepali Monolingual written corpus is one of the 3 resources that constitute the Nepali National Corpus. The Nepali National Corpus was produced in 2006 in the framework of the project Bhasha Sanchar (“language communication”), also known as Nelralec, for Nepali Language Resources and Localiza...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
 English-Persian parallel Corpus    
  • English
  • Persian

ID: ELRA-W0051

ISLRN: 671-618-321-687-7

Please refer to ELRA-W0118 for the latest version of this corpus. This version consists of about 3,500,000 English and Persian (Farsi) words aligned at sentence level (about 100,000 sentences, distributed over 50,021 entries). The format of the files is Unicode. It has been originally created wi...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
500.00 € submit
2500.00 € submit
Licence: Commercial Use - ELRA VAR
2500.00 € submit
2500.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
600.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit

« Previous | Next »