8 Language Resources

Order by:

 Bulgarian Event Corpus    
  • Bulgarian

ID: ELRA-W0329

ISLRN: 832-960-876-604-2

The Bulgarian Event Corpus is composed 324,905 tokens appropriate for training Named Entity Recognition (NER), Named Entity Linking (NEL) and Event Recognition models for Bulgarian in a multidomain context within Humanities. The texts are domain related. They include documents from the area of So...

MEMBERacademiccommercial
Licence: Attribution, Share Alike - CC-BY-SA-3.0
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: ? - CC-BY-SA-3.0
0.00 € submit
Licence: Attribution, Share Alike - CC-BY-SA-3.0
0.00 € submit
 Bulgarian Treebank Corpus    
  • Bulgarian

ID: ELRA-W0328

ISLRN: 761-430-854-533-2

The Bulgarian Treebank Corpus is composed of 156,149 tokens (11,138 sentences) coming from three main sources in the domain of Grammar Notebooks (1,391 sentences), News (6,698 sentences), Other (3,049 sentences). It is available with syntactical and morphological annotation on a sentence basis in...

MEMBERacademiccommercial
Licence: Attribution, Share Alike - CC-BY-SA-3.0
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution, Share Alike - CC-BY-SA-3.0
0.00 € submit
0.00 € submit
 Bulgarian Valency Frame Lexicon    
  • Bulgarian

ID: ELRA-L0132

ISLRN: 188-702-981-369-5

The Bulgarian Valency Frame Lexicon is composed of 9547 lexical entries organized by frames with 960 mappings to Princeton WordNet available in XML format. It is a treebank-driven resource of extracted valency frames from BulTreeBank. The frames were manually curated. The frames followed the surf...

MEMBERacademiccommercial
Licence: Attribution, Share Alike - CC-BY-SA-3.0
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution, Share Alike - CC-BY-SA-3.0
0.00 € submit
0.00 € submit
 German Political Speeches Corpus    
  • German

ID: ELRA-W0330

ISLRN: 381-445-879-769-5

This corpus consists of a collection of political speeches in German crawled from the online archive of the German presidency (Bundespraësident) and the Chancellery (Bundesregierung). For the German Presidency the speeches are available from July 1, 1984 to February 17, 2012 and the corpus con...

MEMBERacademiccommercial
Licence: Attribution, Share Alike - CC-BY-SA
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution, Share Alike - CC-BY-SA
0.00 € submit
0.00 € submit
 Glissando-ca    
  • Catalan; Valencian

ID: ELRA-S0407

ISLRN: 780-617-066-913-1

Glissando-ca includes more than 12 hours of speech in Catalan, recorded under optimal acoustic conditions, orthographically transcribed, phonetically aligned and annotated with prosodic information (location of the stressed syllables and prosodic phrasing). The corpus was recorded by 8 profession...

MEMBERacademiccommercial
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA
0.00 € submit
0.00 € submit
 Glissando-sp    
  • Spanish; Castilian

ID: ELRA-S0406

ISLRN: 024-286-962-247-6

Glissando-sp includes more than 12 hours of speech in Spanish, recorded under optimal acoustic conditions, orthographically transcribed, phonetically aligned and annotated with prosodic information (location of the stressed syllables and prosodic phrasing). The corpus was recorded by 8 profession...

MEMBERacademiccommercial
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA
0.00 € submit
0.00 € submit
 JV_TDM Corpus    
  • French

ID: ELRA-S0379

ISLRN: 371-240-320-910-4

The JV_TDM corpus provides a phonetic annotation of 37 chapters of the original French version of “Around the World in 80 Days” by Jules Verne read by a single speaker. Each chapter has been annotated in a separate .TextGrid file. The audio files are not included in this release. They are availab...

MEMBERacademiccommercial
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA
0.00 € submit
0.00 € submit
 Persian Speech Corpus    
  • Persian

ID: ELRA-S0393

ISLRN: 068-845-898-304-0

This about 2.5-hour Single-Speaker Speech corpus has been developed using the same methodologies used in the PhD work carried out by Nawar Halabi at the University of Southampton. The corpus was recorded in Persian (Tehrani accent) by one male speaker using a professional studio, through a "Blubb...

MEMBERacademiccommercial
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
4000.00 € submit
4000.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit