Bulgarian Treebank Corpus

View resource name in all available languages

Corpus Treebank bulgare

761-430-854-533-2

ID:

ELRA-W0328

The Bulgarian Treebank Corpus is composed of 156,149 tokens (11,138 sentences) coming from three main sources in the domain of Grammar Notebooks (1,391 sentences), News (6,698 sentences), Other (3,049 sentences). It is available with syntactical and morphological annotation on a sentence basis in Universal Dependencies format. This subset of BulTreeBank excludes ellipses and some rare phenomena. The conversion of BulTreeBank into Universal Dependency format was supported by the EU Project QTLeap (http://qtleap.eu/).

View resource description in French

Le Corpus Treebank bulgare est un corpus de 156 149 tokens (11 138 phrases) provenant de 3 sources différentes : livres de grammaire (1391 phrases), actualités (6698 phrases) et divers (3049 phrases). Il est disponible avec des annotations syntaxiques et morphologiques au niveau de la phrase. Ce sous-ensemble du BulTreeBank exclut les ellipses et d’autres phénomènes. La conversion du BulTreeBank au format Universal Dependencies a été soutenu par le projet européen QTLeap (http://qtleap.eu/).

MEMBERacademiccommercial
Licence: Attribution, Share Alike - CC-BY-SA-3.0
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution, Share Alike - CC-BY-SA-3.0
0.00 € submit
0.00 € submit
03/10/2022
People who looked at this resource also viewed the following: