Download PDF PDF Download WAV WAV Download MP3 MP3 Download EAF EAF Download XML XML Download TSV TSV Download XLS XLS Download TXT TXT Download ZIP ZIP Download ZIP/WAV ZIP/WAV Download ZIP/MP3 ZIP/MP3 Download ZIP/EAF ZIP/EAF Download ZIP/XML ZIP/XML Download ZIP/XLS ZIP/XLS Download ZIP/PDF ZIP/PDF Download ZIP/TSV ZIP/TSV Contact 📧︎

This page collects various Corpora of spoken and written varieties of Kurdish maintained by the Department of General Linguistics at the University of Bamberg.

The Laki variety of Harsin

Sara Belelli

This corpus contains sound files and transcriptions (as PDF) of the Laki variety of Harsin as documented by Sara Belelli in her dissertation (Belelli XXXX). The published book version of the dissertation will be made available here shortly.

All data in this corpus are freely accessible under a Creative Commons (CC-BY 4.0) licence.

Last updated 29 September 2021.

Corpus files

Citation for this corpus

Belelli, Sara. 2021. The Laki variety of Harsin: Corpus and sound files. ( (date accessed)

The Corpus of Contemporary Written Kurdish  (CCWK)

Abdullah Incekan, Geoffrey Haig

The CCWK comprises a selection of contemporary written, primarily literary texts in Northern Kurdish (Kurmanjî). The corpus was compiled by Abdullah Incekan as part of his PhD project (Incekan 2018) under the supervision of Geoffrey Haig.

The corpus consists of more than 900 000 words, predominantly fiction (~ 77%) combined with some non-fiction Kurmanjî Kurdish texts (~ 23%). The texts stem from a variety of contemporary sources (from the early 1990's to the present). They are intended to be approximately representative of contemporary Kurdish prose written in the largely standardized roman-based Kurmanjî alphabet. The corpus is not tagged or translated.

Please note that due to copyright constraints, the corpus data are available only on request.

Last updated 27 September 2021.

Corpus files

Citation for this corpus

Incekan, Abdullah & Haig, Geoffrey. 2021. The Corpus of Contemporary Written Kurdish (CCWK). ( (date accessed)

The Corpus of Contemporary Kurdish Newspaper Texts  (CCKNT)

Geoffrey Haig

The CCKNT comprises written Northern Kurdish (Kurmanjî) journalistic texts, compiled from online newspaper texts in 1999. The corpus consists of 483 texts, totalling around 214 000 words. It contains texts from two Kurdish publications: Azadiya Welat, a weekly Kurdish newspaper, and CTV, a company that broadcasts news items in Kurdish on the internet. The texts are not tagged or translated.

The corpus was compiled as part of a project on modern Kurdish syntax, conducted from 1999–2001 at the Seminar für Allgemeine und Vergleichende Sprachwissenschaft at the University of Kiel.

Last updated 28 February 2001.

Corpus files

Citation for this corpus

Haig, Geoffrey. 2001. The Corpus of Contemporary Kurdish Newspaper Texts (CCKNT). ( (date accessed)


Belelli, Sara. XXXX. ((...)). PhD dissertation, University of ((...)).

Incekan, Abdullah. 2018. Die Produktivität und Akzeptabilität von Neologismen im Verblexikon des Kurdischen: Eine korpusbasierte Untersuchung [The productivity and acceptability of neologisms in the Kurdish verbal lexicon: A corpus-based investigation]. PhD dissertation, University of Bamberg. (doi: 10.20378/irb-47265)


For inquiries, please contact Geoffrey Haig. Please direct questions concerning this website to Nils Schiborr.

The resources presented here as well as this page are hosted on the servers of the computing centre of the University of Bamberg. Relevant legal information can be found here.