Kolmogorov complexity of morphs and constructions in English

Katharina Ehret


This chapter demonstrates how compression algorithms can be used to address morphological and syntactic complexity in detail by analysing the contribution of specific linguistic features to English texts. The point of departure is the ongoing complexity debate and quest for complexity metrics. After decades of adhering to the equal complexity axiom, recent research seeks to define and measure linguistic complexity (Dahl 2004; Kortmann and Szmrecsanyi 2012; Miestamo et al. 2008).

Against this backdrop, I present a new flavour of the Juola-style compression technique (Juola 1998), targeted manipulation. Essentially, compression algorithms are used to measure linguistic complexity via the relative informativeness in text samples. Thus, I assess the contribution of morphs such as –ing or –ed, and functional constructions such as progressive (be + verb-ing) or perfect (have + verb past participle) to the syntactic and morphological complexity in a mixed-genre corpus of Alice’s Adventures in Wonderland, the Gospel of Mark and newspaper texts. I find that a higher number of marker types leads to higher amounts of morphological complexity in the corpus. Syntactic complexity is reduced because the presence of morphological markers enhances the algorithmic prediction of linguistic patterns. To conclude, I show that information-theoretic methods yield linguistically meaningful results and can be used to measure the complexity of specific linguistic features in naturalistic copora. 


linguistic complexity; computational linguistics,; morphology; Kolmogorov complexity

Full Text:



  • There are currently no refbacks.