Abstract:
This methodologically-oriented corpus-driven study focuses on distinctive patterns of language use in a spe-cialized text type, namely Russian patient information leaflets. The study’s main goal is to identify keywords and recurrent sequences of words that account for the leaflets’ formulaicity, and — as a secondary goal — to describe their discoursal functions. The keywords were identified using three methods (G2, Hedges’ g and Neozeta) and the overlap between the three metrics was explored. The overlapping keywords were qualitatively analyzed in terms of discoursal functions. As for the distinctive multi-word patterns, we focused on recurrent n-grams with the largest coverage in the corpus: these were identified using the Formu-lex method (Forsyth, 2015b), which provides complementary data with respect to more conservative n-gram and lexical bundles approaches. The results revealed that the most distinctive keywords were identified using Hedges’ g metric, that the largest overlap occurred between G2 and Neozeta metrics, and that the frequent use and discoursal functions of the identified lexical patterns correspond with situational contexts and communicative purposes of patient information leaflets. It is hoped that this study will provide an opportunity for a methodological reflection and inspire further corpus-driven research on distinctive recurrent lexical patterns (e.g., keywords, n-grams, lexical bundles) or — more generally — on formulaic language in texts originally written in Russian.