Empirical Verification of the Polish Formula of Text Difficulty

The aim of the study was to verify the accuracy of the formula for assessing the difficulty of texts written in Polish (Pisarek 1969). The study involved 1,309 persons aged between 15 and 84. 15 texts were used, each approximately 300 words long, representing different subjects and varied difficulty level. Text comprehension was checked with multiple choice tests, the cloze procedure and open-ended questions. Significant correlations between the difficulty of a text and its comprehension were found (rmc(15) =-0.529, p = 0.043; ropen =-0.519, p = 0.047; rcloze =-0.656, p = 0.008). The results confirmed relative accuracy and usefulness of Pisarek's readability formula. The discussion includes proposed ways of improving the current form of the formula.


INTRODUCTION.
It is a truism that the text's author wants to be properly understood by the recipients.Thus, the ability to write clearly and understandably is one of the basic writing skills.
So as to increase the chances that the recipients will understand the text in conformity with the author's intention, the author needs to pay attention to text readability.
In literature of the subject various definitions and ways of understanding of readability can be found (Klare 1963, Samson 1993).First, some definitions point out the legibility of the printed material due to its layout or typographic features.Thus, readability understood this way is affected by the layout of the text, font, the presence of graphics etc.Second, some other definitions connect readability with the easiness of reading thanks to interesting content or the aesthetic value of the writing style.Third, a definition of readability may be simply focused on easiness of text comprehension due to the style of writing (Klare 1963).Readability described this way is connected with the comprehensibility of the text: the higher readability of the text, the more comprehensible it is (Samson 1993:58).This third way of understanding readability has been adopted in this text.
One objective of studies on readability is to develop tools to measure it (Klare, 1963).Apart from classic methods (such as dichotomous questions, open-ended questions, single/multiple choice tests etc.), the cloze procedure developed by Taylor (1953) is very popular.This method is based on the assumption drawn from Gestalt psychology of the human tendency to perceive the entirety even if only a part is visible.So a human strives for a mental closure of the figure.
According to Taylor (1956), the cognitive process involved in doing a cloze test is similar: people try to complete the text which has a 'defect' using predictions based on the available contextual hints.The 'defect' means that every nth (usually every 5th) word is removed from the text and the reader is asked to complete each gap with one word that he or she thinks was used by the author.
In order to determine text readability with the cloze procedure, readability index is used, being the sum or percentage of correctly filled gaps.The validity of this method, estimated on the basis of correlation with other readability measures, has been confirmed (Rankin 1959;Bormuth 1966).Its great advantage is that it not only takes into account the factors affecting text readability that have been already known but also the ones that have not been discovered or measured yet; furthermore, it does not disregard non-linguistic factors such as the reader's knowledge, linguistic skills or even motivation and interest in the text (Pisarek 2007:249).Despite many advantages of the cloze method, it also has some limitations.It is time consuming and laborious, as it involves finding the participants, their engagement in the study, and then calculating the results.One of the ways of estimating text readability which do not have the limitations connected with the cloze test are analytical methods -readability formulas (Flesch 1948, Gunning 1952, Chall & Dale 1995).Most of the formulas take into account two main factors: lexical or semantic features (such as the length of words, their similarity and frequency of occurrence (popularity)) and sentence or syntactic features (average sentence length; Chall & Dale 1995).Despite justified criticism of readability formulas (Bailin & Grafstein 2001, Klare 1974), thanks to their practical values they are still appreciated as a method of readability measurement.
The vast majority of readability formulas were developed for the English language, which is a considerable limitation.So it is not obvious whether they can be effectively applied to other languages, and if so, whether they should be used in the form identical to that for the English language or should be changed substantially.Pisarek (1969) attempted to develop a readability formula for the Polish language.In the creation of his formula, he adopted one factor -average sentence length -directly from English.
In the case of the other factor, he chose another way, taking into consideration the specificity of the Polish language.On the basis of linguistic analyses, he decided that the best indicator of lexical features was the percentage of difficult words, i.e. words having 4 or more syllables in the dictionary form.The text difficulty formula proposed by Pisarek (1969) is: where T -text difficulty, Ts -syntax difficulty index, meaning the average sentence length (in words), Tw -vocabulary difficulty index, meaning the percentage of "difficult" (four-syllable and longer) words.In addition, Pisarek (2007:258) presented ranges describing text difficulty: 4-7very easy texts, 7.1-10 -easy texts, 10.1-13 -average texts, 13.1-16 -difficult texts, 16.1-20very difficult texts.
In spite of its potential practical value, Pisarek's formula has one important flaw: it has never been empirically verified.It means that so far nobody has attempted to answer the question whether it can really accurately determine the difficulty of a text, thus being a predictor of comprehensibility for the potential recipient group.
Considering the above mentioned facts, the aim of this study was to check empirically the predictive validity of Pisarek's formula of text difficulty.

METHOD
2.1 PARTICIPANTS.The study involved 1,321 individuals whose native language was Polish.
12 persons returned tests without completing them, so finally data from 1,309 individuals were used (including 844 women and 464 men; 1 person did not provide the information on sex).The sociodemographic characteristics of the studied sample is presented in Table 1.The participants' mean age was 34.87 years, SD = 16.33,min.= 15, max.= 83.Approximately 2/3 of the sample were people with secondary or higher education.Most of them (75.7%)lived in towns with up to 500 thousand residents.More than half of the participants were professionally active, whereas nearly 1/3 were school or university students.
2.2 MEASURES.15 non-literary tests in Polish, each about 300 words long, were used in the study.The texts were selected so as to ensure a variety of types, topics and the level of difficulty initially determined with the Gunning fog index (Gunning 1952).The texts included: 2 texts from school handbooks, 2 academic texts, 2 acts, 2 official letters, 2 manuals, 2 legal brochures and 3 newspaper/magazine articles (see Table 2).
All participants received a study set including a demographics section and 3 texts.The sets were created randomly: one text was drawn from each group of tests and they were joined together.If the same text was drawn, the drawing was repeated.In order to control the impact the order of text types might have had on the dependent variable, the order of tests in a set was regularly changed.
The answers to multiple choice questions and the words inserted in the cloze test were checked by the persons who collected data against the answer key.The answers to open-ended questions were checked by a team of 2 linguists, other than the team creating the tests.

PROCEDURE.
The paper-and-pencil questionnaires were distributed with the snowball method: they were handed to 50 persons coming from different environments and having different education levels, who were asked to reach possibly varied groups of people.
Before completing the tests, the supervisor informed the respondents of the objective of the study, as well as of its anonymous and voluntary nature.The time of test completion was on average 30-40 minutes. 2 presents the linguistic features of the texts calculated with the Jasnopis application (Broda et al. 2014) and the results of comprehension tests.The scale of difficulty of the texts measured with Pisarek's formula ranged from T = 7.23 for text 4 (Drug administration instruction) to T = 15.09 for text 14 (Accounting Act).

<INSERT TABLE 2 ABOUT HERE>
The correlations between the variables were calculated with the use of Pearson's correlation coefficient.No significant relationship was found between the two components of the formula: average sentence length and percentage of difficult words (r(15) = 0.043; p = 0.880).The analysis of correlation between linguistic variables and text comprehension gave the following results: the average sentence length correlated positively with comprehension assessed with the cloze test (r(15) = -0.607;p = 0.016).The negative relationship was also noted for the average sentence length and the result of multiple choice tests and open-ended questions, although the correlation coefficients for these variables did not prove to be statistically significant (rmc(15) = -0.322;p = 0.243; ropen(15) = -0.401;p = 0.139).The percentage of difficult words correlated negatively with comprehension in the multiple choice test (r(15) = -0.607;p = 0.016) and the open-ended questions test at the statistical tendency level (r(15) = -0.477;p = 0.072).In the case of the cloze test, no significant relationships were noted r(15) = -0.276;p = 0.320).
The most important analyses concerned the correlations between the difficulty of texts calculated on the basis of Pisarek's formula and their comprehension.The relationships proved to be significant for all the types of test.Correlation coefficients were, respectively: -0.519 for the open-ended questions test (p = 0.047), -0.529 for the multiple choice test (p = 0.043) and -0.656 for the cloze test (p = 0.008).

DISCUSSION OF RESULTS.
To sum up the results of the study, Pisarek's formula in the present form turned out to be a relatively good predictor of difficulty of Polish non-literary texts.It can be used e.g. in editors' work to estimate the difficulty of a text, and thus to preliminarily evaluate the match between the style in which the text was written and its potential readers.
The obtained results support the opinion of authors of readability formulas for the English language that text readability is determined by its two main components: lexical/semantic features and syntax/syntactic features (Chall & Dale, 1995).Hence, these factors may be good predictors of text readability regardless of the language.Differences between languages may only occur at the level of details, concerning the selection of indicators for these features.However, the hypothetical universal character of readability components must be further investigated.
Despite confirming relationships between the difficulty of texts measured with Pisarek's formula and their comprehension, it should be noted that these relationships, although significant, were not strong.In the case of two components of the difficulty formula, i.e. the average sentence length and the percentage of difficult words, significant correlations were only found for some comprehension tests.The obtained results may have been influenced by the small number of texts used in the study.Because the size of the sample affects the significance of correlation coefficient (Rubin 2013:214), considering a greater number of texts might influence the number of significant correlations between the variables.
Another important issue is whether and how it would be possible to improve the accuracy of Pisarek's formula.At least three potential ways of its improvement can be proposed: For each text, 3 kinds of reading comprehension tests were prepared: multiple-choice, cloze and openended questions.The multiple choice test was made up of 4 single choice questions, each having 4 alternative answers.The cloze test, consisting of 50 gaps, was prepared by removing every 5th word from the text, beginning with the second sentence.The open-ended questions test included 5 questions.

TABLE 2 .
Characteristics of texts: linguistic features and comprehension level