Low-resource languages in the digital reality – Report from the International Conference in Vilnius

On April 16–17, 2026, the international scientific conference entitled “Linguistic Variation in the Contemporary Sociocultural Context” took place in Vilnius. The event served as a platform for the exchange of ideas among linguists and sociologists; however, from the perspective of contemporary technological challenges, two presentations by our researchers gained particular significance. These were the only papers during the entire event that directly addressed the issue of low-resource languages.

Technological challenges: Protecting against “linguistic homogenization”

A research team consisting of Prof. IS PAN Roman Roszko (ISS PAS), Dr hab. Danuta Roszko (UV), and Dr Piotr Szatkowski (ISS PAS) presented the results of their work on constructing specialized corpora for the Masurian ethnolect and the Lithuanian Puńsk dialect in Poland.

In the era of the rapid expansion of Large Language Models (LLMs), the researchers highlighted the phenomenon of “linguistic homogenisation”. The dominance of high-resource languages in AI training sets causes the specific structures of smaller varieties to be displaced by calques and simplifications.

Key aspects of the project include:

  • Resource normalisation challenges and the creation of proper processing pipelines Due to the lack of standardized orthography in dialectal texts, it was necessary to develop advanced processing pipelines. These include tasks such as cleaning “orthographic noise” and performing full substantive correction.
  • CLARIN-PL and CLARIN-PL-BIZ-Bis infrastructure The work is being carried out within the extended CLARIN-PL infrastructure, which allows for data preparation in interoperable standards (TMX, TSV, JSON), ready for integration with systems such as “KonText”.
  • Benchmarking The project aims to create closed test sets that will allow for an objective assessment of how contemporary AI models perform in understanding and generating texts in these specific linguistic varieties.

A Sociolinguistic Perspective: Can School Save a Language?

Complementing the technological view of multilingualism was an analysis by MA Andrzej Żak (ISS PAS) regarding the status of the Kashubian language. The researcher employed the term “collateral language” – a variety whose linguistic status has been historically contested and which, despite legal recognition, currently struggles with revitalization challenges.

The main findings of the study are:

  • The Educational Paradox Despite 30 years of teaching Kashubian in schools and its status as the only regional language in Poland, statistics indicate a decline in the number of active users.
  • Extra-systemic Barriers An analysis of sociolinguistic interviews revealed that key obstacles are psychological and ideological factors, such as low social prestige of the language and a deeply rooted sense of shame among older generations.
  • Future Strategy The study demonstrates that institutionalization alone (schools, government offices) is insufficient. For a language to survive, a change in social attitudes and the construction of a new, positive linguistic identity are essential.

Andrzej Żak’s participation was funded by the National Science Centre (NCN) SONATA BIS grant awarded to Prof. Nicole Dołowy: “ Linguistic diversity in Poland: collateral languages, language-oriented activities and conceptualization of collective identity” (2020/38/E/HS2/00006).

Summary: The role of CLARIN-PL and CLARIN-PL-BIZ-Bis projects in heritage protection

These presentations clearly demonstrated that protecting smaller linguistic varieties in the 21st century must follow a dual track. On one hand, advanced linguistic engineering – implemented through projects such as CLARIN-PL-BIZ-Bis – is essential to bring these languages into the digital sphere. On the other hand, sociolinguistic reflection is necessary to understand the human context of their use.

The fact that the topic of low-resource languages in Vilnius was raised almost exclusively by our representatives underscores the leading role of the Institute of Slavic Studies of the Polish Academy of Sciences (IS PAN) and the CLARIN-PL and CLARIN-PL-BIZ-Bis consortia in defining the directions of modern Digital Humanities. Without the active creation of data resources, smaller ethnolects are at risk of digital exclusion and fading into non-existence in a world governed by algorithms.

The project “CLARIN – Common Language Resources and Technology Infrastructure” is funded under the Second Priority of the European Funds for a Modern Economy 2021–2027 (FENG) program. Consortium members: Wrocław University of Science and Technology (leader), Institute of Computer Science of the Polish Academy of Sciences, Institute of Slavic Studies of the Polish Academy of Sciences, University of Lodz, University of Wrocław. 

Prof. Roman Roszko delivering his presentation. Photo: private archive.
Andrzej Żak delivering his presentation. Photo: private archive.

From Corpora to Artificial Intelligence. Keynote Address by Prof. Roman Roszko at the ‘Corpus Linguistics in Science and Education’ Conference in Kyiv

During the V International Scientific and Practical Conference Corpus Linguistics in Science and Education’, organised by the National Pedagogical Dragomanov University in Kyiv, the scientific community’s attention was focused on the plenary session delivered by Prof. Roman Roszko from the Institute of Slavic Studies of the Polish Academy of Sciences (ISS PAS). The presentation served not only as the substantive foundation of the event but also as a clear signal that Polish computational linguistics is entering a new digital era.

The presentation, titled ‘Evolution of the Team of Semantics and Computational Linguistics of ISS PAS: From Corpus Resources to Polish Language Models’, was met with immense interest from over 150 participants attending the session.

Prof. Roszko provided an exceptionally clear and profound outline of the journey taken by the current Semantics and Corpus Linguistics Team. This history evolved from building fundamental monolingual, bilingual, and multilingual corpora towards advanced solutions in the field of Artificial Intelligence (AI).

Key Pillars of the Presentation

The core of the lecture focused on the Institute’s involvement in projects critical to Polish technological sovereignty, including PLLuM (Polish Large Language Model), HIVE AI, CLARIN-PL-BIZ-Bis, and DARIAH-HUB.

Prof. Roszko highlighted the unique role of ISS PAS in creating high-quality datasets, specifically:

  • Textual data for training, evaluation, and testing.
  • Organic instruction data, which enables models to better understand user intent.
  • Specialised programming data, essential for developing tools that support coding.

Another vital topic was the Institute’s role in verifying effective Text and Data Mining (TDM) reservations. In an era of mass AI model training, protecting resources and precisely managing data access has become a priority—a field in which ISS PAS serves as a leading expert.

Recognition from the Academic Community

The organisers from the Faculty of Foreign Philology at the Ukrainian State Dragomanov University expressed high praise for the contribution. In an official statement, they emphasised that Prof. Roszko’s speech was the ‘brightest highlight of the programme’, noting that the depth of the analytical and developmental work described, combined with the speaker’s high level of professionalism, set the tone for all subsequent scientific discussions.

‘This presentation caused a significant stir in the scientific community. We are impressed by the research depth and look forward to further fruitful cooperation within future scientific projects’, wrote the conference Organising Committee.

Prof. Roman Roszko’s appearance at the Kyiv conference confirmed that ISS PAS does more than just archive linguistic heritage; it actively shapes the future of computational linguistics, building bridges between traditional science and the technologies of tomorrow.

Presentation title slide. Photo: private archive.
Participants during the conference. Photo: private archive.
Prof. Roman Roszko during the presentation. Photo: private archive.
Prof. Roman Roszko presenting the PLLuM model. Photo: private archive.

Invitation to Participate in the Conference “Linguistic Variation in the Contemporary Sociocultural Context”

We invite submissions of proposals for participation in the international conference “Linguistic Variation in the Contemporary Sociocultural Context,” organized by the Institute of the Lithuanian Language in cooperation with the Institute of Slavic Studies of the Polish Academy of Sciences. The conference will take place on 16–17 April 2026 in Vilnius.

The conference languages are English and Lithuanian.

Proposed thematic areas:

  • Regional Dialectology: Distribution, Coexistence, and Change of Local Language Variants.
  • Language Attitudes and the Influence on Language Change and Development Processes.
  • The Intersection of Dialectology and Ethnolinguistics in Contemporary Linguistic Research.
  • Influence of Social and Cultural Environment on Language: Diagnostics and Prognostics.
  • Digital Linguistic Resources and Analytical Methods.

The application form together with the abstract should be submitted by 2 March 2026: https://docs.google.com/forms/d/e/1FAIpQLScnEMs9WBCq_Kdy2yFAYekkOWWqCUzWp9mOHUZ0X7TSbexhzA/viewform

The COST Action PLURILINGMEDIA Conference in Warsaw is behind us

 

On 3–5 December 2025, COST Action CA23105 PLURILINGMEDIA was hosted at the Institute of Slavic Studies of the Polish Academy of Sciences.

On 3 December, the COST Action Management Committee meeting took place at the Staszic Palace. Around 40 people representing all European countries and many associated countries came to Warsaw for the annual meeting of the PLURILINGMEDIA network’s Management Committee. Other participants joined the meeting online.

The meeting was opened by Prof. Nicole Dołowy, leader of Working Group 3 Language Vitality and a member of the network’s Core Group, and Dr Craig Willis, Chair of COST Action PLURILINGMEDIA. The meeting focused on the actions undertaken within the network so far, plans for the upcoming year, and the longer-term outlook.

On 4–5 December, the 1st PLURILINGMEDIA General Conference: Media and Language Vitality, organised by the Institute of Slavic Studies PAS, was held. This two-day gathering of researchers and media practitioners working with minority languages was dedicated to the role of media in preserving and promoting these languages. More than 80 participants from Europe and associated countries took part.

Nine thematic sessions were organised on topics such as: translation and accessibility; reception and minority-language media; AI, accessibility and the digital divide; language vitality and print media; media and the preservation of identity and languages; language ideologies, stigmatization and media discourse; digital spaces and online language use; the role of journalists; social media and languages; and language learning and media. In addition to paper sessions, three panels were held on: multilingual families and media; virtual spaces; and new media in the context of the European Charter for Regional or Minority Languages.

The conference was opened by Prof. Elin Haf Gruffydd Jones, President of the European Language Equality Network (ELEN) from the University of Wales Trinity Saint David. Her lecture, “Resilience and Rights, Revitalisation and Reach: What Media Tells Us about the Future of Linguistic Diversity”, provided an overview of existing research on minority-language media and pointed to possible future research directions.

A plenary discussion featuring minority-language media practitioners also took place. The following speakers presented their work, its reception, and its possible impact on audiences:
– Anna Nikitiuk, who discussed her blog “Anna Nikitiuk – po swojomu” and the role of social media in preserving the local speech of Podlasie;
Dr Piotr Szatkowski, who spoke about the presence of Kashubian on social media and his experience creating media content in the Mazurian language;
– Paola Valenta from the Associazione dei Giovani CNI – the association of young people belonging to the Italian minority in Slovenia.

Three representatives of the Institute of Slavic Studies PAS took part in the conference.
Prof. Nicole Dołowy, the main organiser of the conference, together with Prof. Sanita Martena from RTU Rēzekne Academy, delivered a paper titled “Digital Media and Their Role in Strengthening Literacy in Collateral Languages: Case Studies of Latgalian, Kashubian and Podlachian”, devoted to grassroots literacy practices in three collateral languages. Dr Piotr Szatkowski, a member of the COST Action PLURILINGMEDIA network and of the organising committee, presented the paper “AI-Generated Kashubian Voices: Community Perspectives on Concept and Implementation”, discussing the results of his survey on language attitudes and ideologies regarding the role of AI in producing content in both the dominant language (Polish) and the minority language (Kashubian). Dr Olha Tkachenko, in her paper “Dialects and Regional Languages on Ukrainian YouTube: Representation and Interaction” examined the ways regional languages are represented in Ukrainian social media.

The presentations and lectures were accompanied by lively discussions. The idea of COST Action is to foster networking among researchers as well as participants from outside the strictly academic sphere. The conference organised in Warsaw by the Institute of Slavic Studies PAS fulfilled this mission: it became a space for meetings, exchange of experiences, methods, and possible research applications.

More information about COST Action CA23105 PLURILINGMEDIA can be found at: https://plurilingmedia.eu/

Prof. Nicole Dołowy and Prof. Craig Willis during the opening of the conference. Photo: private archive.
Participants of the conference. Photo: private archive.
Prof. Craig Willis, Dr Piotr Szatkowski, Paola Valenta and Dr Anna Nikitiuk during the discussion. Photo: private archive.
Prof. Nicole Dołowy and Prof. Sanita Martena delivering their paper. Photo: Kinga Capik.
Dr Olha Tkachenko, Prof. Nicole Dołowy and Dr Piotr Szatkowski at the COST Action PLURILINGMEDIA conference. Photo: private archive.
Dr Olha Tkachenko delivering her paper. Photo: Kinga Capik.
The first COST Action PLURILINGMEDIA Conference: Media and Language Vitality in Warsaw. Photo: private archive.

Dr. Orest Semotiuk at the ISHS Conference in Kraków

From July 7 to 11, 2025, Dr. Orest Semotiuk is participating in the 35th International Society for Humor Studies (ISHS) Conference.

During the “Online Humor” panel, the researcher delivered a lecture titled “Grim Reaper Meme in Armed Conflicts: Origin and Evolution”.

More information about the conference is available at: https://ishs2025.pl/.

Dr. Orest Semotiuk during the lecture. Photo: private archive.

Dr Olha Tkachenko at the Warsaw East European Conference

From June 30 to July 2, 2025, Dr Olha Tkachenko participated in the 21st Warsaw East European Conference (WEEC), organized by the Centre for East European Studies at the University of Warsaw.

This year’s conference was titled “Time of Global Turbulence: Challenges for Central and Eastern Europe”. Dr Tkachenko presented a paper entitled “Revealing and Counteracting Russian Narratives in Ukrainian Media. Case of YouTube” during the panel “Responding to Hybrid Threats: Disinformation and Democratic Resilience”.

Dr Tkachenko’s research is conducted within the framework of the project “Decolonization processes in Ukraine’s YouTube segment after February 24, 2022”, project no. 2024/08/X/HS2/00066, funded by the National Science Centre of Poland (Miniatura programme).

Dr Olha Tkachenko during a lecture. Photo: Iryna Polets-Gerus.
Dr Olha Tkachenko during a lecture. Photo: Iryna Polets-Gerus.

PLURILINGMEDIA General Conference: Call for Applications

   

We are pleased to announce that first PLURILINGMEDIA conference will take place 4th –5th December 2025 in Warsaw. Attached is the Call for Applications, with a deadline of 1st June.

Link to the application form is available here: https://docs.google.com/forms/d/1usS_76iiD2qCGKUUlLGCdSlo-O3QMFmVpoKuZTj12DQ/viewform

The main theme of the conference is Media and Language Vitality, reflecting Working Group 3 as the lead organisers. The conference will be hosted by the Institute of Slavic Studies, Polish Academy of Sciences, under the leadership of PLURILINGMEDIA WG3 Leader, prof. Nicole Dołowy-Rybińska and Vice-Leader, prof. Sanita Martena.

There are no registration fees for the event and presenting participants will have their costs reimbursed, as per COST Association – European Cooperation in Science and Technology rules.

Dr. Anton Dinerstein at the ASEEES conference in Boston

Conference badge. Photo: private archive.

On November 21–24, 2024, Dr Anton Dinerstein participated in the 56th Annual Convention of the Association for Slavic, East European, and Eurasian Studies (ASEEES).

His paper entitled „Power as Identity Category: Discursive Construction of Politics via Russian-Language Political Discourse in Belarus” was presented as a part of the roundtable „Belarusian Culture I: Language(s)”.

Dr. Anton Dinerstein is implementing the project „Power and Identity in Russian-Language Political Discourse: the Case of Belarus” at ISS PAS (reg. no. 2022/45/P/HS2/02636), which is co-financed by the National Science Centre and the European Union Framework Programme for Research and Innovation Horizon 2020 under the Marie Skłodowska-Curie grant agreement no. 945339.

Dr. Orest Semotiuk in Boston (MA, USA)

On November 21–24, 2024, Dr. Orest Semotiuk participated in the 56th Annual Convention of the Association for Slavic, East European, and Eurasian Studies (ASEEES). His presentation “Ideology vs. Quasi-Ideology: Ruscism and «Ukrofascism» in World, Ukrainian, and Russian Political Cartoons and Memes” was a part of the panel “Make Laughter, Not War: Caricature, Emotion, and Politics in the Post-soviet Era”.

Dr. Orest Semotiuk’s research is carried out within the framework of project no. 2022/45/P/HS2/02536 co-financed by the National Science Centre and the European Union’s Framework Programme for Research and Innovation Horizon 2020 under contract no. 945339 within the framework of the Marie Skłodowska-Curie actions.

 

Dr. Orest Semotiuk during the lecture. Fot. private archive.
Conference materials. Fot. private archive.

Call for Papers: „(In)Visible Russian (Anti-)War Migration” – international conference, Warsaw, 13–15 March 2024

We would like to invite you to participate in the international academic conference “(In)Visible Russian (Anti-)War Migration”, which will take place on March 13–15, 2024, in Warsaw, Poland.

Organizers: Institute of Slavic Studies PAS, Faculty of Sociology UW, Institute of Archaeology and Ethnology PAS, Institute of Ethnology and Cultural Anthropology UW

The conference aims to explore all dimensions of Russian (anti-)war migration and examine the local responses of host countries at micro, meso, and macro levels. The idea for the conference arises from the project “Crossing Borders, Building Walls: Towards an Ethnography of Russian War Mobilization” (NAWA BPN/GIN/2022/1/00082/DEC/1, 2023-2024), conducted at our Institute by Dr. Katarzyna Roman-Rawska.

Full CFP and additional info: CfP (In)Visible Russian (Anti-)War Migration 13-15.03.2024

Abstract submission closes: December 1, 2023

FB event: https://fb.me/e/2GoITXfOU

The conference is co-funded by the state budget, granted by the Ministry of Education and Science, Republic of Poland, under the programme “Excellent Science II – Support for scientific conferences”.

Institute of Slavic Studies, Polish Academy of Sciences

By continuing to use the site, you agree to the use of cookies, in accordance with the current browser settings. Privacy policy

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close