Method of web-extraction (web scraping) of Russian verb paradigms from electronic dictionaries and databases. Matrix organization of lacunae, their codification and classification (on the material of the verbs of sound)

Alternative Title

Метод веб-извлечения парадигм русских глаголов из электронных словарей и баз данных. Матричная организация лакун, их кодификация и классификация (на материале глаголов звучания). Рабочие тезисы. Февраль

Ивлиева, И.В.
Kуб, Перри


At present linguists have access to digital collections of dictionaries and language samples known as “corpora”. Data mining using these sources allows to statistically verify the words’ (wordforms’) codification, determine their frequency of usage with respect to grammatical and colloquial contexts. Data collected by using the web scraping methodology in this research may be used on their own or in combination with data collected from other big data sources. This project is especially significant for the “rich morphology” languages. Different strategies may be applied for gathering, visualizing, or analyzing data from various online Russian dictionaries, corpora, or from other big data sources using digital technologies (e.g., from web portals, to computer-assisted text collections, etc.).

Start Date

01 Nov 2020

End Date

28 Feb 2021

Contact Information

Dr. Irina V. Ivliyeva,
Professor of Russian, Arts, Languages, and Philosophy Department
Missouri University of Science and Technology

Perry B. Koob,
Database Administrator/System Administrator
Academic Technology Support Team
Missouri S&T Information Technology


Arts, Languages, and Philosophy

Document Type


Document Version

Final Version

File Format




Language 2


Publication Date

28 Feb 2021