Method of web-extraction (web scraping) of Russian verb paradigms from electronic dictionaries and databases. Matrix organization of lacunae, their codification and classification (on the material of the verbs of sound)
Alternative Title
Метод веб-извлечения парадигм русских глаголов из электронных словарей и баз данных. Матричная организация лакун, их кодификация и классификация (на материале глаголов звучания). Рабочие тезисы. Февраль
Abstract
At present linguists have access to digital collections of dictionaries and language samples known as “corpora”. Data mining using these sources allows to statistically verify the words’ (wordforms’) codification, determine their frequency of usage with respect to grammatical and colloquial contexts. Data collected by using the web scraping methodology in this research may be used on their own or in combination with data collected from other big data sources. This project is especially significant for the “rich morphology” languages. Different strategies may be applied for gathering, visualizing, or analyzing data from various online Russian dictionaries, corpora, or from other big data sources using digital technologies (e.g., from web portals, to computer-assisted text collections, etc.).
Start Date
01 Nov 2020
End Date
28 Feb 2021
Recommended Citation
Ivliyeva, Irina V. and Koob, Perry, "Method of web-extraction (web scraping) of Russian verb paradigms from electronic dictionaries and databases. Matrix organization of lacunae, their codification and classification (on the material of the verbs of sound)" (2021). Research Data. 7.
https://scholarsmine.mst.edu/research_data/7
Contact Information
Dr. Irina V. Ivliyeva, ivliyeva@mst.edu
Professor of Russian, Arts, Languages, and Philosophy Department
Missouri University of Science and Technology
Perry B. Koob, koobp@mst.edu
Database Administrator/System Administrator
Academic Technology Support Team
Missouri S&T Information Technology
Department(s)
Arts, Languages, and Philosophy
Document Type
Data
Document Version
Final Version
File Format
text
Language(s)
Russian
Language 2
English
Publication Date
28 Feb 2021