Method of web-extraction (web scraping) of Russian verb paradigms from electronic dictionaries and databases. Matrix organization of lacunae, their codification and classification (on the material of the verbs of sound)

Alternative Title

Метод веб-извлечения парадигм русских глаголов из электронных словарей и баз данных. Матричная организация лакун, их кодификация и классификация (на материале глаголов звучания). Рабочие тезисы. Февраль

Ивлиева, И.В.
Kуб, Перри
______________________________________________

Abstract

At present linguists have access to digital collections of dictionaries and language samples known as “corpora”. Data mining using these sources allows to statistically verify the words’ (wordforms’) codification, determine their frequency of usage with respect to grammatical and colloquial contexts. Data collected by using the web scraping methodology in this research may be used on their own or in combination with data collected from other big data sources. This project is especially significant for the “rich morphology” languages. Different strategies may be applied for gathering, visualizing, or analyzing data from various online Russian dictionaries, corpora, or from other big data sources using digital technologies (e.g., from web portals, to computer-assisted text collections, etc.).

Start Date

01 Nov 2020

End Date

28 Feb 2021

Contact Information

Dr. Irina V. Ivliyeva, ivliyeva@mst.edu
Professor of Russian, Arts, Languages, and Philosophy Department
Missouri University of Science and Technology

Perry B. Koob, koobp@mst.edu
Database Administrator/System Administrator
Academic Technology Support Team
Missouri S&T Information Technology

Department(s)

Arts, Languages, and Philosophy

Document Type

Data

Document Version

Final Version

File Format

text

Language(s)

Russian

Language 2

English

Publication Date

28 Feb 2021

Share

 
COinS