Abstract
This project aims to enhance web extraction techniques pertaining to a specific lexical-semantic group of Russian verbs of sounds, which undergo semantic modifications at the word-formation level (affixation). Additionally, it seeks to organize search results in a manner conducive to linguistic research using Multi-Dimensional Scaling (MDS) techniques and novel visualization strategies.
The primary objective in this phase of the research was to gather, consolidate, analyze, and present a comprehensive index of all forms of verbs of sound sourced from the A.A. Zalizniak Grammatical Dictionary of the Russian language, hyperlink each verbal form in this index with the Russian National Corpus (RNC) - a closed digital repository of more than two billion entries with over five-million-word forms. The output not only encompasses all documented modifications of verbs of sound but also identifies gaps, duplicates, and suggests new potential units.
The research methodology has value for developing diverse applications aimed at searching, collecting, and visualizing linguistic data of various volumes and complexities. Leveraging combinatorial optimization techniques in processing open and closed linguistic databases can be particularly crucial when extracting information from diverse digital lexicographic sources across single or multiple languages, national linguistic corpora, and digital text collections. Our cross-disciplinary research highlights the transformative role of digital tools in advancing linguistic inquiry and illustrates the ways for future explorations at the intersection of natural language, technology, and culture.
Recommended Citation
Simmons, John and Ivliyeva, Irina V., "Russian Verbs of Sound’s Web-Scraping Results from the A.A. Zalizniak Grammatical Dictionary and the Russian National Corpus. Multi-Dimensional Scaling Techniques and Visualization Strategies" (2024). Graduate Student Research & Creative Works. 4.
https://scholarsmine.mst.edu/gradstudent_works/4
Meeting Name
ISC Graduate Research Symposium
Department(s)
Arts, Languages, and Philosophy
Second Department
Computer Science
Research Center/Lab(s)
Intelligent Systems Center
Sponsor(s)
Intelligent Systems Center
Keywords and Phrases
Web scraping techniques, visualization of linguistic data, corpus linguistics, Russian word-formation synthesis, Multi-Dimensional Scaling (MDS), digital lexicographic sources
Document Type
Presentation
Document Version
Final Version
File Type
text
Language(s)
English
Rights
© 2024 John Simmons, All Rights Reserved
Publication Date
April 29, 2024
Comments
Irina Ivliyeva, Faculty Advisor