
This project aims to enhance web extraction techniques pertaining to a specific lexical-semantic group of Russian verbs of sounds, which undergo semantic modifications at the word-formation level (affixation). Additionally, it seeks to organize search results in a manner conducive to linguistic research using Multi-Dimensional Scaling (MDS) techniques and novel visualization strategies.

The primary objective in this phase of the research was to gather, consolidate, analyze, and present a comprehensive index of all forms of verbs of sound sourced from the A.A. Zalizniak Grammatical Dictionary of the Russian language, hyperlink each verbal form in this index with the Russian National Corpus (RNC) - a closed digital repository of more than two billion entries with over five-million-word forms. The output not only encompasses all documented modifications of verbs of sound but also identifies gaps, duplicates, and suggests new potential units.

The research methodology has value for developing diverse applications aimed at searching, collecting, and visualizing linguistic data of various volumes and complexities. Leveraging combinatorial optimization techniques in processing open and closed linguistic databases can be particularly crucial when extracting information from diverse digital lexicographic sources across single or multiple languages, national linguistic corpora, and digital text collections. Our cross-disciplinary research highlights the transformative role of digital tools in advancing linguistic inquiry and illustrates the ways for future explorations at the intersection of natural language, technology, and culture.

Meeting Name

ISC Graduate Research Symposium


Arts, Languages, and Philosophy

Second Department

Computer Science

Research Center/Lab(s)

Intelligent Systems Center


Intelligent Systems Center


Irina Ivliyeva, Faculty Advisor

Keywords and Phrases

Web scraping techniques, visualization of linguistic data, corpus linguistics, Russian word-formation synthesis, Multi-Dimensional Scaling (MDS), digital lexicographic sources

Document Type


Document Version

Final Version

File Type





© 2024 John Simmons, All Rights Reserved

Publication Date

April 29, 2024
