Location
Havener Center, Carver/Turner Room, 1:30pm-3:30pm
Start Date
4-2-2026 2:00 PM
End Date
4-2-2026 2:30 PM
Presentation Date
April 2, 2026; 2:00pm-2:30pm
Description
Modern next-generation sequencing (NGS) projects routinely generate terabytes of data that researchers download from public repositories such as SRA and ENA. Existing download tools typically employ static concurrency settings, leading to inefficient bandwidth utilization and prolonged download times under dynamic network conditions. We introduce FastBioDL, a parallel downloader for large biological datasets with an adaptive concurrency controller. FastBioDL frames downloading as an online optimization problem, using a utility function and gradient descent to adjust the number of concurrent socket streams during runtime. This approach maximizes throughput while minimizing resource overhead. Evaluations on public genomic datasets show that FastBioDL achieves up to 4x speedup over state-of-the-art tools, and in high-speed network experiments, it is up to 2.1x faster than existing approaches. By optimizing standard HTTP and FTP downloads entirely on the client side, FastBioDL provides an efficient solution for large-scale genomic data acquisition without requiring specialized commercial software or protocols.
Biography
Rasman Mubtasim Swargo is an M.S. student in Computer Science at Missouri University of Science and Technology, where he works as a Graduate Research Assistant under Dr. Md Arifuzzaman. His research focuses on ML-for-systems, with an emphasis on reinforcement learning-based controllers for optimizing large-scale, high-speed data movement. He developed AutoMDT, a PPO-based system for adaptive data transfer pipelines, published at the INDIS workshop at SC25, and FastBioDL, an adaptive genomic downloader that achieves up to 4× speedup over existing tools. Prior to graduate school, he earned his B.Sc. in Computer Science and Engineering from Bangladesh University of Engineering and Technology and worked as a Software Engineer at Chaldal, a Y Combinator-backed company. He is a recipient of the IEEE CS Richard E. Merwin Scholarship and the IEEE TCHPC travel grant.
Meeting Name
2026 - Miners Solving for Tomorrow Research Conference
Department(s)
Computer Science
Document Type
Presentation
Document Version
Final Version
File Type
text
Language(s)
English
Rights
© 2026 The Authors, All rights reserved
Adaptive Parallel Downloader for Large Genomic Datasets
Havener Center, Carver/Turner Room, 1:30pm-3:30pm
Modern next-generation sequencing (NGS) projects routinely generate terabytes of data that researchers download from public repositories such as SRA and ENA. Existing download tools typically employ static concurrency settings, leading to inefficient bandwidth utilization and prolonged download times under dynamic network conditions. We introduce FastBioDL, a parallel downloader for large biological datasets with an adaptive concurrency controller. FastBioDL frames downloading as an online optimization problem, using a utility function and gradient descent to adjust the number of concurrent socket streams during runtime. This approach maximizes throughput while minimizing resource overhead. Evaluations on public genomic datasets show that FastBioDL achieves up to 4x speedup over state-of-the-art tools, and in high-speed network experiments, it is up to 2.1x faster than existing approaches. By optimizing standard HTTP and FTP downloads entirely on the client side, FastBioDL provides an efficient solution for large-scale genomic data acquisition without requiring specialized commercial software or protocols.

Comments
Advisor: Md Arifuzzaman, marifuzzaman@mst.edu