Location

Havener Center, Carver/Turner Room, 1:30pm-3:30pm

Start Date

4-2-2026 2:00 PM

End Date

4-2-2026 2:30 PM

Presentation Date

April 2, 2026; 2:00pm-2:30pm

Description

Modern next-generation sequencing (NGS) projects routinely generate terabytes of data that researchers download from public repositories such as SRA and ENA. Existing download tools typically employ static concurrency settings, leading to inefficient bandwidth utilization and prolonged download times under dynamic network conditions. We introduce FastBioDL, a parallel downloader for large biological datasets with an adaptive concurrency controller. FastBioDL frames downloading as an online optimization problem, using a utility function and gradient descent to adjust the number of concurrent socket streams during runtime. This approach maximizes throughput while minimizing resource overhead. Evaluations on public genomic datasets show that FastBioDL achieves up to 4x speedup over state-of-the-art tools, and in high-speed network experiments, it is up to 2.1x faster than existing approaches. By optimizing standard HTTP and FTP downloads entirely on the client side, FastBioDL provides an efficient solution for large-scale genomic data acquisition without requiring specialized commercial software or protocols.

Biography

Rasman Mubtasim Swargo is an M.S. student in Computer Science at Missouri University of Science and Technology, where he works as a Graduate Research Assistant under Dr. Md Arifuzzaman. His research focuses on ML-for-systems, with an emphasis on reinforcement learning-based controllers for optimizing large-scale, high-speed data movement. He developed AutoMDT, a PPO-based system for adaptive data transfer pipelines, published at the INDIS workshop at SC25, and FastBioDL, an adaptive genomic downloader that achieves up to 4× speedup over existing tools. Prior to graduate school, he earned his B.Sc. in Computer Science and Engineering from Bangladesh University of Engineering and Technology and worked as a Software Engineer at Chaldal, a Y Combinator-backed company. He is a recipient of the IEEE CS Richard E. Merwin Scholarship and the IEEE TCHPC travel grant.

Meeting Name

2026 - Miners Solving for Tomorrow Research Conference

Department(s)

Computer Science

Comments

Advisor: Md Arifuzzaman, marifuzzaman@mst.edu

Document Type

Presentation

Document Version

Final Version

File Type

text

Language(s)

English

Rights

© 2026 The Authors, All rights reserved

Swargo_Slides.pdf (1353 kB)

Share

COinS
 
Apr 2nd, 2:00 PM Apr 2nd, 2:30 PM

Adaptive Parallel Downloader for Large Genomic Datasets

Havener Center, Carver/Turner Room, 1:30pm-3:30pm

Modern next-generation sequencing (NGS) projects routinely generate terabytes of data that researchers download from public repositories such as SRA and ENA. Existing download tools typically employ static concurrency settings, leading to inefficient bandwidth utilization and prolonged download times under dynamic network conditions. We introduce FastBioDL, a parallel downloader for large biological datasets with an adaptive concurrency controller. FastBioDL frames downloading as an online optimization problem, using a utility function and gradient descent to adjust the number of concurrent socket streams during runtime. This approach maximizes throughput while minimizing resource overhead. Evaluations on public genomic datasets show that FastBioDL achieves up to 4x speedup over state-of-the-art tools, and in high-speed network experiments, it is up to 2.1x faster than existing approaches. By optimizing standard HTTP and FTP downloads entirely on the client side, FastBioDL provides an efficient solution for large-scale genomic data acquisition without requiring specialized commercial software or protocols.