Illyoung Choi

Ph.D. Student
Advisor: Dr. John H. Hartman
Office: Gould-Simpson 733E
E-mail:iychoi@email.arizona.edu
Github: https://github.com/iychoi

Biography

I am a Ph.D. candidate in Computer Science at University of Arizona. I received a M.S. in Computer Science from the University of Arizona in 2014 (Advisor John H. Hartman). I received a B.E. in Computer Science and Engineering from Hankuk University of Foreign Studies in 2009 (Advisor Sangyoung Cho).

Research Interests

My research interests are in Distributed Computing, Cloud Storage Systems and BioInformatics.

Research Projects

Stargate: A file-system for scientific computing that enables remote data access between Hadoop Clusters. Stargate provides efficient remote data access by co-locating computation, caching, and data transport. It performs parallel data transfer to make full use of cluster bandwidth. It also provides multi-tiered data caching to improve data reusability and minimize data transfer via a wide-area network (WAN). It provides an integration to Hadoop to be used by existing scientific computing applications.

Syndicate: A general-purpose data delivery platform that harnesses a collection of existing storage components to provide a global, scalable, and secure storage service. These include public and private cloud storage (for data durability), network caches and content distribution networks (for scalable read bandwidth), and local disks (for local read/write performance). Syndicate's goal is to allow applications to access data independent of where it is stored, and do so in a way that both minimizes the operational overhead imposed on users and maximizes the use of commodity infrastructure. website

SDM (Syndicate Dataset Manager): A scientific dataset delivery service using Syndicate. SDM cherrypicked Syndicate's features - data acquisition from external storages, user interfaces and multi-tiered caching. The service provides a simple command-line interface for search, mount and unmount of scientific datasets. SDM provides mounting datasets into a Linux directory hierarchy or a Hadoop directory hierarchy. website

Libra: A comparative metagenomic sequence analytic tool using Hadoop (MapReduce). Considering ever growing metagenomic sequence data volume, Libra is designed to process any size of input. Libra provides several metrics for metagenomic sequence comparisons, such as Cosine-Similarity, Bray-Curtis, and Jenson-Shannon. To calculate these metrics using distributed computing resources efficiently, Libra uses several techniques - a single-pass distance computation by using a scan-line algorithm, total-order partitioning, and histogram-based partitioning. Libra is currently integrated to iMicrobe portal for community access. Hurwitz Lab iMicrobe

iRODS-FUSE: iRODS is an open-source grid file-system used by several science clouds, such as TACC (Texas Advanced Computing Center) and Cyverse. iRODS-FUSE is a client app that allows users to mount iRODS volume into a directory hierarchy. This work was to re-implement the client with a new design to address bugs in the old versions, and to improve I/O performance. The new design implements buffered I/O, in-memory data/metadata cache, parallel data transfer for read and prefetching. The client solved data corruption issues and showed improved read (68%) and write (91%) performance compared to its old implementation (version 3.1). The new implementation was accepted by iRODS consortium and officially included from a release version 4.1.4.

Last updated : April-19-2020