Operating Systems, Distributed Computing, Cloud Systems and Bioinformatics.
H-Synthesizer: Analyzing Large-scale Sequence Data In The Cloud (MS Thesis) PDF version
Stargate: Inter-cluster Storage Integration System. A system for integrating remote data storage systems and running Hadoop computations against. To provide efficient data transfer between remote clusters, Stargate used recipe based data transfer and various caches (web cache/cluster cache) smartly. Stargate provided Hadoop filesystem interface and scheduled transfer order so that Hadoop computations worked efficiently against data at remote clusters. Used Java, Hadoop. Project page
H-Syndicate: Hadoop FileSystem implementation of Syndicate. A software for integrating Syndicate cloud filesystem and Hadoop. This software enabled Hadoop computations to be able to access Syndicate Volume, and provides efficient access to data by controlling task assignment of Hadoop using the presence and locations of Syndicate local cache. Used Java, C, Hadoop. Project page
BioSpectra: Metagenomic Read Classification System. A classification system for metagenomics reads using k-mer. BioSpectra allowed k-mer searching against an index of k-mers from known reference sequences, scoring and classifying using Lucene search engine. Used Java, Lucene. Project page
iPlant Border Message Server: iPlant Data Store Event Notification System. An event notification server system of iPlant Data Store. All filesystem events genereated by iPlant Data Store are transferred to this system then, classified and routed to users. Used C, RabbitMQ. Project page
iRODS-FUSE (4.1, 4.2): A FUSE Client of iRODS FileSystem. iRODS-FUSE client allowed users to mount iRODS storage onto local directory hierarchy. It was re-designed from scratch due to issues at inconsistent content and cache management of existing iRODS-FUSE (3.2). The new design solved these issues and provided better read and write performance (26% reduction of read time and 88% of write time compared to 3.2). The new implementation was accepted by iRODS consortium and contained from iRODS 4.1.4 release. Further enhancement for read and write performance (68% reduction of read time and 91% of write time compared to 3.2) also was accepted for iRODS 4.2 release. Used C. Project page (iRODS FUSE Development), Project page (iRODS)
H-iRODS: Hadoop FileSystem implementation of iRODS. A software for integrating iRODS and Hadoop. This software enabled Hadoop computations to be able to access iRODS. This software was also designed to work at a situation that iRODS and Hadoop clusters are remote and connected through WAN. To efficiently handle large intermediate outputs of MapReduce computation, this software used HDFS as a temporary storage and transfer only final dataset to iRODS system through WAN. Used Java, Hadoop. Project page