I am fascinated by the computational bottlenecks that prevent scientists from learning from their largest datasets. My research is dedicated to breaking down these barriers by creating the next generation of scientific algorithms.
As a Computational Scientist, I combine principles from machine learning, statistics, and software engineering to build robust tools for drug discovery and biomedicine. I have engineered high-performance software for analyzing molecular libraries at scale and developed machine learning frameworks that remain reliable on noisy, real-world data. I thrive on transforming complex scientific challenges into elegant computational solutions.
Faced with the critical bottleneck of extreme class imbalance in billion-compound DNA-Encoded Libraries, where traditional ML models failed, I developed the DELight framework. I designed and benchmarked novel, targeted undersampling strategies to selectively curate training data. The result was a robust solution that significantly improved model generalizability and reliable hit identification, transforming massive, skewed datasets into actionable discovery tools.
 Technologies used: Python
 Conventionally, the computational bottleneck of analyzing massive molecular datasets and ultra-long simulations using traditional clustering methods was prohibitively slow. My solution was to pioneer a novel, linear-time clustering algorithm, architecting the MDANCE framework to be inherently modular and scalable by design. The result was a groundbreaking 25x speedup, enabling efficient similarity search and pattern recognition in million-scale molecular libraries and making large-scale analysis feasible for the first time.
 Technologies used: Python
 Accurately identifying a protein's true native structure from thousands of simulated conformations is a major bottleneck in structural biology, directly impacting our ability to understand function and design drugs. To solve this, I developed PRIME (Protein Retrieval via Integrative Ensembles), a novel tool that uses extended continuous similarity to intelligently pinpoint the definitive structure from complex ensembles. The result was a breakthrough in accuracy and efficiency: PRIME perfectly mapped all structural motifs across diverse challenging systems, achieving this with unprecedented linear scaling to make high-fidelity structural refinement both reliable and scalable.
 Technologies used: Python
 The mystery of ion selectivity in electrolyte solutions has puzzled scientists for centuries, with little known about why nature prefers one ion over another, a phenomenon known as the specific ion effect. I tackled this challenge by developing a computational framework that moves from qualitative observation to quantitative prediction. Using conceptual Density Functional Theory (DFT), I created models that capture the essential electronic properties of ions to accurately forecast their behavior. This work provides a principled, physics-based method to finally decipher the rules behind nature's selective preferences.
 Technologies used: Python