About

Welcome!

My path into computational science began with a fascination for the elegant machinery of proteins like ATP synthase, nature's own rotary engine. This sparked a central question: how can we use computation to understand and simulate this incredible complexity? I am now dedicated to breaking down the computational bottlenecks that prevent scientists from learning from their largest and noisiest datasets.

As a Computational Scientist, I combine machine learning and high-performance software engineering to build robust tools for drug discovery. I have engineered scalable analysis pipelines for molecular libraries and developed ML frameworks that perform consistently on real-world data, transforming complex research challenges into deployable solutions that accelerate progress.

Projects

DELight

1bna

Faced with the critical bottleneck of extreme class imbalance in billion-compound DNA-Encoded Libraries, where traditional ML models failed, I developed the DELight framework. I designed and benchmarked novel, targeted undersampling strategies to selectively curate training data. The result was a robust solution that significantly improved model generalizability and reliable hit identification, transforming massive, skewed datasets into actionable discovery tools.

 

Technologies used: Python

 

MDANCE

Screenshot of Project 0

Conventionally, the computational bottleneck of analyzing massive molecular datasets and ultra-long simulations using traditional clustering methods was prohibitively slow. My solution was to pioneer a novel, linear-time clustering algorithm, architecting the MDANCE framework to be inherently modular and scalable by design. The result was a groundbreaking 25x speedup, enabling efficient similarity search and pattern recognition in million-scale molecular libraries and making large-scale analysis feasible for the first time.

 

Technologies used: Python

 

PRIME

2k2e

Accurately identifying a protein's true native structure from thousands of simulated conformations is a major bottleneck in structural biology, directly impacting our ability to understand function and design drugs. To solve this, I developed PRIME (Protein Retrieval via Integrative Ensembles), a novel tool that uses extended continuous similarity to intelligently pinpoint the definitive structure from complex ensembles. The result was a breakthrough in accuracy and efficiency: PRIME perfectly mapped all structural motifs across diverse challenging systems, achieving this with unprecedented linear scaling to make high-fidelity structural refinement both reliable and scalable.

 

Technologies used: Python

 

Quantifying Specific Ions Effect

Screenshot of Project 1

The mystery of ion selectivity in electrolyte solutions has puzzled scientists for centuries, with little known about why nature prefers one ion over another, a phenomenon known as the specific ion effect. I tackled this challenge by developing a computational framework that moves from qualitative observation to quantitative prediction. Using conceptual Density Functional Theory (DFT), I created models that capture the essential electronic properties of ions to accurately forecast their behavior. This work provides a principled, physics-based method to finally decipher the rules behind nature's selective preferences.

 

Technologies used: Python

 

MD Starters

Screenshot of Project 1

md-starters provides sample code for preparing and analyzing molecular dynamics simulations, mainly for membrane proteins.

 

Technologies used: Python

 

Technical Skills

  • Architected novel, linear-time clustering algorithms (MDANCE) achieving >25x speedup for million-molecule libraries and MD trajectories.
  • Engineered billion-scale data pipelines (BitBirch, DELight) in Python for chemical space navigation and robust ML model training on imbalanced data.
  • Expert in molecular simulations (AMBER, NAMD) and structure-based drug design (Glide) to investigate protein-ligand interactions and conformational ensembles.
  • Proficient in full research software lifecycle, from development and version control (Git) to creating high-fidelity scientific visualizations for publication.
  • 80% Complete
    Python 80%
    40% Complete (warning)
    R 40%
    20% Complete (danger)
    C++ 20%
    60% Complete (success)
    Bash Script 60%
    40% Complete (danger)
    HTML 40%

    Art

    While my research builds robust computational tools for drug discovery, my artistic practice explores the beauty and complexity of biological systems through a different lens. Click on Sci Art to zoom.