About

Machine Learning for Drug Discovery

My path into computational science began with a fascination for the elegant machinery of proteins like ATP synthase, nature's own rotary engine. This sparked a central question: how can we use computation to understand and simulate this incredible complexity? I am now dedicated to breaking down the computational bottlenecks that prevent scientists from learning from their largest and noisiest datasets.

As a Computational Scientist, I combine machine learning and high-performance software engineering to build robust tools for drug discovery. I have engineered scalable analysis pipelines for molecular libraries and developed ML frameworks that perform consistently on real-world data, transforming complex research challenges into deployable solutions that accelerate progress.

Projects

DELight: ML for DEL screening

1bna

Problem: DNA-encoded libraries (DEL) are extremely imbalanced, causing standard machine learning models to miss true binders.
Built: Targeted undersampling framework to rebalance training data and improve signal detection.
Impact: Increased generalization and hit identification by 5-10% on million-scale compound libraries.

 

MDANCE: Scalable Clustering for Molecular Simulations

Screenshot of Project 0

Problem: Clustering million-scale Molecular Dynamics trajectories is too slow.
Built: Linear-time clustering algorithm and optimized implementation.
Impact: 25x speedup on 1.5 million frames, enabling practical large-scale analysis.

 

PRIME: Native Structure Determination

2k2e

Problem: Accurate protein structural retrieval prediction is limited by data scale and model efficiency.
Built: An algorithm for representative structure selection to identify key structures from molecular dynamics ensembles.
Impact: Achieved perfect recall of critical conformational states with high computational efficiency, enabling rapid analysis for docking and virtual screening pipelines.

 

Technical Skills

  • Architected novel, linear-time clustering algorithms (MDANCE) achieving >25x speedup for million-molecule libraries and MD trajectories.
  • Engineered billion-scale data pipelines (BitBirch, DELight) in Python for chemical space navigation and robust ML model training on imbalanced data.
  • Expert in molecular simulations (AMBER, NAMD) and structure-based drug design (Glide) to investigate protein-ligand interactions and conformational ensembles.
  • Proficient in full research software lifecycle, from development and version control (Git) to creating high-fidelity scientific visualizations for publication.
  • 80% Complete
    Python 80%
    40% Complete (warning)
    R 40%
    20% Complete (danger)
    C++ 20%
    60% Complete (success)
    Bash Script 60%
    40% Complete (danger)
    HTML 40%

    Art

    While my research builds robust computational tools for drug discovery, my artistic practice explores the beauty and complexity of biological systems through a different lens. Click on Sci Art to zoom.