Projects

My Portfolio—both professional and personal

Visit my LinkedIn profile

Personal Projects

Patchwork

Summer 2019

Check it out!

I created a website that is inspired by my AP Studio Art concentration portfolio (some pieces are located on my photography page), which looked at the geometry and symmetry of nature. You can create works of art by shuffling around areas of a photo, which can be imported from your own device or selected from one of my photos! This is my first introduction to front-end development, learning javascript, html, and css coding. Hope you give it a try!


Experience

STEP intern

Summer 2020, Google

Filtering Cost Effective Ad Requests Using Early Delivery Stack Features

Summer 2019, OpenX

The goal of this project was to assign a monetization probability to each ad request using its gateway features in hopes to not process those requests that will not monetize. Using Tensorflow 2.0, I implemented and tuned machine learning models, such as logistic regression, neural networks, and boosted trees, striving to mimic and even outperform the performance of Google Cloud Platform's AutoML Tables. This prototype model provided proof of concept to preserve 99% of the revenue while removing over 6% of the total ad requests, enabling the company to save over $100,000 per year. This project was handed off to the rest of the data science team and is in the process of productionization.

Adjusting Protein Levels for Genotype to Improve the Accuracy of Multiple Sclerosis Diagnostic Tests

Summer 2017-2018, National Institutes of Health

Abstract

In the summer of 2017, I was an intern in the Neuroimmunological Diseases Unit at the National Institute of Neurological Disorders and Stroke under the supervision of Dr. Bielekova. My project improved a Multiple Sclerosis molecular diagnostic test through R (a statistical programming language) and PLINK (a genome data analysis toolset) by adjusting the protein levels based on genotype. I curated genotype data related to eQTLs (expression quantitative trait loci) for over 400 individuals and ran simulations to determine the effect of standard deviation, sample size, and minor allele frequency on the power of the approach. These simulations substantiated that reducing the noise in the protein data due to allelic differences improves the model's accuracy by up to 30%. In August, I presented at the NIH Summer Poster Day and won Honorable Mention.

Throughout senior year, I worked as a volunteer for the lab to further expand on my research, applying my methods from the last summer to optimize a random-forest model that predicted the rate of progression for Multiple Sclerosis. The summer after senior year, 2018, I returned to the NIH, implementing my own machine learning models. Both the simulations and the real patient data demonstrated an improvement in the model, better anticipating the progression of the disease. In August, I again presented at the Summer Poster Day and won an Honorable Mention award.

Blood Based Cancer Diagnosis Using Methylation Data from Circulating Cell-free DNAs

Summer 2015-2017, University of California, Los Angeles

In 2015, I began my two-year project of a blood-based cancer diagnosis using methylation data from circulating cell-free DNA (cfDNA). I designed a computational method to estimate the fraction of tumor-derived cfDNA by using a maximum likelihood approach and beta distributions to model cfDNA in blood samples as a mixture of normal and cancer DNA. By running over 200 simulations, I investigated the effects of sequencing depth and the fraction of tumor-derived cfDNA and predicted liver cancer status. I was named an international finalist in the Dongrun-Yau Science Award and presented my findings at the Ventura County Science Fair, regional JSHS, International Chinese Statistical Association (ICSA) Applied Statistics Symposium, and the Joint Statistical Meeting. I published my findings in New Advances in Statistics as first author.