Data Science Enthusiast. Love the idea of playing with data and drawing meaningful insights from it. I'm actively seeking full-time opportunities in the field of data science starting Jan 2021. Areas of interest include machine learning, natural language processing, and data visualization.
Currently an MS student at Columbia University and completed my undergrad in Computer Science. I would love to apply my data science skills in a way that creates a profound impact, drives decision making, and benefits society.
-
Experience
Hindsight Technology Solutions New Jersey, US
Data Science Intern May 2020 - Aug 2020
Project 1: Proposed an Entity ranking algorithm to maximize ad clicks and digital marketing revenue
• Queried entities recognized by Named Entity Recognition (NER) and extracted features from news feeds and blog posts stored in AWS.
• Recommended insights by combining gradient and modified moving average convergence divergence (MACD) to
rank the entity by observing entity’s trend.
Project 2: Implemented Hierarchical classification of articles up-to 4 levels
• Applied Logistic Regression classifier at each parent node and achieved an overall accuracy of 83.7%.
Columbia University New York, NY
Graduate Research Assistant Jun 2020 - Aug 2020
• Performed data wrangling on data containing around 120 million rows using HPC Cluster.
• Applied LDA for topic modeling to identify company reviews containing discriminatory keywords.
• Implemented word embeddings using word2vec, fastText, POS tagging, sentiment analysis using BERT and
association rule mining to rank the companies based on the level of gender and race discrimination.
SportsMechanics India Pvt Ltd Tamil Nadu, IN
Data Science Intern Jan 2019 - May 2019
• Deployed ML model to production using Flask (Random Forest) .
• Identified patterns by employing frequent itemset mining and association rules on cricket data. Github
• Saved 200 man-hours by automating data collection process via web scraping of scorecards using Beautiful Soup.
-
Projects
Classification of Malaria cells using Convolution Neural Networks and Keras
• Built a CNN to classify images of malaria and non-malaria cells. Achieved accuracy of 96%
Yelp dataset recommender system
• Built a recommender system by implementing and comparing results of Deep Learning model and Matrix Factorization
to predict last rating of a user using Yelp dataset.
Predict price of a used vehicle on Craigslist using different Boosting models
• Performed exploratory data analysis (EDA), data pre-processing, feature engineering and model selection along
with hyperparameter tuning.
• Compared results of XGBoost, LightGBM and CatBoost.
Predict wine quality using NLP techniques and Regression Model
• Combined word embeddings of spaCy with Bag-of-words, n-grams, character n-gram, TF-IDF rescaling for
feature engineering.
• Predicted using Ridge regression which gave 77 % accuracy.
Analysis of drug overdose-related deaths: Exploratory and Data Visualization Project
• Formulated 5 actionable insights and created an interactive R Shiny Dashboard on drug abuse data.