I am a proactive and dedicated professional with 2+ years of software engineering experience and strong programming and mathematical skills, leveraging my educational background in Computer Engineering to achieve a high level of efficiency and improvement results in cloud-based and interactive consumer products using data science and machine learning techniques. I am seeking a data scientist position with a team-focused company providing solutions to real-life problems to continue to grow in the field while increasing the quality and experience satisfaction of customers.
-
Experience
• Used Agile methodology (daily scrum meeting, product backlog, sprint backlog and retrospective) to streamline time management, reduce task redundancy and increase team collaboration.
o Improved team throughput by 40%, reduced dependencies and identified risks by 60% through sprint planning.
• Developed an automation framework that incorporated coding best practices to the effect of maintaining uniform code standards for the whole team, as well as enabling code reusability.
o Helped team effectiveness by avoiding hard coding, easing scalability and maintenance.
o Increased team productivity potential by ~12 hours per week.
o Increased test coverage from 56% to 97%.
• Improved various SQL queries by tuning the joins, wild char searches, and views which reduced the response times of search screens and load times of display screens by 40% to 45%.
• Managed cost, utilization and security by setting up alarms such as CloudWatch (AWS) to initiate preventative action when metric values exceeded thresholds, decreasing response time and increasing resolution rates.
• Applied proof of concept techniques (POC) to identify and initialize application improvement objectives
• Developed Gradle scripts for automated deployment of applications and worked with Jenkins for continuous integration and build automation which increased speed of deployment by 70%.
-
Projects
1. Project Title: NapIT Sep. 2017 - Dec. 2017
Skills Used: Machine Learning, Android, Software Engineering, Feature Engineering, Data Pre-processing, Content-based Recommender System, Predictive Modeling, Data Visualization and Web Scraping
Project Details: NapIT is a health recommender application which enables a user to track the status of their health and provides suggestions on how to improve their health. This software also adds value by giving predictions on whether the user is likely to get any diseases like obesity, high cholesterol, heart diseases etc. The data for this project was collected using:
1. Mobile Phone’s Accelerometer: Sleep tracking was implemented using the user’s mobile phone or
wearable device while kept on the bed during sleep and measures movement. From movement tracked, conclusions are drawn on the nature of sleep (ie: if the user moves more, it is inferred that the user’s sleep is very light).
2. User Interface: The mobile app has a User Interface (UI) where the user can input data like age, height, weight etc.
3. Step Counter: A step counter was implemented as part of the application to calculate the number of steps taken, miles walked and calories burnt as well as recording the duration of activity.
Originally, 3 algorithms were applied separately with low accuracy. To enhance, we implemented a voting classifier made up of SVM, Logistic Regression and Decision Tree as base classifiers, which yielded an accuracy rating of 97.8%
2. Project Title: Food Borne Illness Analysis Jan. 2017 - May 2017
Skills Used: Machine Learning, Data Visualization, Data Wrangling (standardization, normalization), Dimensionality Reduction and Hypothesis Testing
Project Details: The goal of Foodborne Illness Analysis is to analyze: a) food sources with high-risk illness causing potential b) species which spread illness c) regions and states where spread of illness is prevalent d) locations such as hospitals, prisons and offices with high-risk potential/outbreak concentrations and e) seasons corresponding to significant increases of species growth and spread of illness.
Principal Component Analysis was used to identify the significant features to be used in the model to enhance performance of subsequent data samples. Using Logistic Regression and Random Forests machine learning models, accuracy ratings achieved were 90.2% and 94.6% respectively.