An experienced problem solver dedicated to examining and analyzing raw data sets to build optimized predictive analysis models using Statistics and Machine Learning Algorithms to draw insights and provide Business Solutions.
My education and work experience has made me proficient in the following domain:
- Importing and cleaning data from various sources
- Data Visualization and Exploratory Data Analysis.
- Performing Statistical Analysis such as Hypothesis Testing to get a better understanding of data.
- Applying Machine Learning Algorithms to create models to predict outcome.
- Performing Hyperparameter Tuning to improve performance.
- Working with Databases such as SQL, MYSQL, Hadoop, Hive.
Curious, Inquisitive and Analytical. Always willing to learn and grow.
Check my projects at https://github.com/shreyas1262?tab=repositories
• Increased sales 23% by segmenting products according to monthly sales by applying K-means algorithm
• Developed performance management models to calculate KPI measures using SQL to improve revenue by 12%
• Created and tested code to automate saving of bulk data to SQL database using Python reducing time by 27%
• Improved demand forecasting that reduced returns to retail partners by 17% using Python, SQL and Excel
• Improved customer retention by 18% by building Tableau dashboards to conduct trend analysis on retail metrics
Predict Hotel Bookings using User Search Parameters [Python] July 2019
• Achieved accuracy of 71% by building a Logistic Regression model to predict if a user will book a hotel room
• Performed feature selection, engineering using Python, SQL to preprocess data improving accuracy by 15%
• Improved accuracy of classification model by 20% by applying hyperparameter tuning techniques
Predict Goals Scored by Strikers and wingers in Soccer [Python] March 2019
• Achieved a r-square of 0.74 by building a Random Forest regressor to predict number of goals scored by a player
• Performed statistical tests such as ANOVA to determine if number of goals differs according to preferred foot
• Reduced mean absolute error by 30% by applying hyperparameter tuning techniques to the model
Telecom Customer Churn Prediction [Python] January 2019
• Applied PCA and used classification techniques such as Random Forest to predict churn with a precision of 76%
• Engineered features that improved accuracy of the model by 30+% by using one hot encoding
• Created visualizations using Seaborn and Matplotlib to spot trends and created churn report in Tableau
NBA Player of the week Analysis [Hadoop, Spark] November 2018
• Imported data into Hadoop and performed feature selection and engineering to improve accuracy by 30%
• Built a K-means clustering model using Spark MLlib to segment players on attributes to achieve 74% accuracy
• Visualized the attributes using Tableau to understand the parameters of players receiving the award
Lending Club Customer Data Analysis [R, Tableau] March 2018
• Performed pre-processing on Lending Club dataset by treating missing values for analysis using Excel
• Performed K-means clustering using Tableau-R integration to segment customers based on their repayment.