Having diversified skill set and expertise in the field of IT, software, business analytics, data analytics ,data visualization, big data, machine learning, engineering and statistics, I intend to become a part of dynamic organization to acquire an efficient experience where I can work on achieving organizational goals along with the enhancement of my skills.
Technical Skills:
Programming languages: Python, R, SQL, C++, MATLAB, Octave
Analysis tools: R Studio, TensorFlow, Tableau, Advanced Excel
Statistical Techniques: Probability Distributions, Basic Inference Methods, Linear Models, Nonparametric Statistics, Optimization and Simulation Experiments, Regression, Correlation, Hypothesis Testing, Analysis of Variance, Distribution, A/B Testing, DOE, RSM, Regression, ANOVA, Resampling, GLM, Time Series-ARIMA, LTSM, Transfer function
Hadoop: Apache Hadoop architecture & HDFC (Flume, Sqoop, Hive), Map reduce Programming, YARN, Spark Programming & Data frames, Spark SQL
Machine Learning Algorithms: Linear & Logistic Regression, Rule-based decision tree and Random Forests, Model fitting, model selection, Bayesian regression, classification, clustering, Naive Bayes and Discriminant Analysis, k-Means, EM, SVM, Hierarchical clustering, Neural Networks, k-fold cross validation technique, Deep Learning(TensorFlow, Keras), NLP, Computer vision, ASR, KNN, data mining, ID3 algorithm and C5 decision tree for classification and prediction, Association Analysis and Dimension Reduction Techniques, CNN, RNN, GAN, several machine learning/deep learning libraries.
Data Modeling & Data warehousing: SSIS
Operating System: Windows, Linux, MAC.
Git: GitHub, Git.
Thank you for viewing the profile. Feel free to contact me at:
shahrukh.big.data.24@gmail.com
Github: https://github.com/shahrukh-ak
-
Experience
• Implemented and tested a statistical data analysis software module using various technologies including R language, statistical algorithms, and Google Cloud Platform.
• Used Sampling, Resampling, Hypothesis, Testing, Confidence Interval, P-value, Critical value, Confusion Matrix, Z-Test, T-Test, Analysis of Variance, Correlation, Feature Engineering / Feature Selection techniques, A/B testing, etc. using R programming to improve data quality.
• Conducted One-Sample t-test, Welch Two Sample t-test, ANOVA, Paired t-test, 2-sample test for equality of proportions with continuity correction, One sample Chi-squared test for variance, Shapiro-Wilk normality test, One-sample Sign-Test, and made decisions based on p-value obtained from these tests.
• Utilized the values obtained by the correlation methods like Pearson's product-moment correlation, Spearman's rank correlation rho, Kendall's rank correlation tau and conducted different tests based on these values to get the p values and decided that the Null hypothesis should be rejected or not.
• Collaborated and Communicated the results of the analysis to the decision-makers by presenting actionable insights through visualizing the data using a scatterplot, Box-and-whisker plot, Normal QQ Plots, histograms, bar plots.
• Used Cook’s D bar plot and chart to identify the influence diagnostics for each variable, to check the difference in fits, and to detect & display the influential observations.
-
Projects
1) Predicting Sales Prices of houses (Linear Regression):
Examined the dataset of sales prices of houses using linear regression and polynomial regression, predicted the results and achieved an accuracy of 93% in the results.
2) Predicting flight delays by creating a machine learning model in Python:
Imported airline arrival data into a Jupyter notebook and use Pandas to clean it. Then, built a machine learning model with Scikit-Learn and use Matplotlib to visualize the output.
• Used Pandas to clean and prepare data
• Used Scikit-learn to build a machine-learning model
• Used Matplotlib to visualize the output
3) Predicting Taxi Fares with Random Forests:
Utilized the concepts of regression trees and random forests to predict the value of fares and tips, based on location, date, and time.
• Extracted data of 49999 Indiana's taxi trips using R.
• Performed data cleaning.
• Predicted taxi fares using a tree
• Plotted the predicted fare and actual fare
• Got the results of where people spend the most.