Self-motivated individual, coming with a solid computer engineering background, excellent programming skills in Python and R, and an ability to clearly communicate complex and technical information.
-
Experience
Assistant System Engineer at Tata Consultancy Services, Mumbai (July 2017 - May 2019)
Technology stack: JavaScript, HTML, CSS, C#, SQL
• Developed over 100 web forms using JavaScript, C#, and HTML for data collection and analysis of information of 50,000+ users of Aditya Birla Group’s (client) HRMS portal.
• Managed client’s database comprising of over 1,000 tables, and configured SQL scripts for automated update and insertion of data generated by over 100 web forms.
• Collaborated with team of developers, business analysts, and technical support to determine optimal specifications and solutions as per the business requirements.
-
Projects
1. Sentiment Analysis of Hotel Reviews using Recurrent Neural Networks
- Technologies used: Python (torch, torch.nn, torch.optim, torchtext, argparse, numpy, pandas)
- Implemented and tuned 3 variants of Recurrent Neural Networks - namely, vanilla RNN, GRU, and LSTM for sentiment analysis of 100,000 reviews of over 1,400 hotels across Europe. The models are implemented using PyTorch and uses GloVe’s 100-dimensional representation for words. Additionally, the models efficiently handle reviews of variable length by padding the shorter reviews in a batch. Accuracy of 91%, 92%, and 95%on test data is achieved with vanilla RNN, GRU, and LSTM respectively.
2. CIFAR-10 Image Classification with Pretrained ResNet-18
- Technologies used: Python (torch, torch.nn, torch.optim, torchvision, argparse, numpy)
- Four different neural networks -- namely, simple Softmax, 2-layer NN, 1 Convolution, and 4 Convolutions are implemented in this project using PyTorch to classify images into 10 classes. The aforementioned models are attached to the ResNet18 model by removing the last fully-connected layer. Extensive hyperparameter tuning has been performed and accuracy of 76%, 84%, 84%, and 87% correspondingly was achieved.
3. Target Marketing for Paralyzed Veterans of America (PVA)
- Technologies used: R (tidyverse, ggplot2, recipes, caret, ROSE, ranger, pROC, glmnet, gbm)
- Developed two different models for calculating the likelihood of response of a user and estimating their respective donation amount from a highly imbalanced dataset with 487 variables and over 95,000 observations. Algorithms like Random Forest, LASSO regression, and Ridge regression were used on an under sampled training set of 50/50 proportion and their scores were calibrated to account for different baseline rates of the minority class. Principal Component Analysis was also performed to reduce the number of variables to 80.
4. NYC Temperature Forecasting
- Technologies used: Python (statsmodels, matplotlib, pandas, numpy, scikit-learn)
- Implemented timeseries forecasting model based on Simple Moving Average, Simple Exponential Smoothing, Holt's Linear model, and Holt's Winter. The dataset consists of about 45,000 observations. Model performance is compared using RMSE of predicted data. Holt’s Winter had the lowest RMSE of 9.85 (Fahrenheit).
5. Human Activity Recognition
- Technologies used: Python (pandas, numpy, scikit-learn)
- Used scikit-learn’s Principal Component Analysis to reduce the number of attributes in the dataset from 561 to 17 attributes to classify user activities into 6 categories. Compared the performance of Decision Trees, SVM, and K-NN classifiers to find the best solution for the classification problem. SVM achieved the highest accuracy of 91.97%
More projects at: https://github.com/ankit19sinha