Self-starter, quick learner, passionate and highly motivated individual who believes in competing with myself to complete goal-oriented tasks successfully. Ami has excellent analytical skills and critical thinking with problem-solving approach in a dynamic work environment.
As a Graduate Student at Northeastern University, I strengthened my skill sets on Algorithms, Big-Data technologies, Data Science and Cloud Computing to broaden my horizons. The versatility in these areas helps me to adopt new technologies quickly and be a strong decision maker to solve real-world problems.
As a Data Scientist co-op at Wolters Kluwer, I worked on Natural Language Processing and Deep Learning with ensemble techniques to provide AI-powered solution in healthcare domain. I gained knowledge on many paradigms of Data Engineering, Machine/Deep Learning and BERT text encoding technique to build and deploy models using Python, Flask, Microsoft Azure and Docker.
With almost 3 years of experience as a Software Engineer, I was involved in full software development life-cycle with expertise in automated application integration and deployment in an Agile environment. I was primarily responsible for system integration using SOAP and RESTful services in Java and SQL and maintaining synchronous and asynchronous communication via SOAP over HTTP/JMS or TCP/IP. I have also worked to build an anomaly detection model in Python by extracting big-data from various sources and performed data analysis and visualization for reporting the KPI’s and metrics in customer engagement for TIBCO products.
SKILL-SET:
Languages/Technologies: Python, SQL, Bash, Java, HTML, CSS, REST/SOAP Web Services
Machine Learning Skills: Regression, Classification, Ensemble Learning, Time-Series Analysis, NLP, Deep Learning
Statistical Skills: Hypothesis Test, Confidence Interval, Bias-Variance, Resampling, Subset Selection
Libraries/Frameworks: Keras, TensorFlow, Scikit-learn, Apache Spark, Apache Hadoop, Hive, Pig, Spring Boot
Database: MySQL, SQL Server, Oracle 11g, PostgreSQL, MongoDB, HBase
Cloud Technologies: AWS (CloudFormation, IAM, S3, EC2, Glue, Athena, RDS, SageMaker), Microsoft Azure
Other Tools: Tableau, Advance Excel, Docker, Flask, Jenkins, CircleCI, Jira, Git, SVN, Confluence
-
Experience
Data Scientist Co-op, Wolters Kluwer, Waltham, MA, USA May 2019 – Dec 2019
• Researched and implemented Bert-as-a-Service to convert text to BERT vectors as part of feature extraction and engineering
• Computed correlation coefficient and statistical data distributions to analyze and visualize the hidden relationships in Python
• Compared TFIDF and BERT techniques for a data mapping project on SNOMED International codes and implemented neural network architecture to classify into map types that gave significant performance increase with 76% precision, 64% recall
• Introduced concept of Bayesian implementation of hyper-parameter tuning using Hyper-Opt to implement across multiple projects • Built SVM, CNN, LSTM and a Random Forest ensemble of all these models and deployed best model in production using Docker
• Optimized entire data pipeline and modeling process by leveraging GPU computation power and data lakes in Microsoft Azure
Graduate Teaching Assistant, Northeastern University, Boston, MA, USA Jan 2019 – Apr 2019, Sep 2019 – Dec 2019
• Conducted workshops and sessions on SQL, Python to extract data from multiple sources and store it in a relational database
• Assisted professor to prepare assignments and exams for a Data Management and Database Design class consisting of 80 students
Associate Consultant, TIBCO Software Inc., Mumbai, MH, India Jan 2018 – Jun 2018
• Devised anomaly detection statistical model in Python using STL decomposition to forecast customer complaints w.r.t. KPI’s
• Applied Loess regression to eliminate non-linear relationships and built reports and dashboards in TIBCO Spotfire for visualization
• Performed time-series and big-data analysis using Python, Hive and Pig to provide statistical insights about the customer data
Software Engineer, Tech Mahindra Ltd., Pune, MH, India Sep 2016 – Jan 2018
• Enhanced a payment processing web service in Java for an Order Management System by adding features using SOA architecture
• Implemented Jeopardy Management System to manage multiple order’s request with optimized SQL configuration scripts in Oracle
• Built microservice for Fraud Management service to send automated emails to users using AWS Lambda, SNS, SES, CodeDeploy
• Implemented and managed CI/CD pipeline using GitHub and Jenkins for the project via shell scripting to deploy EAR packages
Associate Software Engineer, Tech Mahindra Ltd., Pune, MH, India Aug 2015 – Aug 2016
• Designed business logic to translate requirements into process design to develop, integrate and test services using TIBCO EMS
• Created a web service for payment processing module using Java, SQL & TIBCO BW, wrote test cases and deployed via Jenkins
-
Projects
Distributed Cloud Computing in AWS:
• Hosted a fault tolerant web application on AWS and implemented CI/CD pipeline via GitHub, CircleCI & AWS CodeDeploy
• Configured the network architecture on AWS with VPC, Subnets, Internet gateway, NAT, Route table to ensure network security
• Wrote IaC to Automate creation of resources and infrastructure via scripts and CloudFormation in YAML and configured the network architecture on AWS
• Created autoscaling groups to manage EC2 instances by monitoring CPU utilization, logs, metrics and alarms in CloudWatch
• Designed routing policies to map HTTPS traffic on Elastic Load Balancer to HTTP on EC2 and tested performance using JMeter
Bill Tracking Web Application:
• Developed a bill tracking RESTful web application for each user to view, edit and delete the bills to keep track of their expenses
• Integrated a module to add, view and delete file attachments for each bill and incorporated logging and custom exception handling
• Leveraged JPA repository & Hibernate ORM for data persistence in Postgres and tested maven application via JUnit, Postman
• Integrated Spring security with basic authentication and utilized BCrypt hashing scheme for encrypting the user passwords
Parallel and Distributed Big-Data Analytics:
• Extracted data of Amazon reviews on Camera from Amazon S3 bucket using AWS CLI and stored raw data and results in HDFS
• Analyzed it by implementing Hadoop Map-Reduce using secondary sorting, map-reduce chaining, joins and summarization patterns
• Optimized Hadoop jobs using combiner, custom partitioner, map-side joins and binning pattern to improve performance by 55%
• Compared data query latency on big-data for Hadoop Map-Reduce, Hive and Pig and visualized it in a Tableau dashboard
Predictive Analytics for Boston Housing:
• Investigated a comparative study of various modelling techniques for house price prediction using regression and classification
• Performed grid search hyperparameter tuning to get better AUC/MAPE, where Xgboost outperformed to give highest MAPE score for regression (87.43%) and highest AUC score for classification (91%)
Language Modelling for Word Predictor:
• Built a language model that can predict the probability of next word in the sentence based on the words observed in the corpus
• Pre-processed text using NLTK, designed & trained language models with a learned word embedding for LSTM to predict words
• Compared performance of statistical language model using Stupid-Backoff algorithm and neural language model using LSTM