Recent graduate from The University of Texas at Dallas in the field of Business Analytics. I have an overall 2 years of experience in the Analytics industry ranging from DevOps to Big data, Database, Machine Learning. I am looking forward to pursue a career in Analytics driven roles.
SKILLS
• The ability to develop reliable, maintainable, efficient code in most of SQL, Java, and Python
• Python, JAVA, HTML, Spring, SQL Server, Hadoop, Pyspark
• Version control platforms (Git and Maven)
• Big Data, Spark, PySpark, Databricks, Hadoop
• Strong knowledge of various data warehousing methodologies and data modeling concepts
• Strong in Statistical analysis, Advanced Excel, supervised and unsupervised learning algorithms
• Proficient and hands-on practice in Tableau desktop application
-
Experience
Tata Consultancy Services
Data Engineer Intern - July’19-Aug’19
• Closely worked on integrated data hub to produce daily, weekly, and monthly extracts to business owners
• Interacted with the business owners to define and implement new requirements for a more scalable data extraction
• Accomplished feature enhancement by writing complex Hive queries to fetch data as per business needs
• Identified and reported data discrepancies, conducted detailed weekly meetings with business and proactively participated in team meetings with manager
• Strong understanding of UNIX/Linux and Hadoop framework
Fractal Analytics, Mumbai, MH
Consultant/Analyst - Sep’16-Jul’18
• Proficiency in developing web crawlers (also known as bots) using JAVA framework to scrape information about Philips’ product details from the top online retail websites such as Amazon, Walmart, Target, and many more
• Strong knowledge in designing REST APIs for client software application that helped optimize incident resolution; this solution increased the operational efficiency of resolving tickets by 43%
• Experienced in data wrangling, managing large Oracle databases and wrote complex SQL queries for seamless data extraction
• Implemented Agile stories and methodologies using Rally to address web crawler development backlogs. Tracked and followed up with the clients internationally for deployment of the same.
• Advocated and conducted monthly training programs of the client tools across 9+ countries involving regional managers and many newly hired individuals
-
Projects
Deep Learning – DeepArtist Jan’20 – Mar’20
• Developed an algorithm that helps identify the artist when provided the data on painting in image format
• Executed Convolutional Neural Network on pre-trained ImageNet weights and trained the model with help of image data augmentation to model the slight changes observed in the artforms
• Developed a model using ResNet50 (neural network) that identifies artists from painting with approx. accuracy of 85%
Tableau Visualization – Suicides in India Jan’20 – Mar’20
• Tested, Cleaned, and Standardized Data to meet the standards using Execute SQL task, Conditional Split, and Data Conversion
• Employed geographical graph to identify the states majorly affected with suicides
• Created action filters, parameters, and calculated sets for preparing dashboards and worksheets in Tableau
• Created butterfly chart to project the means of suicide among genders
• Created bar charts to understand the driving factors for suicides such as social, health and family reasons
Big data – Amazon Fine Food Reviews Sentiment analysis (UT Dallas) Mar’19 – Apr’19
• Performed text mining on 0.5 million text data of Amazon prime food reviews to make it presentable and ready for text analysis
• Utilized term frequency – inverse document frequency, and count vectorizer methods to retain significant information from the data
• Performed exploratory data analysis on the different user profiles with highest least reviews
• Categorized reviews into good and bad and showcased their impact using word clouds
• Demonstrated sentiment analysis using PySpark on Databricks platform and resulted in 83% prediction accuracy using support vector machine learning algorithm
Machine Learning – Mushroom data classification (UT Dallas) Feb’19-Mar’19
• Implemented data manipulation, pre-processing (scaling & transformation) followed by principle component analysis to overcome larger dimensions and obtained less components depicting enough variation in the original data
• Assessed confusion matrix to identify type-II error as the model evaluation strategy to reduce false classification
• Studied the impact on test and train data using the train test accuracy curve plot
• Incorporated supervised algorithms like support vector machine, decision trees, boosting algorithm and achieved an overall accuracy of 92% using decision tree