Programming Skills: SSIS, SQL, Java, Python, Machine Learning, NLP, Statistical Modeling, Deep Learning, Neural Networks, R, Scala, Apache Spark
Tools: MS Excel, Microsoft SQL server, PostgreSQL, MySQL, Jupyter, SQL Workbench, Amazon S3, Tableau, PowerBI, Pentaho, SharePoint, GitHub, Snowflake Cloud Data Platform, Jira, Confluence, MicroStrategy
-
Experience
(SSIS, SQL, Jira, Confluence, MicroStrategy, Snowflake Cloud Data Platform, AWS)
• Assist Analysts and application administrators with projects and initiatives related to BI and analytics using SSIS and SQL on feeds related to sales and aftersales of vehicles.
• Analyze and create SSIS packages and resolve any kind of job failures effectively under minimal supervision.
• Assist on the MARS project which involves migration of the current database environment to Snowflake Cloud Data Platform.
• Documentation of existing applications and packages as well as conduct monthly sprint planning and daily scrum using Jira and Confluence.
• Work with PCNA IT DBA to complete standard reporting using MicroStrategy.
-
Projects
Image Recognition using Keras and TensorFlow in Python
(Python, CNN, Keras, TensorFlow)
Aug 2020- Sept 2020
• Created a CNN model using Keras and TensorFlow in Python. Accuracy at Stage 01= 71.5%
• Used data augmentation to increase the validation accuracy to nearly 80%.
• Applied Transfer Learning using VGG16, an advanced Image Recognition model architecture improving the model accuracy to 96.1%.
Fake Job Posting Prediction
(R, KNN, Logistic Regression, SVM, NLP)
Feb 2020 – Apr 2020
• Predicted fraudulent job postings with a dataset of 17880 records.
• Performed statistical modelling and used supervised machine-learning algorithms like KNN, Regression Analysis and SVM model using R. Highest accuracy achieved using KNN Model of 96.41%.
• Feature extraction functions and NLP in R to identify the key traits and frequently appearing words in the fraudulent job posts.
COVID-19 Data Analysis and Visualization
(SQL Workbench, ETL, Pentaho, PowerBI)
Feb 2020 – Apr 2020
• Sourced data from multiple sources, performed data preprocessing, created ETL jobs and transformations using Pentaho tool.
• Performed complex SQL queries on the aggregated dataset and created a data model using SQL Workbench.
• Created dashboards for the recovered count, death count, positive/negative tested based on population and location using PowerBI tool.
Cluster Read-Write and Analysis using Apache Spark
(Apache Spark, GitHub)
Feb 2020 – Apr 2020
• Objective is to increase the level of parallelism and decrease the runtime through Spark.
• Reading, extracting of data using filters, writing the dataset into csv and parquet files using Apache Spark.
• Performed cluster analysis on the extracted data using PySpark using concepts of partition and repartition.