Information Technology professional with 2 years of experience in Data Analytics, Data Warehousing, Data Modeling, ETL, Business Intelligence and Data Science Concepts. Possess solid experience in data mining, data integration, data cleansing and proficiency in providing end-to-end business intelligence solution by configuring Metadata, building Analytical Repository & Dashboards to the Business users with accurate results.
-
Experience
Prospect Medical Holdings, Orange, CA, USA July 2019-Present
BI Developer | Data Analytics
PMH provides Healthcare Services which emphasizes coordination of preventive Care and population health management.
● Analyze, design and develop scalable ETL packages from high-volume, high-dimensionality data from various Health Plans to populate databases and create aggregates for large scale data sets of all the response files in time to process data.
● Supporting and following Data Management function like data extract, transformation, loading, integration in support of enterprise data infrastructures - data warehouse, operational data stores and master data management.
● Create data pipelines for automated dashboards to monitor key indicators for senior management leading to accurate measurement of effectiveness of Encounters and improved business processes on Timeliness and Completeness. Perform deep dive analysis, segmentation, slicing & dicing of results to distill actionable insights from large scale data.
● Transformed raw data into MySQL with Custom-made ETL applications to prepare unruly data for reporting and analytics.
● Create Power BI dashboards for End-End reconciliation of Encounters Data & tracking data drop off points viewed by PMS.
Dehaze Inc, Seattle, WA, USA - Data Scientist Intern May 2018 – Dec 2018
Dehaze is an AI-powered job-hunting tool designed to match the applicant to the respective hiring manager using complete automation. Built Unsupervised and supervised models using classification algorithms using Decision Trees, Random Forest.
● Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
Smart Bridge Ltd., Hyderabad, India. Nov 2016 –April 2017
Data Analyst Intern
• Built machine learning models using Python based framework Scikit-learn. Performed data analysis and data profiling using SQL on various sources systems including SQL Server.
-
Projects
NLP Classification Textual Analytics Scoring and Validation for Principal
● Principal is a company which uses integrated Text Analytics Service to score snippets and filter usefulness of information extracted through documents. Performed feature engineering on the dataset using Natural Language methods like Bag of Words (TF-IDF) and Topic Modelling. Built Logistic Regression, Random Forest and XGradient Boosting models.
● Performed machine learning tests and built visualization for the results using Power BI.
https://app.powerbi.com/view?r=eyJrIjoiMDQ5ZTY0MjgtNGY3NS00NzA2LTg1Y2YtMjkyOTM1OTI2YzY5IiwidCI6IjY4MDkzN2ZjLTI4YTEtNDE2Ni04YTY2LTdiYzE3NDBjN2EwMiIsImMiOjJ9
Malware Data Collection and Machine Learning Analysis
● Collected various classes of Malware and classified into separate categories as a source for the data extraction. Used Linux containers and ran the collected malware to capture hardware performance counters using perf tool.
● This whole process has been automated using python. Performed data cleaning and feature engineering on the collected data. Used the dataset to run various machine learning models and evaluate performances. Choose logistic regression as the final model to predict the likelihood of a program being malware.
Impact of Immigration Policy In 2016 Elections
● Performed extensive Multivariate data analysis on 2016 Congressional Election survey data.Built a Logistic Regression model to figure out the extent to which Immigration as an issue mattered to individuals who had voted for Obama in the 2012 elections but switched their vote to Trump in the 2016 elections.
Image Classification (Neural Networks, Machine Learning Classifiers, Gradient Descent, NumPy)
● Built models using – KNNs, Adaboost, Neural Networks (using no libraries but NumPy) techniques for predicting the correctimageorientation of a real test dataset of 1000 images from Flickr.
● Achieved 75% accuracy by training simple neural nets with optimal learning rate using Gradient Descent algorithm.
Analysis of Global Commodity Statistics using Azure and Python (Azure, cloud, Seaborn, matplotlib, Altair, Pandas)
● Performed detailed exploratory analysis of the trends in imports and exports from each country between 1980 to 2016.
● Discerned valuable insights from the very large data set stored over Azure cloud.