Data Science graduate student with 1.7 years of full-time experience in field of Data services at Infosys Limited. Served as Student coordinator at Akuva Infotech. Skills in python, Data Science, Machine Learning and Cloud Computing . Various certifications from coursera, Udemy, Udacity and an IBM digital badge. Interested in solving business problems using data driven approach.
Learning is my passion and I believe learning curve never ends throughout the life. My key skill areas are Data Science, Machine Learning and Computational Intelligence. I love playing around python programming.
I believe data science is about telling story and bringing out hidden secrets. I have created a series of blogs conveying what I have learnt through medium blogs - https://medium.com/@rnagara1
As a data scientist I come up with problem statement, get deeper into the data and come up with key insights to assist the decision makers. I prefer to use Tableau software to make the visualizations simple and easy to understand. Please visit my tableau profile to find my latest visualizations https://public.tableau.com/profile/rajath.nag.nagaraj#!/
I love to contribute to open source python community through packages here is the link to my recent pypi python package - https://pypi.org/manage/project/Multiple-dummies/release/2.0/
I recently added one more skill to my bucket I learnt CSS and Java script, I also launched a basic dashboard to Heroku which is a salesforce platform - https://census5.herokuapp.com/.
To know more about me please visit my webpage https://rajat1995.app/
-
Experience
Infosys Jan 2018 - July 2019.
Project: American Family Insurance - Cloud Data Lake on Amazon web services (AWS).
Development of Python based tools for data quality check from on-premise Hadoop servers to cloud AWS and thereby reducing the time for validation to 1/3rd of usual time. The tool had a record time of 4mins to valuate 1million data.
Closely working with onsite team, solving the issues on data type changes and timestamp related issues. Creating spark data frames in order to analyze differences in data like performing data profiling.
Working with AWS EMR service to access the large data sets from cloud, creating tables from S3 buckets in order to support data analytics team. Also, I worked on Dynamo DB extracting the complex JSON and parsing them to CSV, also as part of project creating table and loading data to hive.
Developed a tool to review the data migrated to snowflake cloud, the tool resulted in time reduction for the investigation and it could compare 1 million data under 4mins.
Creating project report on day to day basis using SAS, Microsoft Excel and present the same to the onsite lead.
-
Projects
Disaster Message Response - ETL and ML pipeline with Web Application.
May 2020 – Jun 2020
Project descriptionThe message dataset contains pre labeled messages from real life disaster events. This Project builds a Naural Language Processing (NLP) model to categorize the messages.
Link to code :- https://github.com/Rajath1995/Diseaster-Response-Pipline-
The project is completed in several steps:
Step 1:- Data Processing
This step involves building an ETL pipline to process data, clean the data and save them into a sql db.
Step 2:- ML Model
This steps builds a machine learning model utlizing Natural language processing and Multioutput classifer.
Step 3:- Building the webapp
This steps utilizes Flask to host the application on the web.
The necessary packages to run to run this application (Python)
SQLLITE
SQLALCHEMY
PANDAS
NUMPY
NLTK
SKLEARN
NLTK PUNKT, WORDNET AND
ITERTOOLS.
Executing the project:
Step1 : ETL script to process data and save the data into database db:-
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
Step2 : Run the Machine Learning script to build the model and save the classifier as pickle file:-
python models/train_classifier.py data/disaster_response_db.db models/classifier.pkl
Step3 : Bring up the web application:-
python run.py
and check the below link for the web applicaiton to be displayed.
http://localhost:3001/
Authors Rajath Nagaraj Masters in Data Science.
Acknowledgements
Figure Eight for providing the relevant dataset to train the model