Actively Looking for Full Time in Software/Data/Cloud Engineering roles starting May 2021 | Ex-Cloud Technology Intern Summer'20 at HBO WarnerMedia | CS Grad Student at NYU
Technologies I am skilled in:
Languages : Python, SQL, Java, C++, HTML, CSS, JavaScript, Perl
Cloud Platforms : AWS (Lambda, EC2, S3, RDS, CloudWatch, API Gateway), Azure (DevOps, Data Factory, Databricks)
Databases : Microsoft SQL Server, PostgreSQL, MySQL, Transact-SQL, MongoDB, Snowflake
Tools : Visual Studio, SSIS, SSMS, Power BI, JIRA, Spark, Hadoop, Git, GitLab, Jupyter Notebook,
REST, Sandbox, Postman, MuleSoft, Anypoint, Jenkins, Grafana, Airflow, Slack
-
Experience
Cloud Technology Intern, HBO WarnerMedia, New York City, NY, USA (Jun 2020 – Aug 2020)
Technology: Python, SQL, AWS (RDS, S3, CloudWatch), Grafana, Airflow, Slack
• Designed custom dashboard using Grafana to monitor daily job runs and provide insights for job failures.
• Developed rule base alert system using python script to notify respective teams and clients about critical 200+ data feeds.
• Integrated Slack notifications to get information for task and job status for different Airflow DAGs executions.
• Analyzed CloudWatch logs to monitor Step functions and Glue jobs for Data Science and Data Engineering teams.
Graduate Software Developer, NYU IT, New York City, NY, USA (Mar 2020 - Present)
Technology: Python, Java, AWS (Lambda, EC2, S3, API Gateway), Snowflake, MuleSoft
• Created custom runtime on AWS Lambda to retrieve data from API Endpoints using Perl script.
• Developed comparison method to calculate delta changes with respect to data present on upstream data sources.
• Constructed REST API endpoints using MuleSoft to provide access to databases as a part of NYU’s COVID Response Team.
• Deployed Cron job to migrate data from S3 bucket to Snowflake Data Warehouse using Lambda and CloudWatch.
Software Engineering Intern, MAQ Software LLC, Mumbai, India (Jan 2019 - May 2019)
Technology: Python, SQL, Azure (DevOps, Data Factory, Databricks), Power BI
• Developed data pipelines with natural language processing model to tag text data with bag of words.
• Deployed incremental pull for 5 million+ records to gain performance and reduced ETL time by 10% using data partitioning.
• Designed Metrics and KPI (Key Performance Indicators) through Power BI to get data insights and visualize data.
-
Projects
Semantic Code Search, CodeSearchNet Challenge (Spring 2020)
Technology: Python, Scikit-learn, Wandb
• Developed sequence-to-sequence LSTM encoders to transfer code documents and search strings into same vector space.
• Implemented TF-IDF search model to identify relevant documents for the given evaluation queries.
• Compared accuracy using Normalized Discounted Cumulative Gain (NDCG) with proposed research work.
New York City Safe Travel Recommender (Fall 2019)
Technology: Python, PySpark, SparkML
• Combined 100K+ Airbnb listings with crime records using PySpark to generate crime over distance ratio.
• Improved rating of new housing which has zero reviews using K-means algorithm with neighborhood of 2 miles.
• Developed nearby restaurants feature using restaurant dataset of New York City with Airbnb listing’s location.
Stock Market Trend Prediction (Spring 2019)
Technology: Python, Scikit-learn, Keras
• Implemented prediction model with input parameters like EMA, MACD to predict the trends in stock price.
• Increased accuracy by 8% compared to previous research using feature correlation and weight optimization.
• Developed a new model which was trained on NSE (National Stock Exchange) dataset of 10 years.