I have 5+ years of experience as software developer in design, development, building ETL pipelines, deploying and supporting large scale distributed systems including extensive experience as Hadoop Developer and Big Data Analyst. I am currently seeking full time role in Data Science, Data Engineering, Data Analysis, ML/AI, Hadoop.
Skills:
Programming : SQL, Python, Pyspark, SAP ABAP on HANA
Hadoop Stack : HDFS, Map Reduce, Pyspark, Spark SQL, RDDs, Hive, Pig, Sqoop, HCatalog, Oozie.
Scripting Languages : Shell Scripting and Python Scripting
DBMS / RDBMS : Oracle 11g, Mysql
Version Control : Git
-
Experience
Extensive experience as Hadoop Developer and Big Data Analyst. Primary technical skills in HDFS, YARN, Pig, Hive, Sqoop, Spark, Oozie.
• Experience in handling data from different sources such as Mainframe (Ebcdic and Ascii), Flat Files, Sequence Files and different relational database systems.
• Knowledge of Pig and Hive’s analytical functions, extending Hive and Pig core functionality by writing custom UDFs.
• Experience in importing and exporting Tera bytes of data between HDFS and Relational Database Systems (Teradata, SQL Server, Oracle) using Sqoop.
• Experience in creating the OOZIE workflows.
• Experience in working with different file formats like Text, Avro and Parquet.
• Familiar with AWS, EMR and S3 storage
• Hands on experience in developing applications using SPARK.
• Experience in creating RDDs, Data Frames and worked with Spark SQL.
• Expertise in preparing the test cases, documenting and performing unit testing and Integration.
• Strong knowledge of Software Development Life Cycle and expertise in detailed design documentation.
• Fast learner with good interpersonal skills, having strong analytical and communication skills
-
Projects
Hadoop Developer, Accenture:
• Involved in sourcing the data from different sources (Mainframe Teradata, flat files etc.) into HDFS. Data is processed monthly and loaded into HDFS and created hive tables to access the data.
• Involved in importing and exporting data from different databases like Oracle, Teradata and SQL Server using Sqoop.
• Involved in creating common components to automate unique key generation process for millions of records and also an update component to process using Spark in Python
• Created a script to automatically trigger any oozie workflows through a wrapper script and schedule the same through Autosys
• Involved in creating the hive tables using different formats.
• Experience in creating AutoSys jobs for scheduling workloads.
• Migrated code of over 200 production Oozie jobs to new platform.
• Responsible for replicating over 500 Tables/Views in new platform.
• Responsible for migrating data to target locations (over 700 tables) in new cluster.