VENKATESH
Profile Summary:
Having 3.9 years of data engineering experience in handling large datasets.
- Hands-on development and implementation experience in Big Data Management Platform (BMP) using Hadoop, Apache Spark, Hive, Sqoop, Scala, ETL.
- Basic knowledge on cloud environment like Amazon Web Services (AWS) S3, EMR, Glue, Athene, Lambda, Redshift, Snowflake.
- Basic knowledge in Python and PySpark.
- Having good knowledge in importing and exporting data using Sqoop from Relational database Management System to HDFS and vice versa.
- Experience in handling various file formats like CSV, JSON, AVRO, Parquet etc.
- Developed data queries using HQL and optimized the Hive queries and handling SQL and Complex SQL Queries.
- I have actively contributed to the development of robust data pipelines using Hadoop, Hive, Spark, and Scala. By leveraging these powerful technologies, I have successfully constructed efficient and scalable data processing workflows.
- I have practical expertise in designing and constructing Hive external tables, utilizing a shared meta-store stored in MySQL.
- I have hands-on experience with GitHub repository and have successfully executed essential operations such as cloning, committing changes, pulling updates, and pushing modifications.
- In addition to my expertise in wrangling big datasets, I possess advanced proficiency in scheduling jobs using Control-M and conducting thorough monitoring.
- Successfully implemented agile methodology, working in cross-functional Scrum teams.
- Actively participated in Scrum ceremonies, including daily stand-ups, sprint planning, and retrospectives, ensuring effective collaboration and timely project delivery.
Academic Qualification:
- Attained a distinguished Master of Science (M.Sc.) degree from Acharya Nagarjuna University.
Professional Experience:
Currently employed as a Big Data Engineer at Confidential from September 2019 to till date.
Technical Skills:
-
- Big Data Tools : Hadoop, Hive (2.1), Spark (2.4), Sqoop
- Hadoop distribution : Cloudera Distribution Platform (6.3)
- Databases : MySQL
- Cloud Technologies : AWS, EMR, S3, GLUE, Athena, Lambda, Snowflake
- Programming : SQL, Scala (2.11)
- Environment : Windows, Linux
- SDLC Model : Agile Model
Project 2:
Name: Pharmaceutical Production Audit Platform (PPAP)
Role: Data Engineer
Project Description:
This project focuses on developing an effective production management system to address
challenges related to excess production of non-patent, non-exclusive drugs, increasing competition from generic pharmaceuticals, the prevalence of counterfeit drugs, and the sudden emergence of infectious diseases. The client receives raw data from a multitude of internal and external sources, primarily derived from production units. These sources inc