BALA
Data Engineer
EXPERIENCE SUMMARY
- 9 years of experience in implementing Complete Big Data solutions, including data acquisition, storage, transformation, analytics using Big Data technology that includes Hadoop, Hive, Spark, Python, Sqoop, PL SQL and Informatica.
- Building complete data ingestion (ETL) pipeline from traditional databases and
file systems into Hadoop using Hive, spark, Python, Pyspark, Sqoop, and SFTP/SCP
- Experienced with different Relational databases like Oracle, SQL Server.
Extensive SQL experience in querying, data extraction and data transformations.
- Involved in Spark query tuning and performance optimization
- Experienced in creating UNIX shell scripting for Batch jobs.
- Experienced in developing business reports by writing complex SQL queries using views, volatile and global temporary tables.
- Identifying long running queries, scripts, Spool space issues etc..., implementing appropriate tuning methods.
- Reporting errors in Error tables to client, rectifying known errors and running the scripts.
- Following the given standard approaches while restarting and error handling.
- Worked with Explain Command and identified Join strategies, issues and bottlenecks.
- Written Unit Test cases and submitted Unit test results as per the quality process.
- Strong Problem solving & Communication skills and ability to handle multiple projects and able to work in teams or individually.
- cloud exposure to AWS, AZURE and GCP
Professional Experience:
- Working as a Associate Technical Lead in Confidential, Noida from Sep-2021 to Till date
- Working as a Data engineer in Fragma Data Systems, Bangalore from Jul-2020 to Aug-2021
- Worked as a Sr Software Developer in HCL, Bangalore from Mar-2019 to Jul-2020
- Worked as a Sr Software Developer in ACCENTURE, Bangalore from Nov-2013 to Mar-2019
Technical skills:
- Hadoop, Hive, HDFS, Sqoop, Pyspark, Spark SQL
- Spark, PL/SQL, Informatica, Talend, UNIX Shell scripting
- Python, Redshift, Oracle PL/SQL and Sql Server
- Windows, UNIX and Linux
- Agile/Scrum, JIRA
- Basic Experience and understanding in AI/MLOPS
PROJECT #1:
Project : StitcherX
Client : Stitcher
Environment : Spark SQL, Python, Pyspark, Talend, ELK, Airflow,
Role : Data Engineer
Duration : (Sep-2021 to till date)
Brief description of the project:
StitcherX project mainly gets the ingested data from the Bigquery to redshift using talend jobs.Once the data is available in the redshift will be curated for the business requirement.all the services are running on aws and for scheduling the jobs we are using airflow.
Responsible for:
- Created talend jobs to ingest the Bigquery tables data to redshift.
- Involved in creating Spark jobs for data transformation, aggregations using
Python in Glue.
- Worked on the airflow dags to schedule the jobs.
- Worked on creating Data frames using Spark.
- Understanding the specification and analyzed data according to client
Requirement.
Involved in Unit testing and preparing test cases
PROJECT #2:
Project : Nitro advanced analytics
Client : Mashreq Bank
Environment : Spark SQL, Python, Hive, Pyspark, Oracle and SQL server
Role : Big data Developer
Subject Area : Banking & Finance
Duration : (Jul-2020 to Aug-2021)
Brief description of the project:
The nitro project is to inje