SWAPNIL
DATA ENGINEER
TECHINCAL EXPERTISE:
Professional with 7+ years of experience in Big Data Ecosystems like (HDFS, YARN, Hive, Impala, Spark) and Cloud technologies like (Azure Databricks, Azure Data factory, Azure Data Lake Storage, Azure Synapse) , Python, Informatica, SQL, PowerBI.
- Installation, Configuration, and Administration of Hadoop distributions like Cloudera (CDH)
- Experience in cluster deploying, performance tuning, and administering and monitoring the Hadoop ecosystem
- Knowledge of Hadoop Ecosystem - HDFS, Yarn, MapReduce, Hive, Hue, Sentry, Impala, Zookeeper, Spark.
- Hands on Experience in Cloud Technologies like Azure Databricks, Azure Data Factory, Azure Synapse Analytics, Azure Blob Storage, Azure Data Lake Storage.
- In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames.
- Hands on experience in various Big Data application phases like data ingestion, data analytics.
- Expertise in using Spark-SQL with various data sources like JSON, Parquet, and Hive.
- Experience in using Hadoop distribution like Cloudera, Azure Databricks.
- Experience in transferring data from RDBMS to HDFS and Hive table using Azure Data Factory.
- Experience in creating tables, partitioning, bucketing, loading, and aggregating data using Hive/Impala.
- Uploaded and processed terabytes of data from various structured and semi structured sources into HDFS.
- Worked on Informatica Designer Components, created Tasks, Sessions and Workflow using Workflow manager and Workflow Monitor to monitor the Workflows.
- Extensive experience in extraction, transformation and loading of data directly from different heterogeneous source systems like flat files, Oracle, Netezza.
- Good knowledge of different schemas (Star and Snowflake) to fit reporting query and business analysis requirements.
- Developed and supported Informatica mappings with transformation Filter, Router, Expression, Joiner, Aggregator, Lookup, Union, Sequence generator etc.
- Experience in Slowly Changing Dimensions like SCD1 and SCD2.
- Involved in Technical and Business meetings with internal teams and high-level management.
- Hands on experience in Analysis, Design, Coding and Testing phases of Software Development Life Cycle (SDLC).
- Having good Knowledge of SQL.
- Worked extensively in team-oriented environment with strong, analytical, interpersonal and communication skills.
MAR, 2020 – PRESENT
Project: Barclays Collect Roles and Responsibilities:
- Worked on requirement gathering, analysis and translating business requirements into technical design with Hadoop Ecosystem.
- Extensively used Hive, Spark optimization techniques like Partitioning, Bucketing, Map Join, parallel execution, Broadcast join and Repartitioning.
- Created partitioned, bucketed Hive tables, loaded data into respective partitions at runtime, for quick downstream access.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Analyzed asset matrix (mapping document) and used to develop PySpark projects in PyCharm.
- Python modules are built in PyCharm where the entire PySpark logic is developed to achieve business requirements.
- Used Tivoli to schedule Spark jobs. Also used to trigger spark jobs in both client and cluster mode in lower environments.
Project: SOLAR
Roles and Responsibilities:
- Worked on multiple Hadoop clusters with 150 nodes on Cloudera distribution 5.x, 6.x
- Currently working as Hadoop Administrator and responsible for taking care of everything related to the clusters total of 100+ nodes ranges from Non-PROD to PROD clusters.
- Worked on Kerberized Hadoop cluster.
- Involved in upgrading CM and CDH.
- Worked on the cluster, commissioning & decommissioning of Data Nodes.
- Involved in communication with Cloudera team for cluster tuning.
- Monitoring Hadoop cluster using Cloudera Manager and make sure that all the services are up and running.
- Involved in resolving day to day Incidents and implementing change related activities making sure to follow proper ITIL process.
- Setup data authorization roles for Hive and Impala using Sentry.
- User management, involving user creation, granting permission for the user to various tables and database, giving group permissions.
- Managed and reviewed Hadoop log files, fil