C H A N D R A S H E K H A R
D A T A E N G I N E E R
Programming language: Python,
I wish to join a growing, dynamic and professional organization, where I can provide solutions and help growing the organization. Looking to continue my achievements with your organization by leveraging my high-quality skills.
Pyspark
Hadoop Ecosystem: HDFS, Sqoop, Hive, Spark-SQL
Data-warehouse: Snowflake Monitoring Tool: AWS Cloud watch IDE: PyCharm
Cloud Technologies: AWS(S3, Glue, Lambda, Sagemaker, Step Function) Relational databases: MySQL, Oracle Visualization Tools: Power BI
Pravara Rural Engineering collage,Loni Pune University
2016 - 2019 with CGPA 8.20
Overall, 3+ years of extensive experience in different domains in the IT industry and Mainly 2 years in Hadoop Ecosystem Development and PySpark and Cloud service (Amazon Web Services).
Good experience with Data Ingestion using Spark Transformations, SparkSQL.
Having a hands-on experience on Amazon web services mainly S3 Bucket, AWS Lambda, AWS Glue, AWS Cloud watch, AWS Step functions
Having experience in all stages of the project including requirements gathering, designing & documenting, development, performance optimization, data extraction, cleaning and reporting
Very Good understanding of Partitioning, bucketing concepts in Hive and designed both Internal and xternal tables in Hive to optimize performance.
Experience processing large amounts of structured and unstructured data, including Integrating data from multiple sources.
Confidential
MAY 2020 to Present
P E R S O N A L D E T A I L S
Data Migration
Marital status: Unmarried Gender: Male
L A N G U A G E
English Hindi Marathi
The Objective of this project is to migrate the client’s data according to the required formats and file types. Perform ETL and Data Processing on the same and push the processed data to snowflake. This data is then feed to different applications and teams.
Attaining a daily call with team members and discussing the task on a particular sprint
In every sprint we have assigned tasks that we are completing as per timeline
Apply the technical knowledge in pyspark to solve day to day tasks
Creating data frames from different file formats.
import data from S3 raw bucket and do the cleaning and masking in glue and move to snowflake by using PySpark Using different transformations in pyspark and spark sql to process the data.
Installing required libraries in the cluster based on task Optimizing code performance by using various optimizations as required
Writing the processed data to required destinations snowflake in various file formats
ENVIRONMENT: Pyspark, Spark, AWS S3, AWS Glue, Snowflake
PROJECT NAME : Sentiment Analysis Application
Sentiment analysis refers to identifying as well as classifying the sentiments that are expressed. E-Mails Data are often useful in generating a vast amount of sentiment data upon analysis. These data are useful in understanding the opinion of the people about a variety of topics
Designed and implemented a data pipeline that collect and processes data from S3 bucket, performs data cleaning and transformation using pyspark in sagemaker
Daily data Cleaning and monitoring the data pipeline Installing required libraries use Dockerfile to build a custom Docker image that includes those libraries
Implemented AWS Step Function gives docker image path and trigger by lambda function to automate and orchestrate the amazon Sage Maker related tasks such as publishing data to s3.
Enhancement of the already developed application and support if there any issues or job failure
ENVIRONMENT: AWS S3, AWS Lambda, IAM, PySpark, Cloud Watch, Docker, AWS Sagemaker, AWS Step function
Copyright© Cosette Network Private Limited All Rights Reserved