Aug 2021–TillDate

Description

It is a Big Data Project on Cloud platform. Working on data from multiple data sources. Creating a data lake. Utilize this data lake to generate reports and dashboards as per customer requirements. Standardise the data ingestion by integrating all LOBs’ ingestion methodologies.

Working Environment

PySpark, Azure Databricks, Azure Data Factory, Azure Log Workspace, Azure Monitor, Azure Key Vault, ADLS, Azure DevOps, Sonarqube, Fossa, Great Expectation, Azure SQL Server, Azure Event Hub,

Responsibilities:

- - Integratedifferentmodulesanddesignaflowforendtoendflowofdata.
  - CreateanddeployAWSresourceswithTerraformanddeploythemwithCI/CDpipelines.
  - Orchestrationofthejobusingstepfunctionwiththehelpof SNS,SQS,S3,Dynamodb&Lambda.
  - DevelopingintegrationlogicsinPythontostandardisthepipelineflows.
  - ClientInteractionsandchangedecisionsaspartofagiledevelopment.
  - Workonimplementingarchitecturalchangesacrossthedifferentmodulesofproduct.

Project Name

Confidential (Government)

Role

Data Engineer

Duration

Aug2019–July 2021

Description

It is a Big Data Project on Cloud platform. Working on data from multiple data sources. Creating a data lake. Utilize this data lake to generate reports and dashboards as per customer requirements

Working Environment

Data Science,AWS,Informatica,Hadoop,Tableau,Python,Scala,Spark,Redshift

Responsibilities:

Developapipelinetoreaddatafromvarioussourcesandgetitloadedoncloud.
- - Orchestrationofthejobsusingairflow.
  - CodingtransformationlogicsinPysparktoloadtransformeddata.
  - ClientInteractionsandchangedecisionsaspartofagiledevelopment.

Developingadatamodel

Project Name

TUIFutureMarkets:ContentMatching&PriceComparison(GMP)

Role

Data Engineer

Duration

Jan2019–Jul2019

Description

It is a Big Data Project on Cloud platform. TUI wants their system to be upgraded, enhanced and moved to cloud platform upgrades. We have designed a new pipeline for some new platforms like Global Marketing Platform(GMP) to process their data using Cloud based tools.
DataFlow1:Source(HDFS)àS3àMatillionàSnowflakeàIBMUNICA
DataFlow2:Source(HDFS)àS3àSpark(Qubole)àHive(Qubole)àIBMUNICA

Working Environment

Data Science,AWS,Informatica,Hadoop,Tableau,Python,Scala,Spark

Sagar (RID : 4kv2lpjns7lc)

Rate

Experience

Availability

Work From

Skills

Description

Aug 2021–TillDate

Download the App Now