Shubham (RID : 4kv2lq3feb5z)

  • Data Engineer
  • Ahmedabad, India

Rate

₹ 193,000 (Monthly)

Experience

8 Years

Availability

Immediate

Work From

Offsite

Skills

PythonMongoDBHadoop AdminApacheAWSTensorflowOpenCvJupyterTableauSnowflakeRedshiftSQLS3 BucketsDatabricksPower BI

Description

Projects:

  • Data Pipeline and ETL (December 2015 – May 2018)
  • Designation: Data Engineer
  • Client: Technology Company, Portland, ME
  • Role and Accomplishments:
    • Designed and implemented a data pipeline using GCP services like Google Cloud Storage and BigQuery for processing semi-structured data from 100 million raw records across 14 data sources.
    • Integrated data with GCP's Pub/Sub and DataFlow for real-time processing, enhancing the paid conversion rate by 6%.
    • Led the migration of data storage and processing from Oracle to Google BigQuery, resulting in a 14% performance increase and significant cost savings.
    • Developed data pipeline architectures using GCP's Compute Engine and Data Fusion, enabling rapid scaling to handle increased user traffic.

 

  • Data Modeling and ETL Pipeline Creation (June 2018 – August 2019)
  • Designation: Data Engineer
  • Client: Health Care Company, NY
  • Role and Accomplishments:
    • Enhanced web-based EHR by integrating data using GCP tools like Cloud SQL and Cloud Functions.
    • Employed PySpark within the GCP environment, leveraging DataProc for parallel data processing.
    • Used GCP’s Cloud Composer for workflow orchestration, streamlining deployment for visualization and analytics purposes.
  • ETL Process (August 2019 – Feb 2020)
  • Designation: Data Engineer
  • Client: Payment Processing Company, CA
  • Role and Accomplishments:
    • Managed the ingestion of streaming and transactional data using GCP services like DataFlow, Pub/Sub, and BigQuery.
    • Created a custom Python library to parse and format data, integrating it with GCP’s Cloud Functions for efficient data handling.
    • Automated ETL processes with GCP's Data Fusion, significantly reducing manual effort and enhancing data pipeline reliability.
Submit Query icon