₹ 193,000 (Monthly)

8 Years

Immediate

Offsite

PythonMongoDBHadoop AdminApacheAWSTensorflowOpenCvJupyterTableauSnowflakeRedshiftSQLS3 BucketsDatabricksPower BI

Projects:

Data Pipeline and ETL (December 2015 – May 2018)
Designation: Data Engineer
Client: Technology Company, Portland, ME
Role and Accomplishments:
- Designed and implemented a data pipeline using GCP services like Google Cloud Storage and BigQuery for processing semi-structured data from 100 million raw records across 14 data sources.
- Integrated data with GCP's Pub/Sub and DataFlow for real-time processing, enhancing the paid conversion rate by 6%.
- Led the migration of data storage and processing from Oracle to Google BigQuery, resulting in a 14% performance increase and significant cost savings.
- Developed data pipeline architectures using GCP's Compute Engine and Data Fusion, enabling rapid scaling to handle increased user traffic.

Data Modeling and ETL Pipeline Creation (June 2018 – August 2019)
Designation: Data Engineer
Client: Health Care Company, NY
Role and Accomplishments:
- Enhanced web-based EHR by integrating data using GCP tools like Cloud SQL and Cloud Functions.
- Employed PySpark within the GCP environment, leveraging DataProc for parallel data processing.
- Used GCP’s Cloud Composer for workflow orchestration, streamlining deployment for visualization and analytics purposes.
ETL Process (August 2019 – Feb 2020)
Designation: Data Engineer
Client: Payment Processing Company, CA
Role and Accomplishments:
- Managed the ingestion of streaming and transactional data using GCP services like DataFlow, Pub/Sub, and BigQuery.
- Created a custom Python library to parse and format data, integrating it with GCP’s Cloud Functions for efficient data handling.
- Automated ETL processes with GCP's Data Fusion, significantly reducing manual effort and enhancing data pipeline reliability.