Skills
PythonMongoDBHadoop AdminApacheAWSTensorflowOpenCvJupyterTableauSnowflakeRedshiftSQLS3 BucketsDatabricksPower BIDescription
Projects:
- Data Pipeline and ETL (December 2015 – May 2018)
- Designation: Data Engineer
- Client: Technology Company, Portland, ME
- Role and Accomplishments:
- Designed and implemented a data pipeline using GCP services like Google Cloud Storage and BigQuery for processing semi-structured data from 100 million raw records across 14 data sources.
- Integrated data with GCP's Pub/Sub and DataFlow for real-time processing, enhancing the paid conversion rate by 6%.
- Led the migration of data storage and processing from Oracle to Google BigQuery, resulting in a 14% performance increase and significant cost savings.
- Developed data pipeline architectures using GCP's Compute Engine and Data Fusion, enabling rapid scaling to handle increased user traffic.
- Data Modeling and ETL Pipeline Creation (June 2018 – August 2019)
- Designation: Data Engineer
- Client: Health Care Company, NY
- Role and Accomplishments:
- Enhanced web-based EHR by integrating data using GCP tools like Cloud SQL and Cloud Functions.
- Employed PySpark within the GCP environment, leveraging DataProc for parallel data processing.
- Used GCP’s Cloud Composer for workflow orchestration, streamlining deployment for visualization and analytics purposes.
- ETL Process (August 2019 – Feb 2020)
- Designation: Data Engineer
- Client: Payment Processing Company, CA
- Role and Accomplishments:
- Managed the ingestion of streaming and transactional data using GCP services like DataFlow, Pub/Sub, and BigQuery.
- Created a custom Python library to parse and format data, integrating it with GCP’s Cloud Functions for efficient data handling.
- Automated ETL processes with GCP's Data Fusion, significantly reducing manual effort and enhancing data pipeline reliability.