Skills
PythonPySparkHadoopPandasNumPy SAP Lambda AWS Data PipelineMySQLPostgreSQLAWS RedshiftRedshiftCSS3BootstrapRDS PostgresDescription
- Over 5 years of extensive hands-on experience in the IT industry, Python, PySpark, MySQL, SQL Server, AWS services, Redshift, ETL pipelines, Machine Learning Algorithms, Deployment, Kafka, Docker.
- Designed and developed data pipelines using various technologies such as Python, PySpark, Redis, PostgreSQL, MySQL, ETL pipelines, Redshift, AWS Lambda, AWS Glue and crawler, Docker, and Kubernetes.
- Familiarity with cloud technologies such as AWS EMR, EKS, Lambda, Glue, Kinesis, Fargate, DMS Service, EC2, S3, Elastic Beanstalk, AWS ECS, Cloud Formation, Athena, Step Functions.
- Expertise in creating data pipelines from S3 to Redshift using AWS Data Pipeline for Linkedin social media data.
- Expertise in PySpark SQL to split huge files into smaller files with Transformations and process using warehouse databases.
- Experience working with Sequence files, ORC, AVR, Parque, CSV, Fixed width and XML formats.
- Expertise in long running queries optimization in different-2 databases to achieve better performance
- Ensure scalability, reliability and security of the back-end system using tools Redis, celery, Docker, and Kubernetes
- Hands on experience designing and building data models and data pipelines on Data Warehouse focus and Data Lakes.
- Experience working with Redshift for running data pipelines that has huge volumes.
- Have a good experience inSAP and CRM systems
- Have good experience creating real time data streaming solutions using Spark Streaming, Kafka, Hadoop
- Good experience working with Python oriented to data manipulation, data wrangling and data analysis using libraries like Pandas, NumPy.
- Knowledge in Databases like MySQL, PostgreSQL, Oracle and AWS Redshift.