Skills
Python PySparkPandasNumPyDjangoAWS servicesMySQLPostgreSQLRDS Oracle MongoDBAWS Data PipelineKensisDescription
- Over 3.5+ years of extensive hands-on experience in the IT industry, Python, MySQL, Oracle, AWS services, Redshift, ETL pipelines, Machine Learning Algorithms, Deployment, Kafka, Docker.
- Familiarity with cloud technologies such as AWS Lambda, Glue, Athena, Step Functions, EC2, S3, Kinesis, AWS ECS, Cloud Formation, Fargate, DMS Service, ElasticBeanstalk
- Designed and developed data pipelines using various technologies such as Python, Redis, Postgres, MySQL, BigQuery, ETL pipelines, Snowflake, Redshift, AWS Services, Docker, and Kubernetes.
- Ensure scalability, reliability and security of the backend system using tools Redis, celery, Docker, and Kubernetes
- Expertise in Query optimization for redshift, postgres query.
- Hands on experience designing and building data models and data pipelines on Data Warehouse focus and Data Lakes.
- Experience working with Redshift for running data pipelines that have huge volumes.
- Have good experience creating real time data streaming solutions using Spark Streaming and Kafka.
- Experience with MVC framework as Django, Flask frameworks.
- Experience working with Sequence files, ORC, AVR, Parque, CSV, Fixed width and XML formats.
- Good experience working with Python oriented to data manipulation, data wrangling and data analysis using libraries like Pandas, NumPy, Scikit-Learn and Matplotlib.
- Knowledge in Databases like MySQL, PostgreSQL, Oracle and AWS Redshift
- Expertise in creating data pipelines from S3 to Redshift using AWS Data Pipeline for Linkedin social media data.
- Expertise in PySpark SQL to split huge files into smaller files with Transformations and process using warehouse databases.
- Expertise in long running queries optimization in different-2 databases to achieve better performance
- Ensure scalability, reliability and security of the back-end system using tools Redis, celery, Docker
- Hands on experience designing and building data models and data pipelines on Data Warehouse focus and Data Lakes.