Skills
hadoopsparkpysparkPythonscalaUnix Shell ScriptingClouderahivehbaseDescription
- Around 2.7 years of technical expertise in various domains that include financial, Logistics, with hands-on experience in Big Data Analytics design and development
- Around 2.7 years of relevant experience in Big Data Analytics and data manipulation using Hadoop Ecosystem tools such as MapReduce, HDFS, Yarn, Hive, Impala, HBase, Kudu, Spark,
- PySpark, Spark Streaming, Kafka, Sqoop, Oozie, Parquet, Orc, Kerberos, AWS, Autosys, ITRS, Kibana, AWS Glue, AWS Athena, Lambda, s3
- Rich experience in designing and developing applications in Apache Spark, Scala, Python, Kafka, and Hive with Hadoop Ecosystem
- Strong experience on Hadoop distributions such as Cloudera and Hortonworks
- Hands-on experience in working on Spark RDD, DataFrame, and Dataset API for processing unstructured and structured data
- Efficient in writing live real-time processing and core jobs using Spark Streaming with Kafka as a data pipeline system
- Well-versed in writing multiple jobs using Spark, Spark Streaming, and Hive for data extraction, transformation, and aggregation from multiple file-formats including Parquet, JSON, CSV, and OrcFile and other compressed file formats codecs such as GZIP and Snappy.
- Experienced in working with Kibana for creating Dashboards.
- Experienced in writing Shell Scripting for automation and notifications.