About the Role
We are looking for a Spark developer who knows how to fully exploit the potential of our Spark cluster. You will clean, transform, and analyze vast amounts of raw data from various systems using Spark to provide ready-to-use data to our feature developers and business analysts. This involves both ad-hoc requests as well as data pipelines that are embedded in our production environment.
Roles and Responsibilities
Responsible for systems analysis - Design, Coding, Unit Testing and other SDLC activities
Requirement gathering and understanding, Analyze and convert functional requirements into concrete technical tasks and able to provide reasonable effort estimates
Create Scala/Spark jobs for data transformation and aggregation
Produce unit tests for Spark transformations and helper methods
Design data processing pipelines
Work proactively, independently and with global teams to address project requirements and articulate issues/challenges with enough lead time to address project delivery risks
Requirements
10 - 12 Years hands-on experience.
Experience with Apache Spark streaming and batch framework
Scala (with a focus on the functional programming paradigm)
Experience in Azure cloud platform and Data Bricks
Experience with Pyspark
Scalatest, JUnit, Mockito
Spark query tuning and performance optimization
Experience with Mongoldb database
Experience with Kafka, Storm, Zookeeper
Deep understanding of distributed systems (e.g. CAP theorem, partitioning, replication, consistency, and consensus)
Consistently demonstrates clear and concise written and verbal communication
Ability to work in a fast-paced environment both as an individual contributor and a tech lead
Experience in Git