Mid to Senior Java/Scala Developer – Open Lakehouse & Apache Spark
Are you a skilled Java/Scala Developer with a passion for distributed systems and large-scale data processing? We’re working with an innovative company at the forefront of open-source data technologies, looking for a Mid to Senior Java/Scala Developer to join their team. This role focuses on building and optimizing distributed applications using Apache Spark and modern
Lakehouse architectures.
While mid to senior-level experience is preferred, we welcome applications from high-achieving Master’s graduates in relevant fields.
Key Responsibilities:
- Software Development: Design, develop, and optimize distributed applications using Java and Scala for large-scale data processing.
- Apache Spark Expertise: Utilize deep knowledge of Spark internals to build efficient, scalable data pipelines.
- Table Formats: Work with open-source table formats like Apache Iceberg, Delta Lake, Apache Hudi, and Apache Xtable.
- Lakehouse Architecture: Implement open Lakehouse solutions, including cataloging systems like Unity Catalog and Polaris Catalog, and manage ML workflows with MLflow.
- Computer Science Fundamentals: Apply expertise in data structures, caching, networking, and databases to build high-performance systems.
- Collaboration: Work closely with data engineers, machine learning engineers, and business stakeholders to develop cutting-edge solutions.
Key Requirements:
- Experience: 3+ years in Java or Scala development (or a Master’s degree in Computer Science or related field).
- Strong CS Knowledge: Solid understanding of data structures, caching strategies, networking, and database management.
- Apache Spark: Deep understanding of Spark execution, query optimization, and distributed data processing.
- Open Source Table Formats: Hands-on experience with Apache Iceberg, Delta Lake, Apache Hudi, or Apache Xtable.
- Open Lakehouse & MLflow: Familiarity with Unity Catalog, Polaris Catalog, and MLflow for data and ML integration.
- Problem-Solving: Ability to troubleshoot and optimize large-scale distributed systems.
Preferred Qualifications:
- Experience with cloud platforms (AWS, GCP, Azure).
- Familiarity with CI/CD pipelines, Git, and GitHub.
- Understanding of machine learning workflows in data pipelines.
If you’re passionate about large-scale data systems and open-source technologies, this role offers a chance to work with an innovative and forward-thinking team.
Interested? Apply now or reach out to learn more!