Extensive experience with Apache Kafka, Apache Flink, and other relevant streaming technologies.
Deep understanding of lakehouse architecture and its implementation in large-scale environments.
Strong knowledge of data semantics, discovery processes, and data governance.
Proven ability to design and implement automated data pipelines and materialized views.
Experience with data subsumption techniques and tools.
Proficiency in Java, Scala, Python, or similar languages.
5+ years of directly applicable experience
What you'll be doing:
Architectural Leadership: Design and enhance our data architecture to support real-time and batch processing with a focus on scalability and fault tolerance using Kafka, Flink, and lakehouse principles.
Data Semantics and Discovery: Implement systems and procedures for effective data semantics management, ensuring data is accurately categorized and easily discoverable.
Pipeline Automation: Develop and maintain automated data pipelines that ensure efficient data flow and processing from multiple sources to our lakehouse architecture.
Data Consumption Optimization: Create strategies and systems for optimal generation of materialized views and data subsumption to ensure that our data architecture remains cutting-edge, minimizes redundancy, and achieves required level of performance.
Cross-functional Collaboration: Work closely with data scientists, business analysts, and other engineering teams to define and refine requirements that drive data solutions.
Innovation and Research: Stay abreast of the latest industry developments in data engineering and propose adoption of new technologies or methodologies to keep our data infrastructure ahead of the curve.