Data Engineer (AWS k8s python LakeFormation)
We are looking for a Data Engineer to support a data-driven approach, as we explore data from multiple disparate sources to discover patterns and hidden insights.
We’re looking for candidates who thrive in professional, strategic and innovative user-focused environments, and who are committed to delivering quality products and/or services. The candidate must be able to think independently, analyze, troubleshoot and resolve complex business and technical problems.
You will work with very large, complex, cloud-based scientific datasets and help identify and answer questions that will shape scientific studies. The work that you do will contribute to improving the health of people across the nation.
You will be responsible for developing, optimizing and maintaining code and data pipelines with the goal of improving the health of people across the nation.
- Design, code, test, debug, document, and implement changes to new and existing data pipelines
- Ensure code is high quality through automated tests, production monitoring, alerting and fixing the issues quickly.
- Ensure code is self tested through a CI pipeline, can be deployed using Kubernetes and can be managed by the development team (through logs and alerts)
- Managing production releases Understand and track operating costs of production code and be able to take corrective action
- Handle critical production issues.
- Guiding, assisting and training remote developers and contractors on the project.
- Evaluating cloud based and open source solutions for complexity and costs
- Developing working PoCs along with architects.
- Collaborate with the Data Analysts, Application Developers, Report Developers, Architects, Product Manager and other stakeholders to provide visibility into the data layer and related pipelines
- Identify opportunities to leverage data and create innovative solutions
- Keep up-to-date with latest technology trends
- Work as a team member in a highly creative and collaborative environment
- Manage your own time and work well both independently and as part of a team
- Bachelor’s Degree with 8+ years of experience or equivalent
- Extensive, advanced knowledge of Python 3, Java
- Experience with using AWS based solutions (e.g EMR, Glue) for data management
- Experience with Kubernetes and Kubernetes based tools like Helm.
- Experience deploying data lake solutions like Hadoop, Hive, Spark, Databricks
- Experience with Test Driven Development and continuous integration
- Expertise with SQL
- Proven track record of working independently to deliver high quality code
- Strong analytical, verbal and written communications skills
- Must be a critical thinker and self-starter
- Ability to work in fast-paced nature of a high-growth organization