About the Role
We are looking for a Machine Learning Operations Engineer who develop, deploy and maintain systems for advanced machine learning that include data ingestion and transformation, labeling, experimentation, distributed training, deployment, monitoring and management. You will help build out, modify, upgrade and maintain our end to end machine learning platforms, and enforce isolation and security between customer environments. You’ll work closely with service partners and the wider team to ensure that models move in a maintainable, repeatable way from research to production.
Implement training and inference pipelines in collaboration with a wide range of stakeholders and cross-functional teams.
Productionalize deep learning models while ensuring that business SLAs, including security requirements, are adhered to
Build tooling and pipelining abstractions to allow other team members to focus on experimentation while empowering self-service workflows to deploy and serve models reliably and consistently.
Build, deploy, modify, and upgrade end-to-end MLOps platforms that cover all aspects of advanced ingestion, labeling, training, deployment and management of models
Assist the data team with interfaces between the data platform and MLOPs platforms
Provide infrastructure and tooling to make building and training models faster, easier, and more repeatable
3+ years experience in DevOps, IT and/or MLOps
Strong programming knowledge in Python and/or Go
Strong experience with GPUs for machine learning or other high performance compute
Hands-on experience with ML frameworks, tools and libraries
Well-versed in data structures, data modeling, and database management systems as well as object and file storage systems.
Experience with defining infrastructure as code
Experience with model validation, model training, and other aspects of evaluating an ML system
Experience with continuous integration and continuous deployment tooling
Experience in building traditional code tests like unit and integration testing
Experience with Git, Containers, networking and deployment and automation