top of page

Machine Learning Ops

Worldwide

Full time

Remote

About the Role

We are looking for a Machine Learning Operations Engineer who develop, deploy and maintain systems for advanced machine learning that include data ingestion and transformation, labeling, experimentation, distributed training, deployment, monitoring and management.  You will help build out, modify, upgrade and maintain our end to end machine learning platforms, and enforce isolation and security between customer environments.  You’ll work closely with service partners and the wider team to ensure that models move in a maintainable, repeatable way from research to production.  

Responsibilities:  

  • Implement training and inference pipelines in collaboration with a wide range of stakeholders and cross-functional teams. 

  • Productionalize deep learning models while ensuring that business SLAs, including security requirements, are adhered to

  • Build tooling and pipelining abstractions to allow other team members to focus on experimentation while empowering self-service workflows to deploy and serve models reliably and consistently.

  • Build, deploy, modify, and upgrade end-to-end MLOps platforms that cover all aspects of advanced ingestion, labeling, training, deployment and management of models

  • Assist the data team with interfaces between the data platform and MLOPs platforms 

  • Provide infrastructure and tooling to make building and training models faster, easier, and more repeatable

 

Qualifications: 

  • 3+ years experience in DevOps, IT and/or MLOps

  • Strong programming knowledge in Python and/or Go

  • Strong experience with GPUs for machine learning or other high performance compute

  • Hands-on experience with ML frameworks, tools and libraries

  • Well-versed in data structures, data modeling, and database management systems as well as object and file storage systems.

  • Experience with defining infrastructure as code

  • Experience with model validation, model training, and other aspects of evaluating an ML system 

  • Experience with continuous integration and continuous deployment tooling

  • Experience in building traditional code tests like unit and integration testing

  • Experience with Git, Containers, networking and deployment and automation 

Please let the company know you found this position on Jobdai to support us!

bottom of page