Sr Machine Learning engineer
About the Role
As part of Uber's AI/ML Platform Team ( Michelangelo), the Machine Learning Training team's mission is to make it really easy to train, tune, and build high-quality models at Uber.
We build our own ML training software stack and solve problems at all layers of the stack including iteration speed, compute efficiency, observability, fault tolerance, and correctness. On top of the core training stack, we build services, libraries, and frameworks e.g. automatic hyper parameter/architecture optimization, to accelerate the model development process. Check out [ 1, 2] for more information.
Our team moves at a fast pace and provides individuals with a high degree of autonomy and agency to affect change. We welcome kind and brilliant people to our team, from wherever they come.
What You'll Do
- Build elastic, scalable, and fault-tolerant distributed machine learning libraries and systems used to power machine learning development productivity across Uber.
- Work closely with engineers in the broader Uber ML/AI Platform Team (Michelangelo) to improve the broader ML Platform ecosystem for our users.
- Work closely with Uber's ML community (with ML Engineers, Data Scientists, and Researchers) to scope and build new abstractions for scalable machine learning.
If you love training frameworks, model development tools, and collaborating on production ML models, this role is for you!
What You'll Need
- Master's or equivalent in Computer Science, Engineering, Mathematics or related field
- At least 5 years of software engineering experience, from which at least 2 years working on Machine Learning systems/platforms/applications
- Experience owning problems end-to-end, with a willingness to pick up whatever knowledge is missing to get the job done.
- Knowledge of Python including corresponding scientific libs (numpy, pandas, pytorch, etc)
- Knowledge of machine learning concepts, techniques and workflows.
Bonus points if you have:
- Experience with ML platforms like Sagemaker, Vertex AI, DataRobot, Dataiku, etc
- Knowledge of Java/Scala data stack
- Experience in building scalable and fault-tolerant distributed systems (e.g. using Spark, Kubernetes, Ray, etc)
- Built and delivered ML models into production
- Contributions to AI frameworks such as PyTorch, TensorFlow, JAX, or XGBoost
- Strong communication and problem-solving skills working with multidisciplinary teams
We welcome people from all backgrounds who seek the opportunity to help build a future where everyone and everything can move independently. If you have the curiosity, passion, and collaborative spirit, work with us, and let’s move the world forward, together.
Offices continue to be central to collaboration and Uber’s cultural identity. Unless formally approved to work fully remotely, Uber expects employees to spend at least half of their work time in their assigned office. For certain roles, such as those based at green-light hubs, employees are expected to be in-office for 100% of their time. Please speak with your recruiter to better understand in-office expectations for this role.