Senior/Staff ML Infrastructure Engineer
Gatik
Who we are:
Gatik, the leader in autonomous middle mile logistics, delivers goods safely and efficiently using its fleet of light & medium-duty trucks. The company focuses on short-haul, B2B logistics for Fortune 500 customers including Kroger, Walmart, Tyson Foods, Loblaw, Pitney Bowes, Georgia-Pacific, and KBX; enabling them to optimize their hub-and-spoke supply chain operations, enhance service levels and product flow across multiple locations while reducing labor costs and meeting an unprecedented expectation for faster deliveries. Gatik’s Class 3-7 autonomous box trucks are commercially deployed in multiple markets including Texas, Arkansas, and Ontario, Canada.
About the role:
We're looking for high-energy, creative, and collaborative candidates who want to work in a fast-paced, execution-oriented team. You will play an essential role in helping accelerate the development and deployment of our AV software stack. This position has a strong technical background, hands-on software engineering experience, and a knack for solving hard problems.
This role is onsite at our Mountain View, CA office.
What you'll do:
- Own development of ML models end-to-end from data strategy, initial development, optimization, production platform validation, and fine-tuning based on metrics and on-road performance
- Lead efficient neural network development including quantization, pruning, sparsification, compression, and novel differentiable compute primitives
- Build the foundation models for the on-vehicle and offline applications; Develop metrics and tools to analyze errors and understand improvements in our systems
- Train and evaluate DNNs for the purpose of benchmarking neural network optimization algorithms – optimizing for latency and power consumption
- Design and implement a horizontally scalable, high-throughput cloud inference pipeline for evaluation and KPI calculation
- Streamline workflows to allow creation of verified, deployable artifacts from annotated data
- Support data preparation for training: building a horizontally scalable data preparation pipeline that is simple to use and doesn't delay training
- Support development of tools for introspection and visualization to understand what is going well and what can be improved in our work
What we're looking for:
- Bachelor's Degree in Computer Science, Machine Learning or relevant field
- Master's Degree with a focus on Machine Learning, Statistics, Optimization or a related field (preferred) or relevant work experience
- 7+ years of experience working with large ML projects and/or building production ML systems
- Excellent C++, Python, and/or CUDA programming skills
- Familiarity with modern machine learning environments such as Pytorch
- Expert experience with optimization techniques from high-level ML algorithms to low-level HW utilization
- Experience in software architecture, system performance, latency, and data flow
- Expert experience in machine learning workflows: data sampling and curation, pre-processing, model training, ablation studies, evaluation, deployment, inference optimization
- Strong analytical skills, especially for performance troubleshooting (e.g. profiling, roofline model)
- Industry experience in building large-scale ML pipelines
- Experience with cloud ML training pipelines in Azure (preferred)
- High Performance Computing experience (preferred)