Job Title: ML Platform Engineer – AI & Data Platforms
ML Platform Engineering & MLOps (Azure-Focused)
- Build and manage end-to-end ML/LLM pipelines on Azure ML using Azure DevOps for CI/CD, testing, and release automation.
- Operationalize LLMs and generative AI solutions (e.g., GPT, LLaMA, Claude) with a focus on automation, security, and scalability.
- Develop and manage infrastructure as code using Terraform, including provisioning compute clusters (e.g., Azure Kubernetes Service, Azure Machine Learning compute), storage, and networking.
- Implement robust model lifecycle management (versioning, monitoring, drift detection) with Azure-native MLOps components.
Infrastructure & Cloud Architecture
- Design highly available and performant serving environments for LLM inference using Azure Kubernetes Service (AKS) and Azure Functions or App Services.
- Build and manage RAG pipelines using vector databases (e.g., Azure Cognitive Search, Redis, FAISS) and orchestrate with tools like LangChain or Semantic Kernel.
- Ensure security, logging, role-based access control (RBAC), and audit trails are implemented consistently across environments.
Automation & CI/CD Pipelines
- Build reusable Azure DevOps pipelines for deploying ML assets (data pre-processing, model training, evaluation, and inference services).
- Use Terraform to automate provisioning of Azure resources, ensuring consistent and compliant environments for data science and engineering teams.
- Integrate automated testing, linting, monitoring, and rollback mechanisms into the ML deployment pipeline.
Collaboration & Enablement
- Work closely with Data Scientists, Cloud Engineers, and Product Teams to deliver production-ready AI features.
- Contribute to solution architecture for real-time and batch AI use cases, including conversational AI, enterprise search, and summarization tools powered by LLMs.
- Provide technical guidance on cost optimization, scalability patterns, and high-availability ML deployments.
Qualifications & Skills
Required Experience
- Bachelor’s or Master’s in Computer Science, Engineering, or a related field.
- 5+ years of experience in ML engineering, MLOps, or platform engineering roles.
- Strong experience deploying machine learning models on Azure using Azure ML and Azure DevOps.
- Proven experience managing infrastructure as code with Terraform in production environments.
Technical Proficiency
- Proficiency in Python (PyTorch, Transformers, LangChain) and Terraform, with scripting experience in Bash or PowerShell.
- Experience with Docker and Kubernetes, especially within Azure (AKS).
- Familiarity with CI/CD principles, model registry, and ML artifact management using Azure ML and Azure DevOps Pipelines.
- Working knowledge of vector databases, caching strategies, and scalable inference architectures.
Soft Skills & Mindset
- Systems thinker who can design, implement, and improve robust, automated ML systems.
- Excellent communication and documentation skills—capable of bridging platform and data science teams.
- Strong problem-solving mindset with a focus on delivery, reliability, and business impact.
Preferred Qualifications
- Experience with LLMOps, prompt orchestration frameworks (LangChain, Semantic Kernel), and open-weight model deployment.
- Exposure to smart buildings, IoT, or edge-AI deployments.
- Understanding of governance, privacy, and compliance concerns in enterprise GenAI use cases.
- Certification in Azure (e.g., Azure Solutions Architect, Azure AI Engineer, Terraform Associate) is a plus.