Overview
We’re looking for an AI Engineer with hands-on experience building LLM-powered tools, agents, or infrastructure. You’ll work at the heart of Puch’s multilingual AI assistant — a system built to serve millions of users across India, in 11 languages, through interfaces like WhatsApp, voice, and web.
You’ll own and ship core pieces of our AI stack: from model fine-tuning and evaluation to retrieval systems and tool-calling workflows. If you've built LLM workflows, autonomous agents, or real-world NLP tools — and want to solve meaningful challenges in language, scale, and reliability — this is your playground.
Responsibilities
- Build and deploy end-to-end LLM pipelines — including data cleaning, training, fine-tuning, and fast inference.
- Architect multilingual and multimodal systems that operate across text, voice, and images.
- Implement and optimize retrieval-augmented generation (RAG), embeddings, and hybrid search pipelines using vector databases.
- Extend LLMs with custom tools, APIs, and real-time tool-calling systems.
- Conduct offline and online experiments to improve accuracy, latency, and user satisfaction.
- Contribute to internal tools for evaluation, safety checks, logging, and model observability.
Requirements
- Experience with modern AI/ML libraries (e.g., PyTorch, Hugging Face Transformers, LangChain, OpenAI APIs).
- Deep understanding of LLMs — from pretraining and finetuning to inference patterns and prompt engineering.
- Proven ability to ship production-grade AI systems and workflows end-to-end.
- Familiarity with vector stores and orchestration patterns in agent-based systems.
- Bachelor’s degree in Computer Science, AI/ML, or related fields — or equivalent real-world experience.
Bonus:
- Experience with multilingual NLP or Indian language models.
- Contributions to open-source AI projects, research papers, or demos showcasing LLM capabilities.
- Knowledge of latency-sensitive AI systems or streaming inference optimization.