AI Evaluation Engineer job opportunity at Weekday AI.

_{2026-01-23T07:12:19.631Z} bot

Weekday AI AI Evaluation Engineer

Experience: General

Pattern: Full-time

Country:

Apply Now

Salary:

Status:

Job

Copy Link Report

General

Pune....India

This role is for one of the Weekday's clients We are seeking an AI Evaluation Engineer to evaluate, validate, and ensure the quality of AI/ML systems working with complex, real-world data. This role focuses on assessing component mapping, retrieval-augmented generation (RAG) based Q&A systems, and feature extraction from structured and unstructured sources such as repair records, catalogs, free-text inputs, and technical documentation. This is a hands-on engineering role centered on designing custom evaluation frameworks, datasets, and automated pipelines (including LLM-as-a-judge approaches) to measure quality, detect regressions, and support release readiness. While domain training will be provided, strong ownership in building evaluation intuition and maintaining high-quality test datasets is essential. Key Responsibilities AI Evaluation & Quality Assurance Evaluate ML and LLM outputs using defined metrics, benchmarks, and acceptance criteria. Design and maintain automated evaluation pipelines to assess model accuracy, consistency, and reliability. Develop and own high-quality evaluation datasets, golden test cases, and benchmarks. Testing & Release Validation Execute evaluation-driven smoke tests and regression tests prior to releases. Track quality metrics and provide clear go/no-go signals for production deployments. Detect regressions and unexpected model behavior across releases and data changes. Analysis & Insights Analyze evaluation results to identify trends, inconsistencies, and failure patterns. Provide actionable insights to improve model performance and system behavior. System & API Validation Validate AI services at the API level for correctness, robustness, and stability. Monitor system performance, latency, and error rates under production-like workloads. Cross-Functional Collaboration Work closely with ML, backend, and product teams to define expected AI behavior. Ensure evaluation coverage aligns with real-world use cases and business requirements. Skills & Experience Core Skills Strong proficiency in Python for evaluation scripting and automation. Solid understanding of Machine Learning and AI systems , including LLM-based workflows. Experience with data analysis to interpret evaluation metrics and model outputs. Nice to Have Experience with LLM evaluation frameworks or LLM-as-a-judge techniques. Familiarity with RAG pipelines, NLP systems, or large-scale data processing. Experience building CI/CD-style evaluation or testing pipelines for AI systems. Skills Python · Machine Learning · Artificial Intelligence · Data Analytics

Other Ai Matches

STEM PhDs (Engineering) AI Trainer Applicants are expected to have a solid experience in handling Job related tasks

Network & Hardware QA Engineer Applicants are expected to have a solid experience in handling Job related tasks

Internal audit Applicants are expected to have a solid experience in handling Job related tasks

Finance Associate Applicants are expected to have a solid experience in handling Job related tasks

Paid Content Strategist Applicants are expected to have a solid experience in handling Job related tasks

SAP Integrated Business Planning Applicants are expected to have a solid experience in handling Job related tasks

Founding Engineer (Mid to Staff Level) Applicants are expected to have a solid experience in handling Job related tasks

Founding Fullstack Engineer Applicants are expected to have a solid experience in handling Job related tasks

Senior Lead Backend Engineer Applicants are expected to have a solid experience in handling Job related tasks

Build & Integration Engineer Applicants are expected to have a solid experience in handling Job related tasks

Software Engineer Applicants are expected to have a solid experience in handling Job related tasks

Bioinformatics PhDs Applicants are expected to have a solid experience in handling Job related tasks

IoT Developer Applicants are expected to have a solid experience in handling Job related tasks

Product Manager - Product Led Growth & SEO - Fintech Applicants are expected to have a solid experience in handling Job related tasks

Full Stack Engineer Applicants are expected to have a solid experience in handling Job related tasks

Biology Expert (PhD, Master's, or Olympiad Participants) Applicants are expected to have a solid experience in handling Master's, or Olympiad Participants) related tasks

Engagement Management: Business Partnering Applicants are expected to have a solid experience in handling Job related tasks

Fullstack React Native Developer Applicants are expected to have a solid experience in handling Job related tasks

Performance Rewards and Compensation & Benefits (Senior Team Member SME) Applicants are expected to have a solid experience in handling Job related tasks

Sales Development Representative Applicants are expected to have a solid experience in handling Job related tasks

Corporate Account Manager Applicants are expected to have a solid experience in handling Job related tasks

Python Developer Applicants are expected to have a solid experience in handling Job related tasks

Founding Tech Lead Applicants are expected to have a solid experience in handling Job related tasks

AI Evaluation Engineer job opportunity at Weekday AI.

Saved Jobs

No Job Saved

Other Ai Matches