Description

Key Accountabilities

Definition and execution of testing and quality assurance strategies for AI‑enabled workflows
Continuous evaluation and monitoring of system behavior in production environments
Contribution to auditability, risk management, and continuous quality improvement

Principal Responsibilities

Define quality criteria and testing strategies for agent workflows, covering accuracy, latency, safety, compliance, and operational risk
Build automated evaluation harnesses to assess agent performance, including hallucination rates, tool misuse, policy violations, and task success
Implement continuous production monitoring to detect anomalies, quality degradation, and emerging safety concerns
Develop and maintain automated test suites using Playwright for UI testing and custom scripts for API and workflow validation
Apply LLM evaluation frameworks to assess output quality, regression, and system drift over time
Produce and maintain dashboards and reports that communicate quality metrics and trends to engineering and stakeholders
Develop and maintain runbooks for common failure modes and contribute to incident response activities
Collaborate closely with developers to improve prompts, tool definitions, and workflow designs based on test results
Ensure testing, logging, and monitoring practices align with data privacy, audit, and regulatory requirements

Qualifications

Knowledge, Skills & Experience

Essential

Minimum 3 years’ experience in QA, test automation, or DevOps roles (or 2 years with direct experience testing AI or ML‑enabled systems)
Strong Python skills for test automation, evaluation harnesses, and basic data analysis
High attention to detail, with a focus on issues that materially impact reliability and user trust
Comfort working with evolving tools, frameworks, and testing practices
Collaborative mindset, using evidence‑based insights to influence product and engineering decisions

Technical Skills (Required)

Programming: Python (test automation, evaluation harnesses, data analysis)
UI Automation: Playwright (end‑to‑end workflow testing)
AI Evaluation: Deepeval, RAGAS, Evidently.AI (LLM quality, drift, and regression analysis)
Workflow Testing: API and agent workflow validation using custom scripts
Monitoring: Production quality monitoring and anomaly detection

Desirable

Pytest or equivalent testing frameworks
SQL for querying logs, metrics, or evaluation datasets
Prometheus, Grafana, or similar monitoring tools
Familiarity with hallucination detection and AI safety patterns
CI/CD pipelines and Git‑based workflows

WTW is an Equal Opportunity Employer

主动联系

任何未经请求主动通过我们的网站或韦莱韬悦员工的个人电子邮件帐户提交的简历/应聘者资料，均视为韦莱韬悦的财产，且无需支付代理费用。要成为韦莱韬悦的授权招聘机构/猎头公司，此类机构必须持有由韦莱韬悦授权招聘人员签署的正式书面协议，并与公司保持积极的工作关系。简历必须按照我们的应聘者提交流程进行提交，包括积极参与特定职位的搜索工作。同样，对于我们授权的招聘机构/猎头公司，如果未能遵守应聘者提交流程，韦莱韬悦将不支付任何代理费用。韦莱韬悦是提倡机会均等的雇主。如果您希望我们保存您的联系信息以便将来考虑，请发送电子邮件至：Agency.inquiries@willistowerswatson.com 。

我们的办事处

我们的员工为全球 140 多个国家和市场提供服务。这为我们所做的每一项工作注入了全球视野，同时也能够为您创造许多绝佳的合作机遇与成长空间。探索下面的地图，探索您的职业发展可能。

AI Automation QA

AI Automation QA

Description

Qualifications

其他人还看过

Analista de Clave de Emergencia

Southeast Large Account Leader

BISO for R&B and Corporate Platforms

HR Services - Japan

主动联系

我们的办事处

查看职位

WTW Careers AI Agent