AI Evaluation Specialist
7 days ago
Binance is a leading global blockchain ecosystem behind the world's largest cryptocurrency exchange by trading volume and registered users. We are trusted by over 280 million people in 100+ countries for our industry-leading security, user fund transparency, trading engine speed, deep liquidity, and an unmatched portfolio of digital-asset products. Binance offerings range from trading and finance to education, research, payments, institutional services, Web3 features, and more. We leverage the power of digital assets and blockchain to build an inclusive financial ecosystem to advance the freedom of money and improve financial access for people around the world.
As an A.I. Agent Evaluation and Optimisation Specialist, play a critical role in ensuring both the outstanding performance and continuous improvement of large language model (LLM)-driven autonomous agents. Responsibilities span from designing and implementing robust evaluation frameworks to proactively identifying and executing optimisation strategies that enhance reliability, adaptability, and compliance across the agent lifecycle.
Responsibilities:
- Design, Develop & Optimise Evaluation Plans:
- Create structured, risk-aware, and adaptive evaluation and optimisation plans. Align these with user goals, governance requirements, and system architectures. Translate objectives into measurable criteria, scenarios, and optimisation targets.
- Test Suite Development & Performance Tuning:
- Develop and curate tests covering standard, edge, and emergent agent behaviours. Collaborate to generate synthetic data and incorporate domain expertise and use hands-on optimisation techniques to improve agent robustness.
- Multi-Stage Evaluation & Optimisation:
- Execute controlled (offline) and real-world (online) evaluations, assessing not just outputs but also reasoning steps, tool usage, and workflow execution. Identify and resolve performance bottlenecks, drive fine-tuning, and recommend systemic improvements.
- Analyse, Diagnose & Optimise:
- Conduct deep analysis of evaluation results to find performance gaps, failure modes, and optimisation opportunities at both the model and system level. Provide clear, actionable recommendations to directly improve agent efficiency, accuracy, and reliability.
- Drive Continuous Improvement:
- Collaborate closely with development teams to translate evaluation and optimisation findings into runtime adaptations, code performance enhancements, architectural upgrades, and targeted model retraining, including prompt engineering and reinforcement learning from human feedback (RLHF) methodologies.
- Implement Feedback Loops:
- Establish feedback mechanisms that combine human and machine evaluator input for ongoing monitoring, anomaly detection, and dynamic agent behaviour adjustment, integrating optimisation insights into deployment pipelines.
- Ensure Compliance and Safety:
- Maintain up-to-date governance documentation and safety cases, overseeing regulatory, ethical, and operational compliance through both evaluation and optimisation cycles.
- Cross-Functional Collaboration:
- Work with A.I. researchers, engineers, and domain experts to align evaluation and optimisation strategies with product objectives and user needs.
Requirements:
- Bachelor's or Master's degree in Computer Science, Artificial Intelligence, Data Science, or a related field.
- Demonstrated hands-on A.I. agent development experience, with a track record of identifying and implementing agent performance improvements.
- In-depth understanding of large language models (LLMs), their optimisation, and agent system architectures.
- Experience in both A.I. evaluation methodologies (like benchmarking, online/offline analysis) and direct agent optimisation, such as model fine-tuning or prompt design.
- Familiarity with software engineering best practices (e.g. TDD, BDD), and deep exposure to AI-specific frameworks, observability, and lifecycle analytics.
- Proven ability to perform data-driven diagnostics and root cause analysis, with direct contributions to measurable improvement in A.I. agent performance.
- Strong communication skills, especially for documenting evaluation plans, optimisation strategies, result rationales, and technical recommendations.
- Effective teamwork and cross-functional feedback process experience, bridging evaluation, development, and operations.
- Programming skills in Python plus experience with major A.I./ML libraries and APIs, including hands-on development of LLM agents.
Why Binance
- Shape the future with the world's leading blockchain ecosystem
- Collaborate with world-class talent in a user-centric global organization with a flat structure
- Tackle unique, fast-paced projects with autonomy in an innovative environment
- Thrive in a results-driven workplace with opportunities for career growth and continuous learning
- Competitive salary and company benefits
- Work-from-home arrangement (the arrangement may vary depending on the work nature of the business team)
Binance is committed to being an equal opportunity employer. We believe that having a diverse workforce is fundamental to our success.
By submitting a job application, you confirm that you have read and agree to our
Candidate Privacy Notice
.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
-
AI Integration Specialist
3 days ago
Taipei, Taiwan KKCompany Technologies Full time NT$720,000 - NT$1,440,000 per yearTeam Segment : Corporate KKCompany Technologies, Asias leading AI multimedia technology group is dedicated to creating values for customers with core businesses of multimedia technologies, digital cloud, and AI applications. At KKCompany, we believe in Innovation Made Simple, and technology is the answer to the struggles faced by every industry. Since...
-
AI Business Solutions Specialist
3 days ago
Taipei, Taiwan Microsoft Full time NT$900,000 - NT$1,200,000 per yearAs an AI Business Solutions Specialist, you will lead strategic sales engagements that empower Taiwan Enterprise customers to digitally transform their operations using Microsoft's Business Applications portfolio. This includes CRM Sales & Service, Core ERP (Finance & Operations), Power Apps for Low Code/No Code development, and Co-pilot Studio for...
-
AI Business Solutions Specialist
3 days ago
Taipei, Taipei City, Taiwan Microsoft Full time NT$1,200,000 - NT$3,600,000 per yearAs anAI Business Solutions Specialist, you will lead strategic sales engagements that empower Taiwan Enterprise customers to digitally transform their operations using Microsoft's Business Applications portfolio. This includesCRM Sales & Service,Core ERP (Finance & Operations),Power Apps for Low Code/No Code development, andCo-pilot Studiofor building and...
-
Azure Specialist
1 week ago
Taipei, Taiwan Microsoft Full time NT$1,200,000 - NT$2,400,000 per yearAs an Azure Specialist, you are a solution sales leader with deep business expertise and technical knowledge within our enterprise sales organization, working with our most important customers. You will lead a virtual team of sales, technical, and services resources to help customers realize the digital transformation through cloud computing. A core...
-
App Marketing Specialist
3 days ago
Taipei, Taipei City, Taiwan AI Peak Full time NT$600,000 - NT$1,200,000 per yearWe are a tech startup preparing to launch an AI-powered nutrition and recipe app. With just a photo of your meal, the app generates the full recipe, provides calorie and nutrition breakdowns, and helps users live healthier lives. Our main target audiences are the U.S., Canadian, and U.K. markets.We are a team from Taiwan, Japan, India, Ireland, and...
-
AI Investment Partner
1 week ago
Taipei, Taipei City, Taiwan Appier Full time NT$1,200,000 - NT$3,600,000 per yearAppier is a technology company which aims to provide artificial intelligence platforms to help enterprises solve their most challenging business problems. Appier was established in 2012 by a passionate team of computer scientists and engineers with expertise in AI, data analysis, distributed systems, and marketing.About The RoleWe are looking for a highly...
-
Program Manager, VBAT Programs
3 days ago
Taipei, Taipei City, Taiwan Shield AI Full time NT$1,200,000 - NT$3,600,000 per yearFounded in 2015, Shield AI is a venture-backed deep-tech company with the mission of protecting service members and civilians with intelligent systems. Its products include the V-BAT aircraft, Hivemind Enterprise, and the Hivemind Vision product lines. With nine offices and facilities across the U.S., Europe, the Middle East, and the Asia-Pacific, Shield...
-
Digital Marketing Specialist
1 week ago
Taipei, Taipei City, Taiwan PowerArena Full timeWe are seeking a full-time Digital Marketing Specialist with a strong focus on SEO content writing to join the team. This role will support both PowerArena and our sister company MotherApp. You will be responsible for creating and optimizing content that strengthens our online presence, generates inbound leads, and improves our SEO performance in both...
-
AI Engineer
1 week ago
Taipei, Taiwan Trend Micro Full time $90,000 - $120,000 per yearJoin Trend ‧ Join New Generation趨勢科技 - 全球雲端資安領航者 / 全亞洲最大軟體公司 / 企業版圖橫跨五大洲 / 趨勢全球研發基地在台灣 ===============================================================[Overview]Join our team and play a pivotal role in developing advanced AI agents.From data automation to AI applications,...
-
Workflow Automation Specialist
1 day ago
Taipei, Taiwan PicCollage Full time NT$240,000 - NT$720,000 per yearAbout Us: We are a profitable and growing company, originating in Silicon Valley and now headquartered in Taiwan. We combine intuitive design with Creative AI tech to create inspiring products for millions of people worldwide.We offer a fun, creative, and international workplace with competitive compensation, stock options, flexible hybrid work, free...