AI agents look magical in demos but often fail in the real world—eroding trust, as seen with Air Canada’s chatbot error or Google Bard’s costly launch slip. This talk introduces a practical playbook for evaluating agents, from frameworks like RAGAS and TruLens to new ideas like Evaluation-Driven Development. The goal: to close the trust gap and shape the AI Quality Movement, where agents are not just impressive but truly reliable.
Abdullah Mansoor
https://www.linkedin.com/in/abdullahmansoor/