GenAI OPS with Langsmith

Artificial Intelligence

Hosted by

Mohammad Arshad

AUG

Thu, 14 Aug

03:30 PM - 04:30 PMfalse

Online

Hey, See you at the event!

Ticket Price$9.99

About

A Hands-on workshop to operationalize GenAI apps with LangSmith: instrument tracing, run quick evals (faithfulness & answer quality), compare prompts/agents, and add basic monitoring before shipping. Ideal for builders working with LangChain/LangGraph who want production confidence without heavy MLOps overhead. Workshop: GenAI Ops with LangSmith (40 mins) Who this is for Data scientists, ML/AI engineers, and builders shipping LLM/RAG or agent workflows. Familiar with Python and LangChain/LangGraph basics (nice to have, not mandatory). What you’ll learn (outcomes) Wire tracing into your LangChain/LangGraph app and read runs like a pro. Build a golden dataset and run evaluations (faithfulness, relevance, and LLM-as-judge). Compare prompts/agents and track experiment results to pick the best variant. Add lightweight monitoring (latency, token usage, error rates) and feedback loops. Ship a simple RAG pipeline with guardrails you can trust. Tools & setup (minimal) Python 3.10+, LangChain/LangGraph, LangSmith account & API key. Any LLM provider key (OpenAI/Anthropic/etc.). A small set of PDFs/markdown files for the RAG demo (we’ll provide sample data). Agenda (40 minutes, fast-paced) Why GenAI Ops (3 min) Risks in LLM apps: regressions, hallucinations, brittle prompts. How LangSmith closes the loop. Instrumentation & Tracing (7 min) Add LangSmith callback to a LangChain/LangGraph app. Live view: spans, inputs/outputs, token & latency metrics, error drill-down. RAG Mini-Build (8 min) Quick ingestion → retrieve → generate. Tag runs with metadata (dataset, prompt version, retriever params). Evaluations & Golden Sets (12 min) Create a dataset (questions + references). Run evals: faithfulness (groundedness), answer quality, optional toxicity. Compare variants (prompt v1 vs v2, retriever k=4 vs k=8). Pick a winner using experiment tables & charts. Monitoring & Guardrails (5 min) Set alerts on error rate/latency. Capture user feedback; close the training loop. Wrap-up & Next Steps (5 min) CI idea: run evals on every PR. How to productionize with LangServe, and what to watch in week 1. Hands-on demo you’ll follow Project: “Docs Q&A” RAG. You’ll do: Add langsmith tracing to the chain/graph. Build a 20–30 sample golden dataset. Run eval experiments and read scorecards. Swap prompts/retriever params; choose the best config with evidence. Enable basic dashboards & alerts. Takeaways A working template (app + eval script). A repeatable method to test, compare, and monitor GenAI systems. Checklists for pre-ship sanity: datasets, evals, tracing, thresholds, alerts. Optional extensions (if time permits / resources provided) Regression suite for agent tools (tool-call correctness). Prompt/version governance with tags & experiment lineage. Cost profiling and latency budgets per route.

Event By

Ask a question

7 people attending

See attendees

Location

Online

This event is part of a community

Artificial Intelligence

11,554 Members

Built with

Start your own business