LangSmith Evaluations

The following sections help you create datasets, run evaluations, and analyze results:

Evaluation concepts

Review core terminology and concepts to understand how evaluations work in LangSmith.

Create and manage datasets for evaluation through the UI or SDK.

Evaluate your applications with different evaluators and techniques to measure quality.

View and analyze evaluation results, compare experiments, filter data, and export findings.

Gather human feedback through annotation queues and inline annotation on outputs.

Learn by following step-by-step tutorials, from simple chatbots to complex agent evaluations.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.