The practice of measuring a product's usability, performance, or user satisfaction against defined standards, competitor products, or past versions of itself. Benchmarks make improvement measurable over time and help teams determine whether design changes are genuinely moving the needle.
Common contexts
- Establishing SUS scores and task completion rates before a navigation redesign to measure impact afterward
- Comparing checkout flow completion times against two direct competitors using standardized tasks
- Running a quarterly benchmark study to detect usability regressions introduced by new feature additions
Use when
Set benchmarks before any significant redesign or feature launch — you need a documented baseline to prove that the new version actually improved things. Without it, any post-launch improvement claim is anecdotal, and you lose the ability to hold the team accountable to user outcomes.
Avoid when
Don't benchmark when you don't have the resources or commitment to repeat the measurement — a single data point is just a number, not a benchmark. Benchmarking also misleads when the tasks or metrics change between studies, making before-and-after comparisons statistically meaningless.
Benchmarks are most politically powerful not when you show improvement, but when you use them to detect regressions that would otherwise be invisible in a sea of new-feature enthusiasm.
Real-world examples
- Google measures Core Web Vitals (LCP, FID, CLS) as performance benchmarks and ties them to search rankings, prompting teams to hit defined thresholds.
- Nielsen Norman Group's annual UX benchmarking studies measure task success rates and satisfaction scores for major e-commerce and banking sites over time.
- Airbnb's design team benchmarks the booking conversion funnel quarterly, tracking completion rate per step against prior quarters to identify regressions.