Should I test the whole product or specific tasks?

Specific tasks. Testing the whole product produces vague results. A good task represents a real user goal, has a clear start and end, is achievable in two to ten minutes, and is one of the most common, most strategic, or most problematic. Frame the user goal rather than the system action, provide context for why they are doing it, do not reveal the path, and do not use product terminology in the framing, or you hand the participant the answer.

How do I moderate without leading the participant?

Encourage think-aloud ('what is going through your mind?'), and do not help unless the participant is truly stuck, and even then only after a long pause. Do not lead ('are you looking for the menu?' is a bad prompt). Note where they hesitate, scroll, or backtrack, and note the gap between their language and the product's. The researcher should talk about 20 percent of the time, never defend the design when someone struggles, and never ask participants to predict their future behavior.

How are findings scored?

By severity. Critical issues block task completion and most users hit them; major issues significantly slow the task and many users hit them; minor issues are friction with a workaround that some users hit; cosmetic issues are polish that does not affect the task. Patterns across participants are signal, while a single strong opinion is a weak data point worth investigating but not a finding on its own. Skipping severity scoring leaves the team unable to prioritize, because every finding looks equal.

How is this different from ux-research?

Usability testing tests an existing design or prototype for task completion; ux-research is generative discovery into what is true, before there is a design. Use ux-research upstream to understand users and shape the design, then usability testing to check that the design works. Live conversion optimization belongs to cro-optimization, and mapping the broader experience belongs to journey-mapping.

Skill · Usability testing

Usability testing.

Find the usability problems before users hit them in production.

Plan and run tests that surface usability problems on existing designs or prototypes before they reach production. Five phases carry it: define what to test, choose moderated or unmoderated, recruit, run the test, and synthesize and report. The work tests specific tasks, not the whole product.

This skill tests working or near-working designs. Broader discovery research and live conversion optimization are separate jobs with their own skills.

Audience: designers, researchers, and PMs validating a flow before launch, diagnosing a drop-off that analytics cannot explain, or comparing two design directions.

View the skill on GitHub Browse the full catalog

The framework

Five phases from task to report.

Each phase narrows toward severity-scored findings the team can act on.

01Define what to test: specific tasks, not the whole product. Each task names a real user goal, has a clear start and end, and is the most common, strategic, or problematic.
02Choose moderated or unmoderated: moderated for early prototypes and complex tasks, unmoderated for stable designs and pattern at scale.
03Recruit: match the target audience and a mix of experience and device types, and exclude friends, family, and employees.
04Run the test: brief the participant, encourage think-aloud, help only when they are truly stuck, and note hesitation, backtracking, and language mismatches.
05Synthesize and report: inventory issues, cluster by root cause, score each by severity, recommend fixes, and prioritize by severity and effort.

How to run it well

Test tasks, score severity, re-test fixes.

Test specific tasks rather than the whole product. A good task names a real user goal with a clear start and end, achievable in two to ten minutes, framed as the goal rather than the system action, without revealing the path or using product terminology. 'Find a place to stay' is a task; 'click the search button' gives the answer away.

Choose moderated or unmoderated by stage. Moderated sessions with 5 to 8 participants let the researcher probe in real time, which suits early prototypes and complex tasks; unmoderated tests with 15 to 30 participants catch patterns at scale on stable designs. The moderator talks maybe a fifth of the time, helps only after a long pause, never leads ('are you looking for the menu?' is a bad prompt), and never defends the design when a participant struggles.

Patterns across participants are signal; a single strong opinion is not a finding. Score every issue by severity (critical, major, minor, cosmetic) so the team can prioritize, treat a participant's suggestion as a problem to understand rather than a feature to ship, and re-test after fixes, because a fix that introduces a new problem otherwise goes undetected.

Reference files

The reference that goes alongside the SKILL.md.

references/task-script-patterns.md
Task framing patterns by common product type, with good and bad examples.

Browse all reference files on GitHub

Bridges to other skills

Where testing sits among its neighbors.

Usability testing checks a design. These cover the discovery before it, the experience around it, and the production measures beside it.

Discovery first
ux-research
Generative research answers what is true before there is a design to test. Run it upstream; usability testing checks the design that research helped shape.
The wider experience
journey-mapping
Mapping the broader end-to-end experience is its job. Usability testing zooms into a single flow or touchpoint within that map.
Live conversion
cro-optimization
Optimizing conversion on a live page with real traffic happens there. Usability testing finds the why behind a drop-off before or instead of an A/B test.
The numbers
analytics-strategy
Quantitative measurement is a different instrument. Analytics shows where users drop; usability testing shows why they do.

Open source under MIT

Read the SKILL.md on GitHub.

The skill source lives in the rampstackco/claude-skills repository alongside dozens of other skills covering the full lifecycle of brand and product work. This page is a structured overview; the SKILL.md is the source. MIT licensed.

View SKILL.md Browse the full catalog

Frequently asked questions.

Should I test the whole product or specific tasks?: Specific tasks. Testing the whole product produces vague results. A good task represents a real user goal, has a clear start and end, is achievable in two to ten minutes, and is one of the most common, most strategic, or most problematic. Frame the user goal rather than the system action, provide context for why they are doing it, do not reveal the path, and do not use product terminology in the framing, or you hand the participant the answer.
Moderated or unmoderated?: Moderated tests (5 to 8 participants, with a researcher probing in real time) suit early-stage prototypes, complex tasks, and novel concepts; they catch surprises and go deeper. Unmoderated tests (15 to 30 participants, completed alone via a tool) suit stable designs and simple tasks, catching patterns at scale with less depth per session. Most teams use moderated for early and critical decisions and unmoderated for ongoing validation. For the most common segment, around 5 users surface roughly 85 percent of usability issues.
How do I moderate without leading the participant?: Encourage think-aloud ('what is going through your mind?'), and do not help unless the participant is truly stuck, and even then only after a long pause. Do not lead ('are you looking for the menu?' is a bad prompt). Note where they hesitate, scroll, or backtrack, and note the gap between their language and the product's. The researcher should talk about 20 percent of the time, never defend the design when someone struggles, and never ask participants to predict their future behavior.
How are findings scored?: By severity. Critical issues block task completion and most users hit them; major issues significantly slow the task and many users hit them; minor issues are friction with a workaround that some users hit; cosmetic issues are polish that does not affect the task. Patterns across participants are signal, while a single strong opinion is a weak data point worth investigating but not a finding on its own. Skipping severity scoring leaves the team unable to prioritize, because every finding looks equal.
How is this different from ux-research?: Usability testing tests an existing design or prototype for task completion; ux-research is generative discovery into what is true, before there is a design. Use ux-research upstream to understand users and shape the design, then usability testing to check that the design works. Live conversion optimization belongs to cro-optimization, and mapping the broader experience belongs to journey-mapping.