Usability testing is a technique used in user-centered interaction design to evaluate a product by testing it on users. This can be seen as an irreplaceable usability practice, since it gives direct input on how real users use the system. It is more concerned with the design intuitiveness of the product and tested with users who have no prior exposure to it. Such testing is paramount to the success of an end product as a fully functioning application that creates confusion amongst its users will not last for long. This is in contrast with usability inspection methods where experts use different methods to evaluate a user interface without involving users. Usability testing focuses on measuring a human-made product's capacity to meet its intended purposes. Examples of products that commonly benefit from usability testing are

food Food is any substance consumed by an organism for nutritional support. Food is usually of plant, animal, or fungal origin, and contains essential nutrients, such as carbohydrates, fats, proteins, vitamins, or minerals. The substance is in ...

, consumer products,

website A website (also written as a web site) is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. Examples of notable websites are Google, Facebook, Amazon, and W ...

s or web applications, computer interfaces, documents, and devices. Usability testing measures the usability, or ease of use, of a specific object or set of objects, whereas general human–computer interaction studies attempt to formulate universal principles.

What it is not

Simply gathering opinions on an object or a document is

market research Market research is an organized effort to gather information about target markets and customers: know about them, starting with who they are. It is an important component of business strategy and a major factor in maintaining competitiveness. Ma ...

qualitative research Qualitative research is a type of research that aims to gather and analyse non-numerical (descriptive) data in order to gain an understanding of individuals' social reality, including understanding their attitudes, beliefs, and motivation. This ...

rather than usability testing. Usability testing usually involves systematic observation under controlled conditions to determine how well people can use the product. However, often both qualitative research and usability testing are used in combination, to better understand users' motivations/perceptions, in addition to their actions. Rather than showing users a rough draft and asking, "Do you understand this?", usability testing involves watching people trying to ''use'' something for its intended purpose. For example, when testing instructions for assembling a toy, the test subjects should be given the instructions and a box of parts and, rather than being asked to comment on the parts and materials, they should be asked to put the toy together. Instruction phrasing, illustration quality, and the toy's design all affect the assembly process.

Methods

Setting up a usability test involves carefully creating a scenario, or a realistic situation, wherein the person performs a list of tasks using the product being tested while observers watch and take notes ( dynamic verification). Several other

test Test(s), testing, or TEST may refer to: * Test (assessment), an educational assessment intended to measure the respondents' knowledge or other abilities Arts and entertainment * ''Test'' (2013 film), an American film * ''Test'' (2014 film), ...

instruments such as scripted instructions, paper prototypes, and pre- and post-test questionnaires are also used to gather feedback on the product being tested ( static verification). For example, to test the attachment function of an e-mail program, a scenario would describe a situation where a person needs to send an e-mail attachment, and asking them to undertake this task. The aim is to observe how people function in a realistic manner, so that developers can identify the problem areas and fix them. Techniques popularly used to gather data during a usability test include think aloud protocol, co-discovery learning and eye tracking.

Hallway testing

Hallway testing, also known as guerrilla usability, is a quick and cheap method of usability testing in which people — such as those passing by in the hallway — are asked to try using the product or service. This can help designers identify "brick walls," problems so serious that users simply cannot advance, in the early stages of a new design. Anyone but project designers and engineers can be used (they tend to act as "expert reviewers" because they are too close to the project). This type of testing is an example of

convenience sampling Convenience sampling (also known as grab sampling, accidental sampling, or opportunity sampling) is a type of non-probability sampling that involves the sample being drawn from that part of the population that is close to hand. This type of sampli ...

and thus the results are potentially biased.

Remote usability testing

In a scenario where usability evaluators, developers and prospective users are located in different countries and time zones, conducting a traditional lab usability evaluation creates challenges both from the cost and logistical perspectives. These concerns led to research on remote usability evaluation, with the user and the evaluators separated over space and time. Remote testing, which facilitates evaluations being done in the context of the user's other tasks and technology, can be either synchronous or asynchronous. The former involves real time one-on-one communication between the evaluator and the user, while the latter involves the evaluator and user working separately. Numerous tools are available to address the needs of both these approaches. Synchronous usability testing methodologies involve video conferencing or employ remote application sharing tools such as WebEx. WebEx and GoToMeeting are the most commonly used technologies to conduct a synchronous remote usability test. However, synchronous remote testing may lack the immediacy and sense of "presence" desired to support a collaborative testing process. Moreover, managing interpersonal dynamics across cultural and linguistic barriers may require approaches sensitive to the cultures involved. Other disadvantages include having reduced control over the testing environment and the distractions and interruptions experienced by the participants in their native environment. One of the newer methods developed for conducting a synchronous remote usability test is by using virtual worlds. Asynchronous methodologies include automatic collection of user's click streams, user logs of critical incidents that occur while interacting with the application and subjective feedback on the interface by users. Similar to an in-lab study, an asynchronous remote usability test is task-based and the platform allows researchers to capture clicks and task times. Hence, for many large companies, this allows researchers to better understand visitors' intents when visiting a website or mobile site. Additionally, this style of user testing also provides an opportunity to segment feedback by demographic, attitudinal and behavioral type. The tests are carried out in the user's own environment (rather than labs) helping further simulate real-life scenario testing. This approach also provides a vehicle to easily solicit feedback from users in remote areas quickly and with lower organizational overheads. In recent years, conducting usability testing asynchronously has also become prevalent and allows testers to provide feedback in their free time and from the comfort of their own home.

Expert review

Expert review is another general method of usability testing. As the name suggests, this method relies on bringing in experts with experience in the field (possibly from companies that specialize in usability testing) to evaluate the usability of a product. A heuristic evaluation or usability audit is an evaluation of an interface by one or more human factors experts. Evaluators measure the usability, efficiency, and effectiveness of the interface based on usability principles, such as the 10 usability heuristics originally defined by

Jakob Nielsen Jacob or Jakob Nielsen may refer to: * Jacob Nielsen, Count of Halland (died c. 1309), great grandson of Valdemar II of Denmark * , Norway (1768-1822) * Jakob Nielsen (mathematician) (1890–1959), Danish mathematician known for work on automorphis ...

in 1994. Nielsen's usability heuristics, which have continued to evolve in response to user research and new devices, include: * Visibility of system status * Match between system and the real world * User control and freedom * Consistency and standards * Error prevention * Recognition rather than recall * Flexibility and efficiency of use * Aesthetic and minimalist design * Help users recognize, diagnose, and recover from errors * Help and documentation

Automated expert review

Similar to expert reviews, automated expert reviews provide usability testing but through the use of programs given rules for good design and heuristics. Though an automated review might not provide as much detail and insight as reviews from people, they can be finished more quickly and consistently. The idea of creating surrogate users for usability testing is an ambitious direction for the artificial intelligence community.

A/B testing

In web development and marketing, A/B testing or split testing is an experimental approach to web design (especially user experience design), which aims to identify changes to web pages that increase or maximize an outcome of interest (e.g., click-through rate for a banner advertisement). As the name implies, two versions (A and B) are compared, which are identical except for one variation that might impact a user's behavior. Version A might be the one currently used, while version B is modified in some respect. For instance, on an e-commerce website the purchase funnel is typically a good candidate for A/B testing, as even marginal improvements in drop-off rates can represent a significant gain in sales. Significant improvements can be seen through testing elements like copy text, layouts, images and colors. Multivariate testing or bucket testing is similar to A/B testing but tests more than two versions at the same time.

Number of participants

In the early 1990s,

, at that time a researcher at

Sun Microsystems Sun Microsystems, Inc. (Sun for short) was an American technology company that sold computers, computer components, software, and information technology services and created the Java programming language, the Solaris operating system, ZFS, t ...

, popularized the concept of using numerous small usability tests—typically with only five participants each—at various stages of the development process. His argument is that, once it is found that two or three people are totally confused by the home page, little is gained by watching more people suffer through the same flawed design. "Elaborate usability tests are a waste of resources. The best results come from testing no more than five users and running as many small tests as you can afford."; references The claim of "Five users is enough" was later described by a mathematical model which states for the proportion of uncovered problems U

U = 1-(1-p)^n

where p is the probability of one subject identifying a specific problem and n the number of subjects (or test sessions). This model shows up as an asymptotic graph towards the number of real existing problems (see figure below). In later research Nielsen's claim has been questioned using both

empirical Empirical evidence for a proposition is evidence, i.e. what supports or counters this proposition, that is constituted by or accessible to sense experience or experimental procedure. Empirical evidence is of central importance to the sciences and ...

evidence and more advanced

mathematical model A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used in the natural sciences (such as physics, ...

s. Two key challenges to this assertion are: # Since usability is related to the specific set of users, such a small sample size is unlikely to be representative of the total population so the data from such a small sample is more likely to reflect the sample group than the population they may represent #Not every usability problem is equally easy-to-detect. Intractable problems happen to decelerate the overall process. Under these circumstances, the progress of the process is much shallower than predicted by the Nielsen/Landauer formula. It is worth noting that Nielsen does not advocate stopping after a single test with five users; his point is that testing with five users, fixing the problems they uncover, and then testing the revised site with five different users is a better use of limited resources than running a single usability test with 10 users. In practice, the tests are run once or twice per week during the entire development cycle, using three to five test subjects per round, and with the results delivered within 24 hours to the designers. The number of users actually tested over the course of the project can thus easily reach 50 to 100 people. Research shows that user testing conducted by organisations most commonly involves the recruitment of 5-10 participants. In the early stage, when users are most likely to immediately encounter problems that stop them in their tracks, almost anyone of normal intelligence can be used as a test subject. In stage two, testers will recruit test subjects across a broad spectrum of abilities. For example, in one study, experienced users showed no problem using any design, from the first to the last, while naive users and self-identified power users both failed repeatedly. Later on, as the design smooths out, users should be recruited from the target population. When the method is applied to a sufficient number of people over the course of a project, the objections raised above become addressed: The sample size ceases to be small and usability problems that arise with only occasional users are found. The value of the method lies in the fact that specific design problems, once encountered, are never seen again because they are immediately eliminated, while the parts that appear successful are tested over and over. While it's true that the initial problems in the design may be tested by only five users, when the method is properly applied, the parts of the design that worked in that initial test will go on to be tested by 50 to 100 people.

Example

A 1982

Apple Computer Apple Inc. is an American multinational technology company headquartered in Cupertino, California, United States. Apple is the largest technology company by revenue (totaling in 2021) and, as of June 2022, is the world's biggest company ...

manual for developers advised on usability testing: # "Select the target audience. Begin your human interface design by identifying your target audience. Are you writing for businesspeople or children?" # Determine how much target users know about Apple computers, and the subject matter of the software. # Steps 1 and 2 permit designing the user interface to suit the target audience's needs. Tax-preparation software written for accountants might assume that its users know nothing about computers but are experts on the tax code, while such software written for consumers might assume that its users know nothing about taxes but are familiar with the basics of Apple computers. Apple advised developers, "You should begin testing as soon as possible, using drafted friends, relatives, and new employees": Designers must watch people use the program in person, because

Education

Usability testing has been a formal subject of academic instruction in different disciplines. Usability testing is important to composition studies and online writing instruction (OWI). Scholar Collin Bjork argues that usability testing is "necessary but insufficient for developing effective OWI, unless it is also coupled with the theories of

digital rhetoric Digital rhetoric can be generally defined as communication that exists in the digital sphere. As such, digital rhetoric can be expressed in many different forms —including but not limited to text, images, videos, and software. Due to the in ...

References

External links

Usability.gov
{{DEFAULTSORT:Usability Testing Usability Software testing Educational technology Product testing