In the late 1970s, Pepsi was running behind Coca-Cola in the competition to be the leading cola. But then Pepsi discovered that in blind taste tests, people actually preferred the sweeter taste of Pepsi. To spread the word, Pepsi ran a famous advertising campaign, called the Pepsi Challenge, which showed people tasting the two brands of cola while not knowing which was which. They chose Pepsi every time.
As Pepsi steadily gained market share in the early 1980s, Coca-Cola ran the same test and found the same result—people simply preferred Pepsi when tasting the two side by side. So, after conducting extensive market research, Coca-Cola’s solution was to create a sweeter version of its famous cola—New Coke. In taste tests, people preferred the new formula of Coke to both the regular Coke formula and to Pepsi.
Despite this success in tests, when the company brought New Coke to market, customers revolted. New Coke turned out to be one of the biggest blunders in marketing history. Within months, Coke returned its original formula—branded as “Coca-Cola Classic”—to the shelves.
In the end, sales showed that people preferred Coke Classic. But Coca-Cola’s research predicted just the opposite. So what went wrong?
The tests had people drink one or two sips of each cola in isolation and then decide which they preferred based on that. The problem is, that’s not how people drink cola in real life. We might have a can with a meal. And we almost never drink just one or two sips. User research is just as much about the way the research is conducted as it is about the product being researched.
For the purposes of designing and researching digital services and websites, the point is that people can behave differently in user research than they do in real life. We need to be conscious of the way we design and run user research sessions and the way we interpret the results to take real-life behavior into account—and avoid interpretations that lead to a lot of unnecessary work and a negative impact on the user experience.
To show how this applies to web design, I’d like to share three examples taken from a project I worked on. The project was for a government digital service that civil servants use to book and manage appointments. The service would replace a third-party booking system called BookingBug. We were concerned with three user needs:
- booking an appointment;
- viewing the day’s appointments;
- and canceling an appointment.
Booking an appointment
We needed to give users a way to book an appointment, which consisted of selecting a location, an appointment type, and a person to see. The order of these fields matters: not all appointment types can be conducted at every location, and, not all personnel are trained to conduct every appointment type.
Our initial design had three select boxes in one page. Selecting an option in the first select box would cause the values in the subsequent boxes to be updated, but because it was just a prototype we didn’t build this into the test. Users selected an option from each of the select boxes easily and quickly. But afterwards, we realized that the test didn’t really reflect how the interface would actually work.
In reality, the select boxes would need to be updated dynamically with AJAX, which would slow things down drastically and affect the overall experience. We would also need a way to indicate that something was loading—like a loading spinner. This feedback would also need to be perceivable to visually-impaired users relying on a screen reader.
As mentioned earlier, the order in which users select options matters, because completing each step causes the subsequent steps to be updated. For production, if the user selected options in the wrong order, things could break. However, the prototype didn’t reflect this at all—users could select anything, in any order, and proceed regardless.
Users loved the prototype, but it wasn’t something we could actually give them in the end. To test this fairly and realistically, we would need to do a lot of extra work. What looked innocently like a simple prototype gave us misleading results.
Our next iteration followed the One Thing Per Page pattern; we split out each form field into a separate screen. There was no need for AJAX, and each page had a single submit button. This also stopped users from answering questions in the wrong order. As there was no longer a need for AJAX, the related accessibility considerations went away too.
This tested really well. The difference was that we knew the prototype was realistic, meaning users would get a similar experience when the feature went into production.
Viewing the day’s appointments
We needed to give users a way to view their schedule. We laid out the appointments in a table, where each row represented an appointment. Any available time was demarcated by the word “Available.” Appointments were linked, but available times were not.
In the first round of research, we asked users to look at the screen and give feedback. They told us what they liked, what they didn’t, and what they would change. Some participants told us they wanted their availability to stand out more. Others said they wanted color-coded appointment types. One participant even said the screen looked boring.
During the debrief, we realized they wanted color-coded appointments because BookingBug (to which they had become accustomed) had them. However, the reason BookingBug used color for appointments was that the system’s layout squeezed so much information into the screen that it was hard to garner any useful information from it otherwise.
We weren’t convinced that the feedback was valuable. Accommodating these changes would have meant breaking existing patterns, which was something we didn’t want to do without being sure.
We also weren’t happy about making availability more prominent, as this would make the appointments visually weaker. That is, fixing this problem could inadvertently end up creating another, equally weighted problem. We wanted to let the content do the work instead.
The real problem, we thought, was asking users their opinion first, instead of giving them tasks to complete. People can be resistant to change, and the questions we asked were about their opinion, not about how to accomplish what they need to do. Ask anyone their opinion and they’ll have one. Like the Coca-Cola and Pepsi taste tests, what people feel and say in user research can be quite different than how they behave in real life.
So we tested the same design again. But this time, we started each session by asking users questions that the schedule page should be able to answer. For example, we asked “Can you tell me when you’re next available?” and “What appointment do you have at 4 p.m.?”
Users looked at the screen and answered each question instantly. Only afterward did we ask users how they felt about it. Naturally, they were happy—and they made no comments that would require major changes. Somewhat amusingly, this time one participant said they wanted their availability to be less prominent because they didn’t want their manager seeing they had free time.
If we hadn’t changed our approach to research, we might have spent a lot of time designing something new that would have had no value for users.
Canceling an appointment
The last feature involved giving users a way to cancel an appointment. As we were transitioning away from using BookingBug, there was one situation where an appointment could have been booked in both BookingBug and the application—the details of which don’t really matter. What is important is that we asked users to confirm they understood what they needed to do.
The first research session had five participants. One of those participants read the prompt but missed the checkbox and proceeded to submit the form. At that point, the user was taken to the next screen.
We might have been tempted to explore ways to make the checkbox more prominent, which in theory would reduce the chance of users missing it. But then again, the checkbox pattern was used across the service and had gone through many rounds of usability and accessibility testing—we knew that the visual design of the checkbox wasn’t at fault.
The problem was that the prototype didn’t have form validation. In production, users would see an error message, which would stop them from proceeding. We could have spent time adding form validation, but there is a balancing act between the speed in which you want to create a throwaway prototype and having that prototype give you accurate and useful results.
Coca-Cola wanted its world-famous cola to test better than Pepsi. As soon as tests showed that people preferred its new formula, Coca-Cola ran with it. But like the design of the schedule page, it wasn’t the product that was wrong, it was the research.
Although we weren’t in danger of making the marketing misstep of the century, the design of our tests could have influenced our interpretation of the results in such a way that it would have created a lot more work for a negative return. That’s a lot of wasted time and a lot of wasted money.
Time with users is precious: we should put as much effort and thought into the way we run research sessions as we do with designing the experience. That way users get the best experience and we avoid doing unnecessary work.