Why Are We Experimenting with Generative AI?

Today we launched an invite-only beta of AskCR, an experimental chatbot that helps people get to CR’s trusted information faster. We imagine some people may have questions about why CR is using generative AI when its potential for misinformation and even harm is clear. In this post, we’ll unpack why we’re experimenting with generative AI, why these experiments are core to our mission, and how we set out to build responsibly.

Navigating the information marketplace is getting harder

Today, the information marketplace is warped by spam, pay to play, linkbait, fake reviews—and now a deluge of AI generated content. As both a publisher and non-profit organization, this cuts to the heart of our mission.

A recent expose reveals how a controversial outfit called AdVon “strikes deals with publishers in which it provides huge numbers of extremely low-quality, [AI-generated] product reviews… intended to pull in traffic from people Googling things like ‘best ab roller.’ The idea seems to be that these visitors will be fooled into thinking the recommendations were made by the publication’s actual journalists and click one of the articles’ affiliate links, kicking back a little money if they make a purchase.”

While AdVon and others claim to include a human in the loop of content generation, “its reviews are packed with filler and truisms, and sometimes include bizarre mistakes that make it difficult to believe a human ever seriously reviewed the draft before publication.”

AI is distorting the already messed up economics of the web. It lets “unscrupulous profiteers pollute the internet with low-quality work produced at unprecedented scale.” And it’s a major problem for anyone who depends on the internet to find and make sense of information: “if Google can’t figure out how to separate the wheat from the chaff — [that] threatens to flood the whole web in an unstoppable deluge of spam.”

The nature of search is changing

Meanwhile, the nature of Google search itself is changing. Last month, Google rolled out AI-powered search result summaries for all U.S. customers. Instead of “ten blue links,” many users will now see an generative AI summary at the top of their search.

The rollout sparked headlines like “Google’s “AI Overview” can give false, misleading, and dangerous answers”; and “Google promised a better search experience — now it’s telling us to put glue on our pizza.” (Google has since taken some steps to address these issues).

Some argue that this proves generative AI is just an investor-driven hype bubble. That if even Google can’t get it right, maybe this technology is a dead end. Or that LLMs are so error-prone that using them to power consumer recommendations is dangerous.

We don’t necessarily agree. To be sure, there are real dangers in using LLMs to try and summarize unvetted information online. And there are problems and limitations inherent in the technical architecture of LLMs— these have been well-documented and convincingly argued by experts like Emily Bender, Timnit Gebru and Yann LeCun.

But it’s important to note that while some of the false and misleading examples we saw in Google’s AI summaries rollout were made because of mistakes made by the LLM, it was far more frequently the case that the information being summarized by the LLM was the problem, with the tool sometimes drawing on old Reddit posts or Onion articles for advice. This highlights the growing challenge for search engines like Google: judging—or, if you prefer, valuing—what constitutes “good” information.

These are timely questions for publishers and advocacy groups like Consumer Reports, and not just from a mission perspective. While we are a non-profit, we earn much of our revenue selling digital memberships, which depends on maintaining a healthy flow of web traffic. We already contend with a flood of low-quality content in a competitive SEO market, and may face further declines in traffic as the nature of search changes.

Consumers may rely more on AI in the future to decide what to buy

Google is also evolving to become more of an AI shopping assistant, which you’ve no doubt noticed when searching for product recommendations.

It’s possible that generative AI may spawn products that eventually replace search — with what some people call “AI agents.” Agents are a new class of product that perform specific tasks on behalf of a user. We’re already seeing shopping agents that actively scour the web and help people with personalized recommendations, with compelling early proofs of concept like vetted.ai, claros.so, and vibecheck.market.

As one Bloomberg writer put it, “Why trawl through a bunch of junk if you can access a uniquely personal AI shopping assistant that understands your preferences and can recommend the best products for you?”

These technology trends could be existential threats to product ratings and review sites like CR; they may take time to materialize; or they may not materialize at all. But we can’t afford to ignore them.

CR is well-positioned to help consumers navigate this new landscape and ensure they have access to reliable, unbiased information. So in our product R&D, we’re exploring how we might help people doing product research get to the right answers faster. LLMs are probably an important piece of the puzzle.

Chatbots are a step into the future

People have also been asking us: “Do consumers really want chatbots?” This attitude was very relatably on display when a Washington Post tech reviewer tried Amazon’s new shopping chatbot and concluded “it’s not good,” asking: “If these chatbots are supposed to be magical, why are so many of them dumb as rocks?”

It’s true that chatbots are not always going to be the right interface for consumers. They put too much onus on the user to figure out what they can do, when simple buttons would suffice and get to the users’ intent faster. But chatbots are a stepping stone to more interesting user interface concepts, like voice, multimodal, and generative UX.

So while the AskCR beta is essentially a chatbot (for now), we’re interested in exploring how some of the underlying technologies can help users get to the content they want faster.

Are we staking too much trust in a new technology?

Finally, some people have been asking us whether Consumer Reports should be taking risks with LLM-based products, because even well-designed and responsible systems can generate answers that raise eyebrows.

CR has built a reputation over many years for trustworthiness, independence, and integrity — why risk embarrassing headlines when AI-based tools will inevitably make mistakes?

As of today, AskCR beta is by invitation only, and we’ll be carefully assessing whether and how to roll it out more broadly. We’ve already tested it extensively, including security testing, “red teaming,” and iterative evaluation of AskCR’s responses to a wide variety of questions. We expect to continue learning about where AskCR succeeds and fails, and making improvements accordingly.

To date, all of CR’s uses of LLMs either summarize or answer questions based only on CR’s own published content, i.e., the articles our experts have written. These are considered “reliable” use cases that line up with the current capabilities of LLMs. AskCR, for instance, is a chatbot based on the popular Retrieval Augmented Generation (RAG) architecture, which should go a long way toward limiting inaccurate answers. But as we warn beta users, “AskCR uses AI and can make mistakes. Consider checking results.”

As long as we frame consumers’ expectations correctly, and strive for transparency and constant improvement, we think our users will give us the benefit of the doubt.

It’s not enough to stand on the sidelines

Experimenting with generative AI is part of our commitment of keeping with the times. Part of CR’s role is, and has always been, to be a “truth teller” about where new technology falls short of marketing claims and promises.

A lot of the most egregious harms we’re seeing from LLMs aren’t necessarily because LLMs are bad, but because they can be used badly. By learning in practice about where LLM-based products can be used to help or harm consumers, we can be more effective advocates for responsible innovation (and more credibly call out irresponsible innovation).

And as AI shapes more and more of our experience online, we think it’s not enough to comment from the sidelines—we need to also get actively engaged in actually solving the problems vexing consumers. Building solutions is a powerful way for CR to fulfill its mission of empowering and protecting people. That’s the spirit that motivated the creation of our Permission Slip service.

“Deciding what to buy” is clearly a big pain point for many online shoppers, and CR brings many unique strengths into play as generative AI creates opportunities to build new kinds of products.

As I asked earlier this year, “What if we harnessed the potential of AI to power uniquely personal recommendation agents that save consumers time and money? What if we built AI agents operated purely for the consumer’s benefit, rather than the interests of sellers and advertisers? The Consumer Reports magazine became iconic because it bucked the trends of the day: it was ad free, objective, and accountable only to the consumer. What’s the equivalent for the coming AI era?”

We know that because we’re CR, people will hold everything we do to a higher standard. We have no intention of being “just another company with a chatbot,” nor hyping the capabilities of new technologies that people may or may not want.

To that end, we’ve established principles for responsible AI at CR, listening carefully to user feedback, and transparently documenting what we’re learning right here on this blog.

We want to hear from you

LLMs are just the latest technology trend, but they portend much bigger changes to how consumers find and synthesize information. We think the broader deployment of this technology will have a major influence on consumer behavior and decision-making in the coming years. So even as we get our hands dirty with LLMs, we’re focused on the bigger picture:

As Google rolls out ads in generative AI summaries—how can we ensure the line between information and advertising is clear?
How should we think about the consumer protection implications of advanced AI assistants? Where do we draw the ethical line for products that are persuasive by design?
And what obligations or duties should persuasive systems owe their users? How do we ensure that AI tools are never exploiting or manipulating the behavior of consumers they’re meant to serve?

We hope you’ll follow us here, or get in touch if you want to explore these questions and build solutions together.

Navigating the information marketplace is getting harder

The nature of search is changing

Consumers may rely more on AI in the future to decide what to buy

Chatbots are a step into the future

Are we staking too much trust in a new technology?

It’s not enough to stand on the sidelines

We want to hear from you

Bad If True: How Participatory Science Can Be a Warning System for Digital Harms

Encoding Loyalty Principles into AI Agents’ Behavior

KeyDrop Scans The Web for Publicly Exposed API Keys

Takeaways from “The Race to Standardize Agentic Commerce” Webinar

Navigating the information marketplace is getting harder

The nature of search is changing

Consumers may rely more on AI in the future to decide what to buy

Chatbots are a step into the future

Are we staking too much trust in a new technology?

It’s not enough to stand on the sidelines

We want to hear from you

Related Blog Posts

Get the latest on Innovation at Consumer Reports