AI Engineer World's Fair 2025: Takeaways from the Team Building AskCR

This summer, members of Consumer Reports (CR)’s Innovation team traveled to San Francisco to attend the AI Engineer World’s Fair (AIEWF). AIEWF brings together more than 3,000 engineers, founders, and researchers building products that leverage AI. We were on the hunt for information and inspiration as we continue to build AskCR into a world-class product research advisor.

AskCR is in a unique position within the AI landscape. We’re a small team building for millions of CR members, but we have something most AI companies don’t: decades of high-quality, unbiased consumer research data and a clearly defined vertical. While the big players continue to innovate on general-purpose AI assistants, we’re laser-focused on building for consumers looking to make informed purchasing decisions.

One challenge we’ve encountered is getting our search coverage and citation quality to reach parity with industry leaders; AIEWF’s RecSys and Search+Retrieval track tackled this topic head on. We also know we need to better integrate CR’s product expert knowledge into AskCR’s responses, and the GraphRAG track at AIEWF sparked new ideas in that vein. Using AI tools like Cursor has allowed our team to speed up production cycles, but as codebases expand, the productivity gains slow down.The AI Architects track was helpful in instructing us on how to better leverage these tools within the team. Lastly, the MCP track provided some much needed signal on how the emerging AI agent ecosystem is shaping up.

Takeaways from the RecSys and Search+Retrieval Tracks

One of the most valuable aspects of AIEWF was hearing from teams building AI products in unique verticals. With how fast the field is moving, there is a lot to be learned from teams building RAG (retrieval-augmented generation) applications in domains outside of the consumer space we play in. Here were some of the standouts.

Harvey.ai’s approach to legal AI is a great case study in how to build for a specific domain. Their use of custom embeddings and re-rankers to better encode domain-specific knowledge is impressive and something we may consider experimenting with in AskCR. Despite all of the complexities, they still see a ton of value in and rely heavily on human evals. They emphasized that the core API that they maintain is purposefully simple, to allow for more rapid iteration. While our team had been reaching similar conclusions with AskCR, it was validating to see this point of view reinforced by a market-leading startup.

Instacart’s presentation, specifically their discussion on leveraging Large Language Models (LLMs) for search and discovery, offered new ideas for maximizing information extraction from user queries. Their approach to query rewrites, where a single user query is expanded (“fanned out”) to identify potential substitutes and complementary products, is especially relevant to our current workflows within AskCR.

Currently, AskCR refines user queries by incorporating context from the user’s chat history and expands queries to improve semantic matching with our existing content. However, Instacart’s methodology presents an opportunity for further enhancement. By adopting a similar “fanning out” strategy, we could execute additional query instances. This would be helpful in scenarios where a specific product requested by a user has not been directly reviewed by CR. In such cases, Instacart’s approach would allow us to identify and present similar, reviewed products.

MongoDB’s overview of the state of RAG in 2025 underscored the need for Retrieval Augmented Generation (RAG) within domain-specific applications. Even as context windows for Large Language Models (LLMs) expand, the cost and latency trade-offs in production apps that serve millions of users keep RAG relevant. Despite that, there’s still lots of room for refinement and enhancement within existing RAG workflows.

In the development of AskCR, we have already implemented several techniques to optimize our search capabilities. These include query refinement, document enrichment, semantic chunking, and hybrid search. Nevertheless, the landscape of RAG is constantly evolving, and there are emerging techniques, such as GraphRAG, that we could potentially integrate. Leveraging methods like GraphRAG would allow us to further embed intricate domain-specific knowledge into our approach to search, leading to even more accurate and contextually rich results for our users.

Takeaways from the GraphRAG Track

One of the challenges in AskCR is getting the system to search for relevant content like our experts would. We’ve been experimenting with curating knowledge graphs of CR expert advice, matching user questions to relevant advice and then using that advice to improve the logic of our retrievers. Zach Blumenfeld’s GraphRAG workshop (code) was a great walkthrough of curating knowledge graphs from unstructured data and applying them in agent workflows.

Alison Cossette’s workshop (code) went deeper on using knowledge graphs as analytical tools to discover issues and opportunities in data sources . Particularly interesting are the ideas to visualize user interactions in the context of the graph and to use graphs to analyze CR’s articles and structured databases to find and collapse redundant chunks, which could limit our semantic and hybrid retrievers returning multiple chunks with overlapping insights. Utilizing graph techniques could improve the breadth of our retrievers.

Takeaways from the AI Architects Track

Yegor Denisov-Blanch’s talk analyzing real-world productivity data from AI coding tools across nearly 100,000 developers and hundreds of companies was full of great insights into these tools and also into agent performance that extend beyond coding.

While productivity gains are generally around 20%, developers spend significantly more time reworking code generated by AI coding tools. The gains are much higher for greenfield projects (30-35%) compared to large brownfield codebases (5-20%), with even lower performance on codebases using less popular languages.

Most relevant to our work at AskCR was the finding that despite state-of-the-art models having context windows approaching or beyond 1 million tokens, coding tools show performance declines as they approach just 32k tokens.

Since AI coding assistants are agents operating in a limited arena with extensive training data, this study also serves as a useful lens on the current state of agent performance more generally.

Takeaways from the MCP Track

Model Context Protocol (MCP) is emerging as the de facto protocol for AI agents to access servers, much like HTTP for web browsers. This protocol expands the usefulness of AI agents by giving them access to external tools, services, and APIs, enabling them to interact with the real world.

A fully-functional MCP server can show emergent capabilities. Dynamic tool discovery allows servers to change available tools on the fly (based on context). MCP servers can also expose resources (data like static files or live system data) and support prompts (manuals for agents on server usage). Another key feature is sampling, which is an intermediary step that enables servers to transform data using an LLM from the client for tasks like summarizing or formatting. An example involved a “research agent” tool using sampling to have the client’s LLM generate SQL queries for a BigQuery PyPI dataset, effectively giving the MCP server access to an LLM. Future additions to the spec include auth, elicitation, streamable HTTP, and a community registry of MCPs.

To wrap up the track, Apify, a leading API marketplace presented their plan for an open marketplace of agentic tools, where an AI agent only needs one Apify API key to access them all (currently an AI agent would need many API keys for each service it would use). This solves both the problem of setting up accounts for each service as well as the payments portion, as payments can all be metered through one api key and disbursed to the developers who publish the actors. We will be keeping Apify’s model in mind as we continue prototyping AI agent advocates that work on behalf of consumers.

Needless to say, the AI Engineer World’s Fair was a very valuable conference, and helped plant many new ideas for improving AskCR and better serving consumers’ interests in the age of AI. If you’re interested in any of the perspectives shared in this post and keen to discuss how they might be applied towards expert-backed product advice, drop us a line innovationlab@cr.consumer.org.