vector research partners ( aka V4RP )
The Art of Evals: How Figma Put People at the Center of Its AI Product
Building, testing, and validating with human-centered design.
Outline
AI tools are reshaping product development, enabling people at all skill levels to turn ideas into tangible, interactive experiences. As Apple engineering leader Michael Lopp notes, democratization is positive because it opens up these capabilities, but it also creates a crowded marketplace.
For builders aiming to stand out, what matters most—taste, speed, or something else?
Figma’s latest tool, Figma Make, emphasizes human craft and creativity in the product-building process. Its new prompt-to-functional-app experience, launched at Config alongside Sites, Buzz, and Draw, reduces the technical barriers to bringing a product to life.
David Kossnick, Figma’s Head of Product, AI, made humans the focus not just of the product experience but also of the evaluation process.
Unlike traditional software, AI products exist in a “foggy middle ground” where real testing is essential to validate capabilities.
At The Review, we’ve explored how Figma integrates human feedback into its product decisions. FigJam grew from community use of Figma for brainstorming during the pandemic, according to CPO Yuhki Yamashita. Figma Slides emerged organically through internal viral projects, says founding PM Mihika Kapoor.
This interview dives into the evaluation process Kossnick and team used for Figma Make, keeping humans central at every step—from defining success metrics to gathering qualitative feedback and analyzing results.
Developing shared AI infrastructure
Figma Sites required a company-wide infrastructure effort, bridging design tools and web publishing. It translated designs into functional web code using deterministic code-gen rather than AI.
As AI models advanced, a designer proposed making static website components functional with AI. This led to a hackathon project that became the precursor to Figma Make.
Code Layers in Figma Sites introduced three key capabilities:
- Making code a primitive in Figma’s canvas.
- Converting designs into React code, not just HTML/CSS.
- Adding a chat interface for AI-driven interactions.
These features addressed the hardest parts of design-to-code conversion.
Code Layers and the birth of Figma Make
Another hackathon created a standalone prototype of Code Layers where users could prompt AI to build full sites or apps. “It worked surprisingly well, a surprising percentage of the time,” says Kossnick.
The question then became: is this viable as a product?
Figma’s AI product viability framework
Kossnick identifies four paths to assess whether an AI project is worth pursuing:
- Technology isn’t ready: Prototypes validate feasibility before heavy investment.
- Almost possible: Requires custom models or fine-tuning; scalability is a concern.
- Possible with product adjustments: Narrow scope or adjust features to make AI integration feasible.
- It works: Technology and product align effectively.
Once the path is identified, rapid prototyping and validation are crucial.
Structuring your AI product team
Key takeaways from Figma Make:
- Role blending keeps teams small: Designers can code, PMs can prototype.
- Everyone touches code: Builds shared understanding of product functionality.
- Centralized teams: Shared infrastructure accelerates iteration.
- Include target personas in evals: Designers and PMs provide the taste check.
Fluidity across roles increases pace and efficiency in AI product development.
Figma’s three-step, human-centric evaluation process
- Define meaningful success metrics
Metrics must reflect the target persona’s expectations, scenarios, and desired deliverables. For Figma Make:- Design score (1–4): Does the output visually match the mock or prompt?
- Functionality score (1–4): Does the output behave as expected?
- Gather qualitative human feedback at scale
Feedback expanded in four layers: internal AI team, target PM/design personas, company-wide, and alpha customer testers. - Early stages were scrappy, with feedback collected in Slack. Later, FigJam boards enabled collaboration and larger-scale insights, surfacing real use cases and unexpected applications.
- Assess the data effectively
Four evaluation types:- Deterministic: Pass/fail checks.
- Taste and judgment: Human-rated qualitative assessment.
- AI as judge: AI evaluates output based on human guidance.
- Usage analytics: A/B testing and production data.
Human-centered approach
Figma Make’s development shows that human input is crucial in AI product evaluation. Designers and PMs were intentionally included to ensure outputs matched real user expectations.
“One of the worst things you can do is optimize for the wrong thing,” says Kossnick. “If your users’ prompts differ from yours, you’ve been optimizing in isolation and need to start over.”