How I Built Quizco's AI Quiz Agent

I did not want Quizco’s AI agent to be a one-shot prompt that blindly spits out trivia.

I wanted it to act more like a product teammate:

look at what people are already playing
decide whether new quizzes are even needed
generate drafts in a strict structure
verify and de-duplicate them
hand the final decision to a human reviewer

That led me to an architecture that is much closer to an agent loop than a simple “generate quiz” button.

I also wanted the post to have a single systems-view image, so I made this architecture diagram for the agent loop:

This is a single Node.js backend service architecture; planner, writer, and verifier are internal modules inside the same backend.

Architecture diagram for the Quizco AI agent showing triggers, planner context, generation, quality gates, review, publishing, and memory feedback.

Workflow

From platform signals to a published quiz

Every agent run starts by reading product signals and ends with a human approval decision.

01
01
Read platform demand and queue pressure
Before generating anything, the backend inspects active quiz tags, attempt counts, recent agent history, pending quiz load, and durable memory from earlier runs.
This lets the agent react to real demand instead of inventing random topics in isolation.
02
02
Choose whether to generate at all
The planner uses the OpenAI Responses API, can call internal tools, and can use web search when freshness matters. It is allowed to stand down if demand is weak or the queue is already full.
A run can return generate_quizzes, stand_down, or observe instead of forcing content every time.
03
03
Generate strict JSON quiz drafts
The writer model produces exactly 5 questions, 4 answer options per question, a confidence score, a format, a topic, and a trend summary. The writer can be OpenAI or Anthropic depending on config.
The prompt also tells the model to avoid reusing recent titles, angles, or question concepts.
04
04
Validate safety, structure, and factual quality
Generated drafts are checked for unsafe topics, weak structure, duplicate questions, missing citations for web-backed ideas, and factual issues through a verifier that can search the web question by question.
If a draft fails, the agent revises it and tries again up to the configured revision limit instead of publishing low-quality output.
05
05
Block near-duplicate quizzes with embeddings
Once a draft passes validation, I create an embedding and compare it against existing quizzes with MongoDB vector search so the agent does not keep publishing the same idea in a new wrapper.
The duplicate threshold is explicit, so repetition is treated like a product bug rather than a model quirk.
06
06
Store the result in a review buffer
Passing drafts are written to QuizPending with planner notes, source citations, verification reports, confidence, and revision metadata attached.
I wanted every pending quiz to carry enough context for a reviewer to understand why it exists.
07
07
Approve to publish, reject to teach the system
The React admin dashboard lets me approve or reject pending quizzes. Approval creates a real Quiz plus its Question documents. Rejection saves a reason that becomes future memory for the planner.
Nothing is auto-published. Human review is the final gate.

Planning before generation

The most important design decision was making planning a first-class step.

Instead of asking a model to directly invent quizzes, I built a planner that can inspect the state of the platform first. That planner gets a small tool belt and can decide that the correct answer is to do nothing.

get_platform_state
get_recent_agent_history
get_agent_memory
web_search

That mattered for two reasons.

First, the agent needs to respect product reality. If there are already too many pending quizzes, or if a topic was just rejected, generating more of the same thing is noise.

Second, freshness is uneven. Some topics can be chosen from internal demand alone, while others benefit from current web context. The planner only reaches for web search when that freshness actually improves the decision.

I also added a deterministic fallback ranker. If the planner call fails, Quizco can still choose topics based on attempts, active quiz counts, and penalties for recently pending, rejected, or already-approved topics. That keeps the system resilient instead of brittle.

Generation is only one stage of the pipeline

Once the planner selects topics, the writer prompt asks for a strict quiz JSON object, not free-form prose.

The draft must include:

a title and description
a topic and tags
a format such as standard, speed_round, deep_dive, or streak
an agentConfidence score
exactly 5 questions
exactly 4 options per question

After that, the draft goes through multiple filters:

safety validation blocks harmful or tragedy-heavy topics
structure validation catches broken question counts, duplicate prompts, or invalid options
citation validation ensures web-influenced drafts still carry sources
the verifier uses web search to fact-check questions that contain factual claims
the embedding check blocks drafts that are too close to existing quizzes

This was the difference between “AI content” and “reviewable product content.” I wanted the agent to earn its way into the inbox.

Human review was non-negotiable

I built a dedicated React admin surface at /agent instead of hiding the workflow behind logs.

The dashboard is split into a few focused views:

Briefing summarizes the latest run and current agent state
Quiz Inbox shows pending quizzes with confidence, topic context, and approve/reject actions
Skipped records why the agent chose not to generate certain candidates
Run History exposes recent runs for auditability
Platform Health shows preflight status, last run info, and run controls

Quizco AI Agent dashboard showing the Quiz Inbox with pending quizzes and approve/reject controls.

That dashboard turns the agent from a black box into an inspectable system. If a quiz is rejected, the rejection reason is saved. If a quiz is approved, it is converted into a real Quiz and its Question records.

Workflow

The learning loop after each run

The agent gets better because reviews and quiz performance are written back into memory, not because the model magically remembers.

01
01
Approval and rejection outcomes are captured explicitly
Each pending quiz ends up as approved, rejected, or still pending, and rejected quizzes carry a human-written reason.
This gives the system concrete product feedback instead of vague intuition.
02
02
Agent memory is rebuilt from recent runs and review outcomes
After each run and after each review action, I refresh a singleton memory document that summarizes topic performance, review insights, and recent run summaries.
The memory stores both what the agent tried and how humans responded to it.
03
03
Published agent quizzes create new attempt signals
Once an approved quiz is live, players interact with it. Those attempts become the next round of topic demand and performance data.
This closes the loop between generated content and real user behavior.
04
04
The planner reads that memory before choosing again
On the next cycle, the planner sees recent approvals, rejections, pending load, and topic performance before selecting new angles.
That is what makes the system iterative instead of stateless.

Why the data model mattered

I split the system into a few purpose-built collections so the pipeline stays auditable.

Collection	Why it exists	What it stores
`AgentRun`	Run-level observability	trigger, duration, planner action, topics selected, citations, skips, errors
`AgentMemory`	Durable learning between runs	topic performance, review insights, recent runs
`QuizPending`	Human review buffer	generated draft, citations, verification report, confidence, review status
`Quiz` + `Question`	Published learning content	approved quizzes and their questions
`Recommendation`	Recommendation surface for players	recommendation metadata and expiry windows

That last piece is worth calling out honestly: I already laid down the Recommendation model and the toast-based recommendation UI in the frontend, but the current runAgentCycle is still focused on generating and reviewing quizzes rather than automatically dispatching recommendations. The surface is there; the publishing loop came first.

What I would improve next

If I keep pushing Quizco’s agent further, the next upgrades are pretty clear.

automatic recommendation dispatch so approved agent content can reach the right users without manual glue
richer health analytics so the dashboard matches the ambition of the backend pipeline
stronger reviewer tooling around why a draft passed validation and which sources were used to verify it

The biggest takeaway from building this was simple: the useful part of an AI agent is not the model call. It is the loop around the model call.

Planning, guardrails, validation, memory, and review are what made Quizco’s agent feel dependable enough to ship.

How I Built Quizco's AI Quiz Agent

From platform signals to a published quiz

Read platform demand and queue pressure

Choose whether to generate at all

Generate strict JSON quiz drafts

Validate safety, structure, and factual quality

Block near-duplicate quizzes with embeddings

Store the result in a review buffer

Approve to publish, reject to teach the system

Planning before generation

Generation is only one stage of the pipeline

Human review was non-negotiable

The learning loop after each run

Approval and rejection outcomes are captured explicitly

Agent memory is rebuilt from recent runs and review outcomes

Published agent quizzes create new attempt signals

The planner reads that memory before choosing again

Why the data model mattered

What I would improve next