How I Used AI to Generate, Moderate, and Scale an AI-Powered Quiz App

Meltem Seyhan
May 26
9 min read

As a solo founder building Quiz Inn, an AI-powered quiz app for curious minds and competitive spirits, I’ve relied heavily on artificial intelligence—not for gimmicks, but for core functionalities that scale, personalize, and enhance the user experience. From generating over 65,000 distinct questions to moderating more than 126,000 artworks for safety, AI enables me to build what would otherwise require a full team of editors…

Here’s a behind-the-scenes look at how AI powers Quiz Inn.

🎯 1. AI-Powered Question Generation with RAG

Since Quiz Inn is an AI-powered quiz app, every quiz in it starts with AI-generated content, tailored by category-specific prompts and enriched with Retrieval-Augmented Generation (RAG). I currently support two main categories:

Art History, which draws from 120,000+ artworks, 71 distinct styles, and over 3,000 artists
Football, featuring real-time data from 9 major leagues, covering clubs, games, players, and transfers—updated weekly

These include:

Süper Lig (TR1)
La Liga (ES1)
Premier League (GB1)
Bundesliga (L1)
Serie A (IT1)
Ligue 1 (FR1)
UEFA Champions League (CL)
UEFA Europa League (EL)
UEFA Europa Conference League (UCOL)

My Journey: From Naïve Prompts to Structured Generation

When I first started building Quiz Inn, I began with the Art History category. At that time, I naively assumed that AI alone—without any supporting data—could generate rich, diverse quiz questions. I quickly realized that wasn’t the case.

No matter how I phrased the prompt, the AI kept returning the same kind of questions—always about Starry Night or Mona Lisa. It felt like I was stuck in a loop of the most famous artworks, with little variety or depth.

That’s when I realized I needed more control.

I introduced RAG to inject specific knowledge into the process. Now, I control which artwork, style, or artist is used during question generation. This not only ensures factual accuracy but also allows for thousands of unique, lesser-known but equally important questions to be created.

This shift fundamentally changed the quality of the app. Today, users can explore everything from Romanesque architecture to Abstract Expressionism, not just the usual suspects.

🧠 2. Duplicate Detection via Embedding Similarity

To maintain a diverse quiz experience—especially in the Art History category, where many artworks share stylistic or thematic similarities—I use OpenAI’s embedding model to represent each question as a high-dimensional vector. Before accepting a new question into the system, I calculate its cosine similarity to all previously stored questions.

If the similarity is above a certain threshold, the question is discarded and regenerated. This ensures that the 4,500+ Art History questions remain meaningfully distinct, avoiding repetitive content.

My Journey: From Slow Queries to Scalable Vector Search

Initially, I stored the embeddings directly in a column of a relational database (RDBMS) and performed distance calculations in Python. While this approach worked when I had a few hundred questions, it quickly became unsustainable as the dataset grew into the thousands.

Similarity checks became painfully slow, and the system couldn’t keep up with the growing demand.

That’s when I turned to Vertex AI’s Vector Search on Google Cloud Platform. By moving the embeddings into a dedicated vector database, I achieved both scalability and speed. Now I can handle tens of thousands of similarity comparisons in real time, without compromising performance.

This upgrade was a turning point—it unlocked the ability to generate questions at scale while keeping the content fresh, unique, and relevant.

📊 3. Difficulty Calibration After Generation

Even when I request a specific difficulty level (e.g., easy, medium, hard) during question generation, AI doesn’t always interpret difficulty the way a human player would. To address this, I added a second layer of validation:

I provide AI with labeled example questions across each difficulty level
The model then re-evaluates newly generated questions and assigns a corrected difficulty label

This two-pass strategy helps maintain a balanced difficulty curve, ensuring players are neither overwhelmed nor bored as they progress through levels.

My Journey: Difficulty Isn’t Always What It Seems

When I first built the generation prompts, I assumed a simple instruction like “generate an easy Art History question” would be enough. I trusted the AI to understand what “easy” means for a casual quiz player.

It didn’t.

What I labeled as “easy” ended up being far too hard for most users. Questions that required deep knowledge of obscure artworks were appearing right at the start of the game—frustrating players and driving them away.

That’s when I realized that difficulty needed to be calibrated explicitly, not just assumed. By adding a second validation step and training the model with real examples from each difficulty level, I was finally able to tune the challenge to match user expectations.

This change significantly improved user retention and made the learning experience much more enjoyable.

🚫 4. Image Moderation for 126,900 Artifacts

Since many questions in the Art History category feature zoomable high-resolution images, safety is a top concern—especially when publishing on platforms like the Play Store or App Store. To ensure compliance, I use an AI-based moderation service that flags images containing:

Violence / Graphic content
Harassment or Threats
Hate speech
Sexual content / minors
Self-harm or suicide-related content

Any artifact flagged in these categories is automatically excluded from quiz generation. This allows Quiz Inn to remain a safe and appropriate app for all ages—even with 126,900 artworks under review.

My Journey: From Rejections to Resilience

When I submitted the first version of Quiz Inn to the Play Store, it was approved without issues. At that time, I had filled out the content questionnaire indicating that the app was suitable for all audiences—including children.

But things changed.

In later versions, I began receiving rejections. Some of the questions featured classical paintings—especially from the Baroque era, such as Caravaggio’s works—that the review team flagged for nudity or violence.

Caught off guard, I spent an entire weekend manually reviewing and removing thousands of questions. I even had to pause question generation entirely to meet the platform’s content standards and regain approval.

It was clear that I couldn’t continue this way.

That’s when I integrated OpenAI’s Moderation API. Instead of manually reviewing 126,900 artworks, I used AI to automatically scan and label each one. Now, only the pre-approved, safe images are used in question generation—and I can confidently scale the database without fear of future rejections.

Having a complete moderation map for all artworks also opens up new possibilities. In the future, I’ll be able to dynamically adjust content visibility by age group, showing broader content to adults while keeping a curated, kid-friendly experience for younger users. Moderation became not just a safety tool, but a foundation for smarter, more inclusive content delivery.

🌍 5. AI-Powered Multilingual Support

After questions are generated in English, I translate them into Spanish, German, Turkish, Italian, and French using OpenAI’s batch translation API. Each translation uses custom prompts tailored to the quiz category—ensuring football-specific terminology or art historical references are translated appropriately for each language.

This allows players across six languages to receive equally engaging and accurate questions, whether they’re playing on iOS, Android, or Web.

My Journey: From Costly Requests to Scalable Translations

In the early days, I handled translations using OpenAI’s standard chat completion API. It worked well—but it was too expensive and too slow for scaling to tens of thousands of questions.

Everything changed when OpenAI introduced its batch API.

After some experimentation, I migrated the translation pipeline to batch processing. Just last week, I added Italian and French as two new supported languages. After a few small UI updates, I translated all 66,525 existing questions using OpenAI batch requests—completing the full job within 24 hours.

Today, every newly generated question is automatically translated as part of its creation flow.

But there was one critical catch I discovered during this process: the batch API only supports up to 50,000 requests per job. So, I had to update my backend logic to automatically chunk large datasets into multiple batch files. It was a small but important adjustment that made the system much more reliable and future-proof.

This upgrade was a milestone—both in terms of localization and infrastructure maturity. Now, Quiz Inn can truly serve a global audience, at scale, without compromise.

🎨 6. Personalizing UI with Club Colors

Football fans love representing their favorite clubs—and in Quiz Inn, I wanted to bring that passion into the experience. To do that, I used AI to extract and store the official RGB colors of each club from visual materials like logos and banners.

These colors are then used to personalize the app UI, including the AppBar and key highlights, depending on the user’s selected team. Whether you’re a Galatasaray, Real Madrid, or Bayern Munich fan, the app visually aligns with your loyalty.

It’s a subtle but powerful way to enhance immersion and boost user retention—users feel more connected to the app when it reflects their identity.

My Journey: The AI Was Easy—Flutter, Not So Much

Surprisingly, the AI part of this feature was the easiest. Extracting dominant club colors and storing them as RGB values worked as expected.

The real challenge? Making it actually work in Flutter.

Updating the UI in real time to reflect the selected team’s color—especially after login changes or transitioning from anonymous to authenticated users—proved trickier than I anticipated. Refreshing the AppBar color in just the right moment without glitches was more complex than I thought.

But that’s a whole other story—one for a Flutter-focused post.

From an AI perspective, though, this was a clean win: a simple idea that adds a delightful touch for football lovers.

🏷️ 7. AI Labeling for Smart Metadata Tagging

Every question in Quiz Inn is automatically tagged with contextual metadata either during or after generation. For example, in the football category, these tags include:

Season (e.g., 2023–24)
Competition (e.g., Premier League, Champions League)
Match / Game
Player(s) and Club(s) involved

This metadata enables highly personalized gameplay. For instance, you can restrict a multiplayer quiz to only Champions League matches from the current season—or instantly generate a quiz centered around Messi’s career.

My Journey: From Blind Trust to Data-Driven Tagging

In the early days, I was only working with Art History, and things were simpler. Users weren’t asking for anything more than a difficulty setting—no filtering by period, style, or artist.

But as the app grew, so did user expectations.

I wanted users to select an art period or style (say, Impressionism or Baroque) and get questions directly tied to their choice. At first, I relied too much on the AI to self-categorize the questions after generation. I expected it to infer and tag each question with the correct period, style, and artist.

It didn’t work well. Misclassifications were common, and it was clear I needed a better strategy.

Things changed when I switched to RAG (Retrieval-Augmented Generation). Now, I control the context up front—feeding in the exact artifact or artist the question should be based on. As a result, metadata tagging happens automatically and accurately, with zero risk of mislabeling.

When I later introduced the Football category, I applied this learning from the start. Since football data is structured and fully under my control, I simply feed it into the AI and ask it to convert the information into natural-language questions. The tagging? Already embedded in the source.

So now, every question—whether it’s about Van Gogh or Vinícius Jr.—is born with rich, reliable metadata, making advanced filtering and custom quizzes fast and accurate.

✅ 8. Quality Validation: From AI Checks to User Feedback

Generating quiz questions with AI is powerful—but ensuring they’re high-quality, accurate, and well-formed is just as important. That’s why Quiz Inn includes a validation pipeline after each question is generated.

Initially, this process had two layers:

Rule-Based Validation: A lightweight, programmatic check to ensure that the question has:
- Exactly four answer options
- One correct answer, clearly marked
- An associated image (if applicable)
- Proper formatting and non-empty fields
AI-Based Validation: For deeper verification, I used a second call to a more advanced AI model, asking it to assess whether the question was valid, clear, and aligned with the intended difficulty and category.

My Journey: Why I Pulled Back from Over-Validation

While this two-step approach initially gave me confidence in the content quality, I soon noticed a problem: too many false positives.

The second (AI-based) validation step often flagged good, valid questions as problematic—especially in more nuanced categories like Art History. As a result, I was discarding high-quality content unnecessarily, reducing the diversity and richness of the question pool.

After analyzing the feedback and performance impact, I decided to disable the second validation step for now. Instead, I’ve leaned into a more community-driven approach.

🧩 Final Thoughts

Building a rich, scalable quiz app like Quiz Inn would typically require a full team of writers, translators, designers, moderators, and QA testers. Thanks to AI, I’ve been able to streamline all of this—automating question generation, moderation, translation, difficulty calibration, and metadata tagging—largely as a solo founder.

But this journey wasn’t just about applying the latest AI tools. It was about learning their limits, adapting workflows, and engineering practical solutions when things didn’t work as expected.

I assumed early on that AI could “just handle it”—from generating perfectly tagged questions to validating their quality. That assumption didn’t hold. Along the way, I had to:

Abandon expensive validation layers when false positives hurt more than they helped
Switch from naive prompting to RAG-based generation for reliability
Complement AI moderation with real-world App Store compliance experience
Build user feedback mechanisms to close the loop where AI still falls short

By combining factual datasets with AI’s generative power—and constantly iterating through real user feedback—I’ve built a platform where people can truly learn while playing, whether they’re zooming in on a surrealist masterpiece or challenging themselves with up-to-date football trivia.

And I’m just getting started.

🎮 Want to try it? Play instantly on Web or download Quiz Inn on iOS and Android—and start your journey, one question at a time.

📬 Have questions or want to hear more details? Feel free to reach out: meltem.seyhan@mlteam.ai