It’s well-known that most startups fail within a few years of founding. When reading through the startup’s post-mortems, the #1 reason for failure was “ran out of cash,” followed by “no market need.” However, given that “no market need” or the ongoing struggle to find a market need are why businesses ultimately run out of cash, it is not too much of a leap to identify this as the top reason for failure.
Personally, this is something that has never made sense to me. In a world in which so much information is floating around online, and people explicitly state their problems, why is it so hard to build something that people actually want? While this problem is multi-faceted and has many potential answers (e.g. people don’t know what they want, people may want something but be unwilling to pay, etc.), in this article, I walk through a solution I developed using Apify and ChatGPT to analyze user reviews and social media comments to identify gaps in the market.
This article is focused on B2C products (specifically, consumer packaged goods), but I intend to write a similar article focused on B2B. Anyone not interested in reading the full article can find a summary here.
Step 1: Identify the data to track
In building my solution, I focused on a unique niche: protein pancakes. My fiance has created an Instagram devoted to creating the perfect pancake recipe. I wanted to find insights that would help with this.

Based on the niche (B2C, consumer packaged goods), I decided three data sources would be especially useful for insights: YouTube comments, Amazon product reviews, and Walmart product reviews. Some honorable mentions, which I might add for future iterations of this product, include Reddit comments, TikTok comments, and Facebook Page reviews.
Step 2: Find the right tool for the job
I have tested a plethora of web scrapers, including Phantombuster, Browse AI, and Claygent (Clay’s AI web scraper). For this specific project, I went with Apify since it has the largest ecosystem of web scrapers, AI agents, and automation tools available. Filtering for scrapers that worked with ‘Reviews’ left me with hundreds of options.
Apify’s scrapers are known as “Actors”, and the specific Actors I chose for my job were the Youtube Comments Scraper, Amazon Reviews Scraper, and Walmart Reviews Scraper.
Step 3: Identify the content to scrape
Each of the Actors requires inputs to get started. For the specific Actors I chose, the inputs were as follows:
YouTube Comments Scraper: The URLs of the YouTube videos to scrape.
Amazon Reviews Scraper: The ASINs of the products to scrape.
Walmart Reviews Scraper: The URLs of the product pages to scrape.
For YouTube, I did a few searches on the platform, including “protein pancake reviews,” “best protein pancakes,” and “Kodiak protein pancake alternatives” and copied the URLs. I reasoned that videos that specifically focused on reviews and comparisons would attract a lot of comments with user preferences and critical feedback.
For Amazon, I searched for “protein pancakes,” identified some good products, and then found the ASIN (i.e. product ID) near the bottom of each page. For Walmart, the process was similar, except I identified relevant product URLs instead of ASINs.
Step 4: Scrape the content
Scraping the content was easy once I completed steps #1 - #3. Unsurprisingly, Amazon product reviews had the most data of the three items scraped, with some products having 1,000s of reviews. Given the sheer volume of data, I filtered Amazon product reviews by 1-3 stars since I assumed these would be the most critical and likely to point out gaps in the market. My video below should clear things up for anyone confused about what Apify does or what automated web scraping entails.
Step 5: Feed the data into ChatGPT
Most people are well-acquainted with ChatGPT but find the topic of AI agents abstract and confusing. As I have said previously, most use cases don’t require AI agents, and simply feeding the data into ChatGPT for processing is sufficient.
One scenario where manually feeding data into ChatGPT breaks down is when the ChatGPT context window is exceeded, and it begins to forget earlier parts of the conversation. However, since early 2024, ChatGPT has had a context window of 128,000 tokens, which is substantial. For non-technical readers, inputs into ChatGPT are broken down into tokens for processing, and one word is equivalent to about 1.3 tokens. This means a ChatGPT thread can process around 96,000 words or 192 pages of text (assuming 500 words per page) before it starts forgetting earlier parts of the conversation. Unless you build an enterprise-grade application with excessive amounts of data, you are unlikely to bump into limitations with the context window.
But, when I attempted to feed the data into ChatGPT by copying and pasting directly from my CSV file, I ran into repeated issues and was told:
“It looks like the OCR extraction didn’t yield any text. The image might be too low in quality or contain text that isn’t easily readable by the OCR tool.”
My solution was to copy and paste the text from my CSV file into a Google doc, download the doc as a PDF, then import the PDF into ChatGPT. This worked flawlessly. My prompt to ChatGPT was as follows (I also used a follow-up prompt, “Are there any other insights you can share?” ):
I input the scraped data from each source into its own thread to keep it separate. The three I had were “Nail It and Scale It - Amazon - Product Reviews,” “Nail It and Scale It - Walmart - Product Reviews,” and “Nail It and Scale It - YouTube - Comment Analysis.”
Step 6: Get a final spreadsheet and report
After prompting ChatGPT, I gave it a few minutes to process the data and generate its output.
By and large, the output was insightful, and my final step was to combine the data into a comprehensive report. Ultimately, I decided that it would be helpful to create a Google Sheet that logged the most relevant comments and a Google Doc that read more like a research report. I made each as follows:
Google Sheet
I used the following prompt, combined the output of each prompt into a single sheet, and added a column for which platform the comments came from. My end output is here.
Google Doc
I copied and pasted the outputs from the other threads into a single thread and then used the following prompt to analyze them. Finally, I transferred the response to a Google Doc here.
Some valuable pieces of feedback I received on issues with current protein pancake products include:
-The taste is too bland (adding cinnamon or vanilla is suggested).
-Artificial taste and smell.
-Texture is too dry, dense, or chalky; there’s a demand for a lighter, fluffier texture.
-The pancakes burn too quickly, stick to the pan, or have an inconsistent texture.
-Packaging issues, with packages breaking open during shipping.
-Digestive issues, including bloating and gas, were experienced after eating.
-Plant-based or vegan alternatives were requested.
-Desire for lower carb and higher protein options.
-Desire for recommended tweaks and customizations.
Conclusion
In conclusion, using AI to uncover market gaps can be a game changer for product development, especially when traditional methods fall short. You can efficiently identify pain points and areas of opportunity by tapping into real-time consumer feedback from platforms like YouTube, Amazon, and Walmart and processing it with tools like ChatGPT. While there’s no substitute for user interviews with prospective customers, leveraging web scraping and AI analysis can dramatically speed up the discovery phase and give you actionable insights that might otherwise go unnoticed.
If you liked this content, please click the <3 button on Substack so I know which content to double down on.
TLDR Summary
This article explores how to leverage AI, specifically ChatGPT, to analyze user reviews and social media comments for uncovering market gaps. By scraping data using Apify from platforms like YouTube, Amazon, and Walmart, and processing it with AI tools, you can identify unmet consumer needs and pain points, offering valuable insights for product development. The process helps speed up the discovery phase, providing actionable insights that may otherwise be missed.
Key Steps and Insights
Identify the Data to Track
Focused on the niche of protein pancakes, using YouTube comments, Amazon reviews, and Walmart reviews as data sources.
Potential future sources include Reddit, TikTok, and Facebook.
Choose the Right Tool for the Job
Used Apify for web scraping, selecting relevant “Actors” for YouTube comments, Amazon product reviews, and Walmart reviews.
Apify provided the most comprehensive tools for this task.
Identify the Content to Scrape
Gathered YouTube URLs focused on protein pancake reviews, along with Amazon and Walmart product URLs for scraping.
Prioritized products with detailed user feedback.
Scrape the Content
Scraping was straightforward, with Amazon reviews yielding the most data, especially from critical 1-3 star reviews.
Focused on extracting negative feedback for identifying potential market gaps.
Feed the Data into ChatGPT
Used ChatGPT to process the scraped data, overcoming limitations by using PDFs to avoid context window issues.
Managed separate threads for each data source to maintain clarity.
Generate a Final Spreadsheet and Report
Compiled insights into a Google Sheet and a Google Doc, offering both raw data and a comprehensive report.
Identified key pain points such as taste, texture, packaging issues, and a demand for plant-based options.
Conclusion
Using AI to analyze user feedback provides a powerful tool for uncovering market gaps that might be difficult to identify through traditional methods. By scraping data from platforms like YouTube, Amazon, and Walmart, and processing it with AI tools like ChatGPT, product developers can quickly gain valuable insights to inform their next steps in creating products that meet real consumer needs.
This is incredibly useful!! Shortcuts to perfecting a product! Reviews and customer perspective is really key.