We've written before about analyzing collected YouTube Comments. To learn how influencers can respond to crises using comment data, see our October 2024 Insight Brief. If you're interested in learning how to conduct audience analyses from collected YouTube Comments, check out our webinar on an influencer analysis of the 2024 US presidential election. The Starscape dashboard we discussed in this brief is available here.
Watch a short video describing the process here.
At Infegy, we're known for our 17+-year-old social listening dataset containing billions of posts. We're known in the industry for our broad collection methodology, meaning we collect as much data from as many authors as possible (think billions of posts, not millions). Our method suits our clients well—they value being able to search wide-ranging topics without limits or speed hiccups.
However, we don't collect much data by default (most of it is for a good reason - think about customer service call transcripts or private social media posts). However, many clients still find insights from this data type highly valuable.
We know our pre-curated social dataset might not always cover every unique need—that’s just the nature of how it’s built. So, for those cases, we use automated targeted data collection. Here’s how it works: you define your search term, and we use social platforms’ Search APIs to pull exactly the posts and datasets you’re after. Then, using the same collection engine behind our Social Dataset, we gather those posts, analyze them, and upload everything as a custom dataset through the Infegy API. That way, you can combine this tailored data with our larger dataset to get exactly what you need.
Figure 1: Infegy's General Query Structure vs. Targeted Collection
Sometimes, pursuing another approach—automated, targeted data collection- is just as valuable. With this method, you specify your query term ahead of time. Then, we fetch, upload, and process your searched data, allowing you to view the analytical results. Let's walk you through it!
Let's walk you through a specific example of how this works in practice. Say you have a client interested in analyzing consumers' thoughts about Ikea's Kivik couch left on YouTube videos. However, the client hasn't specified the videos they want the comments on - they want a general flavor of conversation.
search_query = "Kivik Ikea"
search_limit = 15
Figure 2: Defining Search Query and Search Limit Parameters
First, we'll define our parameters. The search_query variable tells us precisely what we're sending to YouTube, while the search_limit tells us how many videos we want to collect comments from via the YouTube Search API. By default, our script collects every comment from every YouTube video on our list. The search_limit query parameter presents a tradeoff - a higher value means more comments, but it could take longer to collect.
Figure 3: Video results from our "Kivik Ikea" Search
We'll download each comment for every video and collect the fields associated with each comment.
We'll define these mappings in our script and then push them to Infegy's servers using our API.
Now that we've uploaded our data to our servers, we'll automatically transfer it to Infegy Starscape as a custom dataset.
Figure 4: Infegy's Custom Dataset Availability
We then built a dashboard around this custom dataset. Note that most of the analytics you're used to for our social data analytics, like sentiment, post volume, theme detection, and AI summaries, work just as well for custom data.
Figure 5: Word Cloud Showing Topics From Kivik Couch Videos (December 2015 through January 2024); Infegy Custom Dataset.
Our favorite tool for analyzing custom datasets is our AI Summarization widget available in the Infegy Starscape platform. Within seconds, this tool highlights key ideas from collections of thousands, millions, or even billions of underlying reviews.
Figure 6: AI Summarization Tool Outlining What Users Thought About The Construction Process (December 2015 through January 2024); Infegy Custom Dataset.
Figure 7: AI Summarization Tool Outlining What Users Thought About The Kivik's Comfort-Level and Quality (December 2015 through January 2024); Infegy Custom Dataset.
The quality of these summaries depends heavily on the quality of data you inject into Infegy AI. Because we're using a highly curated dataset, we get insights that would be highly valuable to someone on Ikea's marketing or product design team.
Like all data analysis, this custom data collection approach has tradeoffs. Let's briefly discuss the positive and negative tradeoffs of collecting only the data you want to analyze.
This process relies on YouTube's internal search API. YouTube and other social platforms need highly accurate search results to preserve engagement metrics (time on site, video watch times, etc). We use this to our advantage by only collecting the results returned by those search algorithms, meaning you avoid much of the noise that usually comes with social data.
Social platforms design internal search tools to achieve the most desired results without requiring exhaustive queries. They also work where you don't need specific keyword matching to get relevant results. For example, our search returned content about Kivik couches without mentioning that.
While you can get relevant results, this customized collection relies on the social platforms' internal search algorithm. As a result, you can get misleading, biased, or amplified results. (This happens all the time. Last month, viewers accused YouTube of deprioritizing Rogan's interview with Donald Trump. For another example, Instagram was caught on January 21, 2025 hiding the word “democrat” from search results.)
When relying on customized data collection, you must go through the steps of collecting the data you need. This is hard and can take a while. If you need insight speed and don't want to wait a long time for your dataset to be populated, we suggest going to Infegy's social dataset as a default.
Targeted customized data collection is a powerful approach for extracting specific, high-value insights, particularly when analyzing niche topics or specialized datasets, such as YouTube comments about a particular product. This method allows users to bypass the noise often present in broad social datasets, delivering more precise and relevant results. However, the tradeoffs include reliance on platform-specific algorithms, which can introduce bias or amplify certain perspectives, and the potential for slower data collection due to technical constraints or platform limitations. Despite these challenges, the ability to define search parameters and build custom datasets empowers brands and researchers to gain actionable insights tailored to their unique needs, making it an invaluable tool for targeted analysis when timing allows.