Insights by Infegy

Using YouTube Search For Custom Data Collection

We've written before about analyzing collected YouTube Comments. To learn how influencers can respond to crises using comment data, see our October 2024 Insight Brief. If you're interested in learning how to conduct audience analyses from collected YouTube Comments, check out our webinar on an influencer analysis of the 2024 US presidential election. The Starscape dashboard we discussed in this brief is available here.

Watch a short video describing the process here.

Targeted vs. Broad Data Collection When to Use Each for Maximum Insights!A Look At Collecting Targeted YouTube Comments

At Infegy, we're known for our 17+-year-old social listening dataset containing billions of posts. We're known in the industry for our broad collection methodology, meaning we collect as much data from as many authors as possible (think billions of posts, not millions). Our method suits our clients well—they value being able to search wide-ranging topics without limits or speed hiccups. 

However, we don't collect much data by default (most of it is for a good reason - think about customer service call transcripts or private social media posts). However, many clients still find insights from this data type highly valuable.

Targeted Data Collection

We know our pre-curated social dataset might not always cover every unique need—that’s just the nature of how it’s built. So, for those cases, we use automated targeted data collection. Here’s how it works: you define your search term, and we use social platforms’ Search APIs to pull exactly the posts and datasets you’re after. Then, using the same collection engine behind our Social Dataset, we gather those posts, analyze them, and upload everything as a custom dataset through the Infegy API. That way, you can combine this tailored data with our larger dataset to get exactly what you need.

image4

Figure 1: Infegy's General Query Structure vs. Targeted Collection

Sometimes, pursuing another approach—automated, targeted data collection- is just as valuable. With this method, you specify your query term ahead of time. Then, we fetch, upload, and process your searched data, allowing you to view the analytical results. Let's walk you through it!

The Data Pipeline

Let's walk you through a specific example of how this works in practice. Say you have a client interested in analyzing consumers' thoughts about Ikea's Kivik couch left on YouTube videos. However, the client hasn't specified the videos they want the comments on - they want a general flavor of conversation.

search_query = "Kivik Ikea"

search_limit = 15

Figure 2: Defining Search Query and Search Limit Parameters

First, we'll define our parameters. The search_query variable tells us precisely what we're sending to YouTube, while the search_limit tells us how many videos we want to collect comments from via the YouTube Search API. By default, our script collects every comment from every YouTube video on our list. The search_limit query parameter presents a tradeoff - a higher value means more comments, but it could take longer to collect.

image1

Figure 3: Video results from our "Kivik Ikea" Search

We'll download each comment for every video and collect the fields associated with each comment.

  • Video_Title - Title uploader gave to video
  • Video_Date - The date the user uploaded the video to YouTube
  • Video_URL - Unique video URL and Link
  • Comment_Timestamp - Date comment author left a comment
  • Comment_Author - Specific handle of the comment author
  • Comment_Text - text of the specific comment

We'll define these mappings in our script and then push them to Infegy's servers using our API.

Data-Driven Insights

Now that we've uploaded our data to our servers, we'll automatically transfer it to Infegy Starscape as a custom dataset.

image6

Figure 4: Infegy's Custom Dataset Availability

We then built a dashboard around this custom dataset. Note that most of the analytics you're used to for our social data analytics, like sentiment, post volume, theme detection, and AI summaries, work just as well for custom data.

image2

Figure 5: Word Cloud Showing Topics From Kivik Couch Videos (December 2015 through January 2024); Infegy Custom Dataset.

Our favorite tool for analyzing custom datasets is our AI Summarization widget available in the Infegy Starscape platform. Within seconds, this tool highlights key ideas from collections of thousands, millions, or even billions of underlying reviews.

image5

Figure 6: AI Summarization Tool Outlining What Users Thought About The Construction Process (December 2015 through January 2024); Infegy Custom Dataset.

image3

Figure 7: AI Summarization Tool Outlining What Users Thought About The Kivik's Comfort-Level and Quality (December 2015 through January 2024); Infegy Custom Dataset.

The quality of these summaries depends heavily on the quality of data you inject into Infegy AI. Because we're using a highly curated dataset, we get insights that would be highly valuable to someone on Ikea's marketing or product design team.

Tradeoffs Associated With This Approach

Like all data analysis, this custom data collection approach has tradeoffs. Let's briefly discuss the positive and negative tradeoffs of collecting only the data you want to analyze.

Advantages Of Targeted Collection

Highly Targeted Collection

This process relies on YouTube's internal search API. YouTube and other social platforms need highly accurate search results to preserve engagement metrics (time on site, video watch times, etc). We use this to our advantage by only collecting the results returned by those search algorithms, meaning you avoid much of the noise that usually comes with social data.  

Lack of Complex Query Construction

Social platforms design internal search tools to achieve the most desired results without requiring exhaustive queries. They also work where you don't need specific keyword matching to get relevant results. For example, our search returned content about Kivik couches without mentioning that.

Disadvantages Of Targeted Collection

Reliant on Social Platform's Search Algorithm

While you can get relevant results, this customized collection relies on the social platforms' internal search algorithm. As a result, you can get misleading, biased, or amplified results. (This happens all the time. Last month, viewers accused YouTube of deprioritizing Rogan's interview with Donald Trump. For another example, Instagram was caught on January 21, 2025 hiding the word “democrat” from search results.)

Data Collection Slows Insights

When relying on customized data collection, you must go through the steps of collecting the data you need. This is hard and can take a while. If you need insight speed and don't want to wait a long time for your dataset to be populated, we suggest going to Infegy's social dataset as a default.

Wrapping Up

Targeted customized data collection is a powerful approach for extracting specific, high-value insights, particularly when analyzing niche topics or specialized datasets, such as YouTube comments about a particular product. This method allows users to bypass the noise often present in broad social datasets, delivering more precise and relevant results. However, the tradeoffs include reliance on platform-specific algorithms, which can introduce bias or amplify certain perspectives, and the potential for slower data collection due to technical constraints or platform limitations. Despite these challenges, the ability to define search parameters and build custom datasets empowers brands and researchers to gain actionable insights tailored to their unique needs, making it an invaluable tool for targeted analysis when timing allows.

Key Takeaways

  • Precision in Insights: Targeted customized data collection eliminates noise and delivers highly relevant insights by leveraging platform-specific search algorithms, making it ideal for niche or specific topics like product-focused discussions.
  • Tradeoffs in Speed and Bias: While this method ensures accuracy, it depends on social platforms' search algorithms, potentially introducing bias or amplification, and may require additional time due to technical constraints during data collection.
  • Actionable and Tailored Analysis: Custom datasets allow for deep, specific analysis using tools like AI Summarization, providing brands and researchers with valuable, actionable insights that cater directly to their unique questions and objectives.