Home Internet YouTube’s recommender AI still a horrorshow, finds major crowdsourced study – TechCrunch

YouTube’s recommender AI still a horrorshow, finds major crowdsourced study – TechCrunch


For years YouTube’s video-recommending algorithm has stood accused of fuelling a grab-bag of societal ills by feeding users an AI-amplified diet of hate speech, political extremism, and conspiracy junk/disinformation for the profiteering motive of trying to keep billions of eyeballs stuck to its ad inventory.

And while YouTube’s tech giant parent Google has, sporadically, responded to negative publicity flaring up around the algorithm’s antisocial recommendations — announcing a few policy tweaks or limiting/purging the odd hateful account — it’s not clear how far the platform’s penchant for promoting unhealthy clickbait has been rebooted.

The suspicion remains nowhere near far enough. New research published today by Mozilla backs that notion up, suggesting YouTube’s A.I. continues to puff up piles of ‘bottom-feeding’/low-grade/divisive/disinforming content — stuff that tries to grab eyeballs by triggering people’s sense of outrage, sewing division/polarization, or spreading baseless/harmful disinformation — which in turn implies that YouTube’s problem with recommending terrible stuff is indeed systemic; a side-effect of the platform’s rapacious appetite to harvest views to serve ads.

That YouTube’s A.I. is still — per Mozilla’s study — misbehaving also suggests Google has been pretty successful at fuzzing criticism with superficial claims of reform. The mainstay of its deflective success here is likely the primary protection mechanism of keeping the recommender engine’s algorithmic workings (and associated data) hidden from public view and external oversight — via the convenient shield of ‘commercial secrecy.


But regulation that could help crack open proprietary A.I. black boxes is now on the cards — at least in Europe. To fix YouTube’s algorithm, Mozilla is calling for “common-sense transparency laws, better oversight, and consumer pressure” — suggesting a combination of laws that mandate transparency into A.I. systems, protect independent researchers so they can interrogate algorithmic impacts, and empower platform users with robust controls (such as the ability to opt-out of “personalized” recommendations) are what’s needed to rein in the worst excesses of the YouTube A.I.

Regrets, YouTube users have had a few…

To gather data on specific recommendations to YouTube users — information that Google does not routinely make available to external researchers — Mozilla took a crowdsourced approach via a browser extension (RegretsReporter) that lets users self-report YouTube videos “regret” watching. The tool can generate a report that includes details of the videos the user had been recommended and earlier video views to help build a picture of how YouTube’s recommender system was functioning. (Or, well, ‘dysfunctioning’ as the case may be.)

The crowdsourced volunteers whose data fed Mozilla’s research reported a wide variety of ‘regrets’, including videos spreading COVID-19 fear-mongering, political misinformation, and “wildly inappropriate” children’s cartoons, per the report — with the most frequently reported content categories being misinformation, violent/graphic content, hate speech, and spam/scams.

A substantial majority (71%) of the regret reports came from videos recommended by YouTube’s algorithm, underscoring the A.I.’s starring role in pushing junk into people’s eyeballs. The research also found that recommended videos were 40% more likely to be reported by the volunteers than videos they’d searched for themselves.

Mozilla even found “several” instances when the recommender algorithmic put content in front of users that violated YouTube’s community guidelines and was unrelated to the previous video watched. Such a clear failure.

A very notable finding was that regrettable content appears to be a more significant problem for YouTube users in non-English speaking countries: Mozilla found YouTube regrets were 60% higher in countries without English as a primary language — with Brazil, Germany, and France generating what the report said were “particularly high” levels of regretful YouTubing. (None of the three can be classed as minor international markets.)

According to the report, pandemic-related regrets were also significantly prevalent in non-English speaking countries — a worrying detail to read during an ongoing global health crisis. The crowdsourced study — which Mozilla bills as the largest-ever into YouTube’s recommender algorithm — drew on data from more than 37,000 YouTube users who installed the extension, although it was a subset of 1,162 volunteers — from 91 countries — who submitted reports that flagged 3,362 regrettable videos which the report draws on directly.

These reports were generated between July 2020 and May 2021. What exactly does Mozilla mean by a YouTube “regret”? It says this is a crowdsourced concept based on users’ self-reporting bad experiences on YouTube, so it’s subjective. But Mozilla argues that taking this “people-powered” approach centers the lived experiences of Internet users and is therefore helpful in foregrounding the experiences of marginalized and vulnerable people and communities (vs., for example, applying only a narrower, legal definition of ‘harm’).

“We wanted to interrogate and explore further [people’s experiences of falling down the YouTube’ rabbit hole’] and frankly confirm some of these stories — but then also understand further what are some of the trends that emerged in that,” explained Brandi Geurkink, Mozilla’s senior manager of advocacy and the lead researcher for the project, discussing the aims of the research.

“My main feeling in doing this work was being — I guess — shocked that some of what we had expected to be the case was confirmed… It’s still a limited study in terms of the number of people involved and the methodology we used, but — even with that — it was pretty simple; the data just showed that some of what we thought was confirmed.

“Things like the algorithm recommending content essentially accidentally, that it later is like ‘oops, this violates our policies; we shouldn’t have actively suggested that to people’… And things like the non-English-speaking user base having worse experiences are things you hear discussed anecdotally, and activists have raised these issues. But I was just like — oh wow, it’s coming out clearly in our data.”

Mozilla says the crowdsourced research uncovered “numerous examples” of reported content that would likely breach YouTube’s community guidelines — such as hate speech or debunked political and scientific misinformation.

But it also says the reports flagged much of what YouTube “may” consider ‘borderline content’. Aka, stuff that’s harder to categorize — junk/low-quality videos that perhaps toe the acceptability line and may therefore be trickier for the platform’s algorithmic moderation systems to respond to (and thus content that may also survive the risk of a takedown for longer).

However, a related issue the report flags is that YouTube doesn’t define marginal content — despite discussing the category in its guidelines — hence, says Mozilla, that makes the researchers’ assumption that much of what the volunteers were reporting as ‘regretful’ would likely fall into YouTube’s own ‘borderline content’ category impossible to verify.

The challenge of independently studying the societal effects of Google’s tech and processes is a running theme underlying the research. But Mozilla’s report also accuses the tech giant of meeting YouTube criticism with “inertia and opacity”.

It’s not alone there, either. Critics have long accused YouTube’s ad giant parent of profiting off of engagement generated by hateful outrage and harmful disinformation — allowing “AI-generated bubbles of hate” to surface ever more baleful (and thus stickily engaging) stuff, exposing unsuspecting YouTube users to increasingly unpleasant and extremist views, even as Google gets to shield its low-grade content business under a user-generated content umbrella.

Indeed, ‘falling down the YouTube rabbit hole’ has become a well-trodden metaphor for discussing the process of unsuspecting Internet users being dragged into the darkest and nastiest corners of the web. This user reprogramming occurs in broad daylight via AI-generated suggestions that call people to follow the conspiracy breadcrumb trail right from inside a mainstream web platform.

Back in 2017 — when concern was riding high about online terrorism and the proliferation of ISIS content on social media — politicians in Europe accused YouTube’s algorithm of precisely this: Automating radicalization.

However, getting complex data to back up anecdotal reports of individual YouTube users being ‘radicalized’ after viewing hours of extremist content or conspiracy theory junk on Google’s platform has remained difficult.

Ex-YouTube insider — Guillaume Chaslot — is one notable critic who’s sought to pull back the curtain, shielding the proprietary tech from deeper scrutiny via his transparency project. Mozilla’s crowdsourced research adds to those efforts by sketching a broad — and broadly problematic — picture of YouTube A.I. by collating reports of bad experiences from users themselves.

Of course, externally sampling platform-level data that only Google holds in full (at its proper depth and dimension) can’t be the whole picture — and self-reporting, in particular, may introduce its own biases into Mozilla’s data set. But the problem of effectively studying big tech’s black boxes is a crucial point accompanying the research, as Mozilla advocates for proper oversight of platform power.

In a series of recommendations, the report calls for “robust transparency, scrutiny, and giving people control of recommendation algorithms,” — arguing that without proper oversight of the platform, YouTube will continue to be harmful by mindlessly exposing people to damaging and brain-dead content.

The problematic lack of transparency around how YouTube functions can be picked up from other details in the report. For example, Mozilla found that around 9% of recommended regrets (or almost 200 videos) had since been taken down — for various unclear reasons (sometimes, presumably, after the content was reported and judged by YouTube to have violated its guidelines).

Collectively, just this subset of videos had 160M views before being removed for whatever reason. In other findings, the research found that regretful ideas tend to perform well on the platform. A particular stark metric is that reported regrets acquired complete 70% more views per day than other videos watched by the volunteers on the platform — lending weight to the argument that YouTube’s engagement-optimizing algorithms disproportionately select for triggering/misinforming content more often than quality (thoughtful/informing) stuff simply because it brings in the clicks.

While that might be great for Google’s ad business, it’s a net negative for democratic societies which value truthful information over nonsense, genuine public debate over artificial/amplified binaries, and constructive civic cohesion over divisive tribalism.

But without legally-enforced transparency requirements on ad platforms — and, most likely, regulatory oversight and enforcement that features audit powers — these tech giants will continue to be incentivized to turn a blind eye and cash in at society’s expense.

Mozilla’s report also underlines instances where YouTube’s algorithms are driven by a logic that’s unrelated to the content itself — with a finding that in 43.6% of the cases where the researchers had data about the videos a participant had watched before a reported regret, the recommendation was utterly unrelated to the previous video.

The report gives examples of some of these logic-defying A.I. content pivots/leaps/pitfalls — such as a person watching videos about the U.S. military and then being recommended a misogynistic video entitled ‘Man humiliates feminist in a viral video.’

In another instance, a person watched a video about software rights and was then recommended a video about gun rights. So two requests make yet another wrong YouTube recommendation right there.

In a third example, a person watched an Art Garfunkel music video and was then recommended a political video entitled ‘Trump Debate Moderator EXPOSED as having Deep Democrat Ties, Media Bias Reaches BREAKING Point.’

To which the only sane response is, umm, what???

YouTube’s output in such instances seems — at best — some ‘A.I. brain fart’. A generous interpretation might be that the algorithm got stupidly confused. In a number of the examples cited in the report, the confusion is leading YouTube users toward content with a right-leaning political bias, which seems, well, curious.

Asked what she views as the most concerning findings, Mozilla’s Geurkink told TechCrunch: “One is how misinformation emerged as a dominant problem on the platform. I think that’s something, based on our work talking to Mozilla supporters and people from all around the world, that is an undeniable thing that people are concerned about online. So to see that that is emerging as the biggest problem with the YouTube algorithm is concerning to me.”

She also highlighted the problem of the worse recommendations for non-English-speaking users as another significant concern, suggesting that global inequalities in users’ experiences of platform impacts “don’t get enough attention” — even when such issues get discussed. Responding to Mozilla’s report in a statement, a Google spokesperson sent us this statement:

“The goal of our recommendation system is to connect viewers with the content they love, and on any given day, more than 200 million videos are recommended on the homepage alone. Over 80 billion pieces of information are used to help inform our systems, including survey responses from viewers on what they want to watch. We constantly work to improve the experience on YouTube, and over the past year alone, we’ve launched over 30 different changes to reduce recommendations of harmful content. Thanks to this change, consumption of marginal content from our recommendations is now significantly below 1%.”

Google also claimed it welcomes research into YouTube — and suggested exploring options to bring in external researchers to study the platform without offering anything concrete. At the same time, its response queried how Mozilla’s study defined ‘regrettable’ content and claimed that its user surveys generally show users are satisfied with the content that YouTube recommends.

In further non-quotable remarks, Google noted that earlier this year, it started disclosing a ‘violative view rate ‘(VVR) metric for YouTube — disclosing for the first time the percentage of views on YouTube that comes from content that violates its policies.

The most recent VVR stands at 0.16-0.18% — Google says that out of every 10,000 views on YouTube, 16-18 come from violative content. It said that figure is down by more than 70% compared to the same quarter of 2017 — crediting its investments in machine learning as mainly responsible for the drop.

However, as Geurkink noted, the VVR is of limited use without Google releasing more data to contextualize and quantify how far its A.I. was involved in accelerating views of content its rules state shouldn’t be viewed on its platform. Without that critical data, the suspicion must be that the VVR is a nice bit of misdirection.

“What would be going further than [VVR] — and what would be helpful — in understanding the role of the recommendation algorithm?” Geurkink told us about that, adding: “That’s what is a complete black box still. Without greater transparency, [Google’s] claims of progress have to be taken with a grain of salt.”

Google also flagged a 2019 change it made to how YouTube’s recommender algorithm handles ‘borderline content’ — aka, content that doesn’t violate policies but falls into a problematic grey area — saying that that tweak had also resulted in a 70% drop in watch time for this type of content.

Although the company confirmed this borderline category is a moveable feast — saying it factors in changing trends and context and works with experts to determine what’s get classed as marginal — which makes the percentage mentioned above drop pretty meaningless since there’s no fixed baseline to measure against.

Notably, Google’s response to Mozilla’s report does not mention the poor experience reported by survey participants in non-English-speaking markets. And Geurkink suggested that, in general, many of the claimed mitigating measures YouTube applies are geographically limited — i.e., to English-speaking markets like the U.S. and U.K. (Or at least arrive in those markets first, before a slower rollout to other places.) 

A January 2019 tweak to reduce the amplification of conspiracy theory content in the U.S. was only expanded to the U.K. market months later — in August. “YouTube, for the past few years, have only been reporting on their progress of harmful or borderline content recommendations in the U.S. and English-speaking markets,” she said. “And very few people are questioning that — what about the rest of the world? To me, that is something that deserves more attention and more scrutiny.”

We asked Google to confirm whether it had since applied the 2019 conspiracy theory-related changes globally — and a spokeswoman told us it had. But the much higher rate of reports made to Mozilla of — a broader measure of — ‘regrettable’ content being created in non-English-speaking markets remains notable.

And while there could be other factors at play, which might explain some of the disproportionately higher reporting, the finding may also suggest that, where YouTube’s negative impacts are concerned, Google directs most excellent resources at markets and languages where its reputational risk and the capacity of its machine learning tech to automate content categorization are most substantial.

Yet any such unequal response to A.I. risk leaves some users at greater risk of harm than others — adding another harmful dimension and layer of unfairness to a multi-faceted, many-headed-hydra of a problem.

It’s another reason to leave it up to powerful platforms to rate their A.I.s, mark their homework, and counter genuine concerns with self-serving P.R. is for the birds.

(In additional filler background remarks it sent us, Google described itself as the first company in the industry to incorporate “authoritativeness” into its search and discovery algorithms — without explaining when exactly it claims to have done that or how it imagined it would be able to deliver on its stated mission of ‘organizing the world’s information and making it universally accessible and useful’ without considering the relative value of information sources… So color us baffled at that claim. Most likely, it’s a clumsy attempt to throw disinformation shade at rivals.)

Returning to the regulation point, an E.U. proposal — the Digital Services Act — is set to introduce transparency requirements on large digital platforms as part of a more comprehensive package of accountability measures. When asked about this, Geurkink described the DSA as “a promising avenue for greater transparency”. But she suggested the legislation needs to go further to tackle recommender systems like the YouTube A.I.

“I think that transparency around recommender systems specifically and also people having control over the input of their data and then the output of recommendations is significant — and is a place where the DSA is currently a bit sparse, so I think that’s where we need to dig in,” she told us.

One idea she voiced support for is having a “data access framework” baked into the law — to enable vetted researchers to get more of the information they need to study powerful A.I. technologies — i.e., rather than the law trying to come up with “a laundry list of all of the different pieces of transparency and information that should be applicable”, as she put it.

The E.U. also now has a draft A.I. regulation on the table. The legislative plan takes a risk-based approach to regulate specific artificial intelligence applications. However, it’s unclear whether YouTube’s recommender system would fall under one of the more closely held categories — or, as seems more likely (at least with the initial Commission proposal), fall entirely outside the scope of the planned law.

“An earlier draft of the proposal talked about systems that manipulate human behavior, essentially what recommender systems are. And one could also argue that’s the goal of advertising at large, in some sense. So it was difficult to understand exactly where recommender systems would fall into that,” noted Geurkink.

“There might be a nice harmony between some of the robust data access provisions in the DSA and the new A.I. regulation,” she added. “I think transparency is what it comes down to, so anything that can provide greater transparency is good.

“YouTube could also just provide a lot of this… We’ve been working on this for years now, and we haven’t seen them take any meaningful action on this front, but it’s also, I think, something that we want to keep in mind — legislation can take years. So even if a few of our recommendations were taken up [by Google], that would be a huge step in the right direction.”