A Business Lens on Precision and Recall | by Matt Sosna | Dec, 2023


Social media spam as a case study

Matt Sosna
Towards Data Science
Photo by Nong on Unsplash

Disclaimer: the examples in this post are for illustrative purposes and are not commentary on any specific content policy at any specific company. All views expressed in this article are mine and do not reflect my employer.

Why is there any spam on social media? No one aside from the spammers themselves enjoys clickbait scams or phishing attempts. We have decades of training data to feed machine learning classifiers. So why does spam on every major tech platform feel inevitable? After all these years, why do bot farms still exist?

Image by author

The answer, in short, is that it is really hard to fight spam at scale, and exponentially harder to do so without harming genuine users and advertisers. In this post, we’ll use precision and recall as a framework for understanding the spam problem. We’ll see that eradicating 100% of spam is impractical, and that there is some “equilibrium” spam prevalence based on finance, regulations, and user sentiment.

Photo by Joseph Barrientos on Unsplash

Imagine we’re launching a competitor to TikTok and Instagram. (Forget that they have 1.1 billion and 2 billion monthly active users, respectively; we’re feeling ambitious!) Our key differentiator in this tight market is that we guarantee users will have only the highest quality of videos: absolutely no “get rich quick” schemes, blatant reposts of existing content, URLs that infect your computer with malware, etc.

Attempt 1: Human Review

To achieve this quality guarantee, we’ve hired a staggering 1,000 reviewers to audit every upload before it’s allowed on the platform. Some things just need a human touch, we argue: video spam is too complex and context-dependent to rely on automated logic. A video that urges users to click on a URL could be a malicious phishing attempt or a benign fundraiser for Alzheimer’s research, for example — the stakes are too high to…



Source link

This post originally appeared on TechToday.