How OpenAI’s bot crushed this seven-person company’s website ‘like a DDoS attack’

How OpenAI’s bot crushed this seven-person company’s website ‘like a DDoS attack’

Triplegangers Face OpenAI Bot’s Scrapping Attack

When Triplegangers CEO Oleksandr Tomchuk discovered that his company’s website was under a distributed denial-of-service (DDoS) attack, it was OpenAI’s bot he found to blame. The bot’s relentless scraping attempts brought the e-commerce platform’s services to a halt.

Triplegangers’ Digital Asset Database – A Bounty for AI Crawlers

Our digital assets, more than 65,000 products comprising 3D scans of human models, photos, and detailed descriptions, were a tempting boon for AI crawlers. This extensive database, intended for customers in 3D arts and gaming, turned into a massive liability as OpenAI’s bot employed 600 IP addresses for intensive data scraping.

Controlling OpenAI Bots – A Challenging Task

Despite clear disapproval of bot scrapings in our Website’s terms of service, controlling OpenAI’s GPTBot required a carefully configured robot.txt file. While OpenAI’s bots respect this Robots Exclusion Protocol, the system isn’t foolproof, and compliance is voluntary for AI agents.

Safeguarding Website from AI Attacks

An updated robot.txt file and a Cloudflare account successfully blocked scrapings from the GPTBot and other bots like Barkrowler (SEO crawler) and Bytespider (ToTok’s crawler). However, it’s still unknown to Triplegangers what amount of content OpenAI pilfered.

Warning for Other Small Online Businesses

This episode elucidates the criticality of active monitoring against AI bot scrapings as any negligence can lead to distressing consequences. And just as Triplegangers learned, most websites remain oblivious to scraper bots until they notice an escalation in invalid traffic, brought to light by DoubleVerify’s research highlighting an 86% spike in bogus traffic from AI bots in 2024.

Finally, the OpenAI bot episode emphasizes the need for AI companies to seek permission before collating data, and the necessity for businesses to adapt adequate protective tools to safeguard their assets.

Fonte original: Leia a matéria completa no TechCrunch