When Triplegangers’ website suddenly crashed one Saturday, CEO Oleksandr Tomchuk identified the reason as a result of an onslaught from OpenAI’s bot. This AI-infused mechanism was on a relentless spree, scraping the voluminous data hosted on the firm’s e-commerce platform.
The site, housing over 65,000 products complete with dedicated pages, images, and descriptions, was flooded with barrage of server requests initiated by the OpenAI bot. According to Tomchuk, the aggressive bot used around 600 IP addresses for its data-collecting spree. The fury of requests resembled a DDoS attack, that crippled the website’s operations.
An important pillar of Triplegangers’ business model, the site sells 3D image files and photos, derived from actual human models, to various digital artists, gaming developers, and more. Despite having usage terms that restrict scraping without permission, OpenAI’s bot violated the company’s domain through gaps in Triplegangers’ robot.txt files’ configuration.
OpenAI, like other companies, assumes freedom to scrape data if a site’s robot.txt does not strictly prohibit it within its do-not-crawl tags. Triplegangers was not only hit with a downtime in US business hours but was also anticipating a hefty AWS bill due to the bot’s aggressive activity.
Eventually, Triplegangers reinforced its robot.txt files, blocking notorious AI model companies’ bots. Despite this reactive measure, Tomchuk is yet to find out what data OpenAI may have successfully procured and used.
Having a detailed, tagged photo catalog, Triplegangers’ site is a prime target for AI crawlers. Triplegangers’ worries extend further due to its business nature, involving people’s images, thus making unauthorized use a legal jeopardy under laws like GDPR.
Tomchuk warns fellow online businesses about the threat of AI bots scraping their data, without explicit permissions, and the onus it puts on them to block these bots. The commonplace ignorance of such data breaches and the subsequent reactive management hints at the malpractice of these AI initiatives: acquire data first, ask permissions later.
Original source: Read the full article on TechCrunch