Websites accuse AI startup Anthropic of breaking anti-scratch rules and protocols

[ad_1]

Freelancer has accused Anthropic, the AI startup behind Claude's massive language models, of ignoring the robots.txt “no-page” protocol to crawl data for its websites. Meanwhile, iFixit CEO Kyle Wiens said Anthropic ignored the website's policy that prohibits the use of its content for training an AI model. Matt Barrie, chief executive of Freelancer, said Information that Anthropic's ClaudeBot is “the most aggressive scraper to date.” His website is said to have been visited by 3.5 million company visitors in a four-hour period, “which is about five times the capacity of the second figure” of AI. Similarly, Wiens posted on X/Twitter that Anthropic's bot hit iFixit's servers a million times in 24 hours. “You're not just taking our content without paying, you're tying up our devops resources,” he wrote.

Back in June, A suspect with strings attached another AI company, Perplexity, for crawling its website despite the existence of the Robots Exclusion Protocol, or robots.txt. The robots.txt file usually contains instructions for web browsers as to which pages they can and cannot access. Although compliance is voluntary, it is often simply ignored by malicious bots. After Wired's piece came out, a startup called TollBit that connects AI firms with content publishers reported that it's not just Perplexity that's passing robots.txt signals. Although it does not say the words, Business Insider said it learned that OpenAI and Anthropic were ignoring the protocol.

Barrie said Freelancer tried to deny the bot's access requests at first, but had to block the Anthropic browser entirely. “This is an amazing stab [which] it makes the site slower for everyone working on it and ultimately affects our revenue,” he added. About iFixit, Wiens said the website set off high traffic alarms, and his people woke up at 3AM because of Anthropic's work. The company's search engine stopped scraping iFixit after it added a line to its robots.txt file that disallowed Anthropic's bot, specifically.

The AI implementation told Information that it respects robots.txt and that its browser “respected that signal when iFixit used it.” It also said it aims to be “less disruptive considering how fast it is [it crawls] same domains,” that is why we are now investigating the case.

AI firms use search engines to gather content from websites that they can use to train their productivity AI technology. They have been the target of many lawsuits as a result, with publishers accusing them of copyright infringement. To prevent more lawsuits from being filed, companies like OpenAI have been striking deals with publishers and websites. OpenAI's content partners, so far, include News Corp, Vox Media, Financial Times and Reddit. iFixit's Wiens seems open to the idea of signing a deal for articles on the fix-it website, too, telling Anthropic in a tweet that it's willing to have a conversation about licensing content for commercial use.

If any of those requests reached our terms of service, we would tell you that use of our content is expressly prohibited. But don't ask me, ask Claude!

If you want to have a chat about licensing our content for commercial use, we're here. pic.twitter.com/CAkOQDnLjD

— Kyle Wiens (@kwiens) July 24, 2024