Skip to main content

Search

Items tagged with: aitraining


"On Saturday, Triplegangers CEO Oleksandr Tomchuk was alerted that his company’s e-commerce site was down. It looked to be some kind of distributed denial-of-service attack.

He soon discovered the culprit was a bot from OpenAI that was relentlessly attempting to scrape his entire, enormous site.

“We have over 65,000 products, each product has a page,” Tomchuk told TechCrunch. “Each page has at least three photos.”

OpenAI was sending “tens of thousands” of server requests trying to download all of it, hundreds of thousands of photos, along with their detailed descriptions.

“OpenAI used 600 IPs to scrape data, and we are still analyzing logs from last week, perhaps it’s way more,” he said of the IP addresses the bot used to attempt to consume his site.

“Their crawlers were crushing our site,” he said “It was basically a DDoS attack.”

Triplegangers’ website is its business. The seven-employee company has spent over a decade assembling what it calls the largest database of “human digital doubles” on the web, meaning 3D image files scanned from actual human models.

It sells the 3D object files, as well as photos — everything from hands to hair, skin, and full bodies — to 3D artists, video game makers, anyone who needs to digitally recreate authentic human characteristics."

techcrunch.com/2025/01/10/how-…

#CyberSecurity #AI #GenerativeAI #OpenAI #WebScraping #DDoS #AITraining


"In newly unredacted documents filed with the U.S. District Court for the Northern District of California late Wednesday, plaintiffs in Kadrey v. Meta, who include bestselling authors Sarah Silverman and Ta-Nehisi Coates, recount Meta’s testimony from late last year, during which it was revealed that Zuckerberg approved Meta’s use of a dataset called LibGen for Llama-related training.

LibGen, which describes itself as a “links aggregator,” provides access to copyrighted works from publishers including Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education. LibGen has been sued a number of times, ordered to shut down, and fined tens of millions of dollars for copyright infringement.

According to Meta’s testimony, as relayed by plaintiffs’ counsel, Zuckerberg cleared the use of LibGen to train at least one of Meta’s Llama models despite concerns within Meta’s AI exec team and others at the company. The filing quotes Meta employees as referring to LibGen as a “data set we know to be pirated,” and flagging that its use “may undermine [Meta’s] negotiating position with regulators.”"

techcrunch.com/2025/01/09/mark…

#AI #GenerativeAI #Meta #AITraining #LibGen #Copyright #IP #Piracy