Wikipedia is giving AI developers its data to fend off bot scrapers
Wikimedia says the dataset hosted by Kaggle has been “designed with machine learning workflows in mind,” making it easier for AI developers to access machine-readable article data for modeling, fine-tuning, benchmarking, alignment, and analysis. The content within the dataset is openly licensed, and as of April 15th, includes research summaries, short descriptions, image links, infobox data, and article sections — minus references or non-written elements like audio files.
“As the place the machine learning community comes for tools and tests, Kaggle is extremely excited to be the host for the Wikimedia Foundation’s data,” said Kaggle partnerships lead Brenda Flynn. “Kaggle is excited to play a role in keeping this data accessible, available, and useful.”
You may be interested

97-year-old Holocaust survivor’s wish of attending college comes true
new admin - May 16, 2025Tuesday was a day to remember for a 97-year-old Holocaust survivor who got to live out her dream of going…

Nintendo details Switch 2 updates for Switch games
new admin - May 16, 2025Nintendo announced last month that it would be updating a handful of Switch games to run better on the Switch…

AI and Academic Integrity: What Institutions Can Do
new admin - May 16, 2025[ad_1] AI and Academic Integrity: What Institutions Can Do colleen.flaherty Fri, 05/16/2025 - 03:00 AM Three in four chief technology…