Wikipedia is giving AI developers its data to fend off bot scrapers

April 17, 2025
4,009 Views

Wikimedia says the dataset hosted by Kaggle has been “designed with machine learning workflows in mind,” making it easier for AI developers to access machine-readable article data for modeling, fine-tuning, benchmarking, alignment, and analysis. The content within the dataset is openly licensed, and as of April 15th, includes research summaries, short descriptions, image links, infobox data, and article sections — minus references or non-written elements like audio files.

“As the place the machine learning community comes for tools and tests, Kaggle is extremely excited to be the host for the Wikimedia Foundation’s data,” said Kaggle partnerships lead Brenda Flynn. “Kaggle is excited to play a role in keeping this data accessible, available, and useful.”

Source link

You may be interested

97-year-old Holocaust survivor’s wish of attending college comes true
Top Stories
shares3,446 views
Top Stories
shares3,446 views

97-year-old Holocaust survivor’s wish of attending college comes true

new admin - May 16, 2025

Tuesday was a day to remember for a 97-year-old Holocaust survivor who got to live out her dream of going…

Nintendo details Switch 2 updates for Switch games
Technology
shares3,898 views
Technology
shares3,898 views

Nintendo details Switch 2 updates for Switch games

new admin - May 16, 2025

Nintendo announced last month that it would be updating a handful of Switch games to run better on the Switch…

AI and Academic Integrity: What Institutions Can Do
Education
shares3,986 views
Education
shares3,986 views

AI and Academic Integrity: What Institutions Can Do

new admin - May 16, 2025

[ad_1] AI and Academic Integrity: What Institutions Can Do colleen.flaherty Fri, 05/16/2025 - 03:00 AM Three in four chief technology…