Wikipedia is giving AI developers its data to fend off bot scrapers

April 17, 2025
4,013 Views

Wikimedia says the dataset hosted by Kaggle has been “designed with machine learning workflows in mind,” making it easier for AI developers to access machine-readable article data for modeling, fine-tuning, benchmarking, alignment, and analysis. The content within the dataset is openly licensed, and as of April 15th, includes research summaries, short descriptions, image links, infobox data, and article sections — minus references or non-written elements like audio files.

“As the place the machine learning community comes for tools and tests, Kaggle is extremely excited to be the host for the Wikimedia Foundation’s data,” said Kaggle partnerships lead Brenda Flynn. “Kaggle is excited to play a role in keeping this data accessible, available, and useful.”

Source link

You may be interested

Meta’s reportedly shopping for exclusive content on its upcoming VR headset
Technology
shares2,609 views
Technology
shares2,609 views

Meta’s reportedly shopping for exclusive content on its upcoming VR headset

new admin - Jun 04, 2025

The Wall Street Journal reports Meta is in talks with Disney and A24 to try and secure exclusive streaming content…

Judge orders Trump administration to provide due process to some migrants deported to El Salvador
Business
shares2,646 views
Business
shares2,646 views

Judge orders Trump administration to provide due process to some migrants deported to El Salvador

new admin - Jun 04, 2025

[ad_1] A federal judge today ordered the Trump administration to provide hundreds of migrants sent to CECOT, a maximum-security terrorism…

Former MLB exec blasts Royals’ ‘irresponsible’ move with top prospect Jac Caglianone
Sports
shares2,032 views
Sports
shares2,032 views

Former MLB exec blasts Royals’ ‘irresponsible’ move with top prospect Jac Caglianone

new admin - Jun 04, 2025

[ad_1] NEWYou can now listen to Fox News articles! Kansas City was ecstatic when the Royals called up their top…