DHCenter - logo
arXiv is owned and operated by Cornell University, a private not-for-profit educational institution / Public domain

On August 4th, data science community Kaggle announced its presentation of a free, open pipeline to the machine-readable dataset of the open-access repository, arXiv.

“Having the entire arXiv corpus on Kaggle grows the potential of arXiv articles immensely,” said Eleonora Presani, arXiv Executive Director in Kaggle’s Medium article. “By offering the dataset on Kaggle we go beyond what humans can learn by reading all these articles and we make the data and information behind arXiv available to the public in a machine-readable format.”

Kaggle said its hope was to “empower new use cases that can lead to the exploration of richer machine learning techniques that combine multi-modal features towards applications like trend analysis, paper recommender engines, category prediction, co-citation networks, knowledge graph construction, and semantic search interfaces.”

The dataset is now available on Kaggle and will be updated weekly.

Read the full article on Medium, or the arXiv blog.


New book seeks to combat ‘media warming’


UNIL conference on Research and Mobility: presentations available online

Awards and honors

Matteo Romanello wins SNSF Ambizione grant


New Horizon 2020 funding on European Museum Collaboration and Innovation Space