HALvesting: Harvests open scientific papers from HAL

HALvesting is a Python project designed to harvest research papers from Hyper Articles en Ligne (HAL) and turn them into a language modeling dataset.

The latest dump can be found on HuggingFace.

Quickstart

Code

About

Indices and tables