HALvesting: Harvests open scientific papers from HAL¶
HALvesting is a Python project designed to harvest research papers from Hyper Articles en Ligne (HAL) and turn them into a language modeling dataset.
The latest dump can be found on HuggingFace.