HTMLCorpus Scraper – is tool for scraping web content for text that can be used for topic modelling purposes. The tool can scrape an unlimited number of URLs to a maximum depth of 7.

The tool is helpful for producing corpus of texts for machine learning purposes. It produces a CSV file or corpus of text files – which can be used in your machine learning program for topic modelling.

  • Extract article text from unlimited number of URLs.
  • Extract articles as .txt files or .csv files.
  • Superfast scraping process with realtime update data.
  • Extracted data is also saved a non-structured database for advanced users interested in querying the data.
  • Many more cool features, checkout our demo!