Data Extraction Tool

The Data Extraction Tool uses the BeautifulSoup and pandas libraries to extract all the content you require from a local HTML file. You can select to extract: paragraphs, headings, links, image links, just links, just paragraphs and headings or all options in one!

Firstly, input the name of the CSV file you want to create.

Input the name of the CSV file you want to create.

=>

Secondly, select the index of the HTML file to extract data from (these files will need to be added to the data folder that is created after the program has been run once).

Select the index of the HTML file to extract data from.

  1. projects-web-scraper.html
  2. root-page.html

=>

Lastly, select the index of the extraction option you would like to use.

Select the index of the tag type you want to extract data from.

  1. Paragraphs
  2. Headings
  3. Links
  4. Img urls
  5. All text (no links)
  6. All links
  7. All content

=>