Need a way to get the content from a HTML page that you have stored locally? Select from 7 different options and get the content you need outputted to a CSV file with just 3 console inputs!
Figure 1.1. Data extraction tool demo
The Data Extraction Tool uses the BeautifulSoup and pandas libraries to extract all the content the user requires from a local HTML file. The extraction options include paragraphs, headings, links, image links, just links, just paragraphs and headings or all options in one.
Firstly, the user must input the name of the CSV file to create. Next, the user selects the HTML file to extract data from that has been placed within the HTML folder. Lastly, the user selects the index of the type of data extraction option to use. The full list is outlined below:
- Img urls
- All text (no links)
- All links
- All content
Figure 1.2. Example csv file output