This project is the solution to the automation challenge proposed by Thoughtful AI. The goal is to create an RPA (Robotic Process Automation) bot using Python and Selenium to extract data from a news website, process it, and store it in an Excel file.
The bot performs the following tasks:
- Open the news website: The bot accesses the specified URL.
- Search for phrases: It enters a search phrase into the search field.
- Filter results: Filters the results based on the news category and the specified time period.
- Data extraction: Extracts the title, date, description, and image of the latest news that meets the criteria.
- Content analysis: Counts the number of occurrences of the search phrase in the title and description, and checks for the mention of monetary values.
- Data storage: Saves the extracted data in an Excel file, including the image file name and content analysis results.
- Image download: Downloads the images associated with the news and saves them in the output folder.
- Python 3.8+
- Selenium
- Pandas
-
Robocorp Control Room Setup:
- Ensure the parameters (search phrase, news category, and number of months) are correctly configured in Robocorp Control Room.
-
Run the Bot:
- The bot can be executed via Robocorp Control Room or directly through the Python code.
-
Output:
- The results will be stored in the
/outputfolder, including the Excel file with the data and the downloaded images.
- The results will be stored in the
src/: Contains the main source code.output/: Directory where the output files are stored.main.py: Main Python file.README.md: This file.