How to add Selenium webdriver and save data to Excel using Ubuntu Droplet?

Question

Hello,

I’m very new to Digital Ocean and creating servers in general… so apologies for the beginner questions.

I’ve created a Python script that scrapes websites and exports the data to an Excel spreadsheet… this works well on my own Windows PC but I would like to have this process completed on a Ubuntu Droplet.

I followed this guide… http://jonathansoma.com/lede/algorithms-2017/servers/setting-up/

But it didn’t work. Ideally I would require the script to run autonomously for five hours a day. Would this be possible? If so, I would be willing to pay for guidance.

Any advice would be much appreciated.

Many thanks,

James

alexdo · Answer

Heya,

Just came across this answer and decided to write some general guidelines for anyone who comes across this in the future despite the old question.

No need to apologize for beginner questions, we all start from somewhere!

Running a Python script on a DigitalOcean droplet, even autonomously for a certain amount of hours per day, is certainly possible. You would typically use a task scheduler like cron to execute your script at specified times. For interfacing with Excel on Ubuntu, you might need to adjust your script to use a library like pandas which can handle Excel files.

Please note, due to the range of possible issues when trying to set up a script like this, it’s difficult for me to provide a specific solution without more information about the problem you’ve encountered. The error message you received when trying to run the script would be really helpful.

Regarding the payment for guidance, there’s no need for that. As a DigitalOcean Support bot, I’m here to help you at no additional cost.

For additional help, I recommend you go through these resources:

Hope that this helps!

KFSys · Answer

No worries about being new to this—it’s great that you’re learning! Setting up a Python script on a DigitalOcean Droplet to run autonomously is definitely possible and a common use case. Here’s a step-by-step guide to help you get your web scraping script running on a Ubuntu Droplet:

Step 1: Set Up Your Droplet

First, ensure you have a Droplet running Ubuntu. If you’ve already created one, you can skip this step. If not, here’s a quick rundown:

Create a Droplet: Log into your DigitalOcean account, click “Create Droplet,” choose Ubuntu as your image, select a plan, choose a data center, and finally, set up your SSH keys for secure access.

Step 2: Access Your Droplet

You can access your Droplet via SSH (Secure Shell). On Windows, you can use a tool like PuTTY or Windows Subsystem for Linux (WSL). To connect to your Droplet, you need its IP address and the private key for the SSH:

ssh root@your_droplet_ip

Step 3: Install Python and Required Libraries

Once you’re connected to the Droplet, update the package manager and install Python along with pip, Python’s package installer:

sudo apt update
sudo apt install python3-pip

Since you’re scraping websites and exporting to Excel, you’ll likely need libraries such as beautifulsoup4, requests, and pandas or openpyxl:

pip3 install beautifulsoup4 requests pandas openpyxl

Step 4: Transfer Your Script to the Droplet

You can transfer your script from your local machine to your Droplet using SCP (Secure Copy). On your local machine, run:

scp /path/to/your/script.py root@your_droplet_ip:/path/where/to/store/it

Step 5: Schedule the Script

To run your script autonomously for five hours a day, you can use cron, a scheduler in Linux. You’ll need to decide when and how often within those five hours the script should run. For example, to start the script every day at 1 PM, you would open the crontab file:

crontab -e

And add a line like this to run it every hour for five hours starting at 1 PM:

0 13,14,15,16,17 * * * /usr/bin/python3 /path/to/your/script.py

Step 6: Monitoring and Logging

It’s essential to add some logging to your script to understand what happens when it runs:

import logging
logging.basicConfig(filename='/path/to/your/logfile.log', level=logging.INFO)
logging.info('Started running the script')

Step 7: Handling Excel Files

Since your script exports data to Excel, you’ll need to decide how to handle these files. You can store them in the Droplet or use DigitalOcean Spaces (object storage) to keep them. If they need to be accessed from your local machine, consider automating a download or using a cloud storage solution.

Troubleshooting

If something doesn’t work as expected, check the logs.
Ensure all paths in your scripts and crontab entries are absolute and correctly specified.
Test running your script manually on the Droplet before scheduling it.

By following these steps, you should be able to run your web scraping script autonomously on a DigitalOcean Droplet. If you encounter specific issues, especially with parts of the guide you tried to follow, let me know, and I can help troubleshoot further!