By aussiejim79
Hello,
I’m very new to Digital Ocean and creating servers in general… so apologies for the beginner questions.
I’ve created a Python script that scrapes websites and exports the data to an Excel spreadsheet… this works well on my own Windows PC but I would like to have this process completed on a Ubuntu Droplet.
I followed this guide… http://jonathansoma.com/lede/algorithms-2017/servers/setting-up/
But it didn’t work. Ideally I would require the script to run autonomously for five hours a day. Would this be possible? If so, I would be willing to pay for guidance.
Any advice would be much appreciated.
Many thanks,
James
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Heya,
Just came across this answer and decided to write some general guidelines for anyone who comes across this in the future despite the old question.
No need to apologize for beginner questions, we all start from somewhere!
Running a Python script on a DigitalOcean droplet, even autonomously for a certain amount of hours per day, is certainly possible. You would typically use a task scheduler like cron to execute your script at specified times. For interfacing with Excel on Ubuntu, you might need to adjust your script to use a library like pandas which can handle Excel files.
Please note, due to the range of possible issues when trying to set up a script like this, it’s difficult for me to provide a specific solution without more information about the problem you’ve encountered. The error message you received when trying to run the script would be really helpful.
Regarding the payment for guidance, there’s no need for that. As a DigitalOcean Support bot, I’m here to help you at no additional cost.
For additional help, I recommend you go through these resources:
Hope that this helps!
No worries about being new to this—it’s great that you’re learning! Setting up a Python script on a DigitalOcean Droplet to run autonomously is definitely possible and a common use case. Here’s a step-by-step guide to help you get your web scraping script running on a Ubuntu Droplet:
First, ensure you have a Droplet running Ubuntu. If you’ve already created one, you can skip this step. If not, here’s a quick rundown:
You can access your Droplet via SSH (Secure Shell). On Windows, you can use a tool like PuTTY or Windows Subsystem for Linux (WSL). To connect to your Droplet, you need its IP address and the private key for the SSH:
ssh root@your_droplet_ip
Once you’re connected to the Droplet, update the package manager and install Python along with pip, Python’s package installer:
sudo apt update
sudo apt install python3-pip
Since you’re scraping websites and exporting to Excel, you’ll likely need libraries such as beautifulsoup4, requests, and pandas or openpyxl:
pip3 install beautifulsoup4 requests pandas openpyxl
You can transfer your script from your local machine to your Droplet using SCP (Secure Copy). On your local machine, run:
scp /path/to/your/script.py root@your_droplet_ip:/path/where/to/store/it
To run your script autonomously for five hours a day, you can use cron, a scheduler in Linux. You’ll need to decide when and how often within those five hours the script should run. For example, to start the script every day at 1 PM, you would open the crontab file:
crontab -e
And add a line like this to run it every hour for five hours starting at 1 PM:
0 13,14,15,16,17 * * * /usr/bin/python3 /path/to/your/script.py
It’s essential to add some logging to your script to understand what happens when it runs:
import logging
logging.basicConfig(filename='/path/to/your/logfile.log', level=logging.INFO)
logging.info('Started running the script')
Since your script exports data to Excel, you’ll need to decide how to handle these files. You can store them in the Droplet or use DigitalOcean Spaces (object storage) to keep them. If they need to be accessed from your local machine, consider automating a download or using a cloud storage solution.
By following these steps, you should be able to run your web scraping script autonomously on a DigitalOcean Droplet. If you encounter specific issues, especially with parts of the guide you tried to follow, let me know, and I can help troubleshoot further!
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.