Question

Deploy & Run Scrapy Spiders On AWS Server Using ScrapeOps

i have Deployed & Run Scrapy Spiders On AWS Server Using ScrapeOps. the issue is that it is not saving the scrapped data at AWS server at the specific path. when i have checked the logged at scrapeOps it is showing the following error:

ERROR: Disabled feed storage scheme: s3. Reason: missing botocore library

well i have installed following commands at AWS console pip install boto3 botocore pip install botocore

but still the issue is same. please clarify do i need to pip install botocore at AWS console? or i have to install it anywhere else?


Submit an answer


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Bobby Iliev
Site Moderator
Site Moderator badge
March 10, 2024

Hey!

The error message “ERROR: Disabled feed storage scheme: s3. Reason: missing botocore library” suggests that your Scrapy project cannot access the AWS S3 storage because the required ‘botocore’ library is either missing or not installed correctly.

You need to make sure botocore and boto3 are installed correctly in the same environment where your Scrapy spider is running. Use the following command in your AWS console:

pip install boto3 botocore

Test the installation by running a simple Python script to access S3:

import boto3
s3 = boto3.client('s3')
response = s3.list_buckets()
print(response)

If this script works, you’ve installed the libraries successfully.

After that, modify your Scrapy project settings to enable S3 storage. Here’s an example configuration in your settings.py:Python

FEED_URI = 's3://your-bucket-name/path/to/store/data'
FEED_FORMAT = 'json'
FEED_EXPORTERS = {
    'json': 'scrapy.exporters.JsonItemExporter'
}

Replace your-bucket-name with the actual name of your S3 bucket.

If you were already attempting to use S3 within your spider code, double-check the import statements and the way you’re interacting with S3. Sometimes, restructuring imports or method calls can help if the library installations are correct.

If this still does not work, make sure to check the ScrapeOps logs as they provide valuable insights into errors during execution. Carefully examine the logs for additional clues about missing dependencies or configuration errors specific to your project.

On another note, make sure botocore, boto3, and Scrapy have compatible versions. Refer to their documentation for specific compatibility details. Also consider using a virtual environment (e.g., virtualenv) to isolate your project’s dependencies from other Python installations on your AWS server.

Let me know if you encounter any issues along the way. Provide more details about your Scrapy project setup, relevant code snippets, and complete error messages if further troubleshooting is necessary!

Best,

Bobby

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!

Sign up

Get our biweekly newsletter

Sign up for Infrastructure as a Newsletter.

Hollie's Hub for Good

Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.

Become a contributor

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

Welcome to the developer cloud

DigitalOcean makes it simple to launch in the cloud and scale up as you grow — whether you're running one virtual machine or ten thousand.

Learn more
DigitalOcean Cloud Control Panel