Question

I'm not getting correct output witch scrapy

i was following the tutorial: ‘How To Crawl A Web Page with Scrapy and Python 3’ and when I got to the part where we actually gather the data from the website , I am getting the same thing as when the for loop just had pass in it. Here is my code:

import scrapy 


class BrickSetSpider(scrapy.Spider): 
    name = "brickset_spider"    #<< name of the spider     
    start_urls = ['http://brickset.com/sets/year-2016']    

    
    def parse(self, response):
        SET_SELECTOR = ".set" 
        for brickset in response.css(SET_SELECTOR): 
            
            NAME_SELECTOR = 'h1 ::text'
            PIECES_SELECTOR = './/d1[dt/text() = "Pieces"]/dd/a/text()'
            MINIFIGS_SELECTOR = './/d1[dt/text() = "Minifigs"]/dd[2]/a/text()'
            IMAGE_SELECTOR = 'img ::attr(src)'
            yield {
                'name': brickset.css(NAME_SELECTOR).extract_first(), 
                'pieces': brickset.xpath(PIECES_SELECTOR).extract_first(),
                'minifigs': brickset.xpath(MINIFIGS_SELECTOR).extract_first(),
                'image': brickset.css(IMAGE_SELECTOR).extract_first(),
            }

But here is my output:

2021-03-05 02:09:45 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: scrapybot)
2021-03-05 02:09:45 [scrapy.utils.log] INFO: Versions: lxml 4.6.2.0, libxml2 2.9.5, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.2.0, Python 3.8.8 (tags/v3.8.8:024d805, Feb 19 2021, 13:18:16) [MSC v.1928 64 bit (AMD64)], pyOpenSSL 20.0.1 (OpenSSL 1.1.1j  16 Feb 2021), cryptography 3.4.6, Platform Windows-10-10.0.18362-SP0  
2021-03-05 02:09:45 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2021-03-05 02:09:45 [scrapy.crawler] INFO: Overridden settings:
{'SPIDER_LOADER_WARN_ONLY': True}
2021-03-05 02:09:45 [scrapy.extensions.telnet] INFO: Telnet Password: 2dbd3797bff9e0a2
2021-03-05 02:09:45 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2021-03-05 02:09:46 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2021-03-05 02:09:46 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2021-03-05 02:09:46 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2021-03-05 02:09:46 [scrapy.core.engine] INFO: Spider opened
2021-03-05 02:09:46 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-03-05 02:09:46 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2021-03-05 02:09:46 [scrapy.core.engine] DEBUG: Crawled (403) <GET http://brickset.com/sets/year-2016> (referer: None)
2021-03-05 02:09:46 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 http://brickset.com/sets/year-2016>: HTTP status code is not handled or not allowed
2021-03-05 02:09:46 [scrapy.core.engine] INFO: Closing spider (finished)
2021-03-05 02:09:46 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 226,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 2227,
 'downloader/response_count': 1,
 'downloader/response_status_count/403': 1,
 'elapsed_time_seconds': 0.307499,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2021, 3, 5, 7, 9, 46, 875575),
 'httperror/response_ignored_count': 1,
 'httperror/response_ignored_status_count/403': 1,
 'log_count/DEBUG': 1,
 'log_count/INFO': 11,
 'response_received_count': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2021, 3, 5, 7, 9, 46, 568076)}
2021-03-05 02:09:46 [scrapy.core.engine] INFO: Spider closed (finished)

Please help ;-;

Show comments

Submit an answer

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer