Report this

What is the reason for this report?

Problems executing on a different website.

Posted on May 16, 2021

Connected Tutorial(This question is a follow-up to this tutorial):

How To Scrape a Website Using Node.js and Puppeteer

Hello

I am having a hard time getting this to work with the following website: https://massart.edu/news-category/massart-news

Here is the code for my pagescraper.js

const scraperObject = {
    //url: 'http://books.toscrape.com',
    url: 'https://massart.edu/news-category/massart-news',
    async scraper(browser){
        let page = await browser.newPage();
        console.log(`Navigating to ${this.url}...`);
        await page.goto(this.url, {waitUntil: 'domcontentloaded'});
        // Wait for the required DOM to be rendered
        //await page.waitForSelector('.l_main'); //need to figure .page_inner
        await page.waitForSelector('.layout-region.content-main');        
        // Get the link to all the required books
        //let urls = await page.$$eval('section ol > li', links => {
        	//let urls = await page.$$eval('section main > article >div', links => {
        		let urls = await page.$$eval('article div > h2', links => {
            // Make sure the book to be scraped is in stock 
            links = links.filter(link => link.querySelector('.field__items > i'))
            // Extract the links from the data           
            links = links.map(el => el.querySelector('h2 > a'))
            return links;
        });
        console.log(urls);
    }
}

module.exports = scraperObject;


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Heya,

You need to adjust the selectors to match the elements on the target website and based on the website that you provided, it appears you are trying to scrape the links inside the <h2> elements of articles.

You can modify the selector to select all links within <a> tags inside <article> elements. This should match the links to articles on the page.

Make sure you have Puppeteer installed and properly set up in your project, and that you are handling errors and other aspects of web scraping as needed for your specific use case.

Hope that this helps!

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.