Problems executing on a different website.

Posted on May 16, 2021

Data Analysis

Node.js

JavaScript

By sonicgrass

Connected Tutorial(This question is a follow-up to this tutorial):

How To Scrape a Website Using Node.js and Puppeteer

Hello

I am having a hard time getting this to work with the following website: https://massart.edu/news-category/massart-news

Here is the code for my pagescraper.js

const scraperObject = {
    //url: 'http://books.toscrape.com',
    url: 'https://massart.edu/news-category/massart-news',
    async scraper(browser){
        let page = await browser.newPage();
        console.log(`Navigating to ${this.url}...`);
        await page.goto(this.url, {waitUntil: 'domcontentloaded'});
        // Wait for the required DOM to be rendered
        //await page.waitForSelector('.l_main'); //need to figure .page_inner
        await page.waitForSelector('.layout-region.content-main');        
        // Get the link to all the required books
        //let urls = await page.$$eval('section ol > li', links => {
        	//let urls = await page.$$eval('section main > article >div', links => {
        		let urls = await page.$$eval('article div > h2', links => {
            // Make sure the book to be scraped is in stock 
            links = links.filter(link => link.querySelector('.field__items > i'))
            // Extract the links from the data           
            links = links.map(el => el.querySelector('h2 > a'))
            return links;
        });
        console.log(urls);
    }
}

module.exports = scraperObject;

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

alexdo

October 24, 2023

Heya,

You need to adjust the selectors to match the elements on the target website and based on the website that you provided, it appears you are trying to scrape the links inside the <h2> elements of articles.

You can modify the selector to select all links within <a> tags inside <article> elements. This should match the links to articles on the page.

Make sure you have Puppeteer installed and properly set up in your project, and that you are handling errors and other aspects of web scraping as needed for your specific use case.

Hope that this helps!

Become a contributor for community

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

DigitalOcean Documentation

Full documentation for every DigitalOcean product.

Learn more

Resources for startups and SMBs

The Wave has everything you need to know about building a business, from raising funding to marketing your product.

Learn more

Get our newsletter

Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

New accounts only. By submitting your email you agree to our Privacy Policy

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

View all products

Get started for free

Get started

*This promotional offer applies to new accounts only.