By sonicgrass
Connected Tutorial(This question is a follow-up to this tutorial):
How To Scrape a Website Using Node.js and PuppeteerHello
I am having a hard time getting this to work with the following website: https://massart.edu/news-category/massart-news
Here is the code for my pagescraper.js
const scraperObject = {
//url: 'http://books.toscrape.com',
url: 'https://massart.edu/news-category/massart-news',
async scraper(browser){
let page = await browser.newPage();
console.log(`Navigating to ${this.url}...`);
await page.goto(this.url, {waitUntil: 'domcontentloaded'});
// Wait for the required DOM to be rendered
//await page.waitForSelector('.l_main'); //need to figure .page_inner
await page.waitForSelector('.layout-region.content-main');
// Get the link to all the required books
//let urls = await page.$$eval('section ol > li', links => {
//let urls = await page.$$eval('section main > article >div', links => {
let urls = await page.$$eval('article div > h2', links => {
// Make sure the book to be scraped is in stock
links = links.filter(link => link.querySelector('.field__items > i'))
// Extract the links from the data
links = links.map(el => el.querySelector('h2 > a'))
return links;
});
console.log(urls);
}
}
module.exports = scraperObject;
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Heya,
You need to adjust the selectors to match the elements on the target website and based on the website that you provided, it appears you are trying to scrape the links inside the <h2> elements of articles.
You can modify the selector to select all links within <a> tags inside <article> elements. This should match the links to articles on the page.
Make sure you have Puppeteer installed and properly set up in your project, and that you are handling errors and other aspects of web scraping as needed for your specific use case.
Hope that this helps!
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.