How to do client-side web scraping with JavaScript and jQuery?

The web is transitioning to more and more front-end frameworks e.g. Next.js, React, Angular etc. We’d need a new way to scrape data from websites. This article shows you how to use jQuery to write your own client-side web scraping scripts effectively with Agenty.

I’ll show you how to get your web data with just a few lines of code in JavaScript or jQuery with Agenty to automate your business by collecting data from websites.

Scraping data with jQuery

  • Create a new scraping agent or edit an existing one in your account.
  • Go to configuration tab, click on Field and collection in sidebar
  • Click on the jQuery button on the top right of the fields section.

Here you can write any JavaScript or jQuery code to scrape data from the given URL, the code will be injected on the current browser page to be evaluated in the browsing session.

To demo, let’s build a scraper capable of extracting news from this hacker news website using this 10-lines of code.

Code

function extract() {
    const result = [];
    $('.itemlist tr[id]').each(function (index, tr) {
        const item = {
            rank: index,
            url: $(tr).find('.titlelink').attr('href'),
            title: $(tr).find('.titlelink').text(),
        }
        result.push(item)
    });
    return result;
}

If you look at this jQuery code, I am using each to traverse on each row in the table with id, and use the find function to find elements with given selector on each row.

Then scrape the data using the text() or attr() function to extract attributes or text.

I also added the index to keep a track of news rank on the website to find which news appeared on what rank.

Result

Run your agent by clicking on the Run button and see the tabular output on Result tab downloadable in CSV, JSON and TSV format.

Scraping data with JavaScript

Scraping data with JavaScript is a way to get the data from any website, which is not available in the form of API or any other channel, into your system. In this article we will see how scraping can be done using JavaScript without jQuery dependency.

Code

function extract() {
    const result = [];
    const elements = document.querySelectorAll('.itemlist tr[id]');
    let index = 0;
    for (let element of elements) {
        const item = {
            rank: index++,
            url: element.querySelector('.titlelink').getAttribute('href'),
            title: element.querySelector('.titlelink').innerText,
        }
        result.push(item)
    }  
    return result;
}

In JavaScript we have just changed our code to use forof loop and querySelectorAll and querySelector to traverse each element in the document to extract the innerText and attribute using getAttribute function in native API.

Optionally, we can also test our code in Chrome developer mode to make the development and debugging easy. Follow these steps -

  1. Open developer mode
  2. Go to Sources tab
  3. Go to Snippets, create new snippet
  4. Paste the code
  5. Add one more line at the bottom to execute the extract() function and print the result using console.table.
console.table(extract());

client-side-web-scraping

Once tested, remove the console.table(extract()); line and enter the code in Agenty to run your web scraping agent for batch URLs in input.

Signup now to get 100 pages credit free

14 days free trial, no credit card required!