RSS is a great way to watch for interesting content on the Internet. I skim through a whole range of website feeds in Feedly, then any articles I actually want to read I save to Pocket for later. Sometimes, though, websites don’t have an RSS feed, or the RSS feeds they have are too broad or too narrow.

In these cases I use Apify to regularly crawl the website and create my own RSS feed. It’s awesome! You have complete flexibility over what content to include in the feed. Apify has a free Developer tier that gives you plenty of capacity to scrape a few websites every day. They actually have an existing blog post on this topic, but below I’ve added some extra things you might need to get it working.

1 – Create the crawler on Apify

Sign up for an account on Apify and click Tasks > Create a new task” > Legacy PhantomJS Crawler”. This task type is the only one I’ve found that lets you access crawler results via RSS.

The crawler configuration page on ApifyThe crawler configuration page on Apify

Set the Start URLs” to the page you want to scrape, and make Clickable elements” blank if you only want to scrape one page. You would use clickable elements if you wanted to jump from the starting page to other links found on the page.

2 – Create the page scraping function

Apify lets you write a simple JavaScript function to look for elements and return structured data about that page. In this case, we’re going to use jQuery to find the right elements, and we’ll return a list of JSON objects that each represent an RSS feed item.

Use your browser’s web inspector to look for specific HTML elements and CSS classes that will let you hone in on the headlines you want.

Inspecting HTML elements to scrape in Chrome Dev ToolsInspecting HTML elements to scrape in Chrome Dev Tools

[…]

Read more >>

Articles found elsewhere on the Web!