Attributes
Includes Deno configuration
Repository
Current version released
3 years ago
Dependencies
deno.land/x
Versions
Recipe Scraper
A simple scraper made for scraping recipes but not limited to such.
Idea
Use puppeteer to crawl using an actual browser and have events that inform of the status of the call.
Config
The way to configure a crawler.
crawlstartUrlwhere the crawl should startlinkExtractorsarraylinkExtractioncss selector for links to followshouldExtractfunction taking the page url deciding if to use this extractor or not
detailExtractordetailskey value map key is key of data extarcted and the value is a css selector to get the text to be extracted by that keyshouldExtracttakes the page url and decides if to extract data or not
crawlerlaunchConfigheadlessboolean - show brawser or not
politenessnumber - milliseconds to wait between each page- defaults to 1000 aka a second
Events
datafires witha record of data extracted.startfires when crawl starts with start url and the datecrawledfires with the url of the crawled pageinfotag and message, just misscelaneous infoerrorfires when something unexpected happensfinishfires at the end with all crawled urls and the date finished