tiny & portable dom scraper using jQuery like syntax integrated with schedul...

Scraply

Scraply a simple dom scraper to fetch information from any html based website using jQuery like syntax and convert that info to JSON APIs

How it works?

it works by simple define some macros / endpoints in HCL format, and let the magic begins, here is an example:

# /scraply
macro scraply {
    // the url to scrap
    // we will scrap scraply github page and get information from it
    url = "https://github.com/alash3al/scraply"

    // cache [time to live] in seconds
    // set it to any value < 1 to disable it.
    ttl = 120

    // code to be executed
    //
    // this is a javascript code
    // you must set your returns in the exports variable
    exec = <<JS
        exports = {
            // fetching the title
            // similar to jQuery, right?
            title: $("title").Text(),
            description: $('meta[name=description]').AttrOr('content', '')
        }
    JS

    // schedule this macro to run at the specified cron style spec
    // it extends the cronjob with an additional field in the first
    // to supports seconds.
    schedule = "* * * * * *"

    // notify an endpoint with the result
    // the payload is a json object just like: {"error": "an error if any", "result": "the result will be here"}
    webhook = "http://some.endpoint.com"

    // whether you don't want to expose this macro to the api or not
    private = true

    // our $(..).Method() is just like jQuery's $(..).method()
    // our $(..).Method() is an alias for document.Find(..).Method()
    // 
    // here is a table shows you jQuery methods and scraply Methods:
    //
    //  jQuery              :   Scraply
    //  -------------           ---------------
    //  $(..).first()       :   $(..).First()
    //  $(..).html()        :   $(..).Html()
    //  $(..).text()        :   $(..).Text()
    //  $(..).last()        :   $(..).Last()
    //  $(..).find()        :   $(..).Find()
    //  $(..).attr()        :   $(..).Attr() | $(..).AttrOr(needle, defaultValue)
    //  $(..).children()    :   $(..).Children()
    //  $(..).prev()        :   $(..).Prev()
    //  $(..).next()        :   $(..).Next()
    //  $(..).has()         :   $(..).Has()

    // also you have the following functions in js context
    // println()/console.log()
    // time() the current timestamp
    // sleep(ms) sleep the execution for x of milliseconds
    // macro(macro_name) executes the specified macro name and return its result
}

# /sqler
macro sqler {
    url = "https://github.com/alash3al/sqler"
    ttl = 120
    exec = <<JS
        exports = {
            title: $('title').Text(),
            description: $('meta[name="description"]').AttrOr('content', '')
        }
    JS
}

# /redix
macro redix {
    url = "https://github.com/alash3al/redix"
    ttl = 120
    exec = <<JS
        exports = {
            title: $('title').Text(),
            description: $('meta[name="description"]').AttrOr('content', '')
        }
    JS
}

# aggregate ?
macro all {
    exec = <<JS
        exports = {
            redis: macro("redix"),
            sqler: macro("sqler")
        }
    JS
}

Why?

I wanted a simple tool that fetches the required information in a simple way from web pages, I'm using it in the following cases:

Scraping data from currency rates websites
Scraping product pricing data from e-commerce sites
Scraping news from news websites
Scraping search data
there are more use cases ...

Features

Tiny & Portable Engine.
You can scale & distribute it easily.
Private/Public Macros.
Cron like scheduler.
Webhook Support.
jQuery like API.
Customize everythin in javascript.

How?

Download the binary that fits your OS from here
Create a configuration file i.e scraply.hcl
Run scrapply ./path/to/downloaded/scrapply --config=./scraply.hcl --listen=:9080

Scraply

How it works?

Why?

Features

How?

Recommend

腾讯音乐娱乐集团参与收购环球音乐集团股权

2020年，滴滴哈啰不拼爹

A Complete Guide To The Machine Learning Tools On AWS

⚡ Quark — An IDE and a JavaScript runtime to build cross-platform desktop apps

安卓（Android）性能优化内存优化实战秘籍

这款马桶想阻止你“带薪拉屎”

独立研究中顶会Spotlight，从读博就业无门到一举成名，这位小哥的经验分享火了

解析春运玄学：携程飞猪去哪儿们的抢票加速包，到底灵不灵？

Looking back at 2019

Android 11 可能移除 4GB 录像文件限制

About Joyk