Go to top

Scrapinghub in GSoC 2015

Scrapinghub is applying!

Scrapinghub is a company focused on information retrieval and its later manipulation, deeply involved on developing and contributing in Open Source projects regarding web crawling and data processing technologies.

This year we are applying with three of our most renowned projects, Scrapy, Portia and Splash. You can learn more about these projects on their respective repositories: https://github.com/scrapy/scrapy, https://github.com/scrapinghub/portia and https://github.com/scrapinghub/splash


Scrapy is a very popular web crawling and scraping framework for Python (10th in Github most trending Python projects) used to write spiders for crawling and extracting data from websites.

Check Scrapy ideas


Portia is a tool that allows you to visually scrape websites without any programming knowledge required. Users can annotate web pages to identify the data they wish to extract, and Portia will understand based on these annotations how to scrape data from similar pages.

Check Portia ideas


Splash is a lightweight web browser with an HTTP API. It is used to render web pages that use JavaScript, interact with them, get detailed information and take screenshots of the crawled websites as they are seen in a browser.

Check Splash ideas