Scrapinghub is a company focused on information retrieval and its later manipulation, deeply involved on developing and contributing in Open Source projects regarding web crawling and data processing technologies.
This year we are applying with three of our most renowned projects, Scrapy, Portia and Splash.
Scrapy is one of the most popular web crawling and scraping frameworks for Python (10th in Github most trending Python projects) used to write spiders for crawling and extracting data from websites.
Portia is a tool that allows you to visually scrape websites without any programming knowledge required, built on top of Scrapy. Users can annotate web pages to identify the data they wish to extract, and Portia will understand based on these annotations how to scrape data from similar pages.
New and Simplified BSD licenses