Go to top

Scrapinghub Profile

Organization Name

Scrapinghub

Description

Scrapinghub is a company focused on information retrieval and its later manipulation, deeply involved on developing and contributing in Open Source projects regarding web crawling and data processing technologies.

This year we are applying with three of our most renowned projects, Scrapy, Portia and Splash.

Scrapy is one of the most popular web crawling and scraping frameworks for Python (10th in Github most trending Python projects) used to write spiders for crawling and extracting data from websites.

Portia is a tool that allows you to visually scrape websites without any programming knowledge required, built on top of Scrapy. Users can annotate web pages to identify the data they wish to extract, and Portia will understand based on these annotations how to scrape data from similar pages.

Splash is a javascript rendering service with an HTTP API. It's a lightweight browser with an HTTP API, implemented in Python using Twisted and QT. It's fast, lightweight and state-less which makes it easy to distribute.

You can learn more about these projects on their respective repositories: https://github.com/scrapy/scrapy, https://github.com/scrapinghub/portia and https://github.com/scrapinghub/splash

Tags

python, javascript, lua, git, qt, twisted, http, api, web, crawl, scraping, scrapy, portia, splash

Main License

New and Simplified BSD licenses

Ideas list

http://gsoc2015.scrapinghub.com/

Mailing list

Blank

Organization Website

http://scrapinghub.com/

IRC Channel

Blank

Feed URL

Blank

Google+ URL

https://plus.google.com/+Scrapinghub

Twitter URL

https://twitter.com/ScrapingHub

Blog Page

http://blog.scrapinghub.com/

Facebook URL

Blank

Veteran Organization

Yes