Welcome to the crawley project web site!
Crawley is Pythonic Crawling / Scraping framework intented to change the way you think about extracting data from the internet
Features:
- High Speed WebCrawler built on Eventlet.
- Store you data in relational databases like Postgres, Mysql, Oracle, Sqlite.
- Export your data into Json, XML formats. New
- Supports NoSQL databases like Mongodb and Couchdb. New
- Command line tools.
- Extract data using your favourite tool. XPath or Pyquery (A Jquery-like library for python).
- Cookie Handlers for scraping login protected pages.
- Very easy to use (see the examples).
Downloading the latest version
We are proud to announce the release 0.2.4 of the crawley framework.
Satisfy Dependencies. In ubuntu just do:
~$ apt-get install python-dev libxml2 libxslt1-dev
Then you can download the latest version from pip:
~$ pip install crawley
Or clone the repository at github:
~$ git clone git://github.com/jmg/crawley.git
Where to start?
You can checkout first the documentation and learn crawley with examples!. Also you can join the crawley-users group. The community is in there to help you.
Do you wanna contribute?
Join the crawley-developers group.
Or make a financial donation via paypal.