i followed the official guide, got error message:
the following packages have unmet dependencies: scrapy : depends: python-support (>= 0.90.0) not installable recommends: python-setuptools not going installed e: unable correct problems, have held broken packages.
i tried sudo apt-get python-support
, found ubuntu 16.04 removed python-support
.
lastly, tried install python-setuptools
, seems install python2 instead.
the following additional packages installed: libpython-stdlib libpython2.7-minimal libpython2.7-stdlib python python-minimal python-pkg-resources python2.7 python2.7-minimal suggested packages: python-doc python-tk python-setuptools-doc python2.7-doc binutils binfmt-support following new packages installed: libpython-stdlib libpython2.7-minimal libpython2.7-stdlib python python-minimal python-pkg-resources python-setuptools python2.7 python2.7-minimal
what should use scrapy
in python 3 environment on ubuntu 16.04? thanks.
you should with:
apt-get install -y \ python3 \ python-dev \ python3-dev # cryptography apt-get install -y \ build-essential \ libssl-dev \ libffi-dev # lxml apt-get install -y \ libxml2-dev \ libxslt-dev # install pip apt-get install -y python-pip
this example dockerfile test installing scrapy on python 3, on ubuntu 16.04/xenial:
$ cat dockerfile ubuntu:xenial env debian_frontend noninteractive run apt-get update # install python3 , dev headers run apt-get install -y \ python3 \ python-dev \ python3-dev # install cryptography run apt-get install -y \ build-essential \ libssl-dev \ libffi-dev # install lxml run apt-get install -y \ libxml2-dev \ libxslt-dev # install pip run apt-get install -y python-pip run useradd --create-home --shell /bin/bash scrapyuser user scrapyuser workdir /home/scrapyuser
then, after building docker image , running container with:
$ sudo docker build -t redapple/scrapy-ubuntu-xenial . $ sudo docker run -t -i redapple/scrapy-ubuntu-xenial
you can run pip install scrapy
below i'm using virtualenvwrapper
create python 3 virtualenv:
scrapyuser@88cc645ac499:~$ pip install --user virtualenvwrapper collecting virtualenvwrapper downloading virtualenvwrapper-4.7.1-py2.py3-none-any.whl collecting virtualenv-clone (from virtualenvwrapper) downloading virtualenv-clone-0.2.6.tar.gz collecting stevedore (from virtualenvwrapper) downloading stevedore-1.14.0-py2.py3-none-any.whl collecting virtualenv (from virtualenvwrapper) downloading virtualenv-15.0.2-py2.py3-none-any.whl (1.8mb) 100% |################################| 1.8mb 320kb/s collecting pbr>=1.6 (from stevedore->virtualenvwrapper) downloading pbr-1.10.0-py2.py3-none-any.whl (96kb) 100% |################################| 102kb 1.5mb/s collecting six>=1.9.0 (from stevedore->virtualenvwrapper) downloading six-1.10.0-py2.py3-none-any.whl building wheels collected packages: virtualenv-clone running setup.py bdist_wheel virtualenv-clone ... done stored in directory: /home/scrapyuser/.cache/pip/wheels/24/51/ef/93120d304d240b4b6c2066454250a1626e04f73d34417b956d built virtualenv-clone installing collected packages: virtualenv-clone, pbr, six, stevedore, virtualenv, virtualenvwrapper installed pbr 6 stevedore virtualenv virtualenv-clone virtualenvwrapper using pip version 8.1.1, version 8.1.2 available. should consider upgrading via 'pip install --upgrade pip' command. scrapyuser@88cc645ac499:~$ source ~/.local/bin/virtualenvwrapper.sh virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/premkproject virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/postmkproject virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/initialize virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/premkvirtualenv virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/postmkvirtualenv virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/prermvirtualenv virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/postrmvirtualenv virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/predeactivate virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/postdeactivate virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/preactivate virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/postactivate virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/get_env_details scrapyuser@88cc645ac499:~$ export path=$path:/home/scrapyuser/.local/bin scrapyuser@88cc645ac499:~$ mkvirtualenv --python=/usr/bin/python3 scrapy11.py3 running virtualenv interpreter /usr/bin/python3 using base prefix '/usr' new python executable in /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/python3 creating executable in /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/python installing setuptools, pip, wheel...done. virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/predeactivate virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/postdeactivate virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/preactivate virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/postactivate virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/get_env_details
and installing scrapy 1.1 matter of pip install scrapy
(scrapy11.py3) scrapyuser@88cc645ac499:~$ pip install scrapy collecting scrapy downloading scrapy-1.1.0-py2.py3-none-any.whl (294kb) 100% |################################| 296kb 1.0mb/s collecting pydispatcher>=2.0.5 (from scrapy) downloading pydispatcher-2.0.5.tar.gz collecting pyopenssl (from scrapy) downloading pyopenssl-16.0.0-py2.py3-none-any.whl (45kb) 100% |################################| 51kb 1.8mb/s collecting lxml (from scrapy) downloading lxml-3.6.0.tar.gz (3.7mb) 100% |################################| 3.7mb 312kb/s collecting parsel>=0.9.3 (from scrapy) downloading parsel-1.0.2-py2.py3-none-any.whl collecting six>=1.5.2 (from scrapy) using cached six-1.10.0-py2.py3-none-any.whl collecting twisted>=10.0.0 (from scrapy) downloading twisted-16.2.0.tar.bz2 (2.9mb) 100% |################################| 2.9mb 307kb/s collecting queuelib (from scrapy) downloading queuelib-1.4.2-py2.py3-none-any.whl collecting cssselect>=0.9 (from scrapy) downloading cssselect-0.9.1.tar.gz collecting w3lib>=1.14.2 (from scrapy) downloading w3lib-1.14.2-py2.py3-none-any.whl collecting service-identity (from scrapy) downloading service_identity-16.0.0-py2.py3-none-any.whl collecting cryptography>=1.3 (from pyopenssl->scrapy) downloading cryptography-1.4.tar.gz (399kb) 100% |################################| 409kb 1.1mb/s collecting zope.interface>=4.0.2 (from twisted>=10.0.0->scrapy) downloading zope.interface-4.1.3.tar.gz (141kb) 100% |################################| 143kb 1.3mb/s collecting attrs (from service-identity->scrapy) downloading attrs-16.0.0-py2.py3-none-any.whl collecting pyasn1 (from service-identity->scrapy) downloading pyasn1-0.1.9-py2.py3-none-any.whl collecting pyasn1-modules (from service-identity->scrapy) downloading pyasn1_modules-0.0.8-py2.py3-none-any.whl collecting idna>=2.0 (from cryptography>=1.3->pyopenssl->scrapy) downloading idna-2.1-py2.py3-none-any.whl (54kb) 100% |################################| 61kb 2.0mb/s requirement satisfied (use --upgrade upgrade): setuptools>=11.3 in ./.virtualenvs/scrapy11.py3/lib/python3.5/site-packages (from cryptography>=1.3->pyopenssl->scrapy) collecting cffi>=1.4.1 (from cryptography>=1.3->pyopenssl->scrapy) downloading cffi-1.6.0.tar.gz (397kb) 100% |################################| 399kb 1.1mb/s collecting pycparser (from cffi>=1.4.1->cryptography>=1.3->pyopenssl->scrapy) downloading pycparser-2.14.tar.gz (223kb) 100% |################################| 225kb 1.2mb/s building wheels collected packages: pydispatcher, lxml, twisted, cssselect, cryptography, zope.interface, cffi, pycparser running setup.py bdist_wheel pydispatcher ... done stored in directory: /home/scrapyuser/.cache/pip/wheels/86/02/a1/5857c77600a28813aaf0f66d4e4568f50c9f133277a4122411 running setup.py bdist_wheel lxml ... done stored in directory: /home/scrapyuser/.cache/pip/wheels/6c/eb/a1/e4ff54c99630e3cc6ec659287c4fd88345cd78199923544412 running setup.py bdist_wheel twisted ... done stored in directory: /home/scrapyuser/.cache/pip/wheels/fe/9d/3f/9f7b1c768889796c01929abb7cdfa2a9cdd32bae64eb7aa239 running setup.py bdist_wheel cssselect ... done stored in directory: /home/scrapyuser/.cache/pip/wheels/1b/41/70/480fa9516ccc4853a474faf7a9fb3638338fc99a9255456dd0 running setup.py bdist_wheel cryptography ... done stored in directory: /home/scrapyuser/.cache/pip/wheels/f6/6c/21/11ec069285a52d7fa8c735be5fc2edfb8b24012c0f78f93d20 running setup.py bdist_wheel zope.interface ... done stored in directory: /home/scrapyuser/.cache/pip/wheels/52/04/ad/12c971c57ca6ee5e6d77019c7a1b93105b1460d8c2db6e4ef1 running setup.py bdist_wheel cffi ... done stored in directory: /home/scrapyuser/.cache/pip/wheels/8f/00/29/553c1b1db38bbeec3fec428ae4e400cd8349ecd99fe86edea1 running setup.py bdist_wheel pycparser ... done stored in directory: /home/scrapyuser/.cache/pip/wheels/9b/f4/2e/d03e949a551719a1ffcb659f2c63d8444f4df12e994ce52112 built pydispatcher lxml twisted cssselect cryptography zope.interface cffi pycparser installing collected packages: pydispatcher, idna, pyasn1, six, pycparser, cffi, cryptography, pyopenssl, lxml, w3lib, cssselect, parsel, zope.interface, twisted, queuelib, attrs, pyasn1-modules, service-identity, scrapy installed pydispatcher-2.0.5 twisted-16.2.0 attrs-16.0.0 cffi-1.6.0 cryptography-1.4 cssselect-0.9.1 idna-2.1 lxml-3.6.0 parsel-1.0.2 pyopenssl-16.0.0 pyasn1-0.1.9 pyasn1-modules-0.0.8 pycparser-2.14 queuelib-1.4.2 scrapy-1.1.0 service-identity-16.0.0 six-1.10.0 w3lib-1.14.2 zope.interface-4.1.3
finally testing example project:
(scrapy11.py3) scrapyuser@88cc645ac499:~$ scrapy startproject tutorial new scrapy project 'tutorial', using template directory '/home/scrapyuser/.virtualenvs/scrapy11.py3/lib/python3.5/site-packages/scrapy/templates/project', created in: /home/scrapyuser/tutorial can start first spider with: cd tutorial scrapy genspider example example.com (scrapy11.py3) scrapyuser@88cc645ac499:~$ cd tutorial (scrapy11.py3) scrapyuser@88cc645ac499:~/tutorial$ scrapy genspider example example.com created spider 'example' using template 'basic' in module: tutorial.spiders.example (scrapy11.py3) scrapyuser@88cc645ac499:~/tutorial$ cat tutorial/spiders/example.py # -*- coding: utf-8 -*- import scrapy class examplespider(scrapy.spider): name = "example" allowed_domains = ["example.com"] start_urls = ( 'http://www.example.com/', ) def parse(self, response): pass (scrapy11.py3) scrapyuser@88cc645ac499:~/tutorial$ scrapy crawl example 2016-06-07 11:08:27 [scrapy] info: scrapy 1.1.0 started (bot: tutorial) 2016-06-07 11:08:27 [scrapy] info: overridden settings: {'spider_modules': ['tutorial.spiders'], 'bot_name': 'tutorial', 'robotstxt_obey': true, 'newspider_module': 'tutorial.spiders'} 2016-06-07 11:08:27 [scrapy] info: enabled extensions: ['scrapy.extensions.logstats.logstats', 'scrapy.extensions.corestats.corestats'] 2016-06-07 11:08:27 [scrapy] info: enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.robotstxtmiddleware', 'scrapy.downloadermiddlewares.httpauth.httpauthmiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.downloadtimeoutmiddleware', 'scrapy.downloadermiddlewares.useragent.useragentmiddleware', 'scrapy.downloadermiddlewares.retry.retrymiddleware', 'scrapy.downloadermiddlewares.defaultheaders.defaultheadersmiddleware', 'scrapy.downloadermiddlewares.redirect.metarefreshmiddleware', 'scrapy.downloadermiddlewares.httpcompression.httpcompressionmiddleware', 'scrapy.downloadermiddlewares.redirect.redirectmiddleware', 'scrapy.downloadermiddlewares.cookies.cookiesmiddleware', 'scrapy.downloadermiddlewares.chunked.chunkedtransfermiddleware', 'scrapy.downloadermiddlewares.stats.downloaderstats'] 2016-06-07 11:08:27 [scrapy] info: enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.httperrormiddleware', 'scrapy.spidermiddlewares.offsite.offsitemiddleware', 'scrapy.spidermiddlewares.referer.referermiddleware', 'scrapy.spidermiddlewares.urllength.urllengthmiddleware', 'scrapy.spidermiddlewares.depth.depthmiddleware'] 2016-06-07 11:08:27 [scrapy] info: enabled item pipelines: [] 2016-06-07 11:08:27 [scrapy] info: spider opened 2016-06-07 11:08:28 [scrapy] info: crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2016-06-07 11:08:28 [scrapy] debug: crawled (404) <get http://www.example.com/robots.txt> (referer: none) 2016-06-07 11:08:28 [scrapy] debug: crawled (200) <get http://www.example.com/> (referer: none) 2016-06-07 11:08:28 [scrapy] info: closing spider (finished) 2016-06-07 11:08:28 [scrapy] info: dumping scrapy stats: {'downloader/request_bytes': 436, 'downloader/request_count': 2, 'downloader/request_method_count/get': 2, 'downloader/response_bytes': 1921, 'downloader/response_count': 2, 'downloader/response_status_count/200': 1, 'downloader/response_status_count/404': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2016, 6, 7, 11, 8, 28, 614605), 'log_count/debug': 2, 'log_count/info': 7, 'response_received_count': 2, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2016, 6, 7, 11, 8, 28, 24624)} 2016-06-07 11:08:28 [scrapy] info: spider closed (finished) (scrapy11.py3) scrapyuser@88cc645ac499:~/tutorial$