无法使用Scrapy跟踪链接

无法使用Scrapy跟踪链接

尝试以下代码:

代码语言:javascript运行复制from scrapy.spider import BaseSpider

from scrapy.selector import HtmlXPathSelector

from scrapy.http.request import Request

class ScrapyOrgSpider(BaseSpider):

name = "example.com"

allowed_domains = ["example.com"]

start_urls = ["http://www.example.com/abcd"]

def parse(self, response):

self.log('@@ Original response: %s' % response)

req = Request("http://www.example.com/follow", callback=self.a_1)

self.log('@@ Next request: %s' % req)

return req

def a_1(self, response):

hxs = HtmlXPathSelector(response)

self.log('@@ extraction: %s' %

hxs.select("//a[@class='channel-link']").extract())日志输出:

代码语言:javascript运行复制2012-11-22 12:20:06-0600 [scrapy] INFO: Scrapy 0.17.0 started (bot: oneoff)

2012-11-22 12:20:06-0600 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState

2012-11-22 12:20:06-0600 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats

2012-11-22 12:20:06-0600 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware

2012-11-22 12:20:06-0600 [scrapy] DEBUG: Enabled item pipelines:

2012-11-22 12:20:06-0600 [example.com] INFO: Spider opened

2012-11-22 12:20:06-0600 [example.com] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

2012-11-22 12:20:06-0600 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023

2012-11-22 12:20:06-0600 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080

2012-11-22 12:20:07-0600 [example.com] DEBUG: Redirecting (302) to from

2012-11-22 12:20:07-0600 [example.com] DEBUG: Crawled (200) (referer: None)

2012-11-22 12:20:07-0600 [example.com] DEBUG: @@ Original response: <200 http://www.iana.org/domains/example/>

2012-11-22 12:20:07-0600 [example.com] DEBUG: @@ Next request:

2012-11-22 12:20:07-0600 [example.com] DEBUG: Redirecting (302) to from

2012-11-22 12:20:08-0600 [example.com] DEBUG: Crawled (200) (referer: http://www.iana.org/domains/example/)

2012-11-22 12:20:08-0600 [example.com] DEBUG: @@ extraction: []

2012-11-22 12:20:08-0600 [example.com] INFO: Closing spider (finished)

相关推荐

Windows Hello指纹设置不了怎么办?
365net

Windows Hello指纹设置不了怎么办?

📅 07-03 👁️ 3279
关于 Smallpdf
365bet限制投注

关于 Smallpdf

📅 07-01 👁️ 3993