我正在尝试爬入一个需要登录的网站,但表单中有一个表格,我在下面添加了其 html 内容,并添加了我的代码。
任何帮助,将不胜感激。谢谢
<table align="center" width="100%" border="0" cellpadding="0" style="background:#FDFDFD;">
<tbody><tr>
<td colspan="2" align="center">
Loyola Students Log In
</td>
</tr>
<tr>
<td valign="top" colspan="2" align="center">
<table cellspacing="1" cellpadding="6" width="50%" border="0">
<tbody>
<tr>
<td align="center">
<form id="login_form" name="login_form" action="" method="post" autocomplete="off">
<table cellspacing="0" cellpadding="2" border="0">
<tbody>
<tr>
<td align="center">
<input type="hidden" id="txtSK" name="txtSK">
<input type="hidden" id="txtAN" name="txtAN">
<input type="hidden" value="1" name="_tries">
<input type="hidden" name="_md5">
<input type="hidden" name="txtPageAction" id="txtPageAction" value="0">
<table cellspacing="2" cellpadding="2" border="0" width="60%">
<tbody>
<tr>
<td nowrap="" align="right">Your ID (Dept.No.): </td>
<td align="left"><input size="17" id="login" name="login"></td>
</tr>
<tr>
<td nowrap="" align="right">Password:</td>
<td align="left"><input type="password" maxlength="32" size="17" value="" id="passwd" name="passwd"></td>
</tr>
<tr>
<td></td>
<td align="right"><a href="../../students/loginManager/forgotPassword.jsp">Forgot password?</a></td>
</tr>
<tr>
<td align="center" colspan="2">
<input type="submit" value="Log In" name="_save" id="_save" onclick="funSave()">
</td>
</tr>
</tbody>
</table>
</td>
</tr>
<!-- <tr>
<td colspan="1" style="cursor:pointer" align=center valign=center onclick="javascript:window.open('../../students/loginManager/forgotPassword.jsp')"><u>Forgot Password</u></td>
</tr>-->
<tr>
<td align="left">
<p>If you are a Student and this is your first login, you must use your Dept. number as your User ID, with your date of birth [Format ddmmyyyy E.g. 03121990] as your password.</p>
</td>
</tr>
<tr><td height="28px">
<div id="divProcess" name="divProcess" style="top : 0; left : 0; height:20px; width:100%;display:none ">
<table border="0" width="100%" align="center">
<tbody><tr><td align="center" valign="middle">Please Wait...<img src="../../resources/Image/wait.gif"></td></tr>
</tbody></table>
</div>
</td></tr>
</tbody>
</table>
</form>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody></table>
所以,上面是检查元素派生代码,它有一个表格,表格有一个表格,然后是表格元素。我希望登录并抓取数据,我已经编写了一些基本代码,如下所示。
我的代码如下:
# -*- coding: utf-8 -*-
from scrapy import Spider
from scrapy.loader import ItemLoader
from scrapy.http import FormRequest
from scrapy.utils.response import open_in_browser
from quotes_spider.items import QuotesSpiderItem
class QuotesSpider(Spider):
name = 'quotes'
start_urls = (
'http://erp.loyolacollege.edu/loyolaonline/students/loginManager/youLogin.jsp',
)
def parse(self, response):
return FormRequest.from_response(response,
formdata={'login': 'abc','password': 'xyz'},
callback=self.scrape_home_page)
def scrape_home_page(self, response):
open_in_browser(response)
l = ItemLoader(item=QuotesSpiderItem(), response=response)
h1_tag = response.xpath('//h1/a/text()').extract_first()
tags = response.xpath('//*[@class="tag-item"]/a/text()').extract()
l.add_value('h1_tag', h1_tag)
l.add_value('tags', tags)
return l.load_item()