0

我正在尝试爬入一个需要登录的网站,但表单中有一个表格,我在下面添加了其 html 内容,并添加了我的代码。

任何帮助,将不胜感激。谢谢

<table align="center" width="100%" border="0" cellpadding="0" style="background:#FDFDFD;">
                       <tbody><tr>
                           <td colspan="2" align="center">
                               Loyola Students Log In
                           </td>
                       </tr>
                        <tr>
                            <td valign="top" colspan="2" align="center">

                                <table cellspacing="1" cellpadding="6" width="50%" border="0">
                                    <tbody>
                                        <tr>
                                            <td align="center">

                                    <form id="login_form" name="login_form" action="" method="post" autocomplete="off">
                                        <table cellspacing="0" cellpadding="2" border="0">
                                            <tbody>
                                                <tr>
                                                    <td align="center">
                                                        <input type="hidden" id="txtSK" name="txtSK">
                                                        <input type="hidden" id="txtAN" name="txtAN">
                                                        <input type="hidden" value="1" name="_tries">
                                                        <input type="hidden" name="_md5">
                                                        <input type="hidden" name="txtPageAction" id="txtPageAction" value="0">
                                                        <table cellspacing="2" cellpadding="2" border="0" width="60%">
                                                            <tbody>
                                                                <tr>
                                                                    <td nowrap="" align="right">Your ID (Dept.No.): </td>
                                                                    <td align="left"><input size="17" id="login" name="login"></td>
                                                                </tr>
                                                                <tr>
                                                                    <td nowrap="" align="right">Password:</td>
                                                                    <td align="left"><input type="password" maxlength="32" size="17" value="" id="passwd" name="passwd"></td>
                                                                </tr>
                                                                <tr>
                                                                    <td></td>
                                                                    <td align="right"><a href="../../students/loginManager/forgotPassword.jsp">Forgot password?</a></td>
                                                                </tr>

                                                                <tr>
                                                                    <td align="center" colspan="2">
                                                                        <input type="submit" value="Log In" name="_save" id="_save" onclick="funSave()">
                                                                    </td>
                                                                </tr>
                                                            </tbody>
                                                        </table>
                                                    </td>
                                                </tr>
<!--                                                <tr>                                                                    
                                                    <td colspan="1" style="cursor:pointer"  align=center valign=center onclick="javascript:window.open('../../students/loginManager/forgotPassword.jsp')"><u>Forgot Password</u></td>                                                                                                                                                            
                                                </tr>-->
                                                <tr>
                                                    <td align="left">
                                                        <p>If you are a Student and this is your first login, you must use your Dept. number as your User ID, with your date of birth [Format ddmmyyyy E.g. 03121990] as your password.</p>
                                                    </td>
                                                </tr>
                                                <tr><td height="28px">
                                                        <div id="divProcess" name="divProcess" style="top : 0; left : 0; height:20px; width:100%;display:none ">
                                                            <table border="0" width="100%" align="center">
                                                                <tbody><tr><td align="center" valign="middle">Please Wait...<img src="../../resources/Image/wait.gif"></td></tr>
                                                                </tbody></table>
                                                        </div>
                                                    </td></tr>
                                            </tbody>
                                        </table>
                                    </form>
                            </td>
                        </tr>
                    </tbody>
                </table>
            </td>
           </tr>

           </tbody></table> 

所以,上面是检查元素派生代码,它有一个表格,表格有一个表格,然后是表格元素。我希望登录并抓取数据,我已经编写了一些基本代码,如下所示。

我的代码如下:

# -*- coding: utf-8 -*-
from scrapy import Spider
from scrapy.loader import ItemLoader
from scrapy.http import FormRequest
from scrapy.utils.response import open_in_browser

from quotes_spider.items import QuotesSpiderItem


class QuotesSpider(Spider):
    name = 'quotes'
    start_urls = (
        'http://erp.loyolacollege.edu/loyolaonline/students/loginManager/youLogin.jsp',
    )

    def parse(self, response):

        return FormRequest.from_response(response,
                                         formdata={'login': 'abc','password': 'xyz'},
                                         callback=self.scrape_home_page)

    def scrape_home_page(self, response):
        open_in_browser(response)
        l = ItemLoader(item=QuotesSpiderItem(), response=response)

        h1_tag = response.xpath('//h1/a/text()').extract_first()
        tags = response.xpath('//*[@class="tag-item"]/a/text()').extract()

        l.add_value('h1_tag', h1_tag)
        l.add_value('tags', tags)

        return l.load_item()
4

0 回答 0