成人无码视频,亚洲精品久久久久av无码,午夜精品久久久久久毛片,亚洲 中文字幕 日韩 无码

資訊專欄INFORMATION COLUMN

P2P平臺(tái)數(shù)據(jù)爬取分析

lushan / 1563人閱讀

摘要:關(guān)于數(shù)據(jù)來源本項(xiàng)目寫于年七月初,主要使用爬取網(wǎng)貸之家以及人人貸的數(shù)據(jù)進(jìn)行分析。注這是現(xiàn)在網(wǎng)貸之家的請(qǐng)求后臺(tái)的接口,爬蟲編寫的時(shí)候與數(shù)據(jù)接口與如今的請(qǐng)求接口不一樣,所以網(wǎng)貸之家的數(shù)據(jù)爬蟲部分已無效。

關(guān)于數(shù)據(jù)來源

本項(xiàng)目寫于2017年七月初,主要使用Python爬取網(wǎng)貸之家以及人人貸的數(shù)據(jù)進(jìn)行分析。
網(wǎng)貸之家是國內(nèi)最大的P2P數(shù)據(jù)平臺(tái),人人貸國內(nèi)排名前二十的P2P平臺(tái)。
源碼地址

數(shù)據(jù)爬取 抓包分析

抓包工具主要使用chrome的開發(fā)者工具 網(wǎng)絡(luò)一欄,網(wǎng)貸之家的數(shù)據(jù)全部是ajax返回json數(shù)據(jù),而人人貸既有ajax返回?cái)?shù)據(jù)也有html頁面直接生成數(shù)據(jù)。

請(qǐng)求實(shí)例


從數(shù)據(jù)中可以看到請(qǐng)求數(shù)據(jù)的方式(GET或者POST),請(qǐng)求頭以及請(qǐng)求參數(shù)。

從請(qǐng)求數(shù)據(jù)中可以看到返回?cái)?shù)據(jù)的格式(此例中為json)、數(shù)據(jù)結(jié)構(gòu)以及具體數(shù)據(jù)。
注:這是現(xiàn)在網(wǎng)貸之家的API請(qǐng)求后臺(tái)的接口,爬蟲編寫的時(shí)候與數(shù)據(jù)接口與如今的請(qǐng)求接口不一樣,所以網(wǎng)貸之家的數(shù)據(jù)爬蟲部分已無效。

構(gòu)造請(qǐng)求

根據(jù)抓包分析得到的結(jié)果,構(gòu)造請(qǐng)求。在本項(xiàng)目中,使用Python的 requests庫模擬http請(qǐng)求
具體代碼:

import requests
class SessionUtil():
    def __init__(self,headers=None,cookie=None):
        self.session=requests.Session()
        if headers is None:
            headersStr={"Accept":"application/json, text/javascript, */*; q=0.01",
                "X-Requested-With":"XMLHttpRequest",
                "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36",
                "Accept-Encoding":"gzip, deflate, sdch, br",
                "Accept-Language":"zh-CN,zh;q=0.8"
                }
            self.headers=headersStr
        else:
            self.headers=headers
        self.cookie=cookie
    //發(fā)送get請(qǐng)求
    def getReq(self,url):
        return self.session.get(url,headers=self.headers).text
    def addCookie(self,cookie):
        self.headers["cookie"]=cookie
    //發(fā)送post請(qǐng)求
    def postReq(self,url,param):
        return self.session.post(url, param).text

在設(shè)置請(qǐng)求頭的時(shí)候,關(guān)鍵字段只設(shè)置了"User-Agent",網(wǎng)貸之家和人人貸的沒有反爬措施,甚至不用設(shè)置"Referer"字段來防止跨域錯(cuò)誤。

爬蟲實(shí)例

以下是一個(gè)爬蟲實(shí)例

import json
import time
from databaseUtil import DatabaseUtil
from sessionUtil import SessionUtil
from dictUtil import DictUtil
from logUtil import LogUtil
import traceback
def handleData(returnStr):
    jsonData=json.loads(returnStr)
    platData=jsonData.get("data").get("platOuterVo")
    return platData
def storeData(jsonOne,conn,cur,platId):
    actualCapital=jsonOne.get("actualCapital")
    aliasName=jsonOne.get("aliasName")
    association=jsonOne.get("association")
    associationDetail=jsonOne.get("associationDetail")
    autoBid=jsonOne.get("autoBid")
    autoBidCode=jsonOne.get("autoBidCode")
    bankCapital=jsonOne.get("bankCapital")
    bankFunds=jsonOne.get("bankFunds")
    bidSecurity=jsonOne.get("bidSecurity")
    bindingFlag=jsonOne.get("bindingFlag")
    businessType=jsonOne.get("businessType")
    companyName=jsonOne.get("companyName")
    credit=jsonOne.get("credit")
    creditLevel=jsonOne.get("creditLevel")
    delayScore=jsonOne.get("delayScore")
    delayScoreDetail=jsonOne.get("delayScoreDetail")
    displayFlg=jsonOne.get("displayFlg")
    drawScore=jsonOne.get("drawScore")
    drawScoreDetail=jsonOne.get("drawScoreDetail")
    equityVoList=jsonOne.get("equityVoList")
    experienceScore=jsonOne.get("experienceScore")
    experienceScoreDetail=jsonOne.get("experienceScoreDetail")
    fundCapital=jsonOne.get("fundCapital")
    gjlhhFlag=jsonOne.get("gjlhhFlag")
    gjlhhTime=jsonOne.get("gjlhhTime")
    gruarantee=jsonOne.get("gruarantee")
    inspection=jsonOne.get("inspection")
    juridicalPerson=jsonOne.get("juridicalPerson")
    locationArea=jsonOne.get("locationArea")
    locationAreaName=jsonOne.get("locationAreaName")
    locationCity=jsonOne.get("locationCity")
    locationCityName=jsonOne.get("locationCityName")
    manageExpense=jsonOne.get("manageExpense")
    manageExpenseDetail=jsonOne.get("manageExpenseDetail")
    newTrustCreditor=jsonOne.get("newTrustCreditor")
    newTrustCreditorCode=jsonOne.get("newTrustCreditorCode")
    officeAddress=jsonOne.get("officeAddress")
    onlineDate=jsonOne.get("onlineDate")
    payment=jsonOne.get("payment")
    paymode=jsonOne.get("paymode")
    platBackground=jsonOne.get("platBackground")
    platBackgroundDetail=jsonOne.get("platBackgroundDetail")
    platBackgroundDetailExpand=jsonOne.get("platBackgroundDetailExpand")
    platBackgroundExpand=jsonOne.get("platBackgroundExpand")
    platEarnings=jsonOne.get("platEarnings")
    platEarningsCode=jsonOne.get("platEarningsCode")
    platName=jsonOne.get("platName")
    platStatus=jsonOne.get("platStatus")
    platUrl=jsonOne.get("platUrl")
    problem=jsonOne.get("problem")
    problemTime=jsonOne.get("problemTime")
    recordId=jsonOne.get("recordId")
    recordLicId=jsonOne.get("recordLicId")
    registeredCapital=jsonOne.get("registeredCapital")
    riskCapital=jsonOne.get("riskCapital")
    riskFunds=jsonOne.get("riskFunds")
    riskReserve=jsonOne.get("riskReserve")
    riskcontrol=jsonOne.get("riskcontrol")
    securityModel=jsonOne.get("securityModel")
    securityModelCode=jsonOne.get("securityModelCode")
    securityModelOther=jsonOne.get("securityModelOther")
    serviceScore=jsonOne.get("serviceScore")
    serviceScoreDetail=jsonOne.get("serviceScoreDetail")
    startInvestmentAmout=jsonOne.get("startInvestmentAmout")
    term=jsonOne.get("term")
    termCodes=jsonOne.get("termCodes")
    termWeight=jsonOne.get("termWeight")
    transferExpense=jsonOne.get("transferExpense")
    transferExpenseDetail=jsonOne.get("transferExpenseDetail")
    trustCapital=jsonOne.get("trustCapital")
    trustCreditor=jsonOne.get("trustCreditor")
    trustCreditorMonth=jsonOne.get("trustCreditorMonth")
    trustFunds=jsonOne.get("trustFunds")
    tzjPj=jsonOne.get("tzjPj")
    vipExpense=jsonOne.get("vipExpense")
    withTzj=jsonOne.get("withTzj")
    withdrawExpense=jsonOne.get("withdrawExpense")
    sql="insert into problemPlatDetail (actualCapital,aliasName,association,associationDetail,autoBid,autoBidCode,bankCapital,bankFunds,bidSecurity,bindingFlag,businessType,companyName,credit,creditLevel,delayScore,delayScoreDetail,displayFlg,drawScore,drawScoreDetail,equityVoList,experienceScore,experienceScoreDetail,fundCapital,gjlhhFlag,gjlhhTime,gruarantee,inspection,juridicalPerson,locationArea,locationAreaName,locationCity,locationCityName,manageExpense,manageExpenseDetail,newTrustCreditor,newTrustCreditorCode,officeAddress,onlineDate,payment,paymode,platBackground,platBackgroundDetail,platBackgroundDetailExpand,platBackgroundExpand,platEarnings,platEarningsCode,platName,platStatus,platUrl,problem,problemTime,recordId,recordLicId,registeredCapital,riskCapital,riskFunds,riskReserve,riskcontrol,securityModel,securityModelCode,securityModelOther,serviceScore,serviceScoreDetail,startInvestmentAmout,term,termCodes,termWeight,transferExpense,transferExpenseDetail,trustCapital,trustCreditor,trustCreditorMonth,trustFunds,tzjPj,vipExpense,withTzj,withdrawExpense,platId) values (""+actualCapital+"",""+aliasName+"",""+association+"",""+associationDetail+"",""+autoBid+"",""+autoBidCode+"",""+bankCapital+"",""+bankFunds+"",""+bidSecurity+"",""+bindingFlag+"",""+businessType+"",""+companyName+"",""+credit+"",""+creditLevel+"",""+delayScore+"",""+delayScoreDetail+"",""+displayFlg+"",""+drawScore+"",""+drawScoreDetail+"",""+equityVoList+"",""+experienceScore+"",""+experienceScoreDetail+"",""+fundCapital+"",""+gjlhhFlag+"",""+gjlhhTime+"",""+gruarantee+"",""+inspection+"",""+juridicalPerson+"",""+locationArea+"",""+locationAreaName+"",""+locationCity+"",""+locationCityName+"",""+manageExpense+"",""+manageExpenseDetail+"",""+newTrustCreditor+"",""+newTrustCreditorCode+"",""+officeAddress+"",""+onlineDate+"",""+payment+"",""+paymode+"",""+platBackground+"",""+platBackgroundDetail+"",""+platBackgroundDetailExpand+"",""+platBackgroundExpand+"",""+platEarnings+"",""+platEarningsCode+"",""+platName+"",""+platStatus+"",""+platUrl+"",""+problem+"",""+problemTime+"",""+recordId+"",""+recordLicId+"",""+registeredCapital+"",""+riskCapital+"",""+riskFunds+"",""+riskReserve+"",""+riskcontrol+"",""+securityModel+"",""+securityModelCode+"",""+securityModelOther+"",""+serviceScore+"",""+serviceScoreDetail+"",""+startInvestmentAmout+"",""+term+"",""+termCodes+"",""+termWeight+"",""+transferExpense+"",""+transferExpenseDetail+"",""+trustCapital+"",""+trustCreditor+"",""+trustCreditorMonth+"",""+trustFunds+"",""+tzjPj+"",""+vipExpense+"",""+withTzj+"",""+withdrawExpense+"",""+platId+"")"
    cur.execute(sql)
    conn.commit()

conn,cur=DatabaseUtil().getConn()
session=SessionUtil()
logUtil=LogUtil("problemPlatDetail.log")
cur.execute("select platId from problemPlat")
data=cur.fetchall()
print(data)
mylist=list()
print(data)
for i in range(0,len(data)):
    platId=str(data[i].get("platId"))
    
    mylist.append(platId)

print mylist  
for i in mylist:
    url=""+i
    try:
        data=session.getReq(url)
        platData=handleData(data)
        dictObject=DictUtil(platData)
        storeData(dictObject,conn,cur,i)
    except Exception,e:
        traceback.print_exc()
cur.close()
conn.close

整個(gè)過程中 我們 構(gòu)造請(qǐng)求,然后把解析每個(gè)請(qǐng)求的響應(yīng),其中json返回值使用json庫進(jìn)行解析,html頁面使用BeautifulSoup庫進(jìn)行解析(結(jié)構(gòu)復(fù)雜的html的頁面推薦使用lxml庫進(jìn)行解析),解析到的結(jié)果存儲(chǔ)到mysql數(shù)據(jù)庫中。

爬蟲代碼

爬蟲代碼地址(注:爬蟲使用代碼Python2與python3都可運(yùn)行,本人把爬蟲代碼部署在阿里云服務(wù)器上,使用Python2 運(yùn)行)

數(shù)據(jù)分析

數(shù)據(jù)分析主要使用Python的numpy、pandas、matplotlib進(jìn)行數(shù)據(jù)分析,同時(shí)輔以海致BDP。

時(shí)間序列分析 數(shù)據(jù)讀取

一般采取把數(shù)據(jù)讀取pandas的DataFrame中進(jìn)行分析。
以下就是讀取問題平臺(tái)的數(shù)據(jù)的例子

problemPlat=pd.read_csv("problemPlat.csv",parse_dates=True)#問題平臺(tái) 

數(shù)據(jù)結(jié)構(gòu)

時(shí)間序列分析

eg 問題平臺(tái)數(shù)量隨時(shí)間變化

problemPlat["id"]["2012":"2017"].resample("M",how="count").plot(title="P2P發(fā)生問題")#發(fā)生問題P2P平臺(tái)數(shù)量 隨時(shí)間變化趨勢(shì)

圖形化展示

地域分析

使用海致BDP完成(Python繪制地圖分布輪子比較復(fù)雜,當(dāng)時(shí)還未學(xué)習(xí))

各省問題平臺(tái)數(shù)量

各省平臺(tái)成交額

規(guī)模分布分析

eg 全國六月平臺(tái)成交額分布
代碼

juneData["amount"].hist(normed=True)
juneData["amount"].plot(kind="kde",style="k--")#六月份交易量概率分布

核密度圖形展示

成交額取對(duì)數(shù)核密度分布

np.log10(juneData["amount"]).hist(normed=True)
np.log10(juneData["amount"]).plot(kind="kde",style="k--")#取 10 對(duì)數(shù)的 概率分布

圖形化展示

可看出取10的對(duì)數(shù)后分布更符合正常的金字塔形。

相關(guān)性分析 eg.陸金所交易額與所有平臺(tái)交易額的相關(guān)系數(shù)變化趨勢(shì)
lujinData=platVolume[platVolume["wdzjPlatId"]==59]
corr=pd.rolling_corr(lujinData["amount"],allPlatDayData["amount"],50,min_periods=50).plot(title="陸金所交易額與所有平臺(tái)交易額的相關(guān)系數(shù)變化趨勢(shì)")

圖形化展示

分類比較

車貸平臺(tái)與全平臺(tái)成交額數(shù)據(jù)對(duì)比

carFinanceDayData=carFinanceData.resample("D").sum()["amount"]
fig,axes=plt.subplots(nrows=1,ncols=2,sharey=True,figsize=(14,7))
carFinanceDayData.plot(ax=axes[0],title="車貸平臺(tái)交易額")
allPlatDayData["amount"].plot(ax=axes[1],title="所有p2p平臺(tái)交易額")

趨勢(shì)預(yù)測(cè) eg預(yù)測(cè)陸金所成交量趨勢(shì)(使用Facebook Prophet庫完成)
lujinAmount=platVolume[platVolume["wdzjPlatId"]==59]
lujinAmount["y"]=lujinAmount["amount"]
lujinAmount["ds"]=lujinAmount["date"]
m=Prophet(yearly_seasonality=True)
m.fit(lujinAmount)
future=m.make_future_dataframe(periods=365)
forecast=m.predict(future)
m.plot(forecast)

趨勢(shì)預(yù)測(cè)圖形化展示

數(shù)據(jù)分析代碼

數(shù)據(jù)分析代碼地址(注:數(shù)據(jù)分析代碼智能運(yùn)行在Python3 環(huán)境下)
代碼運(yùn)行后樣例(無需安裝Python環(huán)境 也可查看具體代碼解圖形化展示)

后記

這是本人從 Java web轉(zhuǎn)向數(shù)據(jù)方向后自己寫的第一項(xiàng)目,也是自己的第一個(gè)Python項(xiàng)目,在整個(gè)過程中,也沒遇到多少坑,整體來說,爬蟲和數(shù)據(jù)分析以及Python這門語言門檻都是非常低的。
如果想入門Python爬蟲,推薦《Python網(wǎng)絡(luò)數(shù)據(jù)采集》

如果想入門Python數(shù)據(jù)分析,推薦 《利用Python進(jìn)行數(shù)據(jù)分析》

文章版權(quán)歸作者所有,未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址:http://m.hztianpu.com/yun/41378.html

相關(guān)文章

  • 惡意爬蟲這樣窺探、爬取、威脅你的網(wǎng)站

    摘要:利用這一業(yè)務(wù)邏輯,惡意爬蟲通過各類社工庫拿到一批手機(jī)號(hào)后可以在短時(shí)內(nèi)驗(yàn)證這批號(hào)碼是否為某一網(wǎng)站的注冊(cè)用戶。事前的甄別預(yù)防才是關(guān)鍵惡意爬蟲在給網(wǎng)站帶來可觀訪問量的同時(shí),也帶來了難以估量的威脅和損失。 整個(gè)互聯(lián)網(wǎng)的流量中,真人占比有多少? 80% ??60% ??50% ? showImg(https://segmentfault.com/img/bVGSra?w=350&h=346); ...

    wangbjun 評(píng)論0 收藏0
  • 互聯(lián)網(wǎng)金融爬蟲怎么寫-第一課 p2p網(wǎng)貸爬蟲(XPath入門)

    摘要:之前寫了一個(gè)電商爬蟲系列的文章,簡(jiǎn)單的給大家展示了一下爬蟲從入門到進(jìn)階的路徑,但是作為一個(gè)永遠(yuǎn)走在時(shí)代前沿的科技工作者,我們從來都不能停止。金融數(shù)據(jù)實(shí)在是價(jià)值大,維度多,來源廣。由于也是一種,因此通常來說,在中抽取某個(gè)元素是通過來做的。 相關(guān)教程: 手把手教你寫電商爬蟲-第一課 找個(gè)軟柿子捏捏 手把手教你寫電商爬蟲-第二課 實(shí)戰(zhàn)尚妝網(wǎng)分頁商品采集爬蟲 手把手教你寫電商爬蟲-第三課 實(shí)戰(zhàn)...

    kk_miles 評(píng)論0 收藏0
  • 互聯(lián)網(wǎng)金融爬蟲怎么寫-第一課 p2p網(wǎng)貸爬蟲(XPath入門)

    摘要:之前寫了一個(gè)電商爬蟲系列的文章,簡(jiǎn)單的給大家展示了一下爬蟲從入門到進(jìn)階的路徑,但是作為一個(gè)永遠(yuǎn)走在時(shí)代前沿的科技工作者,我們從來都不能停止。金融數(shù)據(jù)實(shí)在是價(jià)值大,維度多,來源廣。由于也是一種,因此通常來說,在中抽取某個(gè)元素是通過來做的。 相關(guān)教程: 手把手教你寫電商爬蟲-第一課 找個(gè)軟柿子捏捏 手把手教你寫電商爬蟲-第二課 實(shí)戰(zhàn)尚妝網(wǎng)分頁商品采集爬蟲 手把手教你寫電商爬蟲-第三課 實(shí)戰(zhàn)...

    jlanglang 評(píng)論0 收藏0
  • PPIO 分布式存儲(chǔ)在數(shù)據(jù)分發(fā)上有哪些優(yōu)勢(shì)?

    摘要:的關(guān)鍵技術(shù)主要有內(nèi)容存儲(chǔ)和分發(fā)技術(shù)。分發(fā)本身是和存儲(chǔ)密不可分的存儲(chǔ)和分發(fā)的實(shí)質(zhì)都是數(shù)據(jù)的讀取和使用,兩者是不可能分割的。只是存儲(chǔ)場(chǎng)景和分發(fā)場(chǎng)景,設(shè)計(jì)有些不同,服務(wù)質(zhì)量的要求也不一樣。根據(jù)區(qū)域和時(shí)段的不同,存儲(chǔ)的價(jià)格也會(huì)有不同。 showImg(https://segmentfault.com/img/remote/1460000019478027); PPIO 是為開發(fā)者打造的去中心化...

    xiaowugui666 評(píng)論0 收藏0

發(fā)表評(píng)論

0條評(píng)論

閱讀需要支付1元查看
<