分類模型——變量選擇

CloudDeveloper 發(fā)布于2019-07-31 11:03 / 1599人閱讀

摘要：系數(shù)反映每個(gè)特征的影響力。越大表示該特征在分類中起到的作用越大

import numpy as np  
import scipy as sp  
import pandas as pd
import matplotlib.pyplot as plt

Split train and test

from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(customer.ix[:,0:customer.columns.size-1], customer.ix[:,customer.columns.size-1], test_size = 0.2)
x_train, x_test, y_train, y_test = train_test_split(order.ix[:,0:order.columns.size-1], order.ix[:,order.columns.size-1], test_size = 0.2)

Pearson Correlation for Order

from scipy.stats import pearsonr  

prr = []
for i in range(order.columns.size-1):
   frame = pearsonr(order.iloc[:,i], order.iloc[:,order.columns.size-1]) 
   prr.append(frame)

result = pd.concat([pd.DataFrame(order.columns.values.tolist()), pd.DataFrame(prr)], axis=1) 
result.columns = ["Features", "Pearson", "Pvalue"]
result
result.to_csv("result.csv", index = True, header = True)

Pearson Correlation for Customer

from scipy.stats import pearsonr  
prr = []
for i in range(customer.columns.size-1):
   frame = pearsonr(customer.iloc[:,i], customer.iloc[:,customer.columns.size-1]) 
   prr.append(frame)

result = pd.concat([pd.DataFrame(customer.columns.values.tolist()), pd.DataFrame(prr)], axis=1) 
result.columns = ["Features", "Pearson", "Pvalue"]
result
result.to_csv("result.csv", index = True, header = True)

Random forest

from sklearn.ensemble import RandomForestRegressor  
clf = RandomForestRegressor()
clf.fit(x_train, y_train)

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_jobs=100)
clf.fit(x_train, y_train)

MIC

from minepy import MINE
mic = []
for i in range(customer.columns.size-1):
   frame = m.compute_score(customer.iloc[:,i], customer.iloc[:,34]) 
   prr.append(frame)
result = pd.concat([pd.DataFrame(customer.columns.values.tolist()), pd.DataFrame(prr)], axis=1) 
result.columns = ["Features", "Pearson", "Pvalue"]
result.to_csv("result.csv", index = True, header = True)

Feature Correlation

corr = customer.corr()
corr.to_csv("result.csv", index = True, header = True)

tar_corr = lambda x: x.corr(x["tar"])
cus_call.apply(tar_corr)
cus_call.corrwith(cus_call.tar)

Feature Importance

系數(shù)反映每個(gè)特征的影響力。越大表示該特征在分類中起到的作用越大

importances = pd.DataFrame(sorted(zip(x_train.columns, map(lambda x: round(x, 4), clf.feature_importances_)), reverse=True))
importances.columns = ["Features", "Importance"]
importances.to_csv("result.csv", index = True, header = True)

GPU云服務(wù)器云服務(wù)器機(jī)器學(xué)習(xí)模型選擇云服務(wù)器開(kāi)票分類選擇 python變量選擇對(duì)象模型與數(shù)據(jù)模型

文章版權(quán)歸作者所有，未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址：http://m.hztianpu.com/yun/44567.html

發(fā)表評(píng)論

登陸后可評(píng)論

0條評(píng)論

CloudDeveloper

男|高級(jí)講師

我要關(guān)注我要私信

TA的文章

騰訊云即將上線輕量應(yīng)用服務(wù)器,中國(guó)港澳臺(tái)地區(qū)和其他國(guó)家地域低至24元/月起,已開(kāi)放內(nèi)測(cè)試申請(qǐng)

閱讀 2600·2021-07-26 23:38
sass筆記-2|Sass基礎(chǔ)語(yǔ)法之讓樣式表更具條理性和可讀性

閱讀 3497·2019-08-30 13:10
Javascript如何與Sass,Less,Css之間共享變量？

閱讀 2392·2019-08-29 18:33
Vue 組件間的傳值（通訊）

閱讀 2382·2019-08-29 16:12
CSS scroll snap points 實(shí)現(xiàn)漸進(jìn)增強(qiáng)的滾動(dòng)

閱讀 1075·2019-08-29 10:59
文章目錄自動(dòng)生成器

閱讀 1853·2019-08-26 17:40
從lodash源碼學(xué)習(xí)節(jié)流與防抖

閱讀 889·2019-08-26 11:59
作用域鏈&&嚴(yán)格模式

閱讀 875·2019-08-26 11:41

成人无码视频,亚洲精品久久久久av无码,午夜精品久久久久久毛片,亚洲中文字幕日韩无码

資訊專欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！

分類模型——變量選擇

相關(guān)文章

隨機(jī)森林算法入門(python)

機(jī)器學(xué)習(xí)算法基礎(chǔ)（使用Python代碼）

發(fā)表評(píng)論

0條評(píng)論

CloudDeveloper

男|高級(jí)講師

TA的文章

騰訊云即將上線輕量應(yīng)用服務(wù)器,中國(guó)港澳臺(tái)地區(qū)和其他國(guó)家地域低至24元/月起,已開(kāi)放內(nèi)測(cè)試申請(qǐng)

sass筆記-2|Sass基礎(chǔ)語(yǔ)法之讓樣式表更具條理性和可讀性

Javascript如何與Sass,Less,Css之間共享變量？

Vue 組件間的傳值（通訊）

CSS scroll snap points 實(shí)現(xiàn)漸進(jìn)增強(qiáng)的滾動(dòng)

文章目錄自動(dòng)生成器

從lodash源碼學(xué)習(xí)節(jié)流與防抖

作用域鏈&&嚴(yán)格模式

最新活動(dòng)

資訊專欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！

分類模型——變量選擇

相關(guān)文章

發(fā)表評(píng)論

0條評(píng)論

男|高級(jí)講師

TA的文章

最新活動(dòng)

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！