热线电话:13121318867

登录
2020-08-04 阅读量: 746
随机森林分类器的实现

1. np.random. 在1到100当中,生成随机的10个整数,不能重复,放在一个list里面
2. 对这个list进行for循环,在循环当中,实例化空的决策树,将上面的list里面的元素当做随机数种子放进去,将这个决策树结果记录到某一个list下面。
3. 对这个决策树的list,进行循环,训练数据。
4. 最后,给出随机森林的feature_importance
最后一点要求,2,3两部使用一行代码搞定,提示,list comprehension


from sklearn.datasets import load_wine

d=load_wine()

X=d['data']

Y=d['target']


from sklearn.model_selection import train_test_split

Xtrain, Xtest, Ytrain, Ytest = train_test_split(X,Y,test_size=0.3)


n_estimators = 100


import numpy as np

import pandas as pd

seed_list = np.random.randint(0,1000000,n_estimators)


from sklearn.tree import DecisionTreeClassifier

estismators_ = [DecisionTreeClassifier(random_state=i) for i in seed_list]


data = pd.DataFrame(Xtrain)

data['label'] = Ytrain


data_list = [data.sample(frac=1,replace=True,random_state=i).drop_duplicates().copy() for i in seed_list]

estimators_ = [i.fit(j.iloc[:,:-1],j.iloc[:,-1]) for i,j in zip (estimators_,data_list)]

pd.DataFrame([i.feature_importances_ for i in estimators_],columns=d['feature_names']).mean(axis=0)


from sklearn.ensemble import RandomForestClassifier

rfc = RandomForestClassifier(n_estimators=100).fit(X,Y)

rfc.feature_importances_

Y_test_pred = np.array([i.predict_proba(Xtest) for i in estimators_]).mean(axis=0).argmax(axis=1)


from sklearn.metrics import accuracy_score

accuracy_score(Ytest,Y_test_pred)


34.9871
0
关注作者
收藏
评论(0)

发表评论

暂无数据