1. np.random. 在1到100当中,生成随机的10个整数,不能重复,放在一个list里面
2. 对这个list进行for循环,在循环当中,实例化空的决策树,将上面的list里面的元素当做随机数种子放进去,将这个决策树结果记录到某一个list下面。
3. 对这个决策树的list,进行循环,训练数据。
4. 最后,给出随机森林的feature_importance
最后一点要求,2,3两部使用一行代码搞定,提示,list comprehension
from sklearn.datasets import load_wine
d=load_wine()
X=d['data']
Y=d['target']
from sklearn.model_selection import train_test_split
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X,Y,test_size=0.3)
n_estimators = 100
import numpy as np
import pandas as pd
seed_list = np.random.randint(0,1000000,n_estimators)
from sklearn.tree import DecisionTreeClassifier
estismators_ = [DecisionTreeClassifier(random_state=i) for i in seed_list]
data = pd.DataFrame(Xtrain)
data['label'] = Ytrain
data_list = [data.sample(frac=1,replace=True,random_state=i).drop_duplicates().copy() for i in seed_list]
estimators_ = [i.fit(j.iloc[:,:-1],j.iloc[:,-1]) for i,j in zip (estimators_,data_list)]
pd.DataFrame([i.feature_importances_ for i in estimators_],columns=d['feature_names']).mean(axis=0)
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_estimators=100).fit(X,Y)
rfc.feature_importances_
Y_test_pred = np.array([i.predict_proba(Xtest) for i in estimators_]).mean(axis=0).argmax(axis=1)
from sklearn.metrics import accuracy_score
accuracy_score(Ytest,Y_test_pred)
暂无数据