Kaggle解題神器, GDBoost進階版,
因為常被拿來使用而且效果好所以獨立一篇說明
說明:
● Additive model (與 GBM 類似)
● Features sampling (與 Random forest 類似)
● Features sampling (與 Random forest 類似)
● Add regularization in objective function
● Use 1st and 2nd derivative to help training
● Use 1st and 2nd derivative to help training
XGBoost 與 GBM
(Gradient boosting machine) 有什麼差異?
- objective function 加上 regularization,避免Overfitting
- 用上一階及二階導數來生成下一棵樹feature / data sampling
- 與 RF 相同,每棵樹生長時用到不同的資料與 features
實作:
需要安裝XGBoost套件
pip install XGBoost
from xgboost import XGBClassifier
參數說明:
n_estimators [100]: number of trees
learning_rate [0.1]: shrinkage
max_depth [3]: too large → overfitting
gamma [0]: L2 loss regularization, too small → overfitting
lambda [0]: L1 loss regularization, too small → overfitting
scale_pos_weight [1]: use for imbalance data
*方括 [ ] 內為該參數預設值
early_stop = 10:
將 testing data 放進 eval_set,如果 validation 的結果 10 次 沒有進步,就提前結束 training
找出重要特徵:
視覺化:
Ref:
台灣人工智慧學校
learning_rate [0.1]: shrinkage
max_depth [3]: too large → overfitting
gamma [0]: L2 loss regularization, too small → overfitting
lambda [0]: L1 loss regularization, too small → overfitting
scale_pos_weight [1]: use for imbalance data
*方括 [ ] 內為該參數預設值
early_stop = 10:
將 testing data 放進 eval_set,如果 validation 的結果 10 次 沒有進步,就提前結束 training
找出重要特徵:
視覺化:
Ref:
台灣人工智慧學校
沒有留言:
張貼留言