Light GBM is a tree based learning algorithm where the trees are grown leaf-wise (horizontally) as compared to other models that are grown level-wise.
- High accuracy: It is grown on the leaf with the highest loss and will be able to reduce more loss than a level-wise model.
- Fast: high performance and takes up lesser memory to run
- Runs on GPU
- Prone to overfitting: not suitable for datasets <10,000
- So many hyper parameters to tune (>100) …
Parameters
- objective: regression, binary, multiclass
- metric: mae (mean absolute error), use (mean squared error), binary_logloss (binary classification), multi_logloss (multiclass classification)
- boosting: gbdt (traditional gradient boosting decision tree), rf (random forest), dart (dropouts meet multiple additive regression trees), goss (gradient based one side sampling)
- num_boost_round: number of iterations (usually 100+). Large value increases accuracy but decreases speed of training
- learning_rate: determine the contribution of each tree for each iteration. low learning rate will take many iteration (slow) before converging, and a high learning rate may converge quickly but with lower accuracy (usually 0.1, 0.01 etc)
- max_depth: max depth of tree. Adjusted to smaller value to prevent overfitting
- min_data_in_leaf: min number of record in each leaf to prevent overfitting. higher value will prevent overfitting, but can also cause underfitting. lower values for imbalanced class data such that minority class can fall within the same leaf(usually 100–1000s for large dataset)
- num_leaves: total number of leaves in a full tree (usually < 2^max depth)
- feature_fraction: subset of features used for growing tree in each iteration. random selection of features for each tree reduces multicollinearity and overfitting (usually 0.8) and speeds up training
- bagging_fraction: subset of data to be used in each iteration. prevents overfitting and speeds up training
- max_bin: splitting continuous feature into discrete bins. smaller value speeds up training and prevents overfitting, larger value will be more accurate.