Decision Tree

Hello everyone,

Today I went through Decision Tree topic and toady I’m going to share some notes on it.

Decision trees are frequently applied to problems involving classification. The goal is to create a tree-like structure by recursively splitting the data into subsets based on various criteria. Statistical criteria that seek to decrease impurity or maximize homogeneity within each subset are used to determine these divisions.

Entropy and Gini impurity are two popular impurity metrics. The decision tree algorithm selects the split that maximizes the information gain or overall purity after weighing several splits. The procedure keeps going until a predetermined threshold is reached, such a particular tree depth, or until additional splits don’t materially enhance the classification.

Gini Impurity:

Gini impurity quantifies the level of disorder or impurity in a collection of data points within the framework of decision trees. The Gini impurity G(t) for a given node t is computed as

G(t)=1−∑ i=1 c p i 2

where pi is the percentage of data points in class I at node t, and c is the number of classes.
The Gini impurity scale goes from 0 to 1, where 1 denotes the highest impurity (all data points are uniformly dispersed across classes) and 0 represents perfect purity (all data points belong to one class).

Entropy:

Another impurity metric used in decision trees is entropy. The entropy H(t) for a given node t can be calculated as follows:

H(t)=−∑ i=1 c p i log 2 (p i )

Whereas,

The fraction of data points in class I at node t is denoted by pi, and the number of classes is indicated by c.
Entropy is used to assess a node’s purity in a manner similar to Gini impurity, with lower values denoting greater purity.

Gini Impurity:

Entropy:

Comments

Leave a Reply Cancel reply