A Decision Tree Algorithm is a supervised machine learning algorithm used for classification and regression tasks. It works by recursively splitting the dataset into smaller subsets based on feature values, forming a tree-like structure where each node represents a decision rule.
Entropy (Information Gain) (used in classification)
Mean Squared Error (MSE) (used in regression)
Advantages:
✅ Easy to interpret and visualize.
✅ Handles both numerical and categorical data.
✅ Requires little data preprocessing (e.g., no need for feature scaling).
Disadvantages:
❌ Prone to overfitting (if not pruned properly).
❌ Sensitive to noisy data.
❌ Can become biased if one class dominates.
Common Decision Tree Algorithms:
- ID3 (Iterative Dichotomiser 3) – Uses entropy/information gain.
- C4.5 – An improvement over ID3, supports missing values.
- CART (Classification and Regression Trees) – Uses Gini impurity and MSE.
2️⃣ Compute Entropy(A) for each attribute.
3️⃣ Compute Information Gain (IG) for each attribute.
4️⃣ Select the attribute with highest IG and split the dataset.
5️⃣ Recursively repeat Steps 2-4 for each subset.
6️⃣ Stop when all instances belong to one class or no attributes remain.
7️⃣ The resulting decision tree classifies new data points.
Let's go step by step to calculate Entropy and Information Gain for the Play Golf dataset.
Step 1: Sample Dataset
Step 1: Compute the Entropy of the Entire Dataset
Entropy is a measure of uncertainty or disorder in the dataset. A higher entropy value means more uncertainty.
Formula for Entropy:
where:
- is the probability of each class (Yes/No in our case),
- is the dataset.
So, the entropy of the dataset is 0.94.
Step 2: Compute Entropy for Each Attribute
Now, we calculate the entropy of different attributes to check how well they split the data.
Step 3: Compute the Weighted Entropy of the Attribute
To compute the overall entropy of Outlook, we take the weighted sum of the entropy values:
What is Weighted Entropy?
When we split the dataset based on an attribute (e.g., "Outlook"), we create subsets of the data. Each subset has its own entropy (level of uncertainty).
Formula for Weighted Entropy
Step 4: Compute Information Gain for the Attribute
Information Gain tells us how much entropy is reduced after splitting.
Formula for Information Gain (IG):
For Outlook:
We repeat this calculation for all attributes (Humidity, Windy, Temperature).
Step 5: Select the Attribute with the Highest Information Gain
We compare the Information Gain (IG) of all attributes and choose the one with the highest IG as the splitting criterion.
==>
Step 1: Recall the Formula for Information Gain (IG)
The Information Gain (IG) of an attribute is calculated as:
- Entropy(Parent): Entropy of the entire dataset before splitting.
- Entropy(Attribute): Weighted entropy after splitting by that attribute.
Step 2: Entropy of the Parent (Before Splitting)
Before splitting, we calculate the overall entropy of the dataset. Suppose we have 14 instances in the dataset:
Play Golf | Count |
---|---|
Yes | 9 |
No | 5 |
The entropy of the entire dataset is:
So, Entropy(Parent) = 0.94.
Step 3: Compute Entropy for Each Attribute
(1) Entropy of "Windy"
"Windy" has two possible values: TRUE and FALSE. We split the dataset accordingly.
Windy | Play Golf (Yes) | Play Golf (No) | Total Instances |
---|---|---|---|
TRUE | 3 | 3 | 6 |
FALSE | 6 | 2 | 8 |
Calculate Entropy for Windy=TRUE
Calculate Entropy for Windy=FALSE
Weighted Entropy of Windy
Information Gain for Windy
(2) Entropy of "Humidity"
"Humidity" has two values: High and Normal.
Humidity | Play Golf (Yes) | Play Golf (No) | Total Instances |
---|---|---|---|
High | 3 | 4 | 7 |
Normal | 6 | 1 | 7 |
Calculate Entropy for Humidity=High
Calculate Entropy for Humidity=Normal
Weighted Entropy of Humidity
Information Gain for Humidity
(3) Entropy of "Temperature"
"Temperature" has three values: Hot, Mild, Cool.
Temperature | Play Golf (Yes) | Play Golf (No) | Total Instances |
---|---|---|---|
Hot | 2 | 2 | 4 |
Mild | 4 | 2 | 6 |
Cool | 3 | 1 | 4 |
Using the same process as above, we calculate:
Step 4: Choose the Attribute with Highest Information Gain
Attribute | Information Gain |
---|---|
Outlook | 0.25 |
Humidity | 0.15 |
Windy | 0.10 |
Temperature | 0.05 |
Since Outlook has the highest IG (0.25), it becomes the root node in our decision tree.
====>
Step 6: Repeat the Process for Each Subset
Now, we repeat Steps 2-5 for the sub-groups (Sunny, Overcast, Rainy).
- For Overcast, since entropy is 0, it's a leaf node (all "Yes").
- For Sunny and Rainy, we continue splitting using other attributes (Humidity, Windy, etc.).
We repeat this process recursively until:
- All subsets are pure (Entropy = 0).
- No more attributes are left to split.
At this stage, we have selected Outlook as the root node because it has the highest Information Gain (IG = 0.25).
===>
This means the dataset is now divided into three subsets based on the values of Outlook:
- Overcast Subset
- Sunny Subset
- Rainy Subset
Step 6.1: Overcast Subset
Outlook | Temperature | Humidity | Windy | Play Golf |
---|---|---|---|---|
Overcast | Hot | High | FALSE | Yes |
Overcast | Cool | Normal | TRUE | Yes |
Overcast | Mild | High | FALSE | Yes |
Overcast | Hot | Normal | FALSE | Yes |
- In this subset, all instances have Play Golf = Yes.
- Since there is no variation in the target variable, the entropy is 0.
- Conclusion: Overcast becomes a leaf node labeled "Yes" (pure subset).
✅ No further splitting is needed for Overcast.
Step 6.2: Sunny Subset
Outlook | Temperature | Humidity | Windy | Play Golf |
---|---|---|---|---|
Sunny | Mild | High | FALSE | Yes |
Sunny | Cool | Normal | TRUE | No |
Sunny | Cool | Normal | FALSE | Yes |
Sunny | Mild | Normal | FALSE | Yes |
Sunny | Mild | High | TRUE | No |
Since this subset contains both Yes and No values, it is not pure. We need to continue splitting.
Step 6.2.1: Calculate Entropy for the Sunny Subset
Play Golf | Count |
---|---|
Yes | 3 |
No | 2 |
Since entropy > 0, we must split further. We now compute Information Gain for attributes in this subset (Humidity, Windy, Temperature).
Step 6.2.2: Choose Best Split for Sunny
Let’s assume:
- IG(Humidity) = 0.15
- IG(Windy) = 0.20 (highest IG)
- IG(Temperature) = 0.05
Since Windy has the highest IG (0.20), we split the Sunny subset based on Windy.
✅ Windy is the next decision node for Sunny. Now, we split Sunny into two branches:
- Sunny & Windy = TRUE
- Sunny & Windy = FALSE
Subsets:
- Sunny & Windy = TRUE → (Entropy = 0) → Leaf Node
- Sunny & Windy = FALSE → (Entropy = 0) → Leaf Node
Since these subsets are pure, we stop here.
Step 6.3: Rainy Subset
Outlook | Temperature | Humidity | Windy | Play Golf |
---|---|---|---|---|
Rainy | Mild | High | FALSE | Yes |
Rainy | Cool | Normal | FALSE | Yes |
Rainy | Cool | Normal | TRUE | No |
Rainy | Mild | Normal | TRUE | Yes |
Rainy | Mild | High | TRUE | No |
Since this subset contains both Yes and No, we need to continue splitting.
Step 6.3.1: Calculate Entropy for Rainy
Play Golf | Count |
---|---|
Yes | 3 |
No | 2 |
Since entropy > 0, we check Humidity, Windy, and Temperature for the highest Information Gain.
Step 6.3.2: Choose Best Split for Rainy
Let’s assume:
- IG(Humidity) = 0.10
- IG(Windy) = 0.15 (highest IG)
- IG(Temperature) = 0.05
Since Windy has the highest IG (0.15), we split Rainy on Windy.
✅ Windy is the next decision node for Rainy. Now, we split Rainy into two branches:
- Rainy & Windy = TRUE
- Rainy & Windy = FALSE
Subsets:
- Rainy & Windy = TRUE → (Entropy = 0) → Leaf Node
- Rainy & Windy = FALSE → (Entropy = 0) → Leaf Node
Since these subsets are pure, we stop here.
====>
Step 6: Repeat the Process for Each Subset
Now, we repeat Steps 2-5 for the sub-groups (Sunny, Overcast, Rainy).
- For Overcast, since entropy is 0, it's a leaf node (all "Yes").
- For Sunny and Rainy, we continue splitting using other attributes (Humidity, Windy, etc.).
We repeat this process recursively until:
- All subsets are pure (Entropy = 0).
- No more attributes are left to split.
Step 7: Stop When a Stopping Condition is Met
A stopping condition could be:
- All instances belong to the same class (Yes or No).
- No more attributes left.
- Tree reaches a maximum depth.
Once we stop, the final Decision Tree is formed.
Comments
Post a Comment