Skip to main content

Decision Tree

 A Decision Tree Algorithm is a supervised machine learning algorithm used for classification and regression tasks.  It works by recursively splitting the dataset into smaller subsets based on feature values, forming a tree-like structure where each node represents a decision rule.

Entropy (Information Gain) (used in classification)

Mean Squared Error (MSE) (used in regression)

Advantages:

✅ Easy to interpret and visualize.
✅ Handles both numerical and categorical data.
✅ Requires little data preprocessing (e.g., no need for feature scaling).

Disadvantages:

❌ Prone to overfitting (if not pruned properly).
❌ Sensitive to noisy data.
❌ Can become biased if one class dominates.

Common Decision Tree Algorithms:

  • ID3 (Iterative Dichotomiser 3) – Uses entropy/information gain.
  • C4.5 – An improvement over ID3, supports missing values.
  • CART (Classification and Regression Trees) – Uses Gini impurity and MSE.

1️⃣ Compute Entropy(S) for the entire dataset.
2️⃣ Compute Entropy(A) for each attribute.
3️⃣ Compute Information Gain (IG) for each attribute.
4️⃣ Select the attribute with highest IG and split the dataset.
5️⃣ Recursively repeat Steps 2-4 for each subset.
6️⃣ Stop when all instances belong to one class or no attributes remain.
7️⃣ The resulting decision tree classifies new data points.

Let's go step by step to calculate Entropy and Information Gain for the Play Golf dataset.

Step 1: Sample Dataset



  • Total Instances = 14
  • Play Golf (Yes) = 9, Play Golf (No) = 5

  • Step 1: Compute the Entropy of the Entire Dataset

    Entropy is a measure of uncertainty or disorder in the dataset. A higher entropy value means more uncertainty.

    Formula for Entropy:

    Entropy(S)=p(x)log2p(x)Entropy(S) = - \sum p(x) \log_2 p(x)

    where:

    • p(x)p(x) is the probability of each class (Yes/No in our case),
    • SS is the dataset.

    So, the entropy of the dataset is 0.94.


    Step 2: Compute Entropy for Each Attribute

    Now, we calculate the entropy of different attributes to check how well they split the data.




    Step 3: Compute the Weighted Entropy of the Attribute

    To compute the overall entropy of Outlook, we take the weighted sum of the entropy values:

    What is Weighted Entropy?

    When we split the dataset based on an attribute (e.g., "Outlook"), we create subsets of the data. Each subset has its own entropy (level of uncertainty).

    Formula for Weighted Entropy

    Entropy(Attribute)=(SiS×Entropy(Si))Entropy(Attribute) = \sum \left(\frac{|S_i|}{|S|} \times Entropy(S_i) \right)


    Entropy(Outlook)=(514×0.97)+(414×0)+(514×0.97)Entropy(Outlook) = \left(\frac{5}{14} \times 0.97 \right) + \left(\frac{4}{14} \times 0 \right) + \left(\frac{5}{14} \times 0.97 \right)


    =(0.36)+(0)+(0.36)=0.69
    = (0.36) + (0) + (0.36) = 0.69

    Step 4: Compute Information Gain for the Attribute

    Information Gain tells us how much entropy is reduced after splitting.

    Formula for Information Gain (IG):

    IG(A)=Entropy(S)Entropy(A)IG(A) = Entropy(S) - Entropy(A)

    For Outlook:

    IG(Outlook)=0.940.69=0.25IG(Outlook) = 0.94 - 0.69 = 0.25

    We repeat this calculation for all attributes (Humidity, Windy, Temperature).


    Step 5: Select the Attribute with the Highest Information Gain

    We compare the Information Gain (IG) of all attributes and choose the one with the highest IG as the splitting criterion.


    ==>

    Step 1: Recall the Formula for Information Gain (IG)

    The Information Gain (IG) of an attribute is calculated as:

    IG(Attribute)=Entropy(Parent)Entropy(Attribute)IG(Attribute) = Entropy(Parent) - Entropy(Attribute)
    • Entropy(Parent): Entropy of the entire dataset before splitting.
    • Entropy(Attribute): Weighted entropy after splitting by that attribute.

    Step 2: Entropy of the Parent (Before Splitting)

    Before splitting, we calculate the overall entropy of the dataset. Suppose we have 14 instances in the dataset:

    Play GolfCount
    Yes9
    No5

    The entropy of the entire dataset is:

    Entropy(PlayGolf)=(914log2914)(514log2514)

    Entropy(Play Golf) = - \left(\frac{9}{14} \log_2 \frac{9}{14} \right) - \left(\frac{5}{14} \log_2 \frac{5}{14} \right)

    =(0.642×log20.642)(0.357×log20.357)= - (0.642 \times \log_2 0.642) - (0.357 \times \log_2 0.357) =(0.642×0.64)(0.357×1.48)= - (0.642 \times -0.64) - (0.357 \times -1.48) =0.94= 0.94

    So, Entropy(Parent) = 0.94.


    Step 3: Compute Entropy for Each Attribute

    (1) Entropy of "Windy"

    "Windy" has two possible values: TRUE and FALSE. We split the dataset accordingly.

    WindyPlay Golf (Yes)Play Golf (No)Total Instances
    TRUE336
    FALSE628

    Calculate Entropy for Windy=TRUE

    Entropy(Windy=TRUE)=(36log236)(36log236)Entropy(Windy=TRUE) = - \left(\frac{3}{6} \log_2 \frac{3}{6} \right) - \left(\frac{3}{6} \log_2 \frac{3}{6} \right)
    =(0.5×log20.5)(0.5×log20.5)= - (0.5 \times \log_2 0.5) - (0.5 \times \log_2 0.5) =(0.5×1)(0.5×1)=1.0= - (0.5 \times -1) - (0.5 \times -1) = 1.0

    Calculate Entropy for Windy=FALSE

    Entropy(Windy=FALSE)=(68log268)(28log228)Entropy(Windy=FALSE) = - \left(\frac{6}{8} \log_2 \frac{6}{8} \right) - \left(\frac{2}{8} \log_2 \frac{2}{8} \right)
    =(0.75×log20.75)(0.25×log20.25)= - (0.75 \times \log_2 0.75) - (0.25 \times \log_2 0.25) =(0.75×0.415)(0.25×2)= - (0.75 \times -0.415) - (0.25 \times -2) =0.81= 0.81

    Weighted Entropy of Windy

    Entropy(Windy)=(614×1.0)+(814×0.81)Entropy(Windy) = \left(\frac{6}{14} \times 1.0 \right) + \left(\frac{8}{14} \times 0.81 \right)
    =(0.428)+(0.463)=0.89= (0.428) + (0.463) = 0.89

    Information Gain for Windy

    IG(Windy)=0.940.89=0.10IG(Windy) = 0.94 - 0.89 = 0.10

    (2) Entropy of "Humidity"

    "Humidity" has two values: High and Normal.

    HumidityPlay Golf (Yes)Play Golf (No)Total Instances
    High347
    Normal617

    Calculate Entropy for Humidity=High

    Entropy(High)=(37log237)(47log247)Entropy(High) = - \left(\frac{3}{7} \log_2 \frac{3}{7} \right) - \left(\frac{4}{7} \log_2 \frac{4}{7} \right)
    =(0.428×log20.428)(0.571×log20.571)= - (0.428 \times \log_2 0.428) - (0.571 \times \log_2 0.571) =(0.428×1.23)(0.571×0.80)= - (0.428 \times -1.23) - (0.571 \times -0.80) =0.99= 0.99

    Calculate Entropy for Humidity=Normal

    Entropy(Normal)=(67log267)(17log217)Entropy(Normal) = - \left(\frac{6}{7} \log_2 \frac{6}{7} \right) - \left(\frac{1}{7} \log_2 \frac{1}{7} \right)
    =(0.857×log20.857)(0.142×log20.142)= - (0.857 \times \log_2 0.857) - (0.142 \times \log_2 0.142)
    =(0.857×0.22)(0.142×2.81)= - (0.857 \times -0.22) - (0.142 \times -2.81)
    =0.59= 0.59

    Weighted Entropy of Humidity

    Entropy(Humidity)=(714×0.99)+(714×0.59)Entropy(Humidity) = \left(\frac{7}{14} \times 0.99 \right) + \left(\frac{7}{14} \times 0.59 \right) =(0.495)+(0.295)=0.79= (0.495) + (0.295) = 0.79

    Information Gain for Humidity

    IG(Humidity)=0.940.79=0.15IG(Humidity) = 0.94 - 0.79 = 0.15

    (3) Entropy of "Temperature"

    "Temperature" has three values: Hot, Mild, Cool.

    TemperaturePlay Golf (Yes)Play Golf (No)Total Instances
    Hot224
    Mild426
    Cool314

    Using the same process as above, we calculate:

    Entropy(Temperature)=0.89Entropy(Temperature) = 0.89 IG(Temperature)=0.940.89=0.05IG(Temperature) = 0.94 - 0.89 = 0.05

    Step 4: Choose the Attribute with Highest Information Gain

    AttributeInformation Gain
    Outlook0.25
    Humidity0.15
    Windy0.10
    Temperature0.05

    Since Outlook has the highest IG (0.25), it becomes the root node in our decision tree.

    ====>

    Step 6: Repeat the Process for Each Subset

    Now, we repeat Steps 2-5 for the sub-groups (Sunny, Overcast, Rainy).

    1. For Overcast, since entropy is 0, it's a leaf node (all "Yes").
    2. For Sunny and Rainy, we continue splitting using other attributes (Humidity, Windy, etc.).

    We repeat this process recursively until:

    • All subsets are pure (Entropy = 0).
    • No more attributes are left to split.

    At this stage, we have selected Outlook as the root node because it has the highest Information Gain (IG = 0.25).


    ===>

    This means the dataset is now divided into three subsets based on the values of Outlook:

    1. Overcast Subset
    2. Sunny Subset
    3. Rainy Subset

    Step 6.1: Overcast Subset

    OutlookTemperatureHumidityWindyPlay Golf
    OvercastHotHighFALSEYes
    OvercastCoolNormalTRUEYes
    OvercastMildHighFALSEYes
    OvercastHotNormalFALSEYes
    • In this subset, all instances have Play Golf = Yes.
    • Since there is no variation in the target variable, the entropy is 0.
    • Conclusion: Overcast becomes a leaf node labeled "Yes" (pure subset).

    No further splitting is needed for Overcast.


    Step 6.2: Sunny Subset

    OutlookTemperatureHumidityWindyPlay Golf
    SunnyMildHighFALSEYes
    SunnyCoolNormalTRUENo
    SunnyCoolNormalFALSEYes
    SunnyMildNormalFALSEYes
    SunnyMildHighTRUENo

    Since this subset contains both Yes and No values, it is not pure. We need to continue splitting.

    Step 6.2.1: Calculate Entropy for the Sunny Subset

    Play GolfCount
    Yes3
    No2
    Entropy(Sunny)=(35log235)(25log225)Entropy(Sunny) = - \left(\frac{3}{5} \log_2 \frac{3}{5} \right) - \left(\frac{2}{5} \log_2 \frac{2}{5} \right) =(0.6×0.737)(0.4×1.322)= - (0.6 \times -0.737) - (0.4 \times -1.322) =0.97= 0.97

    Since entropy > 0, we must split further. We now compute Information Gain for attributes in this subset (Humidity, Windy, Temperature).

    Step 6.2.2: Choose Best Split for Sunny

    Let’s assume:

    • IG(Humidity) = 0.15
    • IG(Windy) = 0.20 (highest IG)
    • IG(Temperature) = 0.05

    Since Windy has the highest IG (0.20), we split the Sunny subset based on Windy.

    Windy is the next decision node for Sunny. Now, we split Sunny into two branches:

    • Sunny & Windy = TRUE
    • Sunny & Windy = FALSE

    Subsets:

    1. Sunny & Windy = TRUE → (Entropy = 0) → Leaf Node
    2. Sunny & Windy = FALSE → (Entropy = 0) → Leaf Node

    Since these subsets are pure, we stop here.


    Step 6.3: Rainy Subset

    OutlookTemperatureHumidityWindyPlay Golf
    RainyMildHighFALSEYes
    RainyCoolNormalFALSEYes
    RainyCoolNormalTRUENo
    RainyMildNormalTRUEYes
    RainyMildHighTRUENo

    Since this subset contains both Yes and No, we need to continue splitting.

    Step 6.3.1: Calculate Entropy for Rainy

    Play GolfCount
    Yes3
    No2
    Entropy(Rainy)=(35log235)(25log225)Entropy(Rainy) = - \left(\frac{3}{5} \log_2 \frac{3}{5} \right) - \left(\frac{2}{5} \log_2 \frac{2}{5} \right) =0.97= 0.97

    Since entropy > 0, we check Humidity, Windy, and Temperature for the highest Information Gain.

    Step 6.3.2: Choose Best Split for Rainy

    Let’s assume:

    • IG(Humidity) = 0.10
    • IG(Windy) = 0.15 (highest IG)
    • IG(Temperature) = 0.05

    Since Windy has the highest IG (0.15), we split Rainy on Windy.

    Windy is the next decision node for Rainy. Now, we split Rainy into two branches:

    • Rainy & Windy = TRUE
    • Rainy & Windy = FALSE

    Subsets:

    1. Rainy & Windy = TRUE → (Entropy = 0) → Leaf Node
    2. Rainy & Windy = FALSE → (Entropy = 0) → Leaf Node

    Since these subsets are pure, we stop here.

    ====>


    Step 6: Repeat the Process for Each Subset

    Now, we repeat Steps 2-5 for the sub-groups (Sunny, Overcast, Rainy).

    1. For Overcast, since entropy is 0, it's a leaf node (all "Yes").
    2. For Sunny and Rainy, we continue splitting using other attributes (Humidity, Windy, etc.).

    We repeat this process recursively until:

    • All subsets are pure (Entropy = 0).
    • No more attributes are left to split.

    Step 7: Stop When a Stopping Condition is Met

    A stopping condition could be:

    • All instances belong to the same class (Yes or No).
    • No more attributes left.
    • Tree reaches a maximum depth.

    Once we stop, the final Decision Tree is formed.


    Final Decision Tree




    Comments

    Popular posts from this blog

    Hexagonal Architecture (Ports & Adapters Pattern)

    Hexagonal Architecture , also known as the Ports and Adapters pattern, is a software design pattern that aims to create a decoupled and maintainable application by separating the core business logic from external concerns (like databases, APIs, and UIs). Structure of Hexagonal Architecture A typical Hexagonal Architecture has three main layers: 1️⃣ Core Domain (Application Logic) This contains the business rules and domain models. It is completely independent of external technologies . Example: If you’re building a banking system , this part would include logic for transactions, withdrawals, and deposits . 2️⃣ Ports (Interfaces) These are interfaces that define how the core interacts with external components. Two types of ports: Inbound Ports (driven by external inputs like APIs, UI, or events) Outbound Ports (used to interact with external services like databases, messaging systems, etc.) 3️⃣ Adapters (Implementation of Ports) These are concrete implementations of the ports, re...

    Recursion & Choice

    Understanding Recursion and Choice Diagrams with Examples Understanding Recursion and Choice Diagrams with Examples Recursion is a powerful concept in programming where a function calls itself to solve smaller instances of the same problem. It's often used in solving complex problems that can be broken down into simpler subproblems. In this blog post, we'll explore the basics of recursion, understand choice diagrams, and see examples to illustrate these concepts. What is Recursion? Recursion occurs when a function calls itself directly or indirectly to solve a problem. A recursive function must have a base case to terminate the recursive calls and prevent infinite recursion. Here's a simple example of a recursive function to calculate the factorial of a number: public class RecursionExample { public static void main(String[] args) { int number = 5; int result = factorial(...

    Frameworks

      Communication Frameworks: BLUF:  Google's culture strongly emphasizes efficiency and directness, so getting to the "bottom line up front" is very common. SCQA:  Used in presenting proposals, making recommendations, and structuring project plans. PAS : Used in selling ideas and influencing others. BAB : Used in selling ideas and influencing others. Sparklines : Used in presentation to influence others. STAR:  Widely used in Google's interview process and performance evaluations. Problem-Solving/Decision-Making Frameworks: 5 Whys:  A fundamental technique for root cause analysis, and Google is known for its emphasis on data-driven decision-making, which often involves digging into the root causes of problems. Systems Thinking:  Given the complexity of Google's systems, a systems thinking approach is essential. The Four Questions : Used in post-mortem to review an incident. Human factors : Used in post-mortem to avoid the blame culture. Time Management/Prior...