Decision Tree in Machine Learning
Decision Tree is a highly predictive machine learning model that performs tasks with accurate and precise decision making involved. A type of supervised learning, decision trees continuously split data according to certain parameters. Two entities that the decision trees use to denote or classify data are decision nodes and leaves. The final decision or outcome is represented by the leaves and decision nodes are where the data is split. For instance, to determine whether a person is physically fit given certain information of their physical activity, age, eating habits, etc. the decision nodes will ask questions like “does he work out?”, “what’s his age?”, and “does he eat a lot of apples?”, to which the leaves would represent answers of binary nature of ‘fit’ or ‘unfit’ and ‘yes or ‘no’. This is an example of a binary tree.
There are two main types of decision trees:
Classification trees: In classification trees, the nature of the outcome is a variable. For example, outcomes like ‘yes’ or ‘no’.
Regression Trees: In regression trees, the nature of the outcome or decision is continuous. For example, letters like ABC.
Being one of the most powerful bases machine learning model, a decision tree has been foundational to one of the most popular ensemble learning model - Random Forest. What makes the decision tree so favourable and popular among machine learners are its readability, interpretability, and accuracy in giving both continuous and categorical variables as outcomes. Its breaking down of data sets into homogenous subsets of data to eventually provide the outcome in the leaf nodes also makes the whole process easier and also attractive at the same time. The split of data is based on the concept of impurity criteria. The two criteria are Gini and Entropy. They help in impurity reduction which quickens the outcome determination process. Gini is faster for calculation purposes. However, because the splitting of data is an extensive and voluminous process, it is important to create the decision tree by obligating it with limiting, binding or final criteria to avoid discrepancies and inaccuracies.