The entropy function
H
(
p
1
)
=
−
p
1
l
o
g
2
(
p
1
)
−
p
0
l
o
g
2
(
p
0
)
H(p_1)=-p_1log_2(p_1)-p_0log_2(p_0)
H(p1)=−p1log2(p1)−p0log2(p0)
H
(
p
1
)
=
−
p
1
log
2
(
p
1
)
−
(
1
−
p
1
)
log
2
(
1
−
p
1
)
H(p_1) = -p_1 \text{log}_2(p_1) - (1- p_1) \text{log}_2(1- p_1)
H(p1)=−p1log2(p1)−(1−p1)log2(1−p1)
def compute_entropy(y):
"""
Computes the entropy for
Args:
y (ndarray): Numpy array indicating whether each example at a node is
edible (`1`) or poisonous (`0`)
Returns:
entropy (float): Entropy at that node
"""
# You need to return the following variables correctly
entropy = 0.
### START CODE HERE ###
m = len(y)
if m != 0:
p1 = len(y[y == 1]) / len(y)
if p1 != 0 and p1 != 1:
entropy = -p1*np.log2(p1)-(1-p1)*np.log2(1-p1)
else:
entropy = 0
### END CODE HERE ###
return entropy
Information Gain
=
H
(
p
1
node
)
−
(
w
left
H
(
p
1
left
)
+
w
right
H
(
p
1
right
)
)
\text{Information Gain} = H(p_1^\text{node})- (w^{\text{left}}H(p_1^\text{left}) + w^{\text{right}}H(p_1^\text{right}))
Information Gain=H(p1node)−(wleftH(p1left)+wrightH(p1right))
Pick the one with the highest information gain
def split_dataset(X, node_indices, feature):
"""
Splits the data at the given node into
left and right branches
Args:
X (ndarray): Data matrix of shape(n_samples, n_features)
node_indices (list): List containing the active indices. I.e, the samples being considered at this step.
feature (int): Index of feature to split on
Returns:
left_indices (list): Indices with feature value == 1
right_indices (list): Indices with feature value == 0
"""
# You need to return the following variables correctly
left_indices = []
right_indices = []
### START CODE HERE ###
for i in node_indices:
if X[i][feature] == 1:
left_indices.append(i)
else:
right_indices.append(i)
### END CODE HERE ###
return left_indices, right_indices
def compute_information_gain(X, y, node_indices, feature):
"""
Compute the information of splitting the node on a given feature
Args:
X (ndarray): Data matrix of shape(n_samples, n_features)
y (array like): list or ndarray with n_samples containing the target variable
node_indices (ndarray): List containing the active indices. I.e, the samples being considered in this step.
Returns:
cost (float): Cost computed
"""
# Split dataset
left_indices, right_indices = split_dataset(X, node_indices, feature)
# Some useful variables
X_node, y_node = X[node_indices], y[node_indices]
X_left, y_left = X[left_indices], y[left_indices]
X_right, y_right = X[right_indices], y[right_indices]
# You need to return the following variables correctly
information_gain = 0
### START CODE HERE ###
w_left = len(X_left) / len(X_node)
w_right = len(X_right) / len(X_node)
# Weights
node_entropy = compute_entropy(y_node)
left_entropy = compute_entropy(y_left)
right_entropy = compute_entropy(y_right)
#Weighted entropy
information_gain = node_entropy - (w_left * left_entropy + w_right * right_entropy)
#Information gain
### END CODE HERE ###
return information_gain
def get_best_split(X, y, node_indices):
"""
Returns the optimal feature and threshold value
to split the node data
Args:
X (ndarray): Data matrix of shape(n_samples, n_features)
y (array like): list or ndarray with n_samples containing the target variable
node_indices (ndarray): List containing the active indices. I.e, the samples being considered in this step.
Returns:
best_feature (int): The index of the best feature to split
"""
# Some useful variables
num_features = X.shape[1]
# You need to return the following variables correctly
best_feature = -1
### START CODE HERE ###
max_info_gain = 0
for i in range(num_features):
now_info_gain = compute_information_gain(X, y, node_indices,i)
if now_info_gain > max_info_gain:
max_info_gain = now_info_gain
best_feature = i
### END CODE HERE ##
return best_feature
The steps:
# Not graded
tree = []
def build_tree_recursive(X, y, node_indices, branch_name, max_depth, current_depth):
"""
Build a tree using the recursive algorithm that split the dataset into 2 subgroups at each node.
This function just prints the tree.
Args:
X (ndarray): Data matrix of shape(n_samples, n_features)
y (array like): list or ndarray with n_samples containing the target variable
node_indices (ndarray): List containing the active indices. I.e, the samples being considered in this step.
branch_name (string): Name of the branch. ['Root', 'Left', 'Right']
max_depth (int): Max depth of the resulting tree.
current_depth (int): Current depth. Parameter used during recursive call.
"""
# Maximum depth reached - stop splitting
if current_depth == max_depth:
formatting = " "*current_depth + "-"*current_depth
print(formatting, "%s leaf node with indices" % branch_name, node_indices)
return
# Otherwise, get best split and split the data
# Get the best feature and threshold at this node
best_feature = get_best_split(X, y, node_indices)
tree.append((current_depth, branch_name, best_feature, node_indices))
formatting = "-"*current_depth
print("%s Depth %d, %s: Split on feature: %d" % (formatting, current_depth, branch_name, best_feature))
# Split the dataset at the best feature
left_indices, right_indices = split_dataset(X, node_indices, best_feature)
# continue splitting the left and the right child. Increment current depth
build_tree_recursive(X, y, left_indices, "Left", max_depth, current_depth+1)
build_tree_recursive(X, y, right_indices, "Right", max_depth, current_depth+1)
Choose the 9 mid-points between the 10 examples as possible splits, and find the split that gives the highest information gain.
Use some features to predict the weight bias Information Gain \text{Information Gain} Information Gain.
Trees are highly sensitive to small changes of the data.
Create a new training set.
(Deliberate practice)When building the next decision tree, we’re going to focus more attention on the subset of examples that are not yet doing well on and get the new decision tree.