Study | StudyLover

Decision Tree Regressor

Support Vector Regression (SVR) : Random Forest Regression

Unit 2: Guide to Machine Learning Algorithms

A Decision Tree Regressor is a powerful and intuitive non-linear algorithm that works by making a series of if-then-else decisions to split the data into smaller and smaller segments. Imagine it as a flowchart where each internal node represents a "question" about a feature, and each branch represents the answer to that question.

The goal is to partition the data in a way that the final segments, called "leaf nodes," are as homogeneous (pure) as possible. For regression, the prediction for any new data point is simply the average of all the training data points that ended up in the same leaf node.

How it Works:

1. Recursive Splitting: The algorithm starts with the entire dataset at the root. It searches for the best possible split (the feature and the value) that minimizes the error (typically the Mean Squared Error - MSE) in the resulting two child nodes.

2. Creating a Tree: This process is repeated recursively for each new node. The tree continues to grow by asking more questions and creating more branches.

3. Stopping Criteria: The splitting stops when a certain condition is met, such as the tree reaching a maximum depth (max_depth), a node having too few samples to split, or a split not significantly reducing the error.

4. Prediction: To make a prediction for a new data point, it's dropped down the tree. It follows the path of "yes" or "no" answers based on its feature values until it reaches a leaf node. The prediction is the average value of the target variable in that leaf.

The result is a step-wise function. It doesn't produce a smooth curve but rather a series of horizontal lines that can approximate very complex, non-linear relationships.

Code Example in Python

I have created a new Canvas document for you with this detailed code, which breaks down the entire process.

# --- 1. Import Necessary Libraries ---

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeRegressor

from sklearn.metrics import r2_score

# --- 2. Generate and Visualize Non-Linear Sample Data ---

# We'll create data that follows a sine wave pattern, which is clearly non-linear.

np.random.seed(42)

X = np.sort(10 * np.random.rand(100, 1), axis=0)

y = np.sin(X).ravel() + np.random.randn(100) * 0.2

# Split the data for training and testing

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# --- 3. Create and Train the Decision Tree Regressor Model ---

# The most important hyperparameter is `max_depth`, which controls the complexity of the tree.

# A deeper tree can capture more complex patterns but is also more prone to overfitting.

model = DecisionTreeRegressor(max_depth=5, random_state=42)

# Train the model using the training data.

# The .fit() method builds the tree by finding the optimal splits.

model.fit(X_train, y_train)

print("--- Model Training Complete ---")

# --- 4. Make Predictions and Evaluate the Model ---

# Use the trained model to make predictions on the test data.

y_pred = model.predict(X_test)

# Evaluate the model using R-squared.

r2 = r2_score(y_test, y_pred)

print(f"Model R-squared (R²) score: {r2:.4f}")

# --- 5. Visualize the Results ---

# To plot the characteristic step-function of the decision tree,

# we'll make predictions on a sorted range of X values.

X_plot = np.sort(X, axis=0)

y_plot_pred = model.predict(X_plot)

plt.figure(figsize=(10, 6))

# Plot the original data points

plt.scatter(X, y, color='darkorange', s=20, label="Actual Data")

# Plot the Decision Tree regression line

plt.plot(X_plot, y_plot_pred, color="yellowgreen", linewidth=3, label=f"Decision Tree Fit (max_depth=5)")

# Add titles and labels

plt.title('Decision Tree Regression Fit', fontsize=16)

plt.xlabel('Feature (X)', fontsize=12)

plt.ylabel('Target (y)', fontsize=12)

plt.legend()

plt.grid(True)

plt.show()

# --- Predict a new value ---

new_value = [[5.0]] # Predict the output for an input of 5.0

predicted_value = model.predict(new_value)

print(f"\nPredicted value for X={new_value[0][0]}: {predicted_value[0]:.4f}")

Support Vector Regression (SVR) Random Forest Regression