Study | StudyLover

Support Vector Regression (SVR)

Polynomial Regression : Decision Tree Regressor

Unit 2: Guide to Machine Learning Algorithms

Support Vector Regression (SVR) is a powerful and versatile regression algorithm that is an extension of the well-known Support Vector Machine (SVM) used for classification. Unlike traditional regression models that try to minimize the error between the predicted and actual values for all data points, SVR works on a different principle.

The main idea of SVR is to find a function that best fits the data, but with a key difference: it allows for a certain amount of error. It tries to fit a line (or curve) to the data that has the maximum number of points within a specified margin, often called the "epsilon-insensitive tube."

How it Works:

1. The Epsilon Tube: SVR defines a margin of tolerance, epsilon (ε). It does not penalize errors for data points that fall within this tube. The goal is to fit a function that includes as many data points as possible inside this tube.

2. Support Vectors: The data points that lie on the edge of or outside this tube are called support vectors. These are the critical points that "support" or define the position and orientation of the regression curve.

3. The Kernel Trick for Non-Linearity: For non-linear data, SVR uses the "kernel trick." A kernel function, like the Radial Basis Function (RBF) kernel, transforms the data into a higher-dimensional space where a linear relationship can be found. When this linear relationship is projected back to the original space, it becomes a complex, non-linear curve that can fit intricate patterns in the data.

Code Example in Python

This example will walk you through a complete workflow for SVR, including the crucial step of feature scaling, which is very important for this algorithm.

I have created a new Canvas document for you with this detailed code.

# --- 1. Import Necessary Libraries ---

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.svm import SVR

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import r2_score

# --- 2. Generate and Visualize Non-Linear Sample Data ---

# We'll create data that follows a sine wave, a classic non-linear pattern.

np.random.seed(42)

X = np.sort(10 * np.random.rand(100, 1), axis=0)

y = np.sin(X).ravel() + np.random.randn(100) * 0.2

# --- 3. Data Preprocessing: Feature Scaling ---

# SVR is not scale-invariant, meaning its performance is highly dependent on the scale of the features.

# It's standard practice to scale the data before training an SVR model.

# We will scale both X and y.

scaler_X = StandardScaler()

scaler_y = StandardScaler()

# Fit the scalers to the data and transform it.

X_scaled = scaler_X.fit_transform(X)

# We reshape y to be a 2D array for the scaler, then ravel it back to 1D.

y_scaled = scaler_y.fit_transform(y.reshape(-1, 1)).ravel()

# Split the SCALED data for training and testing

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_scaled, test_size=0.2, random_state=42)

# --- 4. Create and Train the SVR Model ---

# We will use the RBF (Radial Basis Function) kernel to handle the non-linear data.

# Key Hyperparameters:

# - kernel='rbf': Specifies the kernel type. RBF is a good default for non-linear data.

# - C: The regularization parameter. It controls the trade-off between achieving a low training error and a low testing error (generalization). Higher C means less regularization.

# - gamma: The kernel coefficient. It defines how much influence a single training example has.

model = SVR(kernel='rbf', C=100, gamma=0.1)

# Train the model using the scaled training data.

model.fit(X_train, y_train)

print("--- Model Training Complete ---")

# --- 5. Make Predictions and Evaluate the Model ---

# Make predictions on the scaled test data.

y_pred_scaled = model.predict(X_test)

# To interpret the results, we need to inverse transform the predictions back to the original scale.

y_pred = scaler_y.inverse_transform(y_pred_scaled.reshape(-1, 1))

y_test_original = scaler_y.inverse_transform(y_test.reshape(-1, 1))

# Evaluate the model using R-squared on the original scale.

r2 = r2_score(y_test_original, y_pred)

print(f"Model R-squared (R²) score: {r2:.4f}")

# --- 6. Visualize the Results ---

# To plot a smooth curve, we'll make predictions on the entire sorted range of X values.

X_plot_scaled = scaler_X.transform(X)

y_plot_pred_scaled = model.predict(X_plot_scaled)

y_plot_pred = scaler_y.inverse_transform(y_plot_pred_scaled.reshape(-1, 1))

plt.figure(figsize=(10, 6))

# Plot the original data points

plt.scatter(X, y, color='darkorange', s=20, label="Actual Data")

# Plot the SVR regression curve

plt.plot(X, y_plot_pred, color='navy', linewidth=3, label="SVR (RBF Kernel) Fit")

# Add titles and labels

plt.title('Support Vector Regression (SVR) Fit', fontsize=16)

plt.xlabel('Feature (X)', fontsize=12)

plt.ylabel('Target (y)', fontsize=12)

plt.legend()

plt.grid(True)

plt.show()

# --- Predict a new value ---

new_value = [[5.0]] # Predict the output for an input of 5.0

# First, scale the new value using the same scaler

new_value_scaled = scaler_X.transform(new_value)

# Then, predict using the trained model

predicted_value_scaled = model.predict(new_value_scaled)

# Finally, inverse transform the prediction

predicted_value = scaler_y.inverse_transform(predicted_value_scaled.reshape(-1, 1))

print(f"\nPredicted value for X={new_value[0][0]}: {predicted_value[0][0]:.4f}")

Polynomial Regression Decision Tree Regressor