A Support Vector Machine (SVM) is a powerful and versatile supervised machine learning algorithm used for both classification and regression, though it is most commonly used for classification. The objective of the SVM algorithm is to find the optimal hyperplane that best separates the data points of different classes in the feature space.
The Core Concepts: Hyperplane, Margin, and Support Vectors
- Hyperplane: This is the decision boundary that separates the classes. In a 2D space with two features, the hyperplane is a simple line. In a 3D space, it's a plane, and in higher dimensions, it's a hyperplane.
- Margin: The SVM algorithm doesn't just find any line that separates the classes; it finds the line that has the maximum margin. The margin is the distance between the hyperplane and the nearest data points from each class. A larger margin is generally better because it means the model is more confident in its classification.
- Support Vectors: The data points that are closest to the hyperplane and lie on the edge of the margin are called support vectors. These are the critical data points that "support" or define the position and orientation of the hyperplane. If you were to move these points, the hyperplane would also move. The algorithm is efficient because it only focuses on these critical points, ignoring the others.
Handling Non-Linear Data: The Kernel Trick
What if the data can't be separated by a straight line? This is where SVMs become truly powerful. They use a technique called the kernel trick.
A kernel function is a mathematical function that takes the original, non-linearly separable data and transforms it into a higher-dimensional space where it can be separated by a linear hyperplane. When this hyperplane is projected back down to the original space, it becomes a complex, non-linear decision boundary.
- Radial Basis Function (RBF) Kernel: This is one of the most popular and powerful kernels. It can create complex, curved decision boundaries, making it suitable for a wide range of problems.
Key Hyperparameters
- C (Regularization Parameter): This parameter controls the trade-off between achieving a smooth decision boundary and correctly classifying all training points.
- A low C value makes the decision boundary smooth, allowing for some misclassifications (a "soft margin").
- A high C value tries to classify every training point correctly, which can lead to a more complex boundary and potential overfitting.
- gamma: This parameter is specific to the RBF kernel. It defines how much influence a single training example has. A high gamma leads to a more complex, "tighter" decision boundary, while a low gamma leads to a smoother one.
Importance of Feature Scaling
SVM is a distance-based algorithm, so its performance is highly sensitive to the scale of the input features. If one feature has a much larger range of values than another, it will dominate the distance calculations. Therefore, it is crucial to scale your data (e.g., using StandardScaler) before training an SVM model.
# --- 1. Import Necessary Libraries ---
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import seaborn as sns
# --- 2. Prepare Sample Data ---
# Create a non-linearly separable dataset to demonstrate the power of the RBF kernel.
from sklearn.datasets import make_moons
X, y = make_moons(n_samples=200, noise=0.2, random_state=42)
# Split the data for training and testing.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# --- 3. Feature Scaling ---
# This step is crucial for SVMs to perform correctly.
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# --- 4. Create and Train the SVM Model ---
# Initialize the classifier.
# `kernel='rbf'`: Use the Radial Basis Function kernel for non-linear data.
# `C=1.0`: A standard value for the regularization parameter.
# `gamma='auto'`: A common setting for the kernel coefficient.
# `random_state=42`: Ensures the results are reproducible.
model = SVC(kernel='rbf', C=1.0, gamma='auto', random_state=42)
# Train the model on the scaled training data.
model.fit(X_train_scaled, y_train)
print("--- Model Training Complete ---")
# --- 5. Make Predictions and Evaluate the Model ---
# Use the trained model to make predictions on the unseen test data.
y_pred = model.predict(X_test_scaled)
# Calculate performance metrics.
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred, target_names=['Class 0', 'Class 1'])
print(f"\nModel Accuracy: {accuracy * 100:.2f}%")
print("\n--- Confusion Matrix ---")
print(conf_matrix)
print("\n--- Classification Report ---")
print(class_report)
# --- 6. Visualize the Decision Boundary ---
# This helps us see the complex, non-linear boundary the SVM has learned.
plt.figure(figsize=(10, 6))
# Create a meshgrid to plot the decision boundary
x_min, x_max = X_train_scaled[:, 0].min() - 1, X_train_scaled[:, 0].max() + 1
y_min, y_max = X_train_scaled[:, 1].min() - 1, X_train_scaled[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
np.arange(y_min, y_max, 0.02))
# Make predictions on the meshgrid
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
# Plot the decision boundary
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)
# Plot the training data points
plt.scatter(X_train_scaled[:, 0], X_train_scaled[:, 1], c=y_train, edgecolors='k', cmap=plt.cm.coolwarm)
plt.title("SVM (RBF Kernel) Decision Boundary")
plt.xlabel("Feature 1 (Scaled)")
plt.ylabel("Feature 2 (Scaled)")
plt.show()
# --- 7. Predict on New, Unseen Data ---
# Create new data points to predict.
new_data = np.array([[0.5, 0.5], [-1.0, -1.0]])
# Scale the new data using the same scaler.
new_data_scaled = scaler.transform(new_data)
# Make predictions.
new_predictions = model.predict(new_data_scaled)
print("\n--- Predictions for New Data Points ---")
for i, point in enumerate(new_data):
print(f"Point {point} ==> Predicted Class: {new_predictions[i]}")