A Neural Network, also known as a Multi-layer Perceptron (MLP), is a machine learning model inspired by the structure of the human brain. It's designed to recognize complex patterns in data, making it extremely powerful for non-linear regression tasks where the relationship between inputs and outputs is not a simple curve.
How It Works: Layers of Neurons
As the code's comments explain, a neural network consists of interconnected nodes called neurons, organized into layers:
1. Input Layer: This layer receives the initial data. In your code, there is one input neuron because you have one feature (X).
2. Hidden Layers: These are the intermediate layers where the real "learning" happens. Your code defines two hidden layers: hidden_layer_sizes=(100, 50).
o The first hidden layer has 100 neurons.
o The second hidden layer has 50 neurons. Each neuron in these layers takes inputs from the previous layer, performs a mathematical calculation, and then applies an activation function before passing the result to the next layer.
3. Output Layer: This layer produces the final prediction. For regression, it's typically a single neuron that outputs a continuous numerical value (the predicted y value).
The "Learning" Process: How it Captures Non-Linearity
The magic of a neural network comes from how these layers work together to learn.
- Weights and Biases: The connection between each neuron has a "weight" associated with it. These weights determine the strength of the connection. The network "learns" by finding the optimal values for these weights.
- Activation Function (activation='relu'): This is the key to learning non-linear patterns. An activation function is a simple mathematical function applied by each neuron in the hidden layers. The ReLU (Rectified Linear Unit) function, used in your code, is a popular choice. It's a very simple function: if the input is positive, it passes it on; if it's negative, it outputs zero. By combining thousands of these simple non-linear functions across multiple layers, the network can approximate extremely complex, curved relationships.
- Backpropagation and Optimization (solver='adam'): The network learns through a process of trial and error.
1. It makes a prediction for a data point.
2. It compares its prediction to the actual value and calculates the error (loss).
3. It then works backward through the network (a process called backpropagation) to figure out how much each weight contributed to the error.
4. Finally, it uses an optimization algorithm, like Adam (solver='adam'), to slightly adjust all the weights in a direction that will reduce the error. This process is repeated thousands of times (max_iter=1000) until the model's predictions are as accurate as possible.
Why Data Scaling is Crucial (StandardScaler)
As mentioned in the code's comments, neural networks are very sensitive to the scale of the input data. If you had one feature ranging from 0-1 and another from 0-100,000, the larger feature would dominate the learning process, and the model would struggle to learn from the smaller feature.
The StandardScaler in your code prevents this by transforming all features to have a mean of 0 and a standard deviation of 1. This puts all features on a level playing field and helps the optimization algorithm (Adam) find the best weights much more efficiently.
# --- 1. Import Necessary Libraries ---
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score
# --- Detailed Explanation of Neural Network Regression ---
#
# **What is a Neural Network?**
# A Neural Network is a machine learning model inspired by the structure of the human brain.
# It consists of interconnected nodes, called "neurons," organized in layers:
# - An **Input Layer**: Receives the initial data (your features).
# - One or more **Hidden Layers**: These are the intermediate layers where most of the computation happens.
# The neurons in these layers apply a mathematical function (an "activation function") to the data they receive.
# - An **Output Layer**: Produces the final prediction. For regression, this is typically a single neuron
# that outputs a continuous numerical value.
#
# **How does it learn non-linear patterns?**
# The power of a neural network comes from its hidden layers and activation functions.
# Activation functions (like 'relu' - Rectified Linear Unit) introduce non-linearity into the model.
# By combining many of these simple non-linear functions across multiple layers, the network can
# learn to approximate extremely complex, curved relationships between the inputs and outputs.
# The network "learns" by adjusting the weights (connections) between neurons through a process
# called backpropagation and gradient descent, minimizing the error between its predictions and the actual values.
#
# **Why Scale the Data?**
# Neural Networks are very sensitive to the scale of the input data. If one feature has a much larger
# range of values than another, it can dominate the learning process. We use `StandardScaler` to
# ensure all features have a mean of 0 and a standard deviation of 1, which helps the model
# learn more effectively.
# --- 2. Generate and Prepare Sample Data ---
# We'll use the same sine wave data for a clear comparison.
np.random.seed(42)
X = np.sort(10 * np.random.rand(100, 1), axis=0)
y = np.sin(X).ravel() + np.random.randn(100) * 0.2
# --- 3. Data Preprocessing: Feature Scaling ---
# This step is crucial for Neural Networks.
scaler_X = StandardScaler()
X_scaled = scaler_X.fit_transform(X)
# Split the SCALED data for training and testing
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# --- 4. Create and Train the Neural Network Regressor Model ---
# We use Scikit-learn's MLPRegressor (Multi-layer Perceptron).
# Key Hyperparameters:
# - hidden_layer_sizes=(100, 50): Defines the network architecture. This means two hidden layers,
# the first with 100 neurons and the second with 50.
# - activation='relu': The activation function for the hidden layers. 'relu' is a common and effective choice.
# - solver='adam': The algorithm used to optimize the network's weights. 'adam' is a robust default.
# - max_iter=1000: The maximum number of training iterations.
model = MLPRegressor(hidden_layer_sizes=(100, 50), activation='relu', solver='adam', max_iter=1000, random_state=42)
# Train the model using the scaled training data.
model.fit(X_train, y_train)
print("--- Model Training Complete ---")
# --- 5. Make Predictions and Evaluate the Model ---
# Make predictions on the scaled test data.
y_pred = model.predict(X_test)
# Evaluate the model using R-squared.
r2 = r2_score(y_test, y_pred)
print(f"Model R-squared (R²) score: {r2:.4f}")
# --- 6. Visualize the Results ---
# To plot the smooth curve learned by the neural network,
# we'll make predictions on the entire sorted (and scaled) range of X values.
X_plot_scaled = scaler_X.transform(X)
y_plot_pred = model.predict(X_plot_scaled)
plt.figure(figsize=(10, 6))
# Plot the original data points
plt.scatter(X, y, color='darkorange', s=20, label="Actual Data")
# Plot the Neural Network regression curve
plt.plot(X, y_plot_pred, color="darkviolet", linewidth=3, label="Neural Network Fit")
# Add titles and labels
plt.title('Neural Network Regression Fit', fontsize=16)
plt.xlabel('Feature (X)', fontsize=12)
plt.ylabel('Target (y)', fontsize=12)
plt.legend()
plt.grid(True)
plt.show()
# --- Predict a new value ---
new_value = [[5.0]] # Predict the output for an input of 5.0
# First, scale the new value using the same scaler
new_value_scaled = scaler_X.transform(new_value)
# Then, predict using the trained model
predicted_value = model.predict(new_value_scaled)
print(f"\nPredicted value for X={new_value[0][0]}: {predicted_value[0]:.4f}")