Study | StudyLover

Regression

Advantages/Applications of machine learning Algorithms : Linear Regression

Unit 2: Guide to Machine Learning Algorithms

Regression algorithms are a type of supervised machine learning used to predict a continuous numerical value, such as the price of a house, the temperature tomorrow, or the stock price of a company. The goal is to find a mathematical function that best maps the input features to the continuous output variable.

Python's Scikit-learn (sklearn) library provides easy-to-use implementations for all of these models.

1. Linear Regression

This is the simplest and most common regression algorithm. It assumes a linear relationship between the input features (X) and the output variable (y). The model's goal is to find the best-fitting straight line (or hyperplane in higher dimensions) that describes the data.

Use Case: Predicting a value when you believe the relationship between variables is straightforward and linear (e.g., predicting a student's exam score based on the number of hours they studied).
Code Example:

import numpy as np

from sklearn.linear_model import LinearRegression

# Features (e.g., size of house in sq. ft.)

X = np.array([[1400], [1600], [1700], [1875], [2100]])

# Labels (e.g., price in thousands of INR)

y = np.array([2450, 3120, 2790, 3080, 4000])

# Create and train the model

model = LinearRegression()

model.fit(X, y)

# Predict the price of a new 2000 sq. ft. house

new_house_size = [[2000]]

predicted_price = model.predict(new_house_size)

print(f"Predicted price for a {new_house_size[0][0]} sq. ft. house: ₹{predicted_price[0]:,.2f}k")

2. Ridge Regression

Ridge Regression is a variation of Linear Regression that includes L2 regularization. Regularization is a technique used to prevent overfitting (when a model learns the training data too well and performs poorly on new data). It does this by adding a penalty term to the cost function that discourages the model from having overly large coefficients.

Use Case: Useful when you have a large number of features, especially when some of them are correlated (multicollinearity).
Code Example:

from sklearn.linear_model import Ridge

# (Using the same data as Linear Regression)

X = np.array([[1400], [1600], [1700], [1875], [2100]])

y = np.array([2450, 3120, 2790, 3080, 4000])

# Create and train the model

# The 'alpha' parameter controls the strength of the regularization

model = Ridge(alpha=1.0)

model.fit(X, y)

# Predict the price

predicted_price = model.predict([[2000]])

print(f"Predicted price (Ridge): ₹{predicted_price[0]:,.2f}k")

3. Lasso Regression

Lasso (Least Absolute Shrinkage and Selection Operator) is another variation of Linear Regression that uses L1 regularization. A key feature of Lasso is that it can shrink the coefficients of less important features to exactly zero, effectively performing automatic feature selection.

Use Case: Excellent when you suspect that many of your input features are irrelevant or redundant.
Code Example:

from sklearn.linear_model import Lasso

# (Using the same data as Linear Regression)

X = np.array([[1400], [1600], [1700], [1875], [2100]])

y = np.array([2450, 3120, 2790, 3080, 4000])

# Create and train the model

model = Lasso(alpha=1.0)

model.fit(X, y)

# Predict the price

predicted_price = model.predict([[2000]])

print(f"Predicted price (Lasso): ₹{predicted_price[0]:,.2f}k")

4. Decision Tree Regressor

A Decision Tree builds a model in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets by making a series of if-then-else decisions based on the input features. For regression, the final prediction is the average of the values in the terminal "leaf" node.

Use Case: Good for capturing non-linear relationships in the data. It's easy to interpret and visualize.
Code Example:

from sklearn.tree import DecisionTreeRegressor

# (Using the same data as Linear Regression)

X = np.array([[1400], [1600], [1700], [1875], [2100]])

y = np.array([2450, 3120, 2790, 3080, 4000])

# Create and train the model

model = DecisionTreeRegressor(random_state=0)

model.fit(X, y)

# Predict the price

predicted_price = model.predict([[2000]])

print(f"Predicted price (Decision Tree): ₹{predicted_price[0]:,.2f}k")

5. Random Forest Regressor

A Random Forest is an ensemble method that builds multiple Decision Trees and merges them to get a more accurate and stable prediction. It's one of the most popular and powerful machine learning algorithms because it helps to correct for the Decision Tree's habit of overfitting.

Use Case: Excellent for complex regression problems where high accuracy is needed and you want to avoid overfitting.
Code Example:

from sklearn.ensemble import RandomForestRegressor

# (Using the same data as Linear Regression)

X = np.array([[1400], [1600], [1700], [1875], [2100]])

y = np.array([2450, 3120, 2790, 3080, 4000])

# Create and train the model

# n_estimators is the number of trees in the forest

model = RandomForestRegressor(n_estimators=100, random_state=0)

model.fit(X, y)

# Predict the price

predicted_price = model.predict([[2000]])

print(f"Predicted price (Random Forest): ₹{predicted_price[0]:,.2f}k")

6. Support Vector Regressor (SVR)

Support Vector Machines can also be used for regression. The goal of SVR is to find a function that deviates from the target values by a value no greater than a specified margin (epsilon), while being as "flat" as possible. It's effective in high-dimensional spaces and when the number of features is greater than the number of samples.

Use Case: Good for high-dimensional data and problems where you are not expecting a simple linear fit.
Code Example:

from sklearn.svm import SVR

from sklearn.preprocessing import StandardScaler

# (Using the same data as Linear Regression)

# SVR is sensitive to feature scaling, so we scale the data first

X = np.array([[1400], [1600], [1700], [1875], [2100]])

y = np.array([2450, 3120, 2790, 3080, 4000])

scaler_X = StandardScaler()

scaler_y = StandardScaler()

X_scaled = scaler_X.fit_transform(X)

y_scaled = scaler_y.fit_transform(y.reshape(-1, 1)).ravel()

# Create and train the model

model = SVR(kernel='linear')

model.fit(X_scaled, y_scaled)

# Predict the price (must scale the input and inverse_transform the output)

new_house_scaled = scaler_X.transform([[2000]])

predicted_price_scaled = model.predict(new_house_scaled)

predicted_price = scaler_y.inverse_transform(predicted_price_scaled.reshape(-1, 1))

print(f"Predicted price (SVR): ₹{predicted_price[0][0]:,.2f}k")

Advantages/Applications of machine learning Algorithms Linear Regression