Study | StudyLover

Machine Learning Algorithms

Download

Advantages/Applications of machine learning Algorithms

Unit 2: Guide to Machine Learning Algorithms

Supervised Learning Algorithms

These algorithms learn from labeled data to make predictions.

Regression Algorithms (Predicting Continuous Values)

Linear Regression: Fits a straight line to the data.
Ridge, Lasso, and ElasticNet: Variations of Linear Regression with regularization to prevent overfitting.
Polynomial Regression: Fits a curved line to the data.
Decision Tree Regressor: A tree-based model for regression tasks.
Random Forest Regressor: An ensemble of many decision trees for more robust predictions.
Gradient Boosting Regressor: Builds trees one after another, where each new tree corrects the errors of the previous one.
Support Vector Regressor (SVR): A version of Support Vector Machines for regression.
K-Nearest Neighbors (KNN) Regressor: Predicts by averaging the values of the 'k' closest data points.

Classification Algorithms (Predicting Categories)

Logistic Regression: Predicts the probability of a binary outcome (e.g., Yes/No).
Decision Tree Classifier: A tree-based model that makes decisions based on a series of "if-then-else" questions.
Random Forest Classifier: An ensemble of decision trees for classification.
Gradient Boosting Classifier: An ensemble boosting method for classification.
Support Vector Machine (SVM / SVC): Finds the best boundary (hyperplane) to separate different classes.
K-Nearest Neighbors (KNN) Classifier: Classifies a data point based on the majority class of its 'k' nearest neighbors.
Naive Bayes: A probabilistic classifier based on Bayes' theorem with a strong assumption of independence between features.
Stochastic Gradient Descent (SGD) Classifier: An efficient approach to fitting linear models, useful for very large datasets.

Unsupervised Learning Algorithms

These algorithms find hidden patterns or structures in unlabeled data.

Clustering Algorithms (Grouping Data)

K-Means: Partitions data into 'k' distinct, non-overlapping clusters based on distance to the cluster's center.
DBSCAN: Groups together points that are closely packed, marking as outliers points that lie alone in low-density regions.
Hierarchical Clustering (Agglomerative): Builds a hierarchy of clusters, either from the bottom up or top down.
Mean Shift: An algorithm that tries to find dense areas of data points.
Spectral Clustering: Uses the eigenvalues of a similarity matrix to perform dimensionality reduction before clustering.

Dimensionality Reduction Algorithms (Reducing Features)

Principal Component Analysis (PCA): Transforms data into a new coordinate system to reduce the number of features while preserving the most variance.
Truncated SVD (Singular Value Decomposition): A technique similar to PCA that works well with sparse data.
t-SNE (t-Distributed Stochastic Neighbor Embedding): An algorithm particularly well-suited for visualizing high-dimensional datasets in 2D or 3D.

Deep Learning (Primarily via TensorFlow/PyTorch)

While Scikit-learn focuses on traditional ML, deep learning models are typically built with more specialized libraries. The core algorithm is the Artificial Neural Network (ANN), with several key architectures:

Convolutional Neural Networks (CNNs): The standard for image and video processing.
Recurrent Neural Networks (RNNs): Designed for sequential data like text and time series. Includes variants like LSTM and GRU.
Transformers: The state-of-the-art architecture for most NLP tasks, forming the basis of models like GPT.

Supervised Learning Algorithms

Linear Models

This is a broad family of models that make predictions based on a linear function of the input features.

Ordinary Least Squares (Linear Regression): The classic regression model that fits a straight line to minimize the sum of squared errors.
Ridge Regression: A version of Linear Regression with L2 regularization, which penalizes large coefficients to prevent overfitting, especially when features are correlated.
Lasso Regression: Uses L1 regularization, which can shrink some coefficients to exactly zero, effectively performing automatic feature selection.
ElasticNet: A combination of Ridge and Lasso regularization, offering a balance between the two.
Logistic Regression: A linear model used for classification that predicts the probability of a binary outcome.
Stochastic Gradient Descent (SGD) Classifier/Regressor: An efficient approach to fitting linear models. Instead of using the whole dataset for each step, it updates the model using one sample at a time. It's excellent for very large datasets.
Perceptron: A simple algorithm for binary classification, and a historical precursor to modern neural networks.
Passive Aggressive Classifier/Regressor: An online learning algorithm that remains passive for correct classifications but becomes aggressive by updating when a miscalculation occurs.

Discriminant Analysis

Linear Discriminant Analysis (LDA): A classifier that finds a linear combination of features that best separates two or more classes. It can also be used for dimensionality reduction.
Quadratic Discriminant Analysis (QDA): A variant of LDA that allows for curved (quadratic) decision boundaries.

Support Vector Machines (SVM)

SVC (Support Vector Classifier): Finds the optimal hyperplane that separates classes with the maximum margin.
SVR (Support Vector Regressor): The regression counterpart to SVC.
Nu-SVC / Nu-SVR: A variation that allows for more control over the number of support vectors.
LinearSVC / LinearSVR: A faster implementation of SVM for the case of a linear kernel, often used for text classification.

Bayesian Methods

Gaussian Naive Bayes: A variant of Naive Bayes for continuous features that assumes a normal (Gaussian) distribution.
Multinomial Naive Bayes: Commonly used for text classification where features are word counts.
Bernoulli Naive Bayes: Used for binary features (e.g., a word is present or not).
Bayesian Ridge Regression: A probabilistic approach to regression that includes a prior on the model's weights.

Gaussian Processes

Gaussian Process Regressor/Classifier: A non-parametric, probabilistic model that provides not just a prediction but also a measure of uncertainty in that prediction.

Unsupervised Learning Algorithms

Clustering Algorithms

K-Means: The most common clustering algorithm.
DBSCAN: A density-based algorithm that is excellent at finding arbitrarily shaped clusters and identifying noise points (outliers).
Agglomerative Hierarchical Clustering: A bottom-up approach that starts with each data point as its own cluster and merges them based on similarity.
Mean Shift: A clustering algorithm that tries to find the densest areas in the data.
Spectral Clustering: A technique that uses the graph of the data to find clusters, effective for non-flat geometries.
Affinity Propagation: A clustering method based on "message passing" between data points.
BIRCH: An algorithm designed specifically for very large datasets.

Dimensionality Reduction & Manifold Learning

Manifold learning is a sub-field focused on finding a low-dimensional representation of high-dimensional data that preserves its underlying non-linear structure.

Principal Component Analysis (PCA): The most common linear dimensionality reduction technique.
Kernel PCA: An extension of PCA that can find non-linear relationships.
Isomap: A manifold learning technique that preserves the geodesic distances between points.
Locally Linear Embedding (LLE): A manifold learning technique that reconstructs each data point from its nearest neighbors.
t-SNE (t-Distributed Stochastic Neighbor Embedding): An algorithm primarily used for visualizing high-dimensional data in 2D or 3D.
Factor Analysis: A linear statistical model used to describe variability among observed variables in terms of a lower number of unobserved variables called factors.

Ensemble Methods

Ensemble methods combine the predictions of several base models to improve robustness and accuracy.

Bagging Methods

These methods build multiple models on random subsets of the data and average their predictions to reduce variance.

Random Forest: An ensemble of many Decision Trees.
ExtraTrees (Extremely Randomized Trees): A variation of Random Forest where the split points in the trees are chosen more randomly, which can further reduce variance.
Bagging Classifier/Regressor: A meta-estimator that can be used with any base model (not just decision trees).

Boosting Methods

These methods build models sequentially, where each new model attempts to correct the errors of the previous one. They are typically very high-performing.

AdaBoost (Adaptive Boosting): One of the first successful boosting algorithms.
Gradient Boosting: A powerful algorithm that builds trees sequentially to minimize a loss function.
XGBoost, LightGBM, CatBoost: Highly optimized, third-party implementations of gradient boosting that are often the winning algorithms in machine learning competitions. They are not part of Scikit-learn but are standard tools in the Python ecosystem.

Stacking and Voting

Voting Classifier/Regressor: Combines predictions from multiple different models. "Hard voting" uses a majority vote for classification, while "soft voting" averages the predicted probabilities.
Stacking: A method where the predictions from several base models are used as input features to train a final "meta-model."

Advantages/Applications of machine learning Algorithms