Machine Learning: Major Algorithms (Part 1)
Machine learning is programming computers to optimize a performance criterion using example data or past experience.
— Introduction to Machine Learning, Ethem Alpaydın
Brief: The article aims to summarise a basic understanding of machine learning. Along with a few industrial uses, this mentions a brief for algorithms like regression, decision tree, and random forest.
Machine Learning is based on the concept of the ability of a machine to make it learn things from a given input data set. Both predictive and descriptive analytics of large or skewed data sets can be performed through machine learning algorithms.
Applications are huge in like in Finance — fraud prediction, credit score, predicting low/high-risk loans, customer segmentation; in Retail — suitable recommendations of products, reward policy, supply chain optimization, demand forecasting; in Marketing — personalized advertisement, customer segmentation, churn prediction.
Here is a list of some of the most commonly used classical machine learning algorithms:
- Linear regression
- Logistic regression
- Decision trees
- Random forests
- Gradient boosting
- Support vector machines (SVM)
- K-nearest neighbors (KNN)
- Naive Bayes
- Principal Component Analysis (PCA)
- K-means clustering
- Hierarchical clustering
- Association rule mining (Apriori algorithm)
- Singular Value Decomposition (SVD)
- Collaborative filtering
- Neural networks (including deep learning)
Regression: Linear Regression is based on a linear relationship among input variables. More often variables are independent in nature. We can have a simpler analogy with the y=mx+c equation where based on x(input), y(output) will change. Based on some given combinations of x and y, one can figure out the values of m and c. And therefore for some x, we can predict a y. For continuous variables, linear regression is used and logistic regression is utilized for discreet variables (or classification).
https://www.analyticsvidhya.com/blog/2021/10/everything-you-need-to-know-about-linear-regression/
Decision trees: One of the most powerful tools for supervised learning is used for both regression and classification. The logical explanation can be given through a tree-like flow chart where each internal node represents an attribute and the branch denotes the outcome. The leaf node depicts a class label. The idea is to reduce the disorderliness through each node and maximize the information gain.
https://www.analyticsvidhya.com/blog/2021/08/decision-tree-algorithm/
Random Forest: Random Forest is based on the Ensembling Machine Learning technique where we combine results of multiple decision trees to increase predictive accuracy and reduce overfitting. The main idea behind a random forest is to reduce the variance of the decision trees by introducing randomness in the training process. Each tree in the random forest is trained on a random subset of the training data, which reduces the correlation between the trees and makes the ensemble more robust to overfitting.