Machine Learning: Major Algorithms (Part 1)

Jaishri Rai
2 min readMay 11, 2023

Machine learning is programming computers to optimize a performance criterion using example data or past experience.

— Introduction to Machine Learning, Ethem Alpaydın

Brief: The article aims to summarise a basic understanding of machine learning. Along with a few industrial uses, this mentions a brief for algorithms like regression, decision tree, and random forest.

Machine Learning is based on the concept of the ability of a machine to make it learn things from a given input data set. Both predictive and descriptive analytics of large or skewed data sets can be performed through machine learning algorithms.

Applications are huge in like in Finance — fraud prediction, credit score, predicting low/high-risk loans, customer segmentation; in Retail — suitable recommendations of products, reward policy, supply chain optimization, demand forecasting; in Marketing — personalized advertisement, customer segmentation, churn prediction.

Here is a list of some of the most commonly used classical machine learning algorithms:

  1. Linear regression
  2. Logistic regression
  3. Decision trees
  4. Random forests
  5. Gradient boosting
  6. Support vector machines (SVM)
  7. K-nearest neighbors (KNN)
  8. Naive Bayes
  9. Principal Component Analysis (PCA)
  10. K-means clustering
  11. Hierarchical clustering
  12. Association rule mining (Apriori algorithm)
  13. Singular Value Decomposition (SVD)
  14. Collaborative filtering
  15. Neural networks (including deep learning)

Regression: Linear Regression is based on a linear relationship among input variables. More often variables are independent in nature. We can have a simpler analogy with the y=mx+c equation where based on x(input), y(output) will change. Based on some given combinations of x and y, one can figure out the values of m and c. And therefore for some x, we can predict a y. For continuous variables, linear regression is used and logistic regression is utilized for discreet variables (or classification).

https://www.analyticsvidhya.com/blog/2021/10/everything-you-need-to-know-about-linear-regression/

Decision trees: One of the most powerful tools for supervised learning is used for both regression and classification. The logical explanation can be given through a tree-like flow chart where each internal node represents an attribute and the branch denotes the outcome. The leaf node depicts a class label. The idea is to reduce the disorderliness through each node and maximize the information gain.

https://www.analyticsvidhya.com/blog/2021/08/decision-tree-algorithm/

https://www.ibm.com/topics/decision-trees#:~:text=A%20decision%20tree%20is%20a,internal%20nodes%20and%20leaf%20nodes.

Random Forest: Random Forest is based on the Ensembling Machine Learning technique where we combine results of multiple decision trees to increase predictive accuracy and reduce overfitting. The main idea behind a random forest is to reduce the variance of the decision trees by introducing randomness in the training process. Each tree in the random forest is trained on a random subset of the training data, which reduces the correlation between the trees and makes the ensemble more robust to overfitting.

--

--

Jaishri Rai

Someone who wants to dig deep in hope that one day my thoughts, my resentments will become part of my armory to make someone’s life better.