Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| machine_learning_roadmap [2023/10/23 16:55] – demiurge | machine_learning_roadmap [2023/10/23 17:02] (current) – [Learn Advanced Python] demiurge | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | # Machine Learning Roadmap | + | ====== |
| - | So you want to *learn* | + | So you want to *learn* Machine Learning? It will be a long journey - one that requires a solid grasp of the fundamentals. Try and not skip any of the stages, and move on to the next once you have a full understanding of the current one. Good luck! |
| - | --- | + | ===== Mathematics and Calculus ===== |
| - | [TOC3] | + | |
| - | --- | + | |
| - | ### Mathematics and Calculus | + | ==== Linear Algebra ==== |
| - | #### 1. Linear Algebra | + | This is what essentially provides the mathematical framework for understanding and manipulating vectors and matrices, which are the building blocks of almost any ML algorithm. A full grasp of these concepts is **essential**. As always, [Khan Academy]([[https:// |
| - | This is what essentially provides the mathematical framework for understanding and manipulating vectors and matrices, which are the building blocks of almost any ML algorithm. A full grasp of these concepts is **essential**. As always, [Khan Academy](https:// | + | |
| - | 1. [Vectors and Spaces](https:// | + | [Vectors and Spaces]([[https:// |
| - | 2. [Matrices](https:// | + | |
| - | #### 2. Calculus | + | [Matrices]([[https://www.khanacademy.org/math/precalculus/ |
| - | Calculus, and particularly derivatives and gradients, play a key role in optimization algorithms used in ML. You will rely on Calculus for optimization techniques such as [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent), | + | |
| - | 1. [Integrals](https:// | + | [Matrix Transformations]([[https:// |
| - | 2. [Differential Equations](https:// | + | |
| - | 3. [Application of Integrals](https:// | + | ==== Calculus ==== |
| - | 4. [Parametric equations, polar coordinates, | + | |
| - | 5. [Series](https:// | + | Calculus, and particularly derivatives and gradients, play a key role in optimization algorithms used in ML. You will rely on Calculus for optimization techniques such as [gradient descent]([[https:// |
| - | 6. [Gradients](https:// | + | |
| + | [Integrals]([[https:// | ||
| + | |||
| + | [Differential Equations]([[https:// | ||
| + | |||
| + | [Application of Integrals]([[https:// | ||
| + | |||
| + | [Parametric equations, polar coordinates, | ||
| + | |||
| + | [Series]([[https:// | ||
| + | |||
| + | [Gradients]([[https:// | ||
| + | |||
| + | ==== Probability and Statistics ==== | ||
| - | #### 3. Probability and Statistics | ||
| Another essential building-block. Probability theory provides a math framework for quantifying *uncertainty*. In ML, models often need to make predictions or decisions based on incomplete or noisy data. With probability, | Another essential building-block. Probability theory provides a math framework for quantifying *uncertainty*. In ML, models often need to make predictions or decisions based on incomplete or noisy data. With probability, | ||
| - | 1. [The Entire Khan Academy Statistics and Probability course](https:// | + | 1. [The Entire Khan Academy Statistics and Probability course]([[https:// |
| - | >You can take only the lessons you think might be important and then take the Course Challenge. | + | 2. Discrete and continuous probability distributions: |
| - | 2. Discrete and continuous probability distributions: | + | |
| - | 3. [Bayesian Statistics](https:// | + | |
| That should probably be enough for Math. I might' | That should probably be enough for Math. I might' | ||
| - | ### | + | ===== Programming |
| The current programming language dominating the ML community is **Python**. Not surprising, since the ease of use allows you to focus on writing efficient code without needing to spend too much time learning the intricacies of the language' | The current programming language dominating the ML community is **Python**. Not surprising, since the ease of use allows you to focus on writing efficient code without needing to spend too much time learning the intricacies of the language' | ||
| - | ##### Learn Python Basics | ||
| - | The `roadmap.sh` [Python Developer roadmap](https:// | ||
| - | 1. Learn the Basic Syntax and Data Types | + | ==== Learn Python |
| - | You'll need to familiarize yourself with Python's syntax, variables, data types (integers, floats, strings, lists, dicts), and basic operations (arithmetic, | + | |
| - | 2. Control Flow | + | |
| - | Understand conditional statements (`if`, `elif`, `else`), loops (`for`, `while`), and logical operators (`and`, `or`, `not`). Very important for implementing decision-making and repetition in your code. | + | |
| - | 3. Functions and modules | + | |
| - | Learn how to define and use functions to encapsulate reusable blocks of code. Also, you'll need to understand how to import and utilize modules (libs). | + | |
| - | 4. Data Structures and Manipulation | + | |
| - | Get yourself acquainted with fundamental data structures like lists, tuples, sets, and dictionaries. Learn how to manipulate and transform data. | + | |
| - | 5. NumPy | + | |
| - | A fundamental library for scientific computing in Python. You will need to gain proficiency in using NumPy arrays for efficient numerical computations. | + | |
| - | 6. Pandas | + | |
| - | You will often need Pandas DataFrames to clean, transform, filter, aggregate, and analyze your datasets. | + | |
| - | 7. Plotting and Data Visualization | + | |
| - | Become familiar with libraries such as [Matplotlib](https:// | + | |
| - | ##### Learn Advanced Python | + | `roadmap.sh` [Python Developer roadmap]([[https:// |
| + | |||
| + | 1. Learn the Basic Syntax and Data Types You'll need to familiarize yourself with Python' | ||
| + | |||
| + | |||
| + | ==== Learn Advanced Python | ||
| At this stage, you'll be sufficiently familiar with Python and ready to tackle the ML aspects of Python. Very exciting. | At this stage, you'll be sufficiently familiar with Python and ready to tackle the ML aspects of Python. Very exciting. | ||
| - | 1. Machine Learning Libraries | + | 1. Machine Learning Libraries Explore the popular ML libraries, such as [PyTorch]([[https:// |
| - | Explore the popular ML libraries, such as [PyTorch](https:// | + | |
| - | 2. Object-Oriented Programming (OOP) | + | 2. Object-Oriented Programming (OOP) Get yourself comfortable with the principles of OOP, including classes, objects, inheritance, |
| - | Get yourself comfortable with the principles of OOP, including classes, objects, inheritance, | + | |
| ### Machine Learning Concepts | ### Machine Learning Concepts | ||
| - | At this point, you can follow whatever ML course you're comfortable with. A popular recommendation is [fastai](https:// | + | At this point, you can follow whatever ML course you're comfortable with. A popular recommendation is [fastai]([[https:// |
| + | |||
| + | 1. **Supervised Learning** This [Coursera]([[https:// | ||
| + | |||
| + | - Classification: | ||
| + | - Regression: Predicting continuous values. | ||
| + | |||
| + | 2. **Unsupervised Learning** | ||
| + | |||
| + | - Clustering: Grouping similar data points together. | ||
| + | - Dimensionality Reduction: Reducing the number of input features while preserving important information. | ||
| + | - Anomaly Detection: Identifying rare of abnormal instances in the data. | ||
| + | |||
| + | 3. **Reinforcement Learning** | ||
| + | |||
| + | 4. **Linear Regression** | ||
| + | |||
| + | - Understanding linear regression models and assumptions. | ||
| + | - Cost functions, including mean squared error. | ||
| + | - Gradient descent for parameter optimization. | ||
| + | - Evaluation metrics for regression models. | ||
| + | |||
| + | 5. **Logistic Regression** | ||
| + | |||
| + | - Modeling binary classification problems with logistic regression. | ||
| + | - Sigmoid function and interpretation of probabilities. | ||
| + | - Maximum likelihood estimation and logistic loss. | ||
| + | - Regularization techniques for logistic regression. | ||
| + | |||
| + | 6. **Decision Trees and Random Forests** | ||
| + | |||
| + | - Basics of decision tree learning. | ||
| + | - Splitting criteria and handling categorical variables. | ||
| + | - Ensemble learning with random forests. | ||
| + | - Feature importance and tree visualization. | ||
| + | |||
| + | 7. **Support Vector Machines (SVM)** | ||
| + | |||
| + | - Formulation of SVMs for binary classification. | ||
| + | - Kernel trick and non-linear decision boundaries. | ||
| + | - Soft margin and regularization in SVMs. | ||
| + | - SVMs for multi-class classification. | ||
| + | |||
| + | 8. **Clustering** | ||
| - | 1. **Supervised Learning** | + | - Overview of unsupervised learning and clustering. |
| - | This [Coursera](https:// | + | - K-means clustering algorithm and initialization methods. |
| - | - Classification: | + | - Hierarchical clustering and density-based clustering. |
| - | - Regression: Predicting continuous values. | + | - Evaluating clustering performance. |
| - | 2. **Unsupervised | + | 9. **Neural Networks and Deep Learning** |
| - | This [course](https:// | + | |
| - | - Clustering: Grouping similar data points together. | + | |
| - | - Dimensionality Reduction: Reducing | + | |
| - | - Anomaly Detection: Identifying rare of abnormal instances in the data. | + | |
| - | 3. **Reinforcement | + | - [Deep Learning |
| - | Coursera provides this [course](https://www.coursera.org/specializations/reinforcement-learning) on Reinforcement Learning, which should be a good starting point. | + | |
| + | | ||
| + | - [Three Mechanisms of Weight Decay Regularization]([[https:// | ||
| + | - [Layer Normalization]([[https:// | ||
| + | - [Attention Is All You Need]([[https:// | ||
| - | 4. **Linear Regression** | + | 10. **Evaluation and Validation** Read the following papers: |
| - | This [resource](https:// | + | |
| - | - Understanding linear regression models and assumptions. | + | |
| - | - Cost functions, including mean squared error. | + | |
| - | - Gradient descent for parameter optimization. | + | |
| - | - Evaluation metrics for regression models. | + | |
| - | 5. **Logistic Regression** | + | - [Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models]([[https://arxiv.org/abs/1806.07139|https://arxiv.org/ |
| - | Read through | + | - [Leave-One-Out Cross-Validation |
| - | - Modeling binary classification problems with logistic regression. | + | |
| - | - Sigmoid function and interpretation of probabilities. | + | |
| - | - Maximum likelihood estimation and logistic loss. | + | |
| - | - Regularization techniques | + | |
| - | 6. **Decision Trees and Random Forests** | + | And [this HuggingFace guide]([[https://huggingface.co/docs/evaluate/ |
| - | This incredible resource by [Jake VanderPlas](https://jakevdp.github.io/PythonDataScienceHandbook/05.08-random-forests.html) should be extremely useful. Main outtake are: | + | |
| - | - Basics of decision tree learning. | + | |
| - | - Splitting criteria and handling categorical variables. | + | |
| - | - Ensemble learning with random forests. | + | |
| - | - Feature importance and tree visualization. | + | |
| - | 7. **Support Vector Machines (SVM)** | + | 11. **Feature Engineering and Dimensionality Reduction** Take a look at [this article]([[https:// |
| - | Read through | + | |
| - | - Formulation of SVMs for binary classification. | + | |
| - | - Kernel trick and non-linear decision boundaries. | + | |
| - | - Soft margin and regularization in SVMs. | + | |
| - | - SVMs for multi-class classification. | + | |
| - | 8. **Clustering** | + | - [Beyond One-hot Encoding: lower dimensional target embedding]([[https://arxiv.org/abs/1806.10805|https://arxiv.org/ |
| - | Read through this excellent | + | - [A Tutorial on Principal Component Analysis]([[https:// |
| - | - Overview of unsupervised learning and clustering. | + | |
| - | - K-means clustering algorithm and initialization methods. | + | |
| - | - Hierarchical clustering and density-based clustering. | + | |
| - | - Evaluating clustering performance. | + | |
| - | 9. **Neural Networks | + | 12. **Model Selection |
| - | The heart of the matter. Read through the papers for each: | + | |
| - | - [Deep Learning | + | |
| - | - [An Introduction | + | |
| - | - [Recurrent Neural Networks (RNNs): A gentle Introduction and Overview](https:// | + | |
| - | - [Three Mechanisms of Weight Decay Regularization](https:// | + | |
| - | - [Layer Normalization](https:// | + | |
| - | - [Attention Is All You Need](https:// | + | |
| - | 10. **Evaluation and Validation** | + | - Grid search, random search, and Bayesian optimization for hyperparameter tuning. |
| - | Read the following papers: | + | - Model selection techniques, including nested cross-validation. |
| - | - [Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models](https:// | + | - Overfitting, |
| - | - [Leave-One-Out Cross-Validation for Bayesian Model Comparison in Large Data](https:// | + | - Performance comparison of different |
| - | And [this HuggingFace guide](https:// | + | |
| - | 11. **Feature Engineering and Dimensionality Reduction** | + | ### Train your own model! You're now ready to pre-train your own model, or fine-tune an existing one! For this, you should |
| - | Take a look at [this article](https://towardsdatascience.com/feature-selection-and-dimensionality-reduction-f488d1a035de) for a general oveeview. | + | |
| - | Also read these papers: | + | |
| - | - [Beyond One-hot Encoding: lower dimensional target embedding](https://arxiv.org/abs/1806.10805) | + | |
| - | - [A Tutorial on Principal Component Analysis](https://arxiv.org/abs/1404.1100) | + | |
| - | 12. **Model Selection and Hyperparameter Tuning** | + | The [HuggingFace Transformers docs]([[https:// |
| - | This is where you're finally dabbling in model training. Good job! You will need to learn: | + | |
| - | - Grid search, random search, and Bayesian optimization for hyperparameter tuning. | + | |
| - | - Model selection techniques, including nested cross-validation. | + | |
| - | - Overfitting, | + | |
| - | - Performance comparison of different models. | + | |
| - | ### Train your own model! | + | ### Stay Updated and Engage in ML community At this point, you know all the essentials. ML is an ever-advancing field, with new innovations emerging everyday. |
| - | You're now ready to pre-train your own model, or fine-tune an existing one! For this, you should look into [Transformers](https://github.com/ | + | |
| - | The [HuggingFace Transformers docs](https:// | ||
| - | ### Stay Updated and Engage in ML community | ||
| - | At this point, you know all the essentials. ML is an ever-advancing field, with new innovations emerging everyday. You'll need to stay abreast of the latest developments, | ||