Linear Algebra 101 for AI/ML – Part 1

Intro

You don't need to be an expert in linear algebra to get started in AI, but you do need to know the basics. This is part 1 of my Linear Algebra 101 for AI/ML series, which is my attempt to compress the 6+ months I spent learning linear algebra before I started my career in AI. With the benefit of hindsight, I know now that you don't need to spend 6+ months or even 6 weeks brushing up on linear algebra to dive into AI. Instead, you can quickly ramp up on the basics and get started coding in AI much faster. As you make progress in AI/ML, you can continue your math studies.

In this article, you will learn:

🔢 the basics of vector and matrix math
🧮 vector and matrix operations
💻 learn the basics of PyTorch, an open source ML framework

As you read this guide, keep an eye out for interactive question and quiz modules to check your understanding of the material!

Without further ado, here are the topics of the article:

Basic Definitions
Element-wise Operations with PyTorch
Quiz

Basic Definitions

Scalar – A scalar is a single numerical value that represents a magnitude without direction. In programming terms, you can think of scalars as simple variables holding a single number, like an integer or float. Examples of scalars include temperature, age, and weight.

Vector – A vector is an ordered list of scalars. Why do we say it's ordered? Because the position of the scalar in the vector matters. Below is an example of a vector. Pretend $\color{cyan}{\vec{y}}$ is a vector representing the movie "Avengers: Endgame". The vector contains five numbers stacked on top of one another in a single column, each of which describes a specific attribute of the movie.

{\color{cyan}{\vec{y}}} \quad = \left. \begin{bmatrix} 0.99 \\ 0.52 \\ 0.45 \\ 0.10 \\ 0.26 \\ \end{bmatrix} \quad \begin{array}{l} \text{action} \\ \text{comedy} \\ \text{drama} \\ \text{horror} \\ \text{romance} \end{array} \right\}\text{5 rows}

We see that the movie has a value of 0.99 for action and 0.10 for horror. This suggests the movie is more of an action movie than a horror movie. If we were to swap the value for action with the value for horror, the vector would no longer accurately represent "Avengers: Endgame", which is not a horror movie. This is why order matters.

\begin{bmatrix} {\color{cyan}{0.99}} \\ 0.52 \\ 0.45 \\ {\color{orange}{0.10}} \\ 0.26 \\ \end{bmatrix} \neq \begin{bmatrix} {\color{cyan}{0.10}} \\ 0.52 \\ 0.45 \\ {\color{orange}{0.99}} \\ 0.26 \\ \end{bmatrix} \quad \begin{array}{l} {\color{cyan}{\text{action}}} \\ \text{comedy} \\ \text{drama} \\ {\color{orange}{\text{horror}}} \\ \text{romance} \end{array}

Are vectors always arranged in column form? No, not necessarily. Below are vectors in either row or column form of different lengths.

\color{orange}{ \overbrace{ \begin{bmatrix} 18 & 21 & 24 & 27 \end{bmatrix} }^{\text{4 columns} } }

\color{cyan}{ \overbrace{ \begin{bmatrix} 18 & 21 \end{bmatrix} }^{\text{2 columns}} }

\color{magenta}{ \left. \begin{bmatrix} -1.5 \\ 0.89 \\ 0.41 \\ \end{bmatrix} \right\}\text{3 rows} }

Notice a vector either has one row or one column. What if you want a mathematical object that has multiple rows and multiple columns? That's where a matrix comes into play.

Matrix – If a scalar is a single number, and a vector is a one-dimensional ordered list of scalars, then a matrix is a two-dimensional array of scalars. Below, $\color{cyan}{X}$ is an example matrix. You can see it has four rows and two columns.

{\color{cyan}{X}} \quad = \begin{bmatrix} {\color{magenta}{3}} & {\color{orange}{3}} \\ {\color{magenta}{4}} & {\color{orange}{3}} \\ {\color{magenta}{5}} & {\color{orange}{3}} \\ {\color{magenta}{5}} & {\color{orange}{4}} \\ \end{bmatrix} \quad \begin{array}{l} \text{123 Maple Grove Lane} \\ \text{888 Ocean View Terrace} \\ \text{100 Birch Street} \\ \text{987 Sunflower Court} \\ \end{array}

Each row corresponds to the address of a single home. The first column represents the number of bedrooms in the home, and the second column represents the number of bathrooms.

Concept Check

How many bathrooms are in the home located at 100 Birch Street?

In the matrix,

{\color{cyan}{X}} \quad = \begin{bmatrix} 3 & 3 \\ 4 & 3 \\ {\color{magenta}{5}} & {\color{orange}{3}} \\ 5 & 4 \\ \end{bmatrix} \quad \begin{array}{l} \text{123 Maple Grove Lane} \\ \text{888 Ocean View Terrace} \\ {\color{yellow}{\text{100 Birch Street}}} \\ \text{987 Sunflower Court} \\ \end{array}

the vector $\begin{bmatrix} {\color{magenta}{5}} & {\color{orange}{3}} \end{bmatrix}$ corresponds to 100 Birch Street. Since the second column represents number of bathrooms, then this home has three bathrooms.

Any mathematician might find these definitions too simplistic and overly reductionist, but they are good enough to get us started. We'll see later how vectors and matrices can hold data to be processed by machine learning models.

Mathematical Notation

Symbols like $\in$ or $\color{cyan}{\mathbb{R}}$ can be daunting when reading math equations, so let's define and build up familiarity with them. $\in$ means "in", and $\color{cyan}{\mathbb{R}}$ means "the set of real numbers." Let's break down what this means. The "set of real numbers $\color{cyan}{\mathbb{R}}$ " is the mathematician's way of saying all numbers you use in everyday life: all whole numbers, negative numbers, fractions, decimal numbers, and irrational numbers on an infinite number line. Below is a visualization of a portion of $\color{cyan}{\mathbb{R}}$ .

Therefore, $x \in \color{cyan}{\mathbb{R}}$ means $x$ is one of the infinitely many real numbers. Next, let's see how we use this notation to indicate a vector's number of rows and/or columns, aka its dimensions.

{\color{orange}{m}} \text{ rows} \left\{ \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_{{\color{orange}{m}}-1} \\ \end{bmatrix} \right. \in {\color{cyan}{\mathbb{R}}}^{{\color{orange}{m}}}

Above, we see a vector with $\color{orange}{m}$ rows and 1 column. Typically in machine learning, these $\color{orange}{m}$ numbers are each from the set of real numbers, hence why we're using ${\color{cyan}{\mathbb{R}}}$ . Since the vector contains $\color{orange}{m}$ real numbers, we say it belongs to (aka $\in$ ) the set of $\color{orange}{m}$ real numbers, or in mathematical notation, $\in {\color{cyan}{\mathbb{R}}}^{\color{orange}{m}}$ .

Finally, ${\color{orange}{X}} \in \mathbb{R}^{{\color{orange}{3}} \times {\color{orange}{5}}}$ means " ${\color{orange}{X}}$ is a matrix with ${\color{orange}{3}}$ rows and ${\color{orange}{5}}$ columns of values, each belonging to the set of real numbers."

Knowing this notation is important because imagine if I had to write out "a matrix ${\color{orange}{X}}$ with ${\color{orange}{3}}$ rows and ${\color{orange}{5}}$ columns" or "a matrix ${\color{cyan}{Y}}$ with ${\color{cyan}{100}}$ rows and ${\color{cyan}{27}}$ columns". It's quite verbose. Instead, what if I just wrote ${\color{orange}{X}} \in \mathbb{R}^{{\color{orange}{3}} \times {\color{orange}{5}}}$ or ${\color{cyan}{Y}} \in \mathbb{R}^{{\color{cyan}{100}} \times {\color{cyan}{27}} }$ ? Isn't that more concise? With enough exposure, you'll be very comfortable with math notation.

Now let's take a look at a matrix:

\overbrace{ \begin{bmatrix} x_{0,0} & x_{0,1} & \cdots & x_{0,{\color{violet}{n}}-1} \\ x_{1,0} & x_{1,1} & \cdots & x_{1,{\color{violet}{n}}-1} \\ \vdots & \vdots & \vdots & \vdots \\ x_{{\color{orange}{m}}-1,0} & x_{{\color{orange}{m}}-1,1} & \cdots & x_{{\color{orange}{m}}-1,{\color{violet}{n}}-1} \\ \end{bmatrix} }^{{\color{violet}{n}} \text{ columns}} \in {\color{cyan}{\mathbb{R}}}^{ {\color{orange}{m}} \times {\color{violet}{n}}}

The matrix above has ${\color{orange}{m}}$ rows and ${\color{violet}{n}}$ columns. If you count them all, there are a total of ${\color{orange}{m}} \cdot {\color{violet}{n}}$ real numbers. Thus, this matrix belongs to the set of matrices, ${\color{cyan}{\mathbb{R}}}^{ {\color{orange}{m}} \times {\color{violet}{n}}}$ .

Element-wise Operations with PyTorch

Code Environment Setup

Now that we've established the definitions of vectors and matrices and their mathematical notation, let's play around with them in code to gain some intuition and familiarity. To do this, we're going to use an open source machine learning framework called PyTorch. PyTorch is widely used throughout academia and industry for cutting edge AI research and production grade software at institutions and companies such as OpenAI, Amazon, Meta, Salesforce, Stanford University, and thousands of startups, so it'll be practical to build up experience with the framework. Visit the official PyTorch installation instructions page to get started.

After you install PyTorch, open up your Python REPL. Copy the code below (tip: on desktop, hover over the code and click on the clipboard that appears to copy the code):

a = \begin{bmatrix} 3 \\ 4 \\ 5 \\ 5 \\ \end{bmatrix} \in \mathbb{R}^{4 \times 1}

Python

import torch

a = torch.tensor([[3], [4], [5], [5]])

Above, on the left hand side we see a vector with four elements, and on the right hand side is its equivalent in code.

Concept Check

Now that we know how to create vectors, can you guess how you create the following matrix in PyTorch?

m = \begin{bmatrix} 3 & 4 \\ 5 & 6 \\ \end{bmatrix}

Python

torch.tensor([[3,4], [5,6]])

Set up your REPL with the following before continuing.

Python

>>> import torch
>>> a = torch.tensor([1.0, 2.0, 4.0, 8.0])
>>> b = torch.tensor([1.0, 0.5, 0.25, 0.125])

We're going to look at a class of operations performed on vectors and matrices called element-wise operations. Element-wise operations are operations that are applied independently to each element of a vector or matrix, resulting in a new vector or matrix of the same shape. These operations include addition, subtraction, multiplication, division, and many more.

Element-wise addition

\begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{2}} \\ {\color{yellow}{4}} \\ {\color{magenta}{8}} \\ \end{bmatrix} + \begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{0.5}} \\ {\color{yellow}{0.25}} \\ {\color{magenta}{0.125}} \\ \end{bmatrix} = \begin{bmatrix} {\color{cyan}{1}} + {\color{cyan}{1}} \\ {\color{orange}{2}} + {\color{orange}{0.5}} \\ {\color{yellow}{4}} + {\color{yellow}{0.25}} \\ {\color{magenta}{8}} + {\color{magenta}{0.125}} \\ \end{bmatrix}

Python

>>> a + b # element-wise addition
tensor([2.00, 2.50, 4.25, 8.125])

Element-wise subtraction

\begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{2}} \\ {\color{yellow}{4}} \\ {\color{magenta}{8}} \\ \end{bmatrix} - \begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{0.5}} \\ {\color{yellow}{0.25}} \\ {\color{magenta}{0.125}} \\ \end{bmatrix} = \begin{bmatrix} {\color{cyan}{1}} - {\color{cyan}{1}} \\ {\color{orange}{2}} - {\color{orange}{0.5}} \\ {\color{yellow}{4}} - {\color{yellow}{0.25}} \\ {\color{magenta}{8}} - {\color{magenta}{0.125}} \\ \end{bmatrix}

Python

>>> a - b # element-wise subtraction
tensor([0.0, 1.5, 3.75, 7.8750])

Element-wise multiplication

\begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{2}} \\ {\color{yellow}{4}} \\ {\color{magenta}{8}} \\ \end{bmatrix} \odot \begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{0.5}} \\ {\color{yellow}{0.25}} \\ {\color{magenta}{0.125}} \\ \end{bmatrix} = \begin{bmatrix} {\color{cyan}{1}} \cdot {\color{cyan}{1}} \\ {\color{orange}{2}} \cdot {\color{orange}{0.5}} \\ {\color{yellow}{4}} \cdot {\color{yellow}{0.25}} \\ {\color{magenta}{8}} \cdot {\color{magenta}{0.125}} \\ \end{bmatrix}

Python

>>> a * b # element-wise multiplication
tensor([1., 1., 1., 1.])

Element-wise division

\begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{2}} \\ {\color{yellow}{4}} \\ {\color{magenta}{8}} \\ \end{bmatrix} \oslash \begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{0.5}} \\ {\color{yellow}{0.25}} \\ {\color{magenta}{0.125}} \\ \end{bmatrix} = \begin{bmatrix} {\color{cyan}{1}} / {\color{cyan}{1}} \\ {\color{orange}{2}} / {\color{orange}{0.5}} \\ {\color{yellow}{4}} / {\color{yellow}{0.25}} \\ {\color{magenta}{8}} / {\color{magenta}{0.125}} \\ \end{bmatrix}

Python

>>> a / b # element-wise division
tensor([ 1.,  4., 16., 64.])

Subscribe to get the latest updates on the Linear Algebra 101 series and more. Unsubscribe any time.

There are also element-wise operations that act on a vector/matrix alone. Below are two commonly used operations in machine learning.

Sigmoid

\sigma \left( \begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{2}} \\ {\color{yellow}{4}} \\ {\color{magenta}{8}} \\ \end{bmatrix} \right) = \begin{bmatrix} \sigma({\color{cyan}{1}}) \\ \sigma({\color{orange}{2}}) \\ \sigma({\color{yellow}{4}}) \\ \sigma({\color{magenta}{8}}) \\ \end{bmatrix} \\ \\[8pt] \text{where } \sigma({\color{yellow}{x}}) = \frac{1}{1+e^{-{\color{yellow}{x}}}}

Python

>>> torch.sigmoid(a)
tensor([0.7311, 0.8808, 0.9820, 0.9997])
>> torch.sigmoid(torch.tensor(239))
tensor(1.)
>>> torch.sigmoid(torch.tensor(0))
tensor(0.5000)
>>> torch.sigmoid(torch.tensor(-0.34))
tensor(0.4158)

The sigmoid function takes any value of $x$ and squashes it into the range $(0, 1)$ . Note that only $\sigma(-\infty) = 0$ and $\sigma(+\infty) = 1$ . This is useful when you have arbitrarily large values and you want to condense them into the range of values between 0 and 1. It's sometimes useful to interpret the output of sigmoid as a probability.

ReLU (Rectified Linear Unit)

\text{ReLU} \left( \begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{2}} \\ {\color{yellow}{4}} \\ {\color{magenta}{8}} \\ \end{bmatrix} \right) = \begin{bmatrix} f({\color{cyan}{1}}) \\ f({\color{orange}{2}}) \\ f({\color{yellow}{4}}) \\ f({\color{magenta}{8}}) \\ \end{bmatrix} \\ \\[8pt] \text{where } f({\color{yellow}{x}}) = \text{max}({\color{yellow}{x}}, 0)

Python

>>> c = torch.tensor([4, -4, 0, 2])
>>> torch.relu(c)
tensor([4, 0, 0, 2])

The ReLU function acts as a filter. Any positive input goes through it unchanged, but any negative input becomes zero. You might find it strange why such a function exists, but this simple function helps neural networks learn to recognize objects in images and is used in ChatGPT and other sophisticated chatbots. ¹

Tensors

Did you notice that we create vectors and matrices with the PyTorch function torch.tensor(...)? Why is it not called torch.vector(...) nor torch.matrix(...)? PyTorch tensors are more general. A vector has $\color{cyan}{1}$ dimension, a matrix has $\color{orange}{2}$ dimensions, so what is a general term that covers 3 or more dimensions? Answer: a tensor. Actually, vectors and matrices are also tensors because a tensor is any $N$ -dimensional array of numbers. A tensor is a fundamental unit in PyTorch. You can learn more about them by visiting this official tutorial from the PyTorch foundation.

Python

>>> a = torch.rand((3, 4, 2)) # Create a three
tensor([[[0.8856, 0.9232],    # dimensional tensor
         [0.0250, 0.2977],    # with random values
         [0.4745, 0.2243],
         [0.3107, 0.9159]],

        [[0.3654, 0.3746],
         [0.4026, 0.4557],
         [0.9426, 0.0865],
         [0.3805, 0.5034]],

        [[0.3843, 0.9903],
         [0.6279, 0.2222],
         [0.0693, 0.0140],
         [0.6222, 0.3590]]])
>>> a.shape
torch.Size([3, 4, 2]) # the tensor's dimensions

In addition to element-wise operations, there are other operations that operate on the entire tensor. We'll cover those operations and apply them to neural networks and other machine learning concepts in the next part of this Linear Algebra 101 series. Stay tuned!

Quiz

Take the quiz below to see if you've mastered the concepts above. Don't worry if you can't answer them right away. Each question contains multiple concepts, so review the article if you're stuck.

Question 3

Suppose we have an email. We process it with a machine learning model, and the model outputs a score. The higher the score, the more likely the email is spam. You see that the score is $-0.74$ , which seems nonsensical. You want the score to be more interpretable. What element-wise operation would you choose to perform on this score?

Answer: sigmoid

Remember, the sigmoid function squashes all input values into the range $(0, 1)$ , and the output can sometimes be interpreted a probability.

>>> torch.sigmoid(torch.tensor(-0.74))
tensor(0.3230)

In this case, it seems the score of $-0.74$ can be mapped to $32.30\%$ , which can be interpreted as the probability of the email being spam.

Footnotes

ReLU was popularized in 2012 by a famous neural network called AlexNet. ↩