Shortcuts

Open this tutorial in a jupyter notebook.

Tensor

Definition

A tensor is multi-dimensional array, similar to NumPy arrays. Their particularity is their capability to stock previous gradients and operations. Their architecture was highly inspired from PyTorch documentation.

In PyTorch, you create tensors using the tensor built-in method tensor or use Tensor class directly (it is not recommended to instantiate a tensor this way however). NETS does not have (yet) built-in functions, so you will need to use the Tensor class to create a tensor.

Tensors supports basic mathematical operations. All these operations are defined in the nets/ops.py module. Here is a list of some operations currently supported:

  • add which add to tensors (broadcasting is supported),

  • subtract which subtract to tensors (broadcasting is supported),

  • multiply which add to tensors with the same shape,

  • dot which compute the dot product @ of two matrices,

  • exp which compute the element-wise exponentiation of a tensor.

  • sum which sum up all elements in a tensor,

  • transpose which transpose a tensor.

import nets

t1 = nets.Tensor([[1, 3],
                  [5, 7]])
t2 = nets.Tensor([[2, 4],
                  [6, 8]])

print(f"t1 =\n{t1}")
print(f"t2 =\n{t2}")

print("\nSome basic operations")
print(f"t1 + t2: \n{t1 + t2}")
print(f"t1 - t2: \n{t1 - t2}")
print(f"t1 * t2: \n{t1 *t2}")
print(f"t1 @ t2: \n{t1 @ t2}")
print(f"t1 ** 2: \n{t1 ** 2}")
print(f"t1 / t2: \n{t1 / t2}")
print(f"t1 / 10: \n{t1 / 10}")

Out:

t1 =
Tensor([[1, 3],
        [5, 7]])
t2 =
Tensor([[2, 4],
        [6, 8]])

Some basic operations
t1 + t2:
Tensor([[ 3,  7],
        [11, 15]])
t1 - t2:
Tensor([[-1, -1],
        [-1, -1]])
t1 * t2:
Tensor([[ 2, 12],
        [30, 56]])
t1 @ t2:
Tensor([[20, 28],
        [52, 76]])
t1 ** 2:
Tensor([[ 1,  9],
        [25, 49]])
t1 / t2:
Tensor([[0.5000, 0.7500],
        [0.8333, 0.8750]])
t1 / 10:
Tensor([[0.1, 0.3],
        [0.5, 0.7]])

Gradients

NETS uses a custom autograd system, made with numpy. Some vanilla architectures do not depends on this functionality however, like DNN networks. All these information will be detailed in the model’s section.

As you may have seen, there is a requires_grad set to False by default when we create a tensor. This attribute attributes works similarly as PyTorch’s attribute. If set to True, previous gradients will be registered and saved in this tensor, in the _hooks attribute. This attribute is basically a list containing all previous gradients. That is, when calling the backward method on this tensor with an upstream gradient, it will propagate through all previous gradients.

Let’s see some basic examples:

t1 = nets.Tensor([1, 3], requires_grad=True)
t2 = nets.Tensor([2, 4], requires_grad=True)

# Some operations
t3 = t1 + t2 + 4
t4 = t3 * t2

print(f"t1 =\n{t1}")
print(f"t2 =\n{t2}")

print("\nOperation:")
print("t3 = t1 + t2 + 4")
print("t4 = t3 * t2")

print(f"\nt3 =\n{t3}")
print(f"t4 =\n{t4}")

print("\nBefore backpropagation")
print(f"t1 gradient: {t1.grad}")
print(f"t2 gradient: {t2.grad}")
print(f"t3 gradient: {t3.grad}")
print(f"t4 gradient: {t4.grad}")

# Upstream gradient
grad = nets.Tensor([-1, 2])

# Back-propagation
t4.backward(grad)

print("\nAfter backpropagation")
print(f"t1 gradient: {t1.grad}")
print(f"t2 gradient: {t2.grad}")
print(f"t3 gradient: {t3.grad}")
print(f"t4 gradient: {t4.grad}")

Out:

t1 =
Tensor([1, 3], requires_grad=True)
t2 =
Tensor([2, 4], requires_grad=True)

Operation:
t3 = t1 + t2 + 4
t4 = t3 * t2

t3 =
Tensor([ 7, 11], requires_grad=True)
t4 =
Tensor([14, 44], requires_grad=True)

Before backpropagation
t1 gradient: Tensor([0., 0.])
t2 gradient: Tensor([0., 0.])
t3 gradient: Tensor([0., 0.])
t4 gradient: Tensor([0., 0.])

After backpropagation
t1 gradient: Tensor([-2.,  8.])
t2 gradient: Tensor([-9., 30.])
t3 gradient: Tensor([-2.,  8.])
t4 gradient: Tensor([-1.,  2.])

Autograd

The autograd system creates a computational graph dynamically, used to update deep learning models.

The backward pass is called for computing the gradients and create a computational graph from all previous operations. This backward pass can be called using the backward method of a Module. In this case, you will need to write the computational on your own, as the gradients’ equations depends on a model. This Vanilla back-propagation is implemented for standard models, like DNN, CNN and RNN. However, you won’t be able to mix them using this technique unless you change some parts of the backward method.

Or, you can use the autograd system. As Tensor keeps track of the computational graph in their hooks, it is easier to compute the back-propagation. The back-propagation is decomposed in elementary operations (+, -, /, *, exp) and the gradient given from a Loss function is then transferred through the computational graph.

Example:

t1 = nets.Tensor([[1., 3.],
                   [5., 7.]], requires_grad=True)
t2 = nets.Tensor([[2., 4.],
                   [6., 8.]], requires_grad=True)

# Some operations
t3 = nets.tanh(t1 * t2)

print(f"t1 =\n{t1}")
print(f"t2 =\n{t2}")

print("\nBefore backpropagation")
print(f"t1 gradient:\n{t1.grad}")
print(f"t2 gradient:\n{t2.grad}")
print('t3 = tanh(t1 * t2)')
print(f"t3 gradient:\n{t3.grad}")

# Upstream gradient
grad = nets.Tensor([[-1., 2.],
                    [-3., 4.]])

# Back-propagation
t3.sum(axis=1).backward(grad[0])

print("\nAfter backpropagation")
print(f"t1 gradient:\n{t1.grad}")
print(f"t2 gradient:\n{t2.grad}")
print('t3 = tanh(t1 * t2)')
print(f"t3 gradient:\n{t3.grad}")

Out:

t1 =
Tensor([[1., 3.],
        [5., 7.]], requires_grad=True)
t2 =
Tensor([[2., 4.],
        [6., 8.]], requires_grad=True)

Before backpropagation
t1 gradient:
Tensor([[0., 0.],
        [0., 0.]])
t2 gradient:
Tensor([[0., 0.],
        [0., 0.]])
t3 = tanh(t1 * t2)
t3 gradient:
Tensor([[0., 0.],
        [0., 0.]])

After backpropagation
t1 gradient:
Tensor([[-1.4130e-01, -6.0402e-10],
        [ 0.0000e+00,  0.0000e+00]])
t2 gradient:
Tensor([[-7.0651e-02, -4.5302e-10],
        [ 0.0000e+00,  0.0000e+00]])
t3 = tanh(t1 * t2)
t3 gradient:
Tensor([[-1., -1.],
        [ 2.,  2.]])

Example

With the backward pass, you can minimize any basic functions (functions that can be decomposed ito elementary operations). As the autograd system, record the gradients, you can determine the slope at this point and adjust your inputs by a coefficient, called learning rate \(l_r\).

This is maybe a lot of information and a lot of text, so let’s try to visualize how we can use gradients to minimize a function instead.

The function we will try to minimize is the so called three hump camel function:

\[\forall x_1, x_2 \in \mathbb{R}, \quad f(x_1, x_2) = 2 x_1^2 - 1.05 x_1^4 + \frac{x_1^6}{6} + x_1 x_2 + x_2^2\]
import numpy as np
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = fig.gca(projection='3d')

# Make data
camel = lambda x,y: (2*x**2 - 1.05*x**4 + ((x**6)/6) + x*y + y**2)
X = np.linspace(-1, 1, 100)
Y = np.linspace(-1, 1, 100)
X, Y = np.meshgrid(X, Y)
Z = camel(X, Y)

# Plot the surface
ax.view_init(30, 105)
ax.plot_surface(X, Y, Z, cmap='coolwarm', linewidth=0)
../_images/camelfunc.png

Let’s minimize this function with NETS

import random as rd
from nets.optim import SGD

# Random x points
data = rd.uniform(-1, 1)
xn = nets.Parameter(data)
# Random y points
data = rd.uniform(-1, 1)
yn = nets.Parameter(data)

# Learning rate
lr = 0.1
optimizer = SGD([xn, yn], lr=lr, momentum=0)
# Keep track of the loss
nets_history = []
nets_points_history = []

# Run the simulation 50 times
for i in range(50):
    # The gradients accumulates, we need to clear that at each epochs
    optimizer.zero_grad()
    # outputs, can be seen as "predictions"
    zn = camel(xn, yn)
    # Compute the loss (sum of all values)
    # As the minimum is at (0, 0), the lower the loss is, the closest we are to this minimum
    loss = zn.sum()  # is a 0-tensor
    # Get the gradients
    loss.backward()
    # Update the points
    optimizer.step()
    # Add the loss to the history and display the current loss in the console
    nets_history.append(loss.item())
    nets_points_history.append((xn.item(), yn.item(), zn.item()))
    print(f"\repoch: {i:4d} | loss: {loss.item():1.2E}", end="")

Out:

epoch:   49 | loss: 3.61E-09

Docs

Access comprehensive developer documentation for Nets

View Docs

Tutorials

Get beginners tutorials and create state-of-the-art models

View Tutorials

Resources

Check the GitHub page and contribute to the project

View GitHub