MATLAB

Preallocaton

Subcategory: Core

When you grow an array dynamically inside a for loop, MATLAB must repeatedly allocate new memory and copy the existing contents. This can significantly degrade performance. Preallocating reserves sufficient contiguous memory upfront, avoiding costly resizing operations later.

Example Code

To take advantage of Preallocation, you should allocate your array outside of the for-loop before using it.

The example below creates an array that grows with every iteration of the loop

tic
x = 0;
N = 10000000;
for k = 2:N
   x(k) = x(k-1) + 5;
end
toc

Elapsed time is 0.380283 seconds.

If we allocate the array before entering the loop, the code runs almost 10x faster

tic
x = zeros(1,N);
N = 10000000;
for k = 2:N
   x(k) = x(k-1) + 5;
end
toc

Elapsed time is 0.040541 seconds.

The Technical Detail

Prior to MATLAB R2010b, memory allocation worked like this:

The first time through the loop, the variable x hasn’t been assigned to yet, so MATLAB creates it through the assignment to x(1). The second time through the loop assigns to x(2), which doesn’t exist yet. So MATLAB allocates enough new memory for two elements and then copies x(1) and assigns x(2) into the new space. The third time through the loop assigns to x(3), which doesn’t exist yet. So MATLAB allocates enough new memory for three elements and then copies x(1) and x(2) and assigns x(3) into the new space.

See the pattern? On the k-th time through the loop, MATLAB has to make a copy of k elements into newly allocated memory. The time required to copy k elements is proportional to k. The time required to execute the entire loop is therefore proportional to the sum of the integers from 1 to n, which is n*(n+1)/2.

Bottom line: memory copying during the assignment statement causes the time required to execute the loop to be proportional to n^2. For large N, this is very slow!

From R2010b onwards, MATLAB uses a much more efficient memory allocation strategy when it encounters loops like this so you are not ‘punished’ as much as you used to be if your code follows this pattern. However, it is still sugnificantly less efficient than simply creating the array once and writing to it in the loop.

Vectorisation

Subcategory: Core

MATLAB is optimised to take advantage of vectorisation for mathematical operations between arrays, whereby the processor executes one instruction across multiple variables simultaneously. Vectorisation can perform mathematical operations many times faster than a for loop and even take advantage of conditional logic.

Example Code

To take advantage of vectorisation within MATLAB, operations should be performed at an array-level, rather than by looping over the individual elements.

The below example multiplies every element of an array by 2 using a for loop.

% Allocate an array of 1 million elements
vals = 1:1000000;
% Multiply all elements by 2
for ii=1:length(vals)
    vals(ii) = vals(ii) * 2;
end

To use vectorisation, it can be rewritten as a single line of code

% Allocate an array of 1 million elements
vals = 1:1000000;
% Multiply all elements by 2
vals = vals * 2;

Not all of your MATLAB code will be this simple though, maybe you have conditional logic which restricts which elements the operation should be performed on.

% Allocate an array of 1 million elements
vals = 1:1000000;
% Multiple by 2 all elements divisible by 3, divide other elements by 2
for ii=1:length(vals)
    if mod(ii, 3) == 0
        vals(ii) = vals(ii) * 2;
    else
        vals(ii) = vals(ii) / 2;
    end
end

The above example can be written as

% Allocate an array of 1 million elements
vals = 1:1000000;
% Produce a boolean array k of whether elements are divisible by 3
k = mod(vals, 3) == 0
% Conditionally divide or multiply elements by 2
vals(k) = vals(k) * 2;
vals(~k) = vals(~k) / 2;

Using vectorisation is both more concise and more performant.

The Technical Detail

Vector instructions, which MATLAB can take advantage of, enable a CPU to apply the same operation to multiple data elements in parallel with a single thread.

Modern CPUs use SIMD (Single Instruction, Multiple Data) instructions to operate on several values packed into a register. Since a typical CPU cache line is 64 bytes, and standard data types like 32-bit or 64-bit floats and integers are 4 or 8 bytes each, 8–16 values can fit within a single cache line and be processed together by a single vector instruction.

However, to take advantage of this, the data must be aligned in memory. Laid out such that it fits neatly into the expected cache line boundaries. MATLAB arrays are explicitly designed for numerical performance, they allocate memory with alignment guarantees, ensuring that vector instructions can be used. Therefore, you just need to make sure where possible you’re using operations which support vectorisation.

GPU Arrays

Subcategory: Parallel Computing Toolbox

MATLAB makes it easy to run matrix operations on an NVIDIA GPU. There is a cost for moving data to and from the GPU, but if your entire algorithm can be done in matrix or vector operations, and your data is big enough, it can be faster.