Numpy Fundamentals

NumPy is a Python library for scientific computing that provides support for multi-dimensional arrays and matrices, as well as a large collection of mathematical functions to operate on these arrays. NumPy is widely used in data science, machine learning, and other scientific computing applications. In this tutorial, we will cover the basics of NumPy and its various functionalities.

Installing NumPy

NumPy can be installed using pip, the Python package installer. Open your command prompt or terminal and enter the following command to download and install the latest version of NumPy:

1
pip install numpy

Creating NumPy Arrays

NumPy arrays can be created in several ways. Here are some of the most common methods:

  • array(): Creates NumPy arrays from Python lists
  • zeros(): Creates NumPy arrays with all elements initialized to 0
  • ones(): Creates NumPy arrays with all elements initialized to 1
  • full(): Creates an array with all elements initialized to same user-specified value
  • arange(): Creates NumPy arrays with a sequence of values (similar to python range() function)
  • linspace(): Creates NumPy arrays with a sequence of evenly spaced values
  • diag(): Creates NumPy array with the provided list of numbers as the diagonal elements and zeros elsewhere
  • eye(): Creates NumPy array with ones on the diagonal and zeros elsewhere
  • identity(): Creates an identity matrix
  • random.rand(): Creates NumPy array of random numbers sampled from a uniform distribution
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import numpy as np

print('1D Array from list: ', np.array([1, 2, 3]))
print('2D Array from list:\n', np.array([[1, 2], [3, 4]]))
print('1D Array of Zeros:', np.zeros(5))
print('2D Array of Zeros:\n', np.zeros((2, 5)))
print('Array of Ones:\n', np.ones((2, 5)))
print('Array of User-Specified Value:\n', np.full((2, 5), -1))
print('Array of Sequence:', np.arange(2, 15, 2))
print('Array of Evenly Spaced Elements:', np.linspace(-2, 3, num=5)) # Range: (-2, 3) & Num of Elements: 5
print('Array with diagonal Elements:\n', np.diag([1, 2, 3], k=0)) # k: Shift along both rows and Columns
print('Array with Ones as Diagonal Elements:\n', np.eye(3, k=-1))
print('Identity Matrix:\n', np.identity(2))
print('Array with Random Numbers:\n', np.random.rand(2, 3))

Indexing and Slicing NumPy Arrays

NumPy arrays can be indexed and sliced in several ways depending on the number of dimensions.

Indexing:

Indexing follows a simple rule that we may have encountered before. Let’s give it a try.

  • 1D Array: array_name[idx] (Similar to Python list)
  • 2D Array: array_name[row_idx, col_idx] (Similar to Matrix)
  • ND Array: array_name[dim1_idx, dim2_idx, …, dimN_idx]

OR

  • 2D Array: array_name[row_idx][col_idx]
  • ND Array: array_name[dim1_idx][dim2_idx] … [dimN_idx]

Slicing:

Slicing follows the same rule as indexing. The only difference is that for each dimension we provide it with a range of indeces and sometimes (optional) step size (similar to arange() function discussed above).

  • 1D Array: array_name[idx1:idx2:step]
  • 2D Array: array_name[row_idx1:row_idx2:step1, col_idx1:col_idx2:step2]
  • ND Array: array_name[dim1_idx1:dim1_idx2:step1, …, dimN_idx1:dimN_idx2:stepN]
1
2
3
4
5
6
7
8
9
10
import numpy as np

a = np.array([1, 2, 3, 4, 5])
b = np.array([[1, 2, 3, 4, 5], [9, 3, 7, 5, 6], [3, 4, 6, 2, 4]])

print('Indexing 1D array:', a[0])
print('Slicing 1D array:', a[0:4:2])
print('Indexing 2D array Approach I:', b[1, 1])
print('Indexing 2D array Approach II:', b[1][1])
print('Slicing 2D array:\n', b[0:3:2, 0:5])

Boolean or Mask Indexing:

Boolean or mask indexing is a powerful feature in NumPy that allows you to use boolean arrays (or masks) to select elements from another array. The basic idea behind boolean indexing is to create a boolean array with the same shape as the array you want to select from, where each element of the boolean array corresponds to whether or not you want to select the corresponding element from the other array.

1
2
3
4
5
6
7
8
import numpy as np

a = np.array([[1, 2], [3, 4], [5, 6]])
mask = np.array([True, False, True])

b = a[mask]

print(b)

In this example, we have an array a with dimensions 3x2 and a boolean mask with dimensions 3x1. We use the boolean mask to select the first and third rows of a. The resulting array b has dimensions 2x2.

You can also use boolean expressions to create more complex masks.

1
2
3
4
5
6
7
8
import numpy as np

a = np.array([[1, 2], [3, 4], [5, 6]])
mask = (a > 2)

b = a[mask]

print(b)

In this example, we create a boolean mask based on the condition (a > 2), which evaluates to a boolean array of the same shape as a with True values where the corresponding element of a is greater than 2 and False values otherwise. We use this mask to select all the elements of a that satisfy this condition.

Boolean indexing is a powerful tool for selecting and manipulating elements of NumPy arrays based on arbitrary conditions. It is especially useful when working with large arrays where explicit iteration is inefficient.

NumPy Array Properties and Methods

NumPy arrays have several useful properties and methods that can be used to manipulate them.

  • shape: Property that returns the dimensions of the array
  • ndim: Property that returns number of array dimensions
  • dtype: Property that returns data-type of the array’s elements
  • reshape(): Method to change the dimensions of an array. The product of old dimensions should match the product of new dimensions
  • ravel(): Method that collapses all values into a single axis or dimension and returns view of original array
  • flatten(): Method that collapses all values into a single axis or dimension and returns an independent copy of original array
  • np.concatenate(): Method to concatenate two or more arrays
  • sum(): Method that returns the sum of all elements in an array
  • cumsum(): Method that returns cumulative sum over given axis
  • prod(): Method that compute product over given axis
  • max(): Method that returns the maximum along a given axis
  • argmax(): Method that returns indeces of the maximum values along a given axis
  • clip(): Clip values beyond range to threshold values
  • np.split(): Method to split an array into multiple sub-arrays
  • np.vstack(): Method to stack arrays vertically
  • np.hstack(): Method to stack arrays horizontally
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import numpy as np

a = np.array([[1, 2, 3], [4, 5, 6]])

print('Shape of Array:', a.shape)
print('Data type of Array:', a.dtype)
print('Array Dimension:', a.ndim)
print('Flattened Array:', a.flatten())
print('Sum of Array Elements:', a.sum())
print('Maximum values along given axis:', a.max(axis=0))
print('Indeces of maximum values along given axis:', a.argmax(axis=0))
print('Reshaping Array...')
x = a.reshape((3, 2)) # Remember the products of new and old dimensions should match
print('New Shape of Array:', x.shape)
y = a.reshape(6)
print('Another possible Shape:', y.shape)
print()

m = np.array([[1, 2], [3, 4]])
n = np.array([[5, 6], [7, 8]])
print('Concatente Two (or more) Tensors Together:\n', np.concatenate((m, n), axis=None))
print('Split Array into 2 Sections:', np.split(y, 3))
print('Split Array using Indeces:\n', np.split(y, [2, 5, 7]))

Types and Type Conversion

NumPy supports a wide range of data types for arrays, including numeric, boolean, and string types. Each data type is identified by a unique character code, such as i for integers and f for floating-point numbers, and can have a specific size in bytes.

Here are some of the most common data types in NumPy:

  • bool: Boolean (True or False) stored as a byte.
  • int8, int16, int32, int64: Integer with a specific number of bits (8, 16, 32, or 64).
  • uint8, uint16, uint32, uint64: Unsigned integer with a specific number of bits (8, 16, 32, or 64).
  • float16, float32, float64: Floating-point number with a specific precision (half, single, or double precision).
  • complex64, complex128: Complex number with a specific precision (single or double precision).

You can create arrays of a specific data type using the dtype argument when you create the array.

1
2
3
4
5
6
7
8
9
import numpy as np

a = np.array([1, 2, 3], dtype=np.int32) # Create an array of integers
b = np.array([1.0, 2.0, 3.0], dtype=np.float64) # Create an array of floating-point numbers
c = np.array([True, False, True], dtype=np.bool_) # Create an array of booleans

print(a.dtype)
print(b.dtype)
print(c.dtype)

You can also convert the data type of an existing array using the astype() method

1
2
3
4
5
6
# Convert an array of integers to floating-point numbers
a = np.array([1, 2, 3], dtype=np.int32)
b = a.astype(np.float64)

print(a.dtype)
print(b.dtype)

NumPy Array Operations

NumPy provides a wide range of mathematical operations that can be performed on arrays. Let’s explore few of them.

1. Element-wise operations:

Elementwise operations are operations performed on each element of an array. NumPy, a popular Python library for numerical computing, provides various functions for performing elementwise operations on arrays.

To perform elementwise operations with NumPy arrays, you can use the basic arithmetic operators (+, -, *, /) or functions (e.g., np.add(), np.subtract(), np.multiply(), np.divide()). Let’s look into a few functions that support elementwise operations with numpy:

Arithmetic Operations:

  • np.add() or + operator: Addition
  • np.subtract() or - operator: Subtraction
  • np.multiply() or * operator: Multiplication
  • np.divide() or / operator: Division
  • np.floor_divide() or // operator: Floor Division
  • np.remainder() or np.divmod() or % operator: Remainder
  • np.absolute(): Get absolute value
  • np.sqrt(): Compute square root
  • np.square(): Compute square of
  • np.power(): Compute nth power
  • np.sign(): Apply sign function

Trignometric Functions:

  • np.sin(): Sinusoidal Function
  • np.cos(): Cosine Function
  • np.tan(): Tangent Function

Hyperbolic Functions:

  • np.sinh(): Sinusoidal Function
  • np.cosh(): Cosine Function
  • np.tanh(): Tangent Function

Rounding Operations:

  • np.around(): Round to the given number of decimals evenly
  • np.round(): Round to the given number of decimals
  • np.rint(): Round to the nearest integer
  • np.fix(): Round to the nearest integer towards zero
  • np.floor(): Round to the floor of the (decimal) number
  • np.ceil(): Round to the ceil of the (decimal) number

Exponents and Logarithms:

  • np.exp(): Compute Element-wise Exponential
  • np.expm1(): Compute ‘exp(x) - 1’ with greater precision for small values
  • np.exp2(): Compute ‘2**p’ for all elements p in the array
  • np.log(): Compute Element-wise Natural Logarithm
  • np.log10(): Compute Element-wise base 10 Logarithm
  • np.log2(): Compute Element-wise base 2 Logarithm

Here’s an example of how to perform elementwise operations with different operators and numpy functions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import numpy as np

a = np.array([4, 5, 6])
b = np.array([1, 2, 3])

print('Element-wise Operations using Basic Arithmetic Operators:')
c = a + b
d = a - b
e = a * b
f = a / b
g = a % b
h = a // b

print('Addition:', c)
print('Subtraction:', d)
print('Multiplication:', e)
print('Division:', f)
print('Modulo:', g)
print('Floor Division', h)

print('\n')
print('Element-wise Operations using Numpy Functions:')
c = np.add(a, b)
d = np.subtract(a, b)
e = np.multiply(a, b)
f = np.divide(a, b)
g = np.divmod(a, b)

print('Addition:', c)
print('Subtraction:', d)
print('Multiplication:', e)
print('Division:', f)
print('Modulo:', g)

2. Matrix operations:

NumPy provides a large collection of matrix operations, such as dot product, transpose, and inverse. For example, the following code performs a dot product between two matrices:

  • np.dot(): Calculates the dot product of two matrices
  • np.transpose() or T Property: Calculates the transpose of a matrix
  • np.trace(): Calculates the trace (sum along diagonal) of a matrix
  • np.linalg.det(): Calculates the determinant of a matrix
  • np.linalg.inv(): Calculates the inverse of a matrix
  • np.linalg.eig(): Calculates the eigenvalues and eigenvectors of a matrix
  • np.linalg.svd(): Calculates the singular value decomposition of a matrix
  • np.linalg.solve(): Solves a system of linear equations
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import numpy as np

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

c = np.dot(a, b)
d = np.transpose(a)
e = np.trace(a)
f = np.linalg.det(a)
g = np.linalg.inv(a)
h, i = np.linalg.eig(a) # h: Eigen Values, i: Eigen Vector
j = np.linalg.svd(a)

print('Dot Product of two metrices:\n', c)
print('Transpose of matrix a:\n', d)
print('Trace of matrix a:\n', e)
print('Determinant of matrix a:', f)
print('Inverse of matrix a:\n', g)
print('Eigen Values of matrix a:\n', h)
print('Eigen Vector of matrix a:\n', i)
print('Singular Value Decomposition of matrix a:', j)

# Solve Equations: x + 2*y = 1 and 3*x + 5*y = 2
a = [[1, 2], [3, 5]]
b = [1, 2]
print('Solution to the Equations is:', np.linalg.solve(a, b))

3. Broadcasting:

Broadcasting is a powerful feature in NumPy that allows arithmetic operations to be performed between arrays with different shapes and sizes. The basic idea behind broadcasting is to automatically align the dimensions of two arrays in such a way that arithmetic operations can be performed element-wise.

The rules of broadcasting are as follows:

  1. If the arrays have different numbers of dimensions, the smaller array is padded with ones on its left until it has the same number of dimensions as the larger array.
  2. If the shape of the arrays does not match along any dimension, the array with shape equal to 1 along that dimension is stretched to match the shape of the other array.
  3. If the arrays still do not match in shape, an error is raised.
1
2
3
4
5
6
7
8
import numpy as np

a = np.array([[1, 2], [3, 4]])
b = np.array([10, 20])

c = a + b

print(c)

In this example, we have an array a with dimensions 2x2 and an array b with dimensions 1x2. Normally, we wouldn’t be able to perform element-wise addition between these two arrays because they have different shapes. However, NumPy’s broadcasting feature allows us to add b to each row of a as if b was a 2x2 array with its rows duplicated i.e. b behaves to be like [[10, 20], [10, 20]].

Structured Arrays

Structured arrays in NumPy are arrays where each element can have a different data type or multiple fields with different data types. This allows you to create arrays that behave like structured data types or database tables.

A structured array is defined using a data type descriptor that specifies the layout of each element.

1
2
3
4
5
6
7
8
9
10
11
import numpy as np

# Define a data type with two fields: a string field and a float field
dt = np.dtype([('name', np.str_, 16), ('value', np.float64)])

# Create a structured array with three elements
a = np.array([('foo', 1.23), ('bar', 4.56), ('baz', 7.89)], dtype=dt)

# Access individual fields using the field name
print(a['name'])
print(a['value'])

In this example, we define a data type with two fields: a string field called ‘name’ with a maximum length of 16 characters, and a float field called ‘value’. We then create a structured array with three elements, where each element consists of a string and a float.

You can access individual fields of a structured array using the field name as a key. You can also access individual elements of the array using standard indexing syntax.

Structured arrays are a powerful tool for working with heterogeneous data in NumPy. They allow you to store and manipulate data with different data types in a convenient and efficient way.

In general, the decision to use structured arrays should be based on the specific requirements of your project and the characteristics of your data. If you have data with multiple fields or columns, and you need to perform database-like operations or store metadata, then a structured array may be the best choice. However, if your data has a simple structure or you need to perform complex calculations, then a regular array or another data structure may be more appropriate.

Why Numpy over Python for Loop?

NumPy is faster than Python for loops because it is designed to perform operations on entire arrays rather than individual elements. This allows it to take advantage of lower-level optimizations such as vectorization and caching. Here’s an example that illustrates the performance difference between NumPy and Python for loops:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import numpy as np
import time

# Using NumPy for element-wise multiplication
a = np.random.rand(1000000)
b = np.random.rand(1000000)

start = time.time()
c = a * b
end = time.time()
numpy_execution_time = end - start
print("Time taken by NumPy: {:.6f} seconds".format(numpy_execution_time))

# Using Python for loop for element-wise multiplication
a = list(a)
b = list(b)
c = []

start = time.time()
for i in range(len(a)):
c.append(a[i] * b[i])
end = time.time()
for_loop_execution_time = end - start
print("Time taken by Python for loop: {:.6f} seconds".format(end - start))

print(f'Numpy is {int(for_loop_execution_time/numpy_execution_time)} times faster than Python for loop!')

This performance difference becomes even more significant as the size of the arrays increases.

Comments