Numpy Fundamentals
NumPy is a Python library for scientific computing that provides support for multi-dimensional arrays and matrices, as well as a large collection of mathematical functions to operate on these arrays. NumPy is widely used in data science, machine learning, and other scientific computing applications. In this tutorial, we will cover the basics of NumPy and its various functionalities.
Installing NumPy
NumPy can be installed using pip, the Python package installer. Open your command prompt or terminal and enter the following command to download and install the latest version of NumPy:
1 | pip install numpy |
Creating NumPy Arrays
NumPy arrays can be created in several ways. Here are some of the most common methods:
array()
: Creates NumPy arrays from Python listszeros()
: Creates NumPy arrays with all elements initialized to 0ones()
: Creates NumPy arrays with all elements initialized to 1full()
: Creates an array with all elements initialized to same user-specified valuearange()
: Creates NumPy arrays with a sequence of values (similar to python range() function)linspace()
: Creates NumPy arrays with a sequence of evenly spaced valuesdiag()
: Creates NumPy array with the provided list of numbers as the diagonal elements and zeros elsewhereeye()
: Creates NumPy array with ones on the diagonal and zeros elsewhereidentity()
: Creates an identity matrixrandom.rand()
: Creates NumPy array of random numbers sampled from a uniform distribution
1 | import numpy as np |
Indexing and Slicing NumPy Arrays
NumPy arrays can be indexed and sliced in several ways depending on the number of dimensions.
Indexing:
Indexing follows a simple rule that we may have encountered before. Let’s give it a try.
- 1D Array: array_name[idx] (Similar to Python list)
- 2D Array: array_name[row_idx, col_idx] (Similar to Matrix)
- ND Array: array_name[dim1_idx, dim2_idx, …, dimN_idx]
OR
- 2D Array: array_name[row_idx][col_idx]
- ND Array: array_name[dim1_idx][dim2_idx] … [dimN_idx]
Slicing:
Slicing follows the same rule as indexing. The only difference is that for each dimension we provide it with a range of indeces and sometimes (optional) step size (similar to arange() function discussed above).
- 1D Array: array_name[idx1:idx2:step]
- 2D Array: array_name[row_idx1:row_idx2:step1, col_idx1:col_idx2:step2]
- ND Array: array_name[dim1_idx1:dim1_idx2:step1, …, dimN_idx1:dimN_idx2:stepN]
1 | import numpy as np |
Boolean or Mask Indexing:
Boolean or mask indexing is a powerful feature in NumPy that allows you to use boolean arrays (or masks) to select elements from another array. The basic idea behind boolean indexing is to create a boolean array with the same shape as the array you want to select from, where each element of the boolean array corresponds to whether or not you want to select the corresponding element from the other array.
1 | import numpy as np |
In this example, we have an array a
with dimensions 3x2
and a boolean mask with dimensions 3x1
. We use the boolean mask to select the first and third rows of a
. The resulting array b
has dimensions 2x2
.
You can also use boolean expressions to create more complex masks.
1 | import numpy as np |
In this example, we create a boolean mask based on the condition (a > 2)
, which evaluates to a boolean array of the same shape as a
with True
values where the corresponding element of a
is greater than 2 and False
values otherwise. We use this mask to select all the elements of a
that satisfy this condition.
Boolean indexing is a powerful tool for selecting and manipulating elements of NumPy arrays based on arbitrary conditions. It is especially useful when working with large arrays where explicit iteration is inefficient.
NumPy Array Properties and Methods
NumPy arrays have several useful properties and methods that can be used to manipulate them.
shape
: Property that returns the dimensions of the arrayndim
: Property that returns number of array dimensionsdtype
: Property that returns data-type of the array’s elementsreshape()
: Method to change the dimensions of an array. The product of old dimensions should match the product of new dimensionsravel()
: Method that collapses all values into a single axis or dimension and returns view of original arrayflatten()
: Method that collapses all values into a single axis or dimension and returns an independent copy of original arraynp.concatenate()
: Method to concatenate two or more arrayssum()
: Method that returns the sum of all elements in an arraycumsum()
: Method that returns cumulative sum over given axisprod()
: Method that compute product over given axismax()
: Method that returns the maximum along a given axisargmax()
: Method that returns indeces of the maximum values along a given axisclip()
: Clip values beyond range to threshold valuesnp.split()
: Method to split an array into multiple sub-arraysnp.vstack()
: Method to stack arrays verticallynp.hstack()
: Method to stack arrays horizontally
1 | import numpy as np |
Types and Type Conversion
NumPy supports a wide range of data types for arrays, including numeric, boolean, and string types. Each data type is identified by a unique character code, such as i
for integers and f
for floating-point numbers, and can have a specific size in bytes.
Here are some of the most common data types in NumPy:
bool
: Boolean (True or False) stored as a byte.int8
,int16
,int32
,int64
: Integer with a specific number of bits (8, 16, 32, or 64).uint8
,uint16
,uint32
,uint64
: Unsigned integer with a specific number of bits (8, 16, 32, or 64).float16
,float32
,float64
: Floating-point number with a specific precision (half, single, or double precision).complex64
,complex128
: Complex number with a specific precision (single or double precision).
You can create arrays of a specific data type using the dtype argument when you create the array.
1 | import numpy as np |
You can also convert the data type of an existing array using the astype()
method
1 | # Convert an array of integers to floating-point numbers |
NumPy Array Operations
NumPy provides a wide range of mathematical operations that can be performed on arrays. Let’s explore few of them.
1. Element-wise operations:
Elementwise operations are operations performed on each element of an array. NumPy, a popular Python library for numerical computing, provides various functions for performing elementwise operations on arrays.
To perform elementwise operations with NumPy arrays, you can use the basic arithmetic operators (+, -, *, /)
or functions (e.g., np.add()
, np.subtract()
, np.multiply()
, np.divide()
). Let’s look into a few functions that support elementwise operations with numpy:
Arithmetic Operations:
np.add()
or+
operator: Additionnp.subtract()
or-
operator: Subtractionnp.multiply()
or*
operator: Multiplicationnp.divide()
or/
operator: Divisionnp.floor_divide()
or//
operator: Floor Divisionnp.remainder()
ornp.divmod()
or%
operator: Remaindernp.absolute()
: Get absolute valuenp.sqrt()
: Compute square rootnp.square()
: Compute square ofnp.power()
: Compute nth powernp.sign()
: Apply sign function
Trignometric Functions:
np.sin()
: Sinusoidal Functionnp.cos()
: Cosine Functionnp.tan()
: Tangent Function
Hyperbolic Functions:
np.sinh()
: Sinusoidal Functionnp.cosh()
: Cosine Functionnp.tanh()
: Tangent Function
Rounding Operations:
np.around()
: Round to the given number of decimals evenlynp.round()
: Round to the given number of decimalsnp.rint()
: Round to the nearest integernp.fix()
: Round to the nearest integer towards zeronp.floor()
: Round to the floor of the (decimal) numbernp.ceil()
: Round to the ceil of the (decimal) number
Exponents and Logarithms:
np.exp()
: Compute Element-wise Exponentialnp.expm1()
: Compute ‘exp(x) - 1’ with greater precision for small valuesnp.exp2()
: Compute ‘2**p’ for all elements p in the arraynp.log()
: Compute Element-wise Natural Logarithmnp.log10()
: Compute Element-wise base 10 Logarithmnp.log2()
: Compute Element-wise base 2 Logarithm
Here’s an example of how to perform elementwise operations with different operators and numpy functions:
1 | import numpy as np |
2. Matrix operations:
NumPy provides a large collection of matrix operations, such as dot product, transpose, and inverse. For example, the following code performs a dot product between two matrices:
np.dot():
Calculates the dot product of two matricesnp.transpose()
orT
Property: Calculates the transpose of a matrixnp.trace()
: Calculates the trace (sum along diagonal) of a matrixnp.linalg.det()
: Calculates the determinant of a matrixnp.linalg.inv()
: Calculates the inverse of a matrixnp.linalg.eig()
: Calculates the eigenvalues and eigenvectors of a matrixnp.linalg.svd()
: Calculates the singular value decomposition of a matrixnp.linalg.solve()
: Solves a system of linear equations
1 | import numpy as np |
3. Broadcasting:
Broadcasting is a powerful feature in NumPy that allows arithmetic operations to be performed between arrays with different shapes and sizes. The basic idea behind broadcasting is to automatically align the dimensions of two arrays in such a way that arithmetic operations can be performed element-wise.
The rules of broadcasting are as follows:
- If the arrays have different numbers of dimensions, the smaller array is padded with ones on its left until it has the same number of dimensions as the larger array.
- If the shape of the arrays does not match along any dimension, the array with shape equal to 1 along that dimension is stretched to match the shape of the other array.
- If the arrays still do not match in shape, an error is raised.
1 | import numpy as np |
In this example, we have an array a
with dimensions 2x2 and an array b
with dimensions 1x2. Normally, we wouldn’t be able to perform element-wise addition between these two arrays because they have different shapes. However, NumPy’s broadcasting feature allows us to add b
to each row of a as if b
was a 2x2 array with its rows duplicated i.e. b
behaves to be like [[10, 20], [10, 20]]
.
Structured Arrays
Structured arrays in NumPy are arrays where each element can have a different data type or multiple fields with different data types. This allows you to create arrays that behave like structured data types or database tables.
A structured array is defined using a data type descriptor that specifies the layout of each element.
1 | import numpy as np |
In this example, we define a data type with two fields: a string field called ‘name’ with a maximum length of 16 characters, and a float field called ‘value’. We then create a structured array with three elements, where each element consists of a string and a float.
You can access individual fields of a structured array using the field name as a key. You can also access individual elements of the array using standard indexing syntax.
Structured arrays are a powerful tool for working with heterogeneous data in NumPy. They allow you to store and manipulate data with different data types in a convenient and efficient way.
In general, the decision to use structured arrays should be based on the specific requirements of your project and the characteristics of your data. If you have data with multiple fields or columns, and you need to perform database-like operations or store metadata, then a structured array may be the best choice. However, if your data has a simple structure or you need to perform complex calculations, then a regular array or another data structure may be more appropriate.
Why Numpy over Python for
Loop?
NumPy is faster than Python for
loops because it is designed to perform operations on entire arrays rather than individual elements. This allows it to take advantage of lower-level optimizations such as vectorization and caching. Here’s an example that illustrates the performance difference between NumPy and Python for
loops:
1 | import numpy as np |
This performance difference becomes even more significant as the size of the arrays increases.