Creation and basic analysis of arrays - Crash course in Data Science with Python

import numpy as np

NumPy documentation¶

Note: Numpy has a very good and extensive documentation, which you can find at https://numpy.org/doc/stable/. If you need any further details about numpy arrays, you can always refer to it.

Multiple ways of creating arrays¶

We have seen that we can turn regular lists into arrays with the array() function. However this becomes quickly impractical for larger arrays. Numpy offers several functions to create particular arrays.

Common simple arrays¶

For example an array full of zeros or ones:

one_array = np.ones(shape=(2,3))
one_array

zero_array = np.zeros(shape=(2,3))
zero_array

We see here that the two functions take a shape argument that describes the shape of the output array. Indeed arrays are not just lists but can also be lists of lists!

If we obtain an array we can use the same shape property to find out the shape of an array:

zero_array.shape

Let’s check the dtype:

one_array.dtype

By default, Numpy creates float arrays. As previously, if needed, we can adjust this with the astype method.

Complex arrays¶

We are not limited to creating arrays containing ones or zeros. Very common operations involve e.g. the creation of arrays containing regularly arranged numbers. For example a “from-to-by-step” list:

arranged_array = np.arange(0, 10, 2)
arranged_array

Here also, we can find out what the shape of the array is as we didn’t specify it explicitly. Since it’s a 1D array we only get one value out:

arranged_array.shape

Or a certain number of equidistant values between boundaries:

np.linspace(0, 1, 10)

Statistical arrays (optional)¶

Numpy offeres also many options to create arrays filled with numbers drawn from a given distribution. Most of these functions are located in a sub-module called np.random. For example we can draw numbers from a Poisson distribution $P(x = k, \lambda) = \frac{\lambda^ke^{\lambda}}{k!}$

poisson = np.random.poisson(lam=5, size=20)
poisson

Loading images as arrays and displaying them¶

In the previous chapters, we have learned about Numpy arrays in general. These structures can hold any type of data but they are particularly ideal to store image data. Indeed in many fields such as biomedical imaging or satellite imagery, multi-dimensional data are acquired that can be easily processed as NumPy arrays.

Importing data¶

There are many libraries to open image data. Some are more general and some dedicated to specific fields with specific image formats. However, most of them have in common that they import those various image formats as Numpy arrays.

Here we use the import module of scikit-image which is a general purpose image processing library for Python. We import an image directly from the internet but any local image can also be opened by using its path.

import skimage.io

Here we have a fluorescence microscopy image of Saccharomyces cerevisiae with a signal accumulating in the vacuoles.

image = skimage.io.imread('https://cildata.crbs.ucsd.edu/media/images/13901/13901.tif')

Printing the image will only show the values on the edges of the image, since there would be too many values to display. However, we can already see that the image is a 2D array of dtype uint16.

image

Let’s confirm that it was imported as a Numpy array:

type(image)

We see above that the tif file was indeed imported as a Numpy array. In addition we see that the pixels have unsigned (i.e., positive-only) integer 16 bits, a common format for images.

We can now check how many pixels and dimensions we have:

image.shape

Displaying arrays as images¶

The shape tells us that our array has 1024 lines and 1360 columns but no other dimensions. So we have a plain 2D gray-scale image. We can use the imshow function from the pypolot collection of matplotlib to display 2D arrays as images:

from matplotlib import pyplot as plt

plt.imshow(image, cmap='gray')

In addition to the image input we also used an optional parameter called cmap. It allows us to set a certain colormap, here a gray one. You can find more here: https://matplotlib.org/stable/tutorials/colors/colormaps.html#sequential

Analysing arrays/images (aggregating functions)¶

There are different types of images when it comes to pixel values. Common ranges are 0-255 for 8-bit images, 0-65535 for 16-bit images and 0-1 for float images. In our case we have a 16-bit image, so the pixel values range from 0 to 65535.

We can check this using the aggregating functions min(), max() and mean(), as well as dtype:

print("Mean:", np.mean(image))
print("Range:", np.min(image), "-", np.max(image))
print("type:", image.dtype)

We can also get an impression of the pixel values by plotting a histogram of the pixel values. This is a very common operation in image analysis. Note that we use the flatten() method to turn the 2D array into a 1D array, so that we can plot the histogram of all pixel values at once:

plt.hist(image.flatten(), bins=100)
plt.show()

As the image is an array, we can perform operations on all values of the array at once. Like this, we can for example easily change the image to a float array and normalize the pixel values to a range between 0 and 1:

image_float = image - image.min() # make the values start at 0
image_float = image_float / image_float.max() # normalize to 1
image_float = image_float.astype(np.float32) # convert to float32

print("Mean:", np.mean(image_float))
print("Range:", np.min(image_float), "-", np.max(image_float))

Let’s check the pixel values:

image_float

As we can see in the histogram below, the pixel values are now between 0 and 1, but the relative distribution has stayed the same.

plt.hist(image_float.flatten(), bins=100)
plt.show()

The dot notation¶

As we have already seen with astype() for example, Numpy arrays come with a lot of functions attached to them (methods). In particular many of the statistics functions can be called in that way. For example, with the mean, we have two options that yield the same result:

print(np.mean(image))
print(image.mean())

Exercises¶

Find out how to generate a list of 10 numbers drawn from a normal distribution with a mean of 10 and standard deviation of 2.
Create a list of 10 numbers evenly spaced between 0 and 90.
Assuming that the values you obtained in (2) are angles in degrees, find a function that converts degrees to radians and apply it to the array.
Calculate the standard deviation of the array obtained in (3). Use both a numpy function (np.myfun) and a method attached to the array.


### YOUR CODE HERE

Create a 2D array of shape (20, 30) filled with random numbers drawn from a uniform distribution between 0 and 1. Use the np.random.uniform() function.
Display the array you created in (5) as an image using plt.imshow() and a colormap of your choice (though we recommend a sequential colormap like gray or viridis).
Find out the mean and the range of the pixel values in the image you created in (5). Use both a numpy function (np.myfun) and a method attached to the array.
Invert the pixel values of the image you created in (5). The range should still be between 0 and 1, but the mean should change. Check the mean before and after the inversion.
Hint: You can invert the pixel values by subtracting them from 1.


### YOUR CODE HERE

(Optional) Have a close look at the individual pixel values and the mean of the images before and after the inversion. What do you observe? Can you explain it? You can also plot the histograms of the pixel values before and after the inversion to visualize the change.