Today you will learn how to handle numeric data in so-called NumPy arrays.
Virtually all data science applications represent data as arrays, i.e. multi-dimensional lists of numbers meant to be processed efficiently. The most prevalent library for numerical computing in Python is NumPy but other libraries provide almost identical data structures for specialized applications, e.g. PyTorch for Deep Learning. The second day of the course focuses on handling these data structures and performing operations on them.
The topics covered include:
Introduction to arrays: purpose, dimensions, types etc.
Creating arrays and importing data as arrays
Extracting parts of arrays: slicing, indexing, logical indexing
Playing with dimensions: reshaping, assembling, broadcasting etc.
Applying functions to arrays: element-wise operations, reductions, aggregations
From NumPy to PyTorch: introduction to PyTorch tensors
Preparations¶
Please download the four jupyter notebooks so you can work through them locally. You can also download the slides as pdf.
Also make sure you have a working Python system, ideally in form of a conda environment such as prepared on day one of the course.