Table of Contents

In this session we will finally work with data!

Please make sure that you have the silc.csv data in the same folder as your Jupyter Notebook that you will use for this session.

Misc. Before we start

Some libraries have rather long names! You can import them and name them with an alias. For instance:

import numpy as np

<aside> 💡 Coding practice tip Conventionally, you load all the libraries you will need at the beginning of your Jupyter Notebook, rather than importing them whenever you need. Recall that you only need to import a library once within a given notebook.

</aside>

Describing Data

Summarizing functions

Quite intuitive

Mean: mean()
Variance: var()
Median: median()
Standard Deviation: std()

Basic Statistics on Python

First, we need a dataset! With Python, it is very easy to create one with the random() function. As its name suggests, this function helps us create random variables from different distributions. For instance, we can draw a number from the standard Normal distribution $N(0,1)$ using np.random.normal(m, s) where m is the mean and s is the standard deviation of the distribution. So, in our case, we would set m = 0 and s = 1.

Code

Let’s get us some data then:

Code