This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 1: Introduction

Introduction

face Josiah Wang

In this lesson, we will look at pandas, a Python library for data manipulation and analysis.

Pandas apparently derives from the term panel data, although it is also a word play on “Python data analysis”. In any case, it unfortunately has nothing do with the cute creature below 🐼

Panda

Image by Sharon Ang from Pixabay

Pandas is built on top of NumPy.

Pandas offers powerful and flexible data structures. It is used to clean, transform, manipulate and analyse data.

This will only be a high-level introduction just enough to get you started. We will not cover all the features of pandas.

I will focus on trying to get you to understand how pandas is structured, so that you will be able to explore it in more depth on your own via reading the official documentation and tutorials. Remember our mantra - learn how to fish and cook your own fish!

Pandas is pretty much object-oriented. So to really understand pandas, I will assume that your OOP knowledge is top-notch by now!

To import pandas:

import pandas as pd

I will assume this import and use pd throughout the module.

As usual, please actively try things out yourself. This lessons will have fewer practical exercises than other previous lessons (and no quizzes! 😲) So do try running the codes yourselves too. Do not just read through the materials passively!

You can use Jupyter Notebook for this lesson if you prefer. It prints out nicer looking tables for you to view your results. A plain terminal works fine as well!