This is an archived version of the course and is no longer updated. Please find the latest version of the course on the main webpage.

Exercises

Ok, that is it for our guided tutorials.

Now, it’s over to you to try some exercises for using Pandas!

Many thanks to the following people for contributing to these exercises: Joe Stacey, Oana Cocarascu

Part A: Data cleaning and selection

Ex1a: Create the following Pandas DataFrame:

Name Age Twitter.followers HoursPW.on.twitter
Annette 23 23 2
Boris 52 102 4
Christine 65 57 0
Daniel 45 453 NA
Eve 25 753 3
Frances 35 Unknown 1
Gabi 21 Unknown 2

Ex1b: For the DataFrame created above, perform the following operations to see who has the most followers:

  • If anyone has ‘Unknown’ number of followers, set this to be 0
  • Remove any rows that contain NA
  • Subtract 5 from each person’s Twitter.followers (to account for bots)
  • Filter the table to only include people who have more than 50 followers
  • Sort the table by the ascending number of twitter followers

Ex1c: For the updated dataframe, try also performing the following operations:

  • Find the mean of each column
  • Return the value from the 2nd row and the 3rd column
  • Return the 3rd row from the column named ‘Name’
  • Drop all columns except for ‘Name’ and ‘Twitter.followers’.
  • How else could you only select these two columns?

Part B: Finding aggregate information from a Pandas DataFrame

Ex2a: Use the following commands to create a new Pandas DataFrame:

  • input_data = [['Europe','UK', 'London', 'Rain', 15, 5],['Europe','France', 'Paris', 'Cloudy', 22, 3], ['Europe','Spain', 'Madrid', 'Sunny', 25, 4],['Asia','Afghanistan', 'Kabul', 'Sunny', 37, 2], ['Asia','Pakistan', 'Islamabad', 'Sunny', 32, 4],['Europe','Germany', 'Berlin', 'Cloudy', 20, 5] ]
  • data = pd.DataFrame(input_data, columns = ['Continent', 'Country', 'Capital.city', 'Weather', 'Temperature', 'Visitor.rating'])

This should look like the table below:

Continent Country Capital.city Weather Temperature Visitor.rating
Europe UK London Rain 15 5
Europe France Paris Cloudy 22 3
Europe Spain Madrid Sunny 25 4
Asia Afghanistan Kabul Sunny 37 2
Asia Pakistan Islamabad Sunny 32 4
Europe Germany Berlin Cloudy 20 5

Ex2b: Update the DataFrame based on the following:

  • Create another row for a country of your choice, and merge
  • Create another column with a 0 or a 1 based on if you’ve been to each capital city

Ex3: In Pandas we can easily group our data by one or more features, and then create a DataFrame with summary information about each group. We can do this using .groupby(col_name) followed by .agg() or .transform(). Use these commands to answer the following questions:

  • Create a DataFrame with the average temperature for each Continent
  • Create a DataFrame with the minimum, median and maximum visitor rating for each Continent
  • Produce a copy of the original DataFrame, and without losing any rows add a column with the average temperature for the continent. E.g. for the first three rows this figure will be the same as all three rows relate to Europe.

Bonus question

Ex4: Pokemon data analysis

  • Download the Pokemon dataset at PokemonData

  • Load the csv file into a dataframe and set the Name to be the index of the dataframe
  • Print the first 20 pokemons
  • Print the name of the columns
  • Print Type 1 and Generation for all pokemons
  • Get all the data for Pikachu
  • Get the number of nulls in each column
  • Use the unique function to display the unique values of column Type 1
  • For each unique value of Type 1 , print Name , Type 1 , Generation , and Legendary of the first 5 pokemons
  • Print all pokemons that have Water as Type 1 and Dragon as Type 2
  • Print all pokemons that are either Type 2 Electric or Type 2 Ice