OSGeoLive-Notebooks/Pandas/06 - Reshaping data.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p><font size=\"6\"><b>Reshaping data</b></font></p>\n",
    "\n",
    "> *© 2016, Joris Van den Bossche and Stijn Van Hoey  (<mailto:jorisvandenbossche@gmail.com>, <mailto:stijnvanhoey@gmail.com>). Licensed under [CC BY 4.0 Creative Commons](http://creativecommons.org/licenses/by/4.0/)*\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "run_control": {
     "frozen": false,
     "read_only": false
    },
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pivoting data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cfr. excel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "People who know Excel, probably know the **Pivot** functionality:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](img/pivot_excel.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data of the table:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "excelample = pd.DataFrame({'Month': [\"January\", \"January\", \"January\", \"January\", \n",
    "                                  \"February\", \"February\", \"February\", \"February\", \n",
    "                                  \"March\", \"March\", \"March\", \"March\"],\n",
    "                   'Category': [\"Transportation\", \"Grocery\", \"Household\", \"Entertainment\",\n",
    "                                \"Transportation\", \"Grocery\", \"Household\", \"Entertainment\",\n",
    "                                \"Transportation\", \"Grocery\", \"Household\", \"Entertainment\"],\n",
    "                   'Amount': [74., 235., 175., 100., 115., 240., 225., 125., 90., 260., 200., 120.]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "excelample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "excelample_pivot = excelample.pivot(index=\"Category\", columns=\"Month\", values=\"Amount\")\n",
    "excelample_pivot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Interested in *Grand totals*?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sum columns\n",
    "excelample_pivot.sum(axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sum rows\n",
    "excelample_pivot.sum(axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pivot is just reordering your data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Small subsample of the titanic dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "df = pd.DataFrame({'Fare': [7.25, 71.2833, 51.8625, 30.0708, 7.8542, 13.0],\n",
    "                   'Pclass': [3, 1, 1, 2, 3, 2],\n",
    "                   'Sex': ['male', 'female', 'male', 'female', 'female', 'male'],\n",
    "                   'Survived': [0, 1, 0, 1, 0, 1]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "df.pivot(index='Pclass', columns='Sex', values='Fare')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "df.pivot(index='Pclass', columns='Sex', values='Survived')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So far, so good..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now use the full titanic dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv(\"data/titanic.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And try the same pivot (*no worries about the try-except, this is here just used to catch a loooong error*):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    df.pivot(index='Sex', columns='Pclass', values='Fare')\n",
    "except Exception as e:\n",
    "    print(\"Exception!\", e)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This does not work, because we would end up with multiple values for one cell of the resulting frame, as the error says: `duplicated` values for the columns in the selection. As an example, consider the following rows of our three columns of interest:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.loc[[1, 3], [\"Sex\", 'Pclass', 'Fare']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since `pivot` is just restructuring data, where would both values of `Fare` for the same combination of `Sex` and `Pclass` need to go?\n",
    "\n",
    "Well, they need to be combined, according to an `aggregation` functionality, which is supported by the function`pivot_table`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-danger\">\n",
    "\n",
    "<b>NOTE</b>:\n",
    "\n",
    " <ul>\n",
    "  <li>**Pivot** is purely restructuring: a single value for each index/column combination is required.</li>\n",
    "</ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pivot tables - aggregating while pivoting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv(\"data/titanic.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "df.pivot_table(index='Sex', columns='Pclass', values='Fare')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "\n",
    "<b>REMEMBER</b>:\n",
    "\n",
    " <ul>\n",
    "  <li>By default, `pivot_table` takes the **mean** of all values that would end up into one cell. However, you can also specify other aggregation functions using the `aggfunc` keyword.</li>\n",
    "</ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "df.pivot_table(index='Sex', columns='Pclass', \n",
    "               values='Fare', aggfunc='max')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "df.pivot_table(index='Sex', columns='Pclass', \n",
    "               values='Fare', aggfunc='count')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "\n",
    "<b>REMEMBER</b>:\n",
    "\n",
    " <ul>\n",
    "  <li>There is a shortcut function for a `pivot_table` with a `aggfunc=count` as aggregation: `crosstab`</li>\n",
    "</ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.crosstab(index=df['Sex'], columns=df['Pclass'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "clear_cell": false
   },
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "<b>EXERCISE</b>:\n",
    "\n",
    " <ul>\n",
    "  <li>Make a pivot table with the survival rates (= number of persons survived / total number of persons) for Pclass vs Sex.</li>\n",
    "  <li>Plot the result as a bar plot.</li>\n",
    "</ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "# %load snippets/06 - Reshaping data20.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "# %load snippets/06 - Reshaping data21.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "clear_cell": false
   },
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "<b>EXERCISE</b>:\n",
    "\n",
    " <ul>\n",
    "  <li>Make a table of the median Fare payed by aged/underaged vs Sex.</li>\n",
    "</ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# %load snippets/06 - Reshaping data22.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "# %load snippets/06 - Reshaping data23.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Melt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `melt` function performs the inverse operation of a `pivot`. This can be used to make your frame longer, i.e. to make a *tidy* version of your data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "pivoted = df.pivot_table(index='Sex', columns='Pclass', values='Fare').reset_index()\n",
    "pivoted.columns.name = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pivoted"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Assume we have a DataFrame like the above. The observations (the average Fare people payed) are spread over different columns. In a tidy dataset, each observation is stored in one row. To obtain this, we can use the `melt` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.melt(pivoted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see above, the `melt` function puts all column labels in one column, and all values in a second column.\n",
    "\n",
    "In this case, this is not fully what we want. We would like to keep the 'Sex' column separately:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.melt(pivoted, id_vars=['Sex']) #, var_name='Pclass', value_name='Fare')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reshaping with `stack` and `unstack`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The docs say:\n",
    "\n",
    "> Pivot a level of the (possibly hierarchical) column labels, returning a\n",
    "DataFrame (or Series in the case of an object with a single level of\n",
    "column labels) having a hierarchical index with a new inner-most level\n",
    "of row labels.\n",
    "\n",
    "Indeed... \n",
    "<img src=\"img/schema-stack.svg\" width=50%>\n",
    "\n",
    "Before we speak about `hierarchical index`, first check it in practice on the following dummy example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "df = pd.DataFrame({'A':['one', 'one', 'two', 'two'], \n",
    "                   'B':['a', 'b', 'a', 'b'], \n",
    "                   'C':range(4)})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To use `stack`/`unstack`, we need the values we want to shift from rows to columns or the other way around as the index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "df = df.set_index(['A', 'B']) # Indeed, you can combine two indices\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "result = df['C'].unstack()\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "df = result.stack().reset_index(name='C')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "source": [
    "<div class=\"alert alert-info\">\n",
    "\n",
    "<b>REMEMBER</b>:\n",
    "\n",
    " <ul>\n",
    "  <li>**stack**: make your data *longer* and *smaller* </li>\n",
    "  <li>**unstack**: make your data *shorter* and *wider* </li>\n",
    "</ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Mimick pivot table "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To better understand and reason about pivot tables, we can express this method as a combination of more basic steps. In short, the pivot is a convenient way of expressing the combination of a `groupby` and `stack/unstack`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv(\"data/titanic.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "df.pivot_table(index='Pclass', columns='Sex', \n",
    "               values='Survived', aggfunc='mean')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "<b>EXERCISE</b>:\n",
    "\n",
    " <ul>\n",
    "  <li>Get the same result as above based on a combination of `groupby` and `unstack`</li>\n",
    "  <li>First use `groupby` to calculate the survival ratio for all groups</li>\n",
    "  <li>Then, use `unstack` to reshape the output of the groupby operation</li>\n",
    "</ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "# %load snippets/06 - Reshaping data37.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Mimick melt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Like the pivot table above, we can now also obtain the result of `melt` with stack/unstack.\n",
    "\n",
    "Let's use the same `pivoted` frame as above, and look at the final melt result:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pivoted = df.pivot_table(index='Sex', columns='Pclass', values='Fare').reset_index()\n",
    "pivoted.columns.name = None\n",
    "pivoted"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.melt(pivoted, id_vars=['Sex'], var_name='Pclass', value_name='Fare')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "<b>EXERCISE</b>:\n",
    "\n",
    " <ul>\n",
    "  <li>Get the same result as above using `stack`/`unstack` (combined with `set_index` / `reset_index`)</li>\n",
    "  <li>Tip: set those columns as the index that you do not want to stack</li>\n",
    "</ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true
   },
   "outputs": [],
   "source": [
    "# %load snippets/06 - Reshaping data40.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true
   },
   "outputs": [],
   "source": [
    "# %load snippets/06 - Reshaping data41.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true
   },
   "outputs": [],
   "source": [
    "# %load snippets/06 - Reshaping data42.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true
   },
   "outputs": [],
   "source": [
    "# %load snippets/06 - Reshaping data43.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# Exercises: use the reshaping methods with the movie data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These exercises are based on the [PyCon tutorial of Brandon Rhodes](https://github.com/brandon-rhodes/pycon-pandas-tutorial/) (so credit to him!) and the datasets he prepared for that. You can download these data from here: [`titles.csv`](https://drive.google.com/open?id=0B3G70MlBnCgKajNMa1pfSzN6Q3M) and [`cast.csv`](https://drive.google.com/open?id=0B3G70MlBnCgKal9UYTJSR2ZhSW8) and put them in the `/data` folder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "cast = pd.read_csv('data/cast.csv')\n",
    "cast.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "titles = pd.read_csv('data/titles.csv')\n",
    "titles.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "<b>EXERCISE</b>:\n",
    "\n",
    " <ul>\n",
    "  <li>Plot the number of actor roles each year and the number of actress roles each year over the whole period of available movie data.</li>\n",
    "</ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "# %load snippets/06 - Reshaping data46.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true
   },
   "outputs": [],
   "source": [
    "# %load snippets/06 - Reshaping data47.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true
   },
   "outputs": [],
   "source": [
    "# %load snippets/06 - Reshaping data48.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "<b>EXERCISE</b>:\n",
    "\n",
    " <ul>\n",
    "  <li>Plot the number of actor roles each year and the number of actress roles each year. Use kind='area' as plot type</li>\n",
    "</ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "# %load snippets/06 - Reshaping data49.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "<b>EXERCISE</b>:\n",
    "\n",
    " <ul>\n",
    "  <li>Plot the fraction of roles that have been 'actor' roles each year over the whole period of available movie data.</li>\n",
    "</ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "# %load snippets/06 - Reshaping data50.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "<b>EXERCISE</b>:\n",
    "\n",
    " <ul>\n",
    "  <li>Define a year as a \"Superman year\" when films of that year feature more Superman characters than Batman characters. How many years in film history have been Superman years?</li>\n",
    "</ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "# %load snippets/06 - Reshaping data51.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "# %load snippets/06 - Reshaping data52.py"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Nbtutor - export exercises",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  },
  "nav_menu": {},
  "toc": {
   "navigate_menu": true,
   "number_sections": true,
   "sideBar": true,
   "threshold": 6,
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}