OSGeoLive-Notebooks/Pandas/solved - 04 - Groupby opera...

698 lines
25 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Groupby operations"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Some imports:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"try:\n",
" import seaborn\n",
"except ImportError:\n",
" pass\n",
"\n",
"pd.options.display.max_rows = 10"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Some 'theory': the groupby operation (split-apply-combine)\n",
"\n",
"The \"group by\" concept: we want to **apply the same function on subsets of your dataframe, based on some key to split the dataframe in subsets**\n",
"\n",
"This operation is also referred to as the \"split-apply-combine\" operation, involving the following steps:\n",
"\n",
"* **Splitting** the data into groups based on some criteria\n",
"* **Applying** a function to each group independently\n",
"* **Combining** the results into a data structure\n",
"\n",
"<img src=\"img/splitApplyCombine.png\">\n",
"\n",
"Similar to SQL `GROUP BY`"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The example of the image in pandas syntax:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>data</th>\n",
" <th>key</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>A</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>5</td>\n",
" <td>B</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>10</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>5</td>\n",
" <td>A</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>10</td>\n",
" <td>B</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>15</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>10</td>\n",
" <td>A</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>15</td>\n",
" <td>B</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>20</td>\n",
" <td>C</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" data key\n",
"0 0 A\n",
"1 5 B\n",
"2 10 C\n",
"3 5 A\n",
"4 10 B\n",
"5 15 C\n",
"6 10 A\n",
"7 15 B\n",
"8 20 C"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame({'key':['A','B','C','A','B','C','A','B','C'],\n",
" 'data': [0, 5, 10, 5, 10, 15, 10, 15, 20]})\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using the filtering and reductions operations we have seen in the previous notebooks, we could do something like:\n",
"\n",
"\n",
" df[df['key'] == \"A\"].sum()\n",
" df[df['key'] == \"B\"].sum()\n",
" ...\n",
"\n",
"But pandas provides the `groupby` method to do this:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>data</th>\n",
" </tr>\n",
" <tr>\n",
" <th>key</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>A</th>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>B</th>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>C</th>\n",
" <td>45</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" data\n",
"key \n",
"A 15\n",
"B 30\n",
"C 45"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('key').aggregate(np.sum) # 'sum'"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>data</th>\n",
" </tr>\n",
" <tr>\n",
" <th>key</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>A</th>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>B</th>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>C</th>\n",
" <td>45</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" data\n",
"key \n",
"A 15\n",
"B 30\n",
"C 45"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('key').sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And many more methods are available. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## And now applying this on some real data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We go back to the titanic survival data:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df = pd.read_csv(\"data/titanic.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C85</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>C123</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-success\">\n",
" <b>EXERCISE</b>: Using groupby(), calculate the average age for each sex.\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"clear_cell": true,
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Sex\n",
"female 27.915709\n",
"male 30.726645\n",
"Name: Age, dtype: float64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('Sex')['Age'].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-success\">\n",
" <b>EXERCISE</b>: Calculate the average survival ratio for all passengers.\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"clear_cell": true,
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.38383838383838381"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Survived'].sum() / len(df['Survived'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-success\">\n",
" <b>EXERCISE</b>: Calculate this survival ratio for all passengers younger that 25 (remember: filtering/boolean indexing).\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"clear_cell": true,
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.41196013289036543"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df25 = df[df['Age'] <= 25]\n",
"df25['Survived'].sum() / len(df25['Survived'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-success\">\n",
" <b>EXERCISE</b>: Is there a difference in this survival ratio between the sexes? (tip: write the above calculation of the survival ratio as a function)\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"clear_cell": true,
"collapsed": true
},
"outputs": [],
"source": [
"def survival_ratio(survived):\n",
" return survived.sum() / len(survived)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"clear_cell": true,
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Sex\n",
"female 0.742038\n",
"male 0.188908\n",
"Name: Survived, dtype: float64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('Sex')['Survived'].aggregate(survival_ratio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-success\">\n",
" <b>EXERCISE</b>: Make a bar plot of the survival ratio for the different classes ('Pclass' column).\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"clear_cell": true,
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7fa3df924518>"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAeEAAAFhCAYAAABH+hLWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAF+dJREFUeJzt3X90U/X9x/FXTIESGiGpSTkt2wFzXBm156yOzcOqsLEc\nYHNzHplrcQcOO1Xnr3M46xBZ3Qp/tCsHHI7jj+1wGCpOyHDSM7ejp+wX7hwoWrdjke4PNzxUNoU0\nJBATENeS7x+e5WuHkBQS3k36fPzV23tD3m0/5zx7c8OtI5VKpQQAAC67K6wHAABgrCLCAAAYIcIA\nABghwgAAGCHCAAAYIcIAABgpyeagjo4O9fb2yuFwqKWlRbW1tZKkY8eOaeXKlXI4HEqlUvrXv/6l\nlStX6qabbsrr0AAAFIOMEe7p6VF/f79CoZAOHTqkhx56SKFQSJJUUVGhZ555RpI0NDSkZcuWaf78\n+fmdGACAIpHx5eju7m4Fg0FJUiAQUDweVzKZPOe4Xbt2acGCBZo4cWLupwQAoAhljHAkEpHX601v\nezweRSKRc4779a9/rW9+85u5nQ4AgCI24jdmfdxdLl9//XVdffXVmjRpUsbHDw4OjfQpAQAoShmv\nCfv9/mFnvuFwWD6fb9gxf/7zn/WFL3whqyeMxU6NcMSxy+dza2DgPesxUCRYT8g11lT2fD73x34+\n45lwfX29urq6JEl9fX2qqKiQy+UadszBgwc1c+bMHIwJAMDYkfFMuK6uTjU1NWpsbJTT6VRra6s6\nOzvldrvTb9gaGBhQeXl53ocFAKCYOC73nzLkpYvs8VIPcon1hFxjTWXvol+OBgAA+UGEAQAwQoQB\nADBChAEAMEKEAQAwQoQBADBChAEAMEKEAQAwQoQBADBChAEAMEKEAQAwQoQBADBChAEAMEKEAQAw\nQoQBADBChAEAMEKEAQAwQoQBADBChAEAMEKEAQAwQoQBADBChAEAMEKEAQAwQoQBADBChAEAMFJi\nPYC1oaEhHT78lvUYHysWK1M0mrAeY5jp06+W0+m0HgMAisKYj/Dhw29pxYYX5Jrstx5l1Dt1MqxN\nD9ysQOAa61EAoCiM+QhLkmuyX2WeKusxAABjDNeEAQAwQoQBADBChAEAMEKEAQAwQoQBADBChAEA\nMEKEAQAwktX/E+7o6FBvb68cDodaWlpUW1ub3nf06FE1NzdrcHBQs2bN0tq1a/M1KwAARSXjmXBP\nT4/6+/sVCoXU1tam9vb2YfvXrVunpqYm7dy5U06nU0ePHs3bsAAAFJOMEe7u7lYwGJQkBQIBxeNx\nJZNJSVIqldJf//pXzZ8/X5L0ox/9SFOnTs3juAAAFI+MEY5EIvJ6veltj8ejSCQiSYpGo3K5XGpv\nb9ftt9+ujRs35m9SAACKzIjvHZ1KpYZ9HA6HtXz5clVWVuquu+7Syy+/rHnz5p338R6PSyUlo+ev\n8MRiZdYjFBSvt0w+n9t6DFwkfnbINdbUpckYYb/fnz7zlaRwOCyfzyfpw7PiqqoqTZs2TZI0Z84c\n/fOf/7xghGOxU5c6c06Ntj8VONpFowkNDLxnPQYugs/n5meHnGJNZe98v6xkfDm6vr5eXV1dkqS+\nvj5VVFTI5XJJkpxOp6ZNm6a33347vX/GjBm5mhkAgKKW8Uy4rq5ONTU1amxslNPpVGtrqzo7O+V2\nuxUMBtXS0qLVq1crlUrpU5/6VPpNWgAA4MKyuibc3Nw8bLu6ujr98Sc/+Ult3749t1MBADAGcMcs\nAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAA\njBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQ\nYQAAjBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBhAACMEGEA\nAIyUZHNQR0eHent75XA41NLSotra2vS++fPnq7KyUg6HQw6HQw8//LD8fn/eBgYAoFhkjHBPT4/6\n+/sVCoV06NAhPfTQQwqFQun9DodDW7ZsUWlpaV4HBQCg2GR8Obq7u1vBYFCSFAgEFI/HlUwm0/tT\nqZRSqVT+JgQAoEhljHAkEpHX601vezweRSKRYcesWbNGt99+uzZu3Jj7CQEAKFJZXRP+qP89612x\nYoVuvPFGTZkyRffee692796tBQsWnPfxHo9LJSXOkU+aJ7FYmfUIBcXrLZPP57YeAxeJnx1yjTV1\naTJG2O/3DzvzDYfD8vl86e1vfOMb6Y/nzp2rN99884IRjsVOXeyseRGNJqxHKCjRaEIDA+9Zj4GL\n4PO5+dkhp1hT2TvfLysZX46ur69XV1eXJKmvr08VFRVyuVySpEQioaamJv3nP/+R9OGbuK655ppc\nzQwAQFHLeCZcV1enmpoaNTY2yul0qrW1VZ2dnXK73QoGg/riF7+ohoYGlZaWatasWVq4cOHlmBsA\ngIKX1TXh5ubmYdvV1dXpj5cuXaqlS5fmdioAAMYA7pgFAIARIgwAgBEiDACAESIMAIARIgwAgBEi\nDACAkRHfthLAhQ0NDenw4besxzhHLFY2Ku8QN3361XI6R8+tbIHLiQgDOXb48FtaseEFuSbzd7Uz\nOXUyrE0P3KxAgDvtYWwiwkAeuCb7Veapsh4DwCjHNWEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBh\nAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAA\njBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjGQV4Y6ODjU2NmrJ\nkiV64403PvaYn/zkJ1q6dGlOhwMAoJhljHBPT4/6+/sVCoXU1tam9vb2c445dOiQXnvtNTkcjrwM\nCQBAMcoY4e7ubgWDQUlSIBBQPB5XMpkcdsy6devU3NycnwkBAChSGSMciUTk9XrT2x6PR5FIJL3d\n2dmp66+/XpWVlfmZEACAIlUy0gekUqn0xydPntSuXbv01FNP6d133x2273w8HpdKSpwjfdq8icXK\nrEcoKF5vmXw+t/UYoxpramRYU4WNn92lyRhhv98/7Mw3HA7L5/NJkvbv369YLKZvf/vbOnPmjI4c\nOaJ169Zp9erV5/33YrFTORg7d6LRhPUIBSUaTWhg4D3rMUY11tTIsKYKl8/n5meXpfP9spLx5ej6\n+np1dXVJkvr6+lRRUSGXyyVJWrhwoX73u98pFArpscce06xZsy4YYAAA8P8yngnX1dWppqZGjY2N\ncjqdam1tVWdnp9xud/oNWwAAYOSyuib8v+98rq6uPueYqqoqbdu2LTdTAQAwBnDHLAAAjBBhAACM\nEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBh\nAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAA\njBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBhAACMEGEAAIwQYQAAjBBhAACMlGRzUEdH\nh3p7e+VwONTS0qLa2tr0vp07d+r555+X0+nUzJkz1dramrdhAQAoJhnPhHt6etTf369QKKS2tja1\nt7en973//vt66aWXtGPHDm3fvl2HDh3S66+/nteBAQAoFhkj3N3drWAwKEkKBAKKx+NKJpOSpNLS\nUj355JO64oordPr0aSUSCV111VX5nRgAgCKRMcKRSERerze97fF4FIlEhh2zefNmLViwQF/5ylc0\nbdq03E8JAEARyuqa8EelUqlzPnfXXXdp+fLluuOOO/TZz35WdXV15328x+NSSYlzpE+bN7FYmfUI\nBcXrLZPP57YeY1RjTY0Ma6qw8bO7NBkj7Pf7h535hsNh+Xw+SdLJkyf1j3/8Q7Nnz9b48eM1d+5c\n/e1vf7tghGOxUzkYO3ei0YT1CAUlGk1oYOA96zFGNdbUyLCmCpfP5+Znl6Xz/bKS8eXo+vp6dXV1\nSZL6+vpUUVEhl8slSRocHNTq1at1+vRpSdKBAwc0Y8aMXM0MAEBRy3gmXFdXp5qaGjU2NsrpdKq1\ntVWdnZ1yu90KBoO6//77tXTpUpWUlGjmzJmaP3/+5ZgbAICCl9U14ebm5mHb1dXV6Y9vueUW3XLL\nLbmdCgCAMYA7ZgEAYIQIAwBghAgDAGCECAMAYIQIAwBghAgDAGCECAMAYIQIAwBghAgDAGCECAMA\nYIQIAwBghAgDAGCECAMAYIQIAwBghAgDAGCECAMAYIQIAwBghAgDAGCECAMAYKTEegAAwIUNDQ3p\n8OG3rMc4RyxWpmg0YT3GMNOnXy2n02k9RtaIMACMcocPv6UVG16Qa7LfepRR7dTJsDY9cLMCgWus\nR8kaEQaAAuCa7FeZp8p6DOQY14QBADBChAEAMEKEAQAwQoQBADBChAEAMEKEAQAwQoQBADBChAEA\nMEKEAQAwQoQBADBChAEAMEKEAQAwktUfcOjo6FBvb68cDodaWlpUW1ub3rd//3498sgjcjqdmjFj\nhtrb2/M2LAAAxSTjmXBPT4/6+/sVCoXU1tZ2TmTXrFmjRx99VNu3b1cikdBf/vKXvA0LAEAxyRjh\n7u5uBYNBSVIgEFA8HlcymUzv37Vrl/z+D//Gpdfr1YkTJ/I0KgAAxSVjhCORiLxeb3rb4/EoEomk\ntydNmiRJCofD2rdvn+bNm5eHMQEAKD5ZXRP+qFQqdc7njh8/rnvuuUdr167V5MmTL/h4j8elkhLn\nSJ82b2KxMusRCorXWyafz209xqjGmhoZ1lRmrKnsFdp6yhhhv98/7Mw3HA7L5/OltxOJhO688059\n//vf15w5czI+YSx26iJHzY9oNGE9QkGJRhMaGHjPeoxRjTU1MqypzFhT2Rut6+l8vxhkfDm6vr5e\nXV1dkqS+vj5VVFTI5XKl969bt07f+c53VF9fn6NRAQAYGzKeCdfV1ammpkaNjY1yOp1qbW1VZ2en\n3G63brjhBr3wwgt6++23tXPnTjkcDn3961/XbbfddjlmBwCgoGV1Tbi5uXnYdnV1dfrjAwcO5HYi\nAADGCO6YBQCAESIMAIARIgwAgBEiDACAESIMAIARIgwAgBEiDACAESIMAIARIgwAgBEiDACAESIM\nAIARIgwAgBEiDACAESIMAIARIgwAgBEiDACAESIMAIARIgwAgBEiDACAESIMAIARIgwAgBEiDACA\nESIMAIARIgwAgBEiDACAESIMAIARIgwAgBEiDACAESIMAIARIgwAgBEiDACAESIMAIARIgwAgBEi\nDACAESIMAICRrCLc0dGhxsZGLVmyRG+88cawfR988IFWr16txYsX52VAAACKVcYI9/T0qL+/X6FQ\nSG1tbWpvbx+2f/369fr0pz8th8ORtyEBAChGGSPc3d2tYDAoSQoEAorH40omk+n9zc3N6f0AACB7\nGSMciUTk9XrT2x6PR5FIJL3tcrnyMxkAAEWuZKQPSKVSl/SEHo9LJSXOS/o3cikWK7MeoaB4vWXy\n+dzWY4xqrKmRYU1lxprKXqGtp4wR9vv9w858w+GwfD7fRT9hLHbqoh+bD9FownqEghKNJjQw8J71\nGKMaa2pkWFOZsaayN1rX0/l+Mcj4cnR9fb26urokSX19faqoqDjnJehUKnXJZ8gAAIw1Gc+E6+rq\nVFNTo8bGRjmdTrW2tqqzs1Nut1vBYFArVqzQ0aNHdfjwYS1btkwNDQ266aabLsfsAAAUtKyuCTc3\nNw/brq6uTn+8adOm3E4EAMAYwR2zAAAwQoQBADBChAEAMEKEAQAwQoQBADBChAEAMEKEAQAwQoQB\nADBChAEAMEKEAQAwQoQBADBChAEAMEKEAQAwQoQBADBChAEAMEKEAQAwQoQBADBChAEAMEKEAQAw\nQoQBADBChAEAMEKEAQAwQoQBADBChAEAMEKEAQAwQoQBADBChAEAMEKEAQAwQoQBADBChAEAMEKE\nAQAwQoQBADBChAEAMEKEAQAwQoQBADBSks1BHR0d6u3tlcPhUEtLi2pra9P79u3bp0ceeUROp1Nz\n587Vvffem7dhAQAoJhnPhHt6etTf369QKKS2tja1t7cP29/e3q7HHntMO3bs0N69e3Xo0KG8DQsA\nQDHJGOHu7m4Fg0FJUiAQUDweVzKZlCQdOXJEU6ZMUUVFhRwOh+bNm6f9+/fnd2IAAIpExpejI5GI\nrr322vS2x+NRJBLRpEmTFIlE5PV60/u8Xq+OHDmSn0nz6NTJsPUIBYHvU/b4XmWH71P2+F5lVojf\no6yuCX9UKpW6qH3/5fO5R/qUeeXzXadXnr/OegwUEdYUco01Vbwyvhzt9/sViUTS2+FwWD6fL71v\nYGAgve/YsWPy+/15GBMAgOKTMcL19fXq6uqSJPX19amiokIul0uSVFVVpWQyqXfeeUeDg4Pas2eP\nbrjhhvxODABAkXCksngNeePGjXr11VfldDrV2tqqv//973K73QoGg3rttdf08MMPS5IWLVqk5cuX\n53tmAACKQlYRBgAAuccdswAAMEKEAQAwQoQBADBChAEAMEKER7l4PG49AgrYx73v8ujRowaToBhF\no1HrEQoeER7l7r//fusRUIB+//vf60tf+pLmzJmjBx98UIlEIr1v1apVhpOhUO3Zs0cLFy7U8uXL\n9eabb+rmm2/W0qVLNX/+fL388svW4xWsEd+2Ern37LPPnnffsWPHLuMkKBabN29WZ2enrrzySj33\n3HNqamrSli1b5Ha7s7q9LPC/fvazn+nJJ5/UO++8o7vvvltPPPGEZs6cqUgkorvvvlvz5s2zHrEg\nEeFR4KmnntKcOXM+9pafg4ODBhOh0DmdTk2ZMkWS1NDQoPLycjU1NennP/+5HA6H8XQoROPHj1dl\nZaUqKyvl9/s1c+ZMSdJVV12lCRMmGE9XuIjwKPD444+rra1NP/zhDzV+/Phh+1555RWjqVDIrrvu\nOn33u9/Vpk2bVFpaqmAwqAkTJmj58uU6ceKE9XgoQOXl5frFL36hpqYmhUIhSR++v2Dr1q2aOnWq\n8XSFiztmjRKnT5/WhAkTdMUVwy/T9/X1qaamxmgqFLJXXnlFn//854ed+SYSCb344ov61re+ZTgZ\nCtH777+vP/3pT/rqV7+a/lxfX596enq0ZMkSzoYvEhEGAMAI744GAMAIEQYAwAgRBgDACO+OBgrI\nv//9by1atEh1dXVKpVIaHBxUVVWV1q5dq7KysnOO7+zs1L59+7RhwwaDaQFkwpkwUGDKy8u1bds2\nPfPMM9qxY4f8fr8ef/zx8x7P/wsGRi/OhIEC97nPfU6/+tWvdODAAf34xz/WuHHjNGXKFK1bt27Y\ncX/4wx+0ZcsWTZgwQUNDQ1q/fr0qKyv19NNP67e//a0mTpyoiRMnasOGDTpz5oxWrlwpSTpz5owa\nGhp06623Wnx5QFEjwkABGxoa0u7duzV79mw98MADeuKJJxQIBLRt27Zz7ucbj8f105/+VFOnTtXm\nzZv1y1/+UqtWrdKjjz6q3bt3y+v1au/evQqHw9q7d68CgYDWrFmjDz74QM8995zRVwgUNyIMFJjj\nx49r2bJl6XtAz549W7feequ2bt2qQCAgSVq2bJmkD68J/1d5eblWrVqlVCqlSCSiz3zmM5Kk2267\nTU1NTVq4cKEWLVqk6dOny+l06p577tEPfvADzZs3Tw0NDZf5qwTGBiIMFJj/XhP+qBMnTujs2bPn\nfczg4KC+973v6Te/+Y0+8YlP6Nlnn9XBgwclSQ8++KDeffdd7dmzR/fdd59Wr16tG2+8US+++KJe\nffVVvfTSS3r66ae1Y8eOvH5dwFhEhIEC83E3uZsyZYo8Ho8OHjyoa6+9Vlu3btXEiRNVWloqSUom\nk3I6naqsrNSZM2f0xz/+UR6PR/F4XNu2bdN9992nJUuW6OzZszpw4IBOnjypqqoqzZkzR9dff72+\n/OUv6+zZs+fcVhXApSHCQIE537ud169fr7a2No0bN05XXnml1q9fr927d0uSJk+erK997WtavHix\nqqqqdMcdd2jVqlXq7u5WMpnU4sWLNXnyZI0bN07t7e06fvy41qxZk/6DInfeeScBBvKAe0cDAGCE\nX20BADBChAEAMEKEAQAwQoQBADBChAEAMEKEAQAwQoQBADDyfyjr35FrEFuKAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x7fa4103b5898>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.groupby('Pclass')['Survived'].aggregate(survival_ratio).plot(kind='bar')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you are ready, more groupby exercises can be found in the \"Advanded groupby operations\" notebook."
]
}
],
"metadata": {
"celltoolbar": "Nbtutor - export exercises",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}