OSGeoLive-Notebooks/Pandas/03b - Some more advanced in...

246 lines
4.6 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Advanced indexing"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"try:\n",
" import seaborn\n",
"except ImportError:\n",
" pass\n",
"\n",
"pd.options.display.max_rows = 10"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This dataset is borrowed from the [PyCon tutorial of Brandon Rhodes](https://github.com/brandon-rhodes/pycon-pandas-tutorial/) (so all credit to him!). You can download these data from here: [`titles.csv`](https://drive.google.com/file/d/0B3G70MlBnCgKa0U4WFdWdGdVOFU/view?usp=sharing) and [`cast.csv`](https://drive.google.com/file/d/0B3G70MlBnCgKRzRmTWdQTUdjNnM/view?usp=sharing) and put them in the `/data` folder."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"cast = pd.read_csv('data/cast.csv')\n",
"cast.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"titles = pd.read_csv('data/titles.csv')\n",
"titles.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setting columns as the index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Why is it useful to have an index?\n",
"\n",
"- Giving meaningful labels to your data -> easier to remember which data are where\n",
"- Unleash some powerful methods, eg with a DatetimeIndex for time series\n",
"- Easier and faster selection of data\n",
"\n",
"It is this last one we are going to explore here!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Setting the `title` column as the index:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"c = cast.set_index('title')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"c.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead of doing:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%time\n",
"cast[cast['title'] == 'Hamlet']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"we can now do:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%time\n",
"c.loc['Hamlet']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But you can also have multiple columns as the index, leading to a **multi-index or hierarchical index**:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"c = cast.set_index(['title', 'year'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"c.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%time\n",
"c.loc[('Hamlet', 2000),:]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"c2 = c.sort_index()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%time\n",
"c2.loc[('Hamlet', 2000),:]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"celltoolbar": "Nbtutor - export exercises",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}