OSGeoLive-Notebooks/Pandas/03b - Some more advanced in...

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Advanced indexing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "try:\n",
    "    import seaborn\n",
    "except ImportError:\n",
    "    pass\n",
    "\n",
    "pd.options.display.max_rows = 10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This dataset is borrowed from the [PyCon tutorial of Brandon Rhodes](https://github.com/brandon-rhodes/pycon-pandas-tutorial/) (so all credit to him!). You can download these data from here: [`titles.csv`](https://drive.google.com/file/d/0B3G70MlBnCgKa0U4WFdWdGdVOFU/view?usp=sharing) and [`cast.csv`](https://drive.google.com/file/d/0B3G70MlBnCgKRzRmTWdQTUdjNnM/view?usp=sharing) and put them in the `/data` folder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "cast = pd.read_csv('data/cast.csv')\n",
    "cast.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "titles = pd.read_csv('data/titles.csv')\n",
    "titles.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setting columns as the index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Why is it useful to have an index?\n",
    "\n",
    "- Giving meaningful labels to your data -> easier to remember which data are where\n",
    "- Unleash some powerful methods, eg with a DatetimeIndex for time series\n",
    "- Easier and faster selection of data\n",
    "\n",
    "It is this last one we are going to explore here!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Setting the `title` column as the index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "c = cast.set_index('title')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "c.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instead of doing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "cast[cast['title'] == 'Hamlet']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "we can now do:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "c.loc['Hamlet']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But you can also have multiple columns as the index, leading to a **multi-index or hierarchical index**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "c = cast.set_index(['title', 'year'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "c.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "c.loc[('Hamlet', 2000),:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "c2 = c.sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "c2.loc[('Hamlet', 2000),:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Nbtutor - export exercises",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}