## Jupyter Snippet CB2nd 01_pandas

Jupyter Snippet CB2nd 01_pandas

# 7.1. Exploring a dataset with pandas and matplotlib

``````from datetime import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
``````
``````player = 'Roger Federer'
'cookbook-2nd-data/blob/master/'
'federer.csv?raw=true',
parse_dates=['start date'],
dayfirst=True)
``````
``````df.head(3)
``````

``````df['win'] = df['winner'] == player
df['win'].tail()
``````
``````1174    False
1175     True
1176     True
1177     True
1178    False
Name: win, dtype: bool
``````
``````won = 100 * df['win'].mean()
print(f"{player} has won {won:.0f}% of his matches.")
``````
``````Roger Federer has won 82% of his matches.
``````
``````date = df['start date']
``````
``````df['dblfaults'] = (df['player1 double faults'] /
df['player1 total points total'])
``````
``````df['dblfaults'].tail()
``````
``````1174    0.018116
1175    0.000000
1176    0.000000
1177    0.011561
1178         NaN
Name: dblfaults, dtype: float64
``````
``````df['dblfaults'].describe()
``````
``````count    1027.000000
mean        0.012129
std         0.010797
min         0.000000
25%         0.004444
50%         0.010000
75%         0.018108
max         0.060606
Name: dblfaults, dtype: float64
``````
``````df.groupby('surface')['win'].mean()
``````
``````surface
Indoor: Carpet    0.736842
Indoor: Clay      0.833333
Indoor: Hard      0.836283
Outdoor: Clay     0.779116
Outdoor: Grass    0.871429
Outdoor: Hard     0.842324
Name: win, dtype: float64
``````
``````gb = df.groupby('year')
``````
``````fig, ax = plt.subplots(1, 1)
ax.plot_date(date.astype(datetime), df['dblfaults'],
alpha=.25, lw=0)
ax.plot_date(gb['start date'].max().astype(datetime),
gb['dblfaults'].mean(), '-', lw=3)
ax.set_xlabel('Year')
ax.set_ylabel('Double faults per match')
ax.set_ylim(0)
``````