pandas でヒストグラム - (主に)プログラミングのメモ

pandas (Python Data Analysis Library)
http://pandas.pydata.org/

http://archive.ics.uci.edu/ml/datasets/Iris にて公開されている Iris データを利用する。
iris.data と iris.names を持ってきて，ヘッダ行を加工した。

$ head iris.csv
"sepal length","sepal width","petal length","petal width","class"
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa

データ読み込みはこんな感じ：

>>> from pandas import *
>>> data = read_csv('iris.csv')
>>> data.head(10)
   sepal length  sepal width  petal length  petal width        class
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
3           4.6          3.1           1.5          0.2  Iris-setosa
4           5.0          3.6           1.4          0.2  Iris-setosa
5           5.4          3.9           1.7          0.4  Iris-setosa
6           4.6          3.4           1.4          0.3  Iris-setosa
7           5.0          3.4           1.5          0.2  Iris-setosa
8           4.4          2.9           1.4          0.2  Iris-setosa
9           4.9          3.1           1.5          0.1  Iris-setosa

ヘッダ行を分離すると共に，各行に番号を振ってくれた。
しかし，ヘッダ行が存在しない場合，どうすればよいのだろうか？

matplot.pyplot を用いてヒストグラムを描く。

>>> import matplotlib.pyplot as plt
>>> data['sepal width'].hist()
<matplotlib.axes.AxesSubplot object at 0xa0017ec>

>>> plt.show()

f:id:ymuto109:20121102000743p:plain