PythonでHtmlを解析する（BeautifulSoupを使って） - のんびりしているエンジニアの日記

皆さんこんにちは
お元気ですか。私は元気です。

さて、今日はBeautifulSoupを使ったhtmlの解析をしたいと思っています。

インストール

sudo pip install beautifulsoup

BeautifulSoupを使って解析する

#coding:utf-8
import BeautifulSoup
import urllib2

opener = urllib2.build_opener()
url = "http://stocks.finance.yahoo.co.jp/stocks/history/?code=998407.O"
soup = BeautifulSoup.BeautifulSoup(urllib2.urlopen(url).read())

#liタグを指定して検索
for li in soup.findAll("li"):
	print li

#attributeを指定して検索することができる
for li in soup.findAll(attrs={"class":"ymuiArrow1"}):
	print li

for text in soup.findAll(text=[u"チャート"]):
	print text.parent
	print text

基本的な使い方ですが、
BeautifulSoup.BeautifulSoup(urllib2.urlopen(url).read())
にてhtmlを読み込ませます。

そのタグに対して、findやfindAllを使って、取得します。
文字だけ取得する場合は .stringを使うことで可能とします。

attr:属性の検索をすることができる。
text:テキストを検索してテキストを取得します。タグを取得したい場合はparentで

これを使うことできっといいことが・・・

参考文献

BeautifulSoupでスクレイピングのまとめ | taichino.com