近期在研究py的网络编程,编写爬虫也是顺利成章的,开始在纠结与用正则表达式来匹配,到后来发现了Beautifulsoup,用他可以非常完美的帮我完成了这些任务:
Beautiful Soup 是用Python写的一个HTML/XML的解析器,它可以很好的处理不规范标记并生成剖析树(parse tree)。 它提供简单又常用的导航(navigating),搜索以及修改剖析树的操作。它可以大大节省你的编程时间。
简单使用说明:
python" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background-image:none;border:0px;line-height:1.1em;vertical-align:baseline;">
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">>>>
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">from
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">bs4
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">import
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">BeautifulSoup
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">>>> html_doc
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python comments" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,130,0);">"""
python comments" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,130,0);">... <html><head><title>The Dormouse's story</title></head>
python comments" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,130,0);">...
python comments" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,130,0);">... <p class="title"><b>The Dormouse's story</b></p>
python comments" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,130,0);">...
python comments" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,130,0);">... <p class="story">Once upon a time there were three little sisters; and their names were
python comments" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,130,0);">... <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
python comments" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,130,0);">... <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
python comments" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,130,0);">... <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
python comments" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,130,0);">... and they lived at the bottom of a well.</p>
python comments" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,130,0);">...
python comments" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,130,0);">... <p class="story">...</p>
python comments" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,130,0);">... """
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">>>> soup
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">BeautifulSoup(html_doc)
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">>>> soup.head()
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">[<title>The Dormouse's story<
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">/
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">title>]
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">>>> soup.title
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;"><title>The Dormouse's story<
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">/
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">title>
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">>>> soup.title.string
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">u
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"The Dormouse's story"
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">>>> soup.body.b
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;"><b>The Dormouse's story<
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">/
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">b>
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">>>> soup.body.a
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;"><a
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">class
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"sister"
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">href
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"http://example.com/elsie"
python functions" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(255,20,147);">id
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"link1"
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">>Elsie<
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">/
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">a>
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">>>> soup.get_text()
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">u
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"... The Dormouse's story\n... \n... The Dormouse's story\n... \n... Once upon a time there were three little sisters; and their names were\n... Elsie,\n... Lacie and\n... Tillie;\n... and they lived at the bottom of a well.\n... \n... ...\n... "
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">>>> soup.find_all(
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">'a'
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">)
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">[<a
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">class
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"sister"
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">href
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"http://example.com/elsie"
python functions" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(255,20,147);">id
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"link1"
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">>Elsie<
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">/
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">a>, <a
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">class
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"sister"
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">href
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"http://example.com/lacie"
python functions" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(255,20,147);">id
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"link2"
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">>Lacie<
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">/
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">a>, <a
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">class
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"sister"
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">href
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"http://example.com/tillie"
python functions" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(255,20,147);">id
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"link3"
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">>Tillie<
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">/
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">a>]
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">>>>
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">for
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">key
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">in
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">soup.find_all(
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">'a'
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">):
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">...
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">print
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">key.get(
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">'class'
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">),key.get(
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">'href'
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">)
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">...
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">[
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">'sister'
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">] http:
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">/
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">/
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">example.com
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">/
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">elsie
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">[
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">'sister'
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">] http:
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">/
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">/
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">example.com
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">/
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">lacie
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">[
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">'sister'
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">] http:
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">/
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">/
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">example.com
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">/
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">tillie
|
###通过里面的方法,可以很快调出里面的元素和结果:
简单说明:
soup.body:表示显示body标签下面的内容,也可以用.来叠加标签:
soup.title.string:表示现在titile的文本内容
soup.get_text():表示显示所有文本内容:
soup.find_all():方式可以随意组合,也可以通过任意标签,包括class,id 等方式:
举例说明:以我常常看的直播表新闻为例;
1、首先看看我们要获得的内容:
我要获取的是上面那一栏热点新闻:如世预赛国足不敌卡塔而
2、源代码查看:
1
2
|
<
div
class
=
"fb_bbs"
><
a
href
=
"http://news.zhibo8.cc/zuqiu/"
style
=
"padding: 0 5px 0 0;"
target
=
"_blank"
title
=
"足球新闻"
><
img
src
=
"/css/images/football.png"
/></
a
><
span
><
a
href
=
"http://news.zhibo8.cc/zuqiu/"
target
=
"_blank"
><
font
color
=
"red"
> 世预赛:国足0-1不敌卡塔
尔</
font
></
a
>|<
a
href
=
"http://news.zhibo8.cc/zuqiu/2015-10-09/5616a910d74ac.htm"
target
=
"_blank"
>国足“刷卡”耻辱:11年不胜</
a
>|<
a
hf
=
"http://news.zhibo8.cc/zuqiu/2015-10-09/5616b22cbd134.htm"
target
=
"_blank"
>切尔西签下阿梅利亚</
a
>|<
a
href
=
"http://news.zhibo8.cc/zuqiu/2015-10-09/5616daa45ee48.htm"
target
=
"_blank"
>惊人!莱万5场14球</
a
>|<
a
href
=
"http://tu.zhibo8.cc/zuqiu/"
target
=
"_blank"
>图-FIFA16中国球员</
a
></
span
></
div
>
|
###从源码看到,这个是一个div 标签包裹的一个class=“fb_bbs”的版块,当然我们要确保这个是唯一的。
3、用BeautifulSoup来分析出结果代码如下:
python" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background-image:none;border:0px;line-height:1.1em;vertical-align:baseline;">
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
|
python comments" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,130,0);">#coding=utf-8
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">import
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">urllib,urllib2
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">from
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">bs4
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">import
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">BeautifulSoup
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">try
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">:
python spaces" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;">
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">html
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">urllib2.urlopen(
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"http://www.zhibo8.cc"
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">)
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">except
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">urllib2.HTTPError as err:
python spaces" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;">
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">print
python functions" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(255,20,147);">str
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">(err)
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">soup
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">BeautifulSoup(html)
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">for
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">i
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">in
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">soup.find_all(
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"div"
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">,attrs
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">{
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"class"
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">:
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"fb_bbs"
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">}):
python spaces" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;">
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">result
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">=
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">i.get_text().split(
python string" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#0000FF;">"|"
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">)
python spaces" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;">
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">for
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">term
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">in
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">result:
python spaces" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;">
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">print
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">term
python spaces" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;">
python value" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,153,0);">4
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">、执行效果:
python spaces" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;">
python spaces" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;">
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">[root@master network]
python comments" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,130,0);"># python url.py
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">世预赛:国足
python value" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,153,0);">0
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">-
python value" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,153,0);">1
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">不敌卡塔尔
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">国足“刷卡”耻辱:
python value" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,153,0);">11
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">年不胜
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">切尔西签下阿梅利亚
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">惊人!莱万
python value" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,153,0);">5
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">场
python value" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,153,0);">14
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">球
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">图
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">-
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">FIFA16中国球员
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">利物浦官方宣布克洛普上任
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">档案:克洛普的安菲尔德之旅
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">欧预赛
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">-
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">德国爆冷
python value" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,153,0);">0
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">-
python value" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,153,0);">1
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">爱尔兰
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">葡萄牙
python value" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,153,0);">1
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">-
python value" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:rgb(0,153,0);">0
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">胜丹麦
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">图
python keyword" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;font-weight:bold;color:rgb(0,102,153);">-
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">穆帅难罢手
python plain" style="font-family:Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace;font-size:1em;background:none;border:0px;line-height:1.1em;vertical-align:baseline;color:#000000;">到此任务差不多完成,代码量比re模块少了很多,而且简洁唯美,用py做爬虫确实是个利器;
|
本文转自 小罗ge11 51CTO博客,原文链接:http://blog.51cto.com/xiaoluoge/1701086,如需转载请自行联系原作者