OS: openSUSE Leap 42.1 (x86_64)
# sudo pip install beautifulsoup4
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://pythonscraping.com/pages/page1.html")
bsObj = BeautifulSoup(html.read())
print(bsObj.h1)
用上面這個程式跑會有Warning,如下
/usr/lib/python3.4/site-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.paser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently
The code that caused this warning is on line 5 of the file myScrap.py. To get rid of this warning, change code that looks like this:
BeautifulSoup([your markup])
to this:
BeautifulSoup([your markup], "html.parser")
markup_type=markup_type))
依照噴出的Warning做程式的修改,所以就變成下面的寫法了。
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://pythonscraping.com/pages/page1.html")
bsObj = BeautifulSoup(html, 'html.parser)
print(bsObj.h1)
上面程式中的意思把
html
內容轉換成 BeautifulSoup
的物件後,並把 html
中的標籤 h1
給顯示出來。bsObj.h1
也可以改寫成bsObj.html.body.h1
bsObj.html.h1
bsObj.body.h1
得到的結果都會是一樣的
沒有留言:
張貼留言