beautifulsoup - python finding index of tag in string -
beautifulsoup - python finding index of tag in string -
html
<div class="productdescriptionwrapper"> <p>a worm worth getting hands dirty over. on 6 feet of crawl space, playhut’s wiggly worm brightly colored , friendly play structure. </p> <ul> <li>6ft of crawl through fun</li> <li>18” diameter easy crawl through</li> <li>bright colorful design</li> <li>product measures: 18""diam x 60""l</li> <li>recommended ages: 3 years & up<br /> </li> </ul> <p><strong>intended indoor use</strong></p>
code
def getbullets(self, soup): bulletlist = [] bullets = str(soup.findall('div', {'class': 'productdescriptionwrapper'})) bullets_re = re.compile('<li>(.*)</li>') bullets_pat = str(re.findall(bullets_re, bullets)) index = bullets_pat.findall('</li>') print index
how extract p
tags , li
tags? thanks!
notice following:
>>> beautifulsoup import beautifulsoup >>> html = """ <what have above> """ >>> soup = beautifulsoup(html) >>> bullets = soup.findall('div', {'class': 'productdescriptionwrapper'}) >>> ptags = bullets[0].findall('p') >>> print ptags [<p>a worm worth getting hands dirty over. on 6 feet of crawl space, playhut’s wiggly worm brightly colored , friendly play structure. </p>, <p><strong>intended indoor use</strong></p>] >>> print ptags[0].text worm worth getting hands dirty over. on 6 feet of crawl space, playhut’s wiggly worm brightly colored , friendly play structure.
you can @ contents of li tags in similar manner.
python beautifulsoup
Comments
Post a Comment