Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files is knows as Beautiful Soup. As a BeautifulSoup is not a standard python library. Beautiful Soup 3 was the official release of Beautiful Soup from May 2006 to March 2012.
Kinds of objects in Beautiful Soup
- Tag
- NavigableString
- BeautifulSoup
- Comments
HTML Tree Structure
Tag Object
A HTML tag is use to define various types of content. A tag object in BeautifulSoup corresponds to an HTML tag in the actual page or document is knows as Tag Object.
HTML Tags contain Many attributes and methods and two important features of a tag are following :-
- attributes
- name
Name tag
Every tag has a name, accessible as .name is knows as name tag.
Syntax
tag.name #'html'
Example of name tag
tag.name = "Strong" tag <Strong><body><b class="boldest">Basic engineer</b></body></Strong> tag.name 'Strong
Attributes
A attributes tag object can have any number of attributes. The tag <b class=”boldest”> has an attribute ‘class’ whose value is “boldest”.
Syntax
tag.attrs # {'id': 'boldest'
Example of Attributes
what['class'] = 'Basic engineer' what['style'] = '2022' what <div class="Basic engineer" style="2022"></div> del what['style'] what <div class="Basic engineer"></div> del why['class'] why <b SecondAttribute="2">hello student</b> del why['SecondAttribute'] why </b> what['class'] 'Basic engineer' what['style'] KeyError: 'style'
Multiple Value attributes
Some of the HTML-5 attributes can have multiple values. Most commonly use is the class-attribute in multiple CSS-values . Others include ‘rel’, ‘rev’, ‘headers’, ‘accesskey’ and ‘accept-charset’.
Example of Multiple Value attributes
from bs4 import BeautifulSoup css_soup = BeautifulSoup('<p class="body"></p>') css_soup.p['class'] ['body'] css_soup = BeautifulSoup('<p class="body bold"></p>') css_soup.p['class'] ['body', 'bold']
NavigableString
A NavigableString is just like Python Unicode string. The navigablestring object is use to represent the contents of a tag. The access the contents “.string” with tag.
Example of NavigableString
from bs4 import BeautifulSoup soup = BeautifulSoup("<h2 id='message'>Basic engineer</h2>") soup.string 'Basic engineer' type(soup.string)
BeautifulSoup
The BeautifulSoup is represents the parse document as a whole. it supports most of the methods described in Navigating tree and Searching tree. The beautiful is Most of the time, it is treated tag object.
Syntax
soup.name # '[document]'
Example of BeautifulSoup Object.
from bs4 import BeautifulSoup soup = BeautifulSoup("<h2 id='message'>Baise engineer</h2>") type(soup) <class 'bs4.BeautifulSoup'> soup.name '[document]'
Comments
The comment part of the web document. Tag, NavigableString and BeautifulSoup cover all most everything.
Example of Comments
soup = BeautifulSoup('<p><!-- Comments hello student --></p>') comment = soup.p.string type(comment) <class 'bs4.element.Comment'> type(comment) <class 'bs4.element.Comment'> print(soup.p.prettify()) <p> <!-- Comments hello student --> </p>
If you have any queries regarding this article or if I have missed something on this topic, please feel free to add in the comment down below for the audience. See you guys in another article.
To know more about Beautiful soup Library Function please Wikipedia Click here
Stay Connected Stay Safe, Thank you
0 Comments