Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files is knows as Beautiful Soup. As a BeautifulSoup is not a standard python library. Beautiful Soup 3 was the official release of Beautiful Soup from May 2006 to March 2012.

Kinds of objects in Beautiful Soup

  • Tag
  • NavigableString
  • BeautifulSoup
  • Comments

HTML Tree Structure

HTML Tree Structure
HTML Tree Structure

Tag Object

A HTML tag is use to define various types of content. A tag object in BeautifulSoup corresponds to an HTML tag in the actual page or document is knows as Tag Object.

HTML Tags contain Many attributes and methods and two important features of a tag are following :-

  • attributes
  • name

Name tag

Every tag has a name, accessible as .name is knows as name tag.

Syntax

tag.name
#'html'

Example of name tag

tag.name = "Strong"
tag
<Strong><body><b class="boldest">Basic engineer</b></body></Strong>
tag.name
'Strong

Attributes

A attributes tag object can have any number of attributes. The tag <b class=”boldest”> has an attribute ‘class’ whose value is “boldest”.

Syntax

tag.attrs
# {'id': 'boldest'

Example of Attributes

what['class'] = 'Basic engineer'
what['style'] = '2022'
what
<div class="Basic engineer" style="2022"></div>
del what['style']
what
<div class="Basic engineer"></div>
del why['class']
why
<b SecondAttribute="2">hello student</b>

del why['SecondAttribute']
why
</b>
what['class']
'Basic engineer'
what['style']
KeyError: 'style'

Multiple Value attributes

Some of the HTML-5 attributes can have multiple values. Most commonly use is the class-attribute in multiple CSS-values . Others include ‘rel’, ‘rev’, ‘headers’, ‘accesskey’ and ‘accept-charset’.

Example of Multiple Value attributes

from bs4 import BeautifulSoup
css_soup = BeautifulSoup('<p class="body"></p>')
css_soup.p['class']
['body']
css_soup = BeautifulSoup('<p class="body bold"></p>')
css_soup.p['class']
['body', 'bold']

NavigableString

A NavigableString is just like Python Unicode string. The navigablestring object is use to represent the contents of a tag. The access the contents “.string” with tag.

Example of NavigableString

from bs4 import BeautifulSoup
soup = BeautifulSoup("<h2 id='message'>Basic engineer</h2>")
soup.string
'Basic engineer'
type(soup.string)

BeautifulSoup

The BeautifulSoup is represents the parse document as a whole. it supports most of the methods described in Navigating tree and Searching tree. The beautiful is Most of the time, it is treated tag object.

Syntax

soup.name
# '[document]'

Example of BeautifulSoup Object.

from bs4 import BeautifulSoup
soup = BeautifulSoup("<h2 id='message'>Baise engineer</h2>")
type(soup)
<class 'bs4.BeautifulSoup'>
soup.name
'[document]'

Comments

The comment part of the web document. Tag, NavigableString and BeautifulSoup cover all most everything.

Example of Comments

soup = BeautifulSoup('<p><!-- Comments hello student --></p>')
comment = soup.p.string
type(comment)
<class 'bs4.element.Comment'>
type(comment)
<class 'bs4.element.Comment'>
print(soup.p.prettify())
<p>
<!-- Comments hello student -->
</p>

If you have any queries regarding this article or if I have missed something on this topic, please feel free to add in the comment down below for the audience. See you guys in another article.

To know more about Beautiful soup Library Function please Wikipedia Click here

Stay Connected Stay Safe, Thank you


Basic Engineer

Hey Readers! We have more than fifteen years of experience in Software Development, IoT, Telecom, Banking, Finance and Embedded domain. Currently we are actively working on Data Science, ML and AI with multiple market leaders worldwide. Happy Reading. Cheers!

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *