Web scraping with PyScript

This is my first day looking at PyScript and I have a question that will determine if I investigate further. At first glance, it looks very promising.

I have a Beautiful Soup application. Can I run it (with suitable modification) in PyScript?

I know I can’t use Selenium since it depends on WebDriver. I assume I can trigger click events on button and option DOM elements.

Depending on the answer to this I may have more specific questions later.

Thanks, Tom

1 Like

You surely can use Beautiful Soup in PyScript, probably without much if any modification of the bs4 code. For example, here’s a (slapdash) snippet of code that you can run in a py-script tag that prints out the tree of tags from the current page. (You’ll need to add beautifulsoup4 to the page’s <py-env> tag):

from bs4 import BeautifulSoup

from js import document
from pyodide.http import open_url

#Use the follow to get another page's content synchronously;
#otherwise we will use the current page
#page_html = open_url('hello_world.html')

page_html = document.documentElement.innerHTML
soup = BeautifulSoup(page_html, 'html.parser')

def print_self_and_children(tag, indent = 0):
    print("_" * indent + str(tag.name))
    if hasattr(tag, 'children'):
        for child in tag.children:
            if hasattr(child, 'name') and child.name is not None: print_self_and_children(child, indent = indent + 2)

print_self_and_children(soup)

image

As you say, Selenium won’t work inside a browser environment, but you can use DOM Selectors and various interaction methods (click(), option.selected, etc) to test interaction if need be.