Web scraping with PyScript

TomElliott · July 29, 2022, 10:32pm

This is my first day looking at PyScript and I have a question that will determine if I investigate further. At first glance, it looks very promising.

I have a Beautiful Soup application. Can I run it (with suitable modification) in PyScript?

I know I can’t use Selenium since it depends on WebDriver. I assume I can trigger click events on button and option DOM elements.

Depending on the answer to this I may have more specific questions later.

Thanks, Tom

JeffGlass · August 4, 2022, 12:09am

You surely can use Beautiful Soup in PyScript, probably without much if any modification of the bs4 code. For example, here’s a (slapdash) snippet of code that you can run in a py-script tag that prints out the tree of tags from the current page. (You’ll need to add beautifulsoup4 to the page’s <py-env> tag):

from bs4 import BeautifulSoup

from js import document
from pyodide.http import open_url

#Use the follow to get another page's content synchronously;
#otherwise we will use the current page
#page_html = open_url('hello_world.html')

page_html = document.documentElement.innerHTML
soup = BeautifulSoup(page_html, 'html.parser')

def print_self_and_children(tag, indent = 0):
    print("_" * indent + str(tag.name))
    if hasattr(tag, 'children'):
        for child in tag.children:
            if hasattr(child, 'name') and child.name is not None: print_self_and_children(child, indent = indent + 2)

print_self_and_children(soup)

As you say, Selenium won’t work inside a browser environment, but you can use DOM Selectors and various interaction methods (click(), option.selected, etc) to test interaction if need be.

aaronbellwebdev · August 16, 2022, 4:43am

I am trying to scrape via Pyscript as well, primarily because I am trying to create an application where the scraper is using existing logins / authentications so I don’t have to create any manual steps where the user will have to manually enter their credentials upon a loaded webpage via Selenium.

It does however seem like this is the only option as I need to automate the download of a chromedriver via something like this:

Automatic download of appropriate chromedriver for Selenium in Python - Stack Overflow

Once this happens, Selenium can do its thing.

I believe this will require the following packages:

requests
wget
zipfile
os
selenium

Is this possible with Pyscript?

Thank you for the help!

JeffGlass · August 16, 2022, 12:12pm

Sadly, neither requests nor selenium will work within a browser window - that is to say, running with PyScript. Requests relies significantly on the ssl package, and sockets are not available within a browser window. And Selenium relies on being able to instantiate an instance of the browser itself (headlessly or in a window), which a browser environment won’t permit you to do.

TomElliott · August 19, 2022, 8:59pm

Thanks for the suggestions. I have determined that I can do what I want with PyScript.

I’m now asking for guidance on using PyScrip in browser extensions, accessing the DOM for the current page.

Since this topic is not restricted to web scraping, perhaps I should start a new thread?

tokindangm · November 5, 2023, 3:32am

Hello Jeff, I am trying to get http request to get the html before using beautiful soup but I keep running to this error. I also followed you code but i am still lost. please help.

raceback (most recent call last):
File “/lib/python3.10/_pyodide/_base.py”, line 460, in eval_code
.run(globals, locals)
File “/lib/python3.10/_pyodide/_base.py”, line 306, in run
coroutine = eval(self.code, globals, locals)
File “”, line 9, in
File “/lib/python3.10/site-packages/requests/api.py”, line 73, in get
return request(“get”, url, params=params, **kwargs)
File “/lib/python3.10/site-packages/requests/api.py”, line 59, in request
return session.request(method=method, url=url, **kwargs)
File “/lib/python3.10/site-packages/requests/sessions.py”, line 589, in request
resp = self.send(prep, **send_kwargs)
File “/lib/python3.10/site-packages/requests/sessions.py”, line 703, in send
r = adapter.send(request, **kwargs)
File “/lib/python3.10/site-packages/pyodide_http/_requests.py”, line 42, in send
resp = send(pyodide_request, stream)
File “/lib/python3.10/site-packages/pyodide_http/_core.py”, line 113, in send
xhr.send(to_js(request.body))
pyodide.JsException: NetworkError: Failed to execute ‘send’ on ‘XMLHttpRequest’: Failed to load 'https://www

Topic		Replies	Views
Basic Scraper - help please :) PyScript	0	344	November 6, 2023
How to use selenium with pyscript PyScript	5	2499	May 19, 2022
How to get/edit a html tag with pyscript? PyScript	4	2134	June 3, 2022
I think PyScript should come out with an IDE plugin first to make coding easy. PyScript	4	546	February 10, 2023
Is it possible to run a definition from python with a button? PyScript	4	804	May 11, 2022

Web scraping with PyScript

Related topics