Breaking Into The Industry: My Journey to Becoming a Software Engineer #10 — Web Scraping

M E
3 min readJan 22, 2022

--

Have you ever given a task that requires repetitive data entry? How often did you make a mistake inputting the data? Data entry can be a tedious task and prone to errors. But worry no more — Beautiful Soup and Selenium to the rescue (put that ‘S’ on my chest).

First, let’s talk about Beautiful Soup (soup sounds good right about now). Beautiful Soup is basically a Python module that parses the HTML code on a webpage that you could use to find the relevant HTML element to extract data.

Now what about Selenium? Doesn’t Selenium sound like a rock band? I think I would name my band Selenium, if I’d ever have one. Anyways, what is Selenium? It is basically the most popular automation and testing tools for web developers. How does Selenium differ to Beautiful Soup? Well, the major difference between Beautiful Soup and Selenium, is that Beautiful Soup cannot enter any information manually onto the browser nor is able to click on any buttons of the browser. And that is where Selenium comes in — Selenium webdriver can basically do what humans can do on a browser such as click any buttons, type text, scroll up and down the webpage and so forth.

“Knowledge is Power” — Francis Bacon

I love that quote above so much that I want to use my Web Scraping knowledge to help with a little data entry task. This project basically uses Beautiful Soup to scrape Zillow for the preferred choice(s) of a potential home buyer or somebody that’s just looking to rent. Browsing for houses on the web could be a daunting task not just for homebuyers but for real estate agents. But before scraping the Zillow website, we first need information such as preferred city, price, number of bedrooms and etc.

Enter your own preferences: city, price, rental/for sales, number of bedrooms and etc.

Once you have that information, copy the URL and paste onto your code and observe how the website is structured. Once you have an idea of what HTML elements need to be parsed in order to gather information, then add them to your code.

Now once we get a hold of the HTML elements, make a list of the prices, address and links of the properties for rent/for sale (whichever you prefer).

Different methods to return a list of prices, addresses and links of the rental/for sale properties.

Before we move on, we must use Google Forms to create a form that displays the results of our web scraping.

Now, this is where we use Selenium to manually fill in the form using the results that we gathered from our Beautiful Soup. And once all the Beautiful Soup data has been filled, we can covert the Google Form into a Spreadsheet.

Voila!!! The rest of the code can be found on my GitHub account. :)

--

--