5. Web Scraping Using BeautifulSoup#

5.2. How Websites Prevent You From Scraping#

This discussion follows the excellent overview by a Stack Overflow and GitHub contributor with the username JonasCz (I wish I knew this user’s real name!) on how to prevent web scraping.

To understand the restrictions and challenges you will encounter when scraping data, put yourself in the position of a website’s owner:

If you own and maintain a website, there are many reasons why you might want to prevent web scraping bots from accessing the data on your website. Maybe the bots will overload the traffic to your site and make it impossible for your website to work as you intend. You might be running a business through this website and sharing the data in mass transfers would undercut your business. For whatever reason, you are now faced with a challenge: how to you prevent automated scraping of the data on your webpage while still allowing individual customers to view your website?

Web scraping will require issuing HTTP requests to a particular web address with a tool like requests, sometimes many times in a short period. Every HTTP request is logged by the server that receives the request, and these logs contain the IP address of the entity making the request. If too many requests are made by the same IP address, the server can block that IP address. The coding logic to automatically identify and block overactive IP addresses is simple, so many websites include these security measures. Some blocks are temporary, placing a rate limit on these requests to slow down the scrapers, and some blocks reroute scrapers through a CAPTCHA (which stands for “Completely Automated Test to Tell Computers and Humans Apart”) to prevent robots like a scraper from accessing the website. JonasCz recommends that these security measures look at other factors as well: the speed of actions on the website, the amount of data requested, and other factors that can identify a user when the IP address is masked.

Stronger gates, such as making users register for a username and password with email confirmation to use your website, are effective against scraping bots. But they also turn away individuals who wouldn’t want to jump through those hoops. Saving all text as images on your server will prevent bots from accessing the text very easily, but it makes the website harder to use and violates regulations that protect people with disabilities.

Instead, JonasCz recommends building your website in a way that never reveals the entirety of the data you own, and never reveals the private API endpoints you use to display the data. Also, web scrapers are fragile: they are built to pull data from the specific HTML structure of a particular website. Changing the HTML code frequently or using different versions of the code based on geographic location will break the scrapers that are built for that code. JonasCz also suggests adding “honeypot” links to the HTML code that will not be displayed to legitimate users but will be followed by scrapers that recursively follow links, and taking action against the agents that follow these links: block their IP addresses, require a CAPTCHA, or deliver fake data.

One important piece of information in a request is the user agent header (which we discuss in more detail below). JonasCz recommends looking at this information and blocking requests when the user agent is blank or matches information from agents that have previously been identified as malicious bots.

Understanding the steps you would take to protect your data from bots if you owned a website, you should have greater insight into why a web scraping endeavor may fail. Your web scraper might not be malicious, but might still violate the rules that the website owner setup to guard against bots. These rules are usually listed explicitly in a file on the server, usually called robots.txt. Some tips for reading and understanding a robots.txt file are here: https://www.promptcloud.com/blog/how-to-read-and-respect-robots-file/

For example, in this document we will be scraping data on the playlist of a radio station from https://spinitron.com/. This website has a robots.txt file here: https://spinitron.com/robots.txt, which reads:

User-agent: *
Crawl-delay: 10
Request-rate: 1/10

The User-agent: * line tells us that the next two lines apply to all user agent strings. Crawl-delay: 10 places a limit on the frequency with which our scraper can make a request from this website. In this case, individual requests must be made 10 second apart. Request-rate: 1/10 tells us that our scraper is only allowed to access one page every 10 seconds, and that we are not allowed to make requests from more than one page at the same time.

5.3. Using requests with a User Agent Header#

As the articles by James Densmore and JonasCz described, requests are much more likely to get blocked by websites if the request does not specify a header that contains a user agent. An HTTP header is a parameter that gets sent along with the HTTP request that contains metadata about the request. A user agent header contains contact and identification information about the person making the request. If there is any issue with your web scraper, you want to give the website owner a chance to contact you directly about that problem. If you do not feel comfortable being contacted by the website’s owner, you should reconsider whether you should be scraping that website.

Fortunately, it is straightforward to include headers in a GET request using requests: just use the headers argument. First, we import the relevant libraries:

import numpy as np
import pandas as pd
import requests

In module 4 we issued GET requests from the Wikipedia API as an example.

r = requests.get("https://en.wikipedia.org/w/api.php")
r
<Response [200]>

To add a user agent string, I use the following code:

headers = {'user-agent': 'Kropko class example (jkropko@virginia.edu)'}
r = requests.get("https://en.wikipedia.org/w/api.php", headers = headers)
r
<Response [200]>

What information needs to go into a user agent header? Different resources have different information about that. According to Amazon Web Services, a user agent should identify your application, its version number, and programming language. So a user agent should look like this:

headers = {'user-agent': 'Kropko class example version 1.0 (jkropko@virginia.edu) (Language=Python 3.8.2; Platform=Mac OSX 10.15.5)'}
r = requests.get("https://en.wikipedia.org/w/api.php", headers = headers)
r
<Response [200]>

Including a user agent is not hard, and it goes a long way towards alleviating the anxieties that website owners have about dealing with your web scraping code. It is a good practice to cultivate into a habit.

5.4. Using BeautifulSoup() (Example: WNRN, Charlottesville’s Legendary Radio Station)#

WNRN is a legendary radio station, and it’s based right here in Charlottesville at 91.9 FM (and streaming online at www.wnrn.org). It’s commercial-free, with only a few interruptions for local nonprofits to tell you about cool things happening in town. They play a mix of new and classic alternative rock and R&B. They emphasize music for bands coming to play at local venues. And they play the Grateful Dead on Saturday mornings. You should be listening to WNRN!

The playlist of the songs that WNRN has played in the last few hours is here: https://spinitron.com/WNRN/. I want to scrape the data off this website. I also want to scrape the data off of the additional playlists that this website links to, to collect as much data as possible. Our goal in this example is to create a dataframe of each song WNRN has played, the artist, the album, and the time each song was played.

The process involves four steps:

  1. Download the raw text of the HTML code for the website we want to scrape using the requests library.

  2. Use the BeautifulSoup() function from the bs4 library to parse the raw text so that Python can understand, search through, and operate on the HTML tags from string.

  3. Use methods associated with BeautifulSoup() to extract the data we need from the HTML code.

  4. Place the data into a pandas data frame.

5.4.1. Downloading and Understanding Raw HTML#

For this example, I first download the HTML that exists on https://spinitron.com/WNRN using the requests.get() function. To be ethical and to help this website’s owners know that I am not a malicious actor, I also specify a user agent string.

url = "https://spinitron.com/WNRN"
headers = {'user-agent': 'Kropko class example (jkropko@virginia.edu)'}
r = requests.get(url, headers=headers)
r
<Response [200]>

The raw HTML code contains a series of text fragments that look like this,

<tag attribute="value"> Navigable string </tag>

where tag, attribute, "value", and Navigable string are replaced by specific parameters and data that control the content and presentation of the webpage that gets displayed in a web browser. For example, here are the first 1000 characters of the raw text from WNRN’s playlist:

print(r.text[0:1000])
<!doctype html><html lang="en">
<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1,maximum-scale=1">
    <title>WNRN – Independent Music Radio</title>
    <meta name="description" content="A member-supported, independent music radio station broadcasting from the Blue Ridge to the Bay across Virginia—Richmond, Hampton Roads, Roanoke, Charlottesville, Lynchburg, Nelson County, Williamsburg, and The Shenandoah Valley.">

                <meta name="csrf-param" content="_csrf">
<meta name="csrf-token" content="xe2EMhC914W386uzJixiQ_82yB6sj2UI-vGeqZsXYTuylOVBJomQ2va2yPhPSQsVk3__J9XgUT3NlKmd-Vk0Aw==">

    <meta property="og:url" content="/WNRN/">
<meta property="og:title" content="WNRN – Independent Music Radio">
<meta property="og:description" content="A member-supported, independent music radio station broadcasting from the Blue Ridge to the Bay across Virginia—Richmond, Hampto

Tags specify how the data contained within the page are organized and how the visual elements on this page should look. Tags are designated by opening and closing angle braces, < and >. In the HTML code displayed above, there are tags named

  • <html>, which tells browsers that the following code is written in HTML,

  • <meta>, which defines metadata in the document that help govern how the output shold be displayed in the browser,

  • <title>, which sets the title of the document, and

  • <link>, which pulls data or images from external resources for later use.

To see what other HTML tags do, look at the list on https://www.w3schools.com/TAGs/.

In some cases the tag operates on the text that immediately follows, and a closing tag </tag> frames the text that gets operated on by the tag. The text in between the opening and closing tag is called the navigable string. For example, the tag <title>WNRN Independent Music Radio</title> specifies that “WNRN – Independent Music Radio”, and only this string, is the title.

Some tags have attributes, which are arguments listed inside an opening tag to modify the behavior of that tag or to attach relevant data to the tag. The first <html> tag listed above contains an attribute lang with a value "en" that specifies that this document contains HTML code in English.

5.4.2. Parsing Raw HTML Using BeautifulSoup()#

The requests.get() function only downloads the raw text of the HTML code, but it does not yet understand the logic and organization of the HTML code. Getting Python to register text as a particular coding standard is called parsing the code. We’ve parsed code into Python before with JSON data. We used requests.get() to download the JSON formatted data, but we needed json.loads() to parse the data in order to be able to navigate the branches of the JSON tree.

There are two widely used Python libraries for parsing HTML data: bs4 which contains the BeautifulSoup() function, and selenium. BeautifulSoup() works with raw text, but cannot access websites themselves (we use requests.get() for that). In order to access the data on a website, the data needs to be visible in the raw HTML that requests.get() returns. If there are measures taken by a website to hide that data, possibly by calling server-side Javascript to populate data fields, or by saving data as image files, then we won’t be able to access the data with an HTML parser. selenium has more features to extract more complicated data and circumvent anti-scraping measures, such as taking a screenshot of the webpage in a browser and using optical character recognition (OCR) to pull data directly from the image. However, selenium requires each request to be loaded in a web browser, so it can be quite a bit slower than BeautifulSoup(). If you are interested in learning how to use selenium, see this guide: https://selenium-python.readthedocs.io/. Here we will be using BeautifulSoup().

First I import the BeautifulSoup() function:

from bs4 import BeautifulSoup

To use it, we pass the .text attribute of the requests.get() output from https://spinitron.com/WNRN to BeautifulSoup() (which I saved as r.text above). This function can parse either HTML or XML code, so the second argument should specify HTML:

wnrn = BeautifulSoup(r.text, 'html')

Now that the https://spinitron.com/WNRN source code is registered as HTML code in Python, we can begin executing commands to navigate the organizational structure of the code and extract data.

5.4.3. Searching for HTML Tags and Extracting Data#

While HTML is a coding language, it does not force coders to follow very strict templates. There’s a lot of flexibility and creativity possible for HTML programmers, and as such, there is no one universal method for extracting data from HTML. The best approach is to open a browser window, navigate to the webpage you want to scrape, and “view page source”. (Different web browsers have different ways to do that. On Mozilla Firefox, right click somewhere on the page other than an active link, and “view page source” should be an option.) The source will display the raw HTML code that generates the page. You will need to search through this code to find examples of the data points you intend to collect, possibly using control+F to search for specific values. Once you find the data you need, make note of the tags that surround the data and use the tools we will describe next to extract the data.

The parsable HTML BeautifulSoup() output, wnrn, has important methods and attributes that we will use to extract the data we want. First, we can use the name of a tag as an attribute to extract the first occurrence of that tag. Here we extract the first <meta> tag:

metatag = wnrn.meta
metatag
<meta charset="utf-8"/>

This tag stores its attributes as a list, so we can extract the value of an attribute by calling the name of that attribute as follows:

metatag['charset']
'utf-8'

If a tag has a navigable string, we can extract that with the .string attribute of a particular tag. For example, to extract the title, we start with the <title> tag:

titletag = wnrn.title
titletag
<title>WNRN – Independent Music Radio</title>

Then we extract the title as follows:

titletag.string
'WNRN – Independent Music Radio'

Our goal in this example is to extract the artist, song, album, and time played for every song played on WNRN. I look in the raw HTML source code for the first instance of an artist. These data are contained in the <span> tags:

spantag = wnrn.span
spantag
<span class="artist">Leslie Odom Jr.</span>

Calling one tag is not especially useful, because we generally want to extract all of the relevant data on a page. For that, we can use the .find_next() and .find_all() methods, both of which are very literal. The next <span> tag in the HTML code contains the song associated with the artist:

spantag.find_next()
<span class="song">Winter Song</span>

And the next occurrence of <span> contains the album name (under "release"):

spantag.find_next().find_next()
<div class="info"><span class="release">Live</span></div>

To find all occurrences of the <span> tag, organized in a list, use .find_all() and provide the tag as the argument:

spanlist = wnrn.find_all("span")
spanlist
[<span class="artist">Leslie Odom Jr.</span>,
 <span class="song">Winter Song</span>,
 <span class="release">Live</span>,
 <span class="artist">R.E.M.</span>,
 <span class="song">South Central Rain</span>,
 <span class="release">Reckoning</span>,
 <span class="artist">Hinds</span>,
 <span class="song">Bats</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Billy Strings</span>,
 <span class="song">Gild the Lily</span>,
 <span class="release">Highway Prayers</span>,
 <span class="artist">Sonic Youth</span>,
 <span class="song">Sunday</span>,
 <span class="release">A Thousand Leaves</span>,
 <span class="artist">Oracle Sisters</span>,
 <span class="song">Alouette</span>,
 <span class="release">Divinations</span>,
 <span class="artist">Little Feat</span>,
 <span class="song">Dixie Chicken</span>,
 <span class="release">Dixie Chicken</span>,
 <span class="artist">Deep Sea Diver</span>,
 <span class="song">Billboard Heart</span>,
 <span class="release">Billboard Heart</span>,
 <span class="artist">Lady Blackbird</span>,
 <span class="song">Like A Woman</span>,
 <span class="release">Slang Spirituals</span>,
 <span class="artist">Dave Matthews Band</span>,
 <span class="song">Where Are You Going</span>,
 <span class="release">Busted Stuff</span>,
 <span class="artist">The Heavy Heavy</span>,
 <span class="song">Feel</span>,
 <span class="release">One of a Kind</span>,
 <span class="artist">Palmyra</span>,
 <span class="song">Fried</span>,
 <span class="release">Surprise No. 1 EP</span>,
 <span class="artist">Natalie Merchant</span>,
 <span class="song">Carnival</span>,
 <span class="release">Tigerlily</span>,
 <span class="artist">Hozier</span>,
 <span class="song">Nobody's Soldier</span>,
 <span class="release">Unaired</span>,
 <span class="artist">Chuck Prophet</span>,
 <span class="song">First Came The Thunder</span>,
 <span class="release">Wake the Dead</span>,
 <span class="artist">Olivia Wolf</span>,
 <span class="song">Cosmic Appalachian Radio</span>,
 <span class="release">Silver Rounds</span>,
 <span class="artist">Bob Marley &amp; The Wailers</span>,
 <span class="song">I Shot the Sheriff</span>,
 <span class="release">Burnin'</span>,
 <span class="artist">Alabama Shakes</span>,
 <span class="song">Hang Loose</span>,
 <span class="release">Boys And Girls</span>,
 <span class="artist">Royel Otis</span>,
 <span class="song">If Our Love Is Dead</span>,
 <span class="release">Pratts &amp; Pain</span>,
 <span class="artist">flipturn</span>,
 <span class="song">Rodeo Clown</span>,
 <span class="release">Burnout Days</span>,
 <span class="artist">Chamomile &amp; Whiskey</span>,
 <span class="song">Gone</span>,
 <span class="release">Sweet Afton</span>,
 <span class="artist">Franz Ferdinand</span>,
 <span class="song">Night Or Day</span>,
 <span class="release">The Human Fear</span>,
 <span class="artist">Jim Lauderdale</span>,
 <span class="song">Don't You Treat 'Em That Way</span>,
 <span class="release">My Favorite Place</span>,
 <span class="artist">The National</span>,
 <span class="song">Fake Empire</span>,
 <span class="release">The Boxer</span>,
 <span class="artist">MRCY</span>,
 <span class="song">R.L.M.</span>,
 <span class="release">VOLUME 1</span>,
 <span class="artist">The Cure</span>,
 <span class="song">A Fragile Thing</span>,
 <span class="release">Songs Of A Lost World</span>,
 <span class="artist">Devon Gilfillian</span>,
 <span class="song">The Good Life</span>,
 <span class="release">Black Hole Rainbow</span>,
 <span class="artist">Beach House</span>,
 <span class="song">Space Song</span>,
 <span class="release">Depression Cherry</span>,
 <span class="artist">Oh He Dead f/ The Honeynut Horns</span>,
 <span class="song">Tell Me</span>,
 <span class="release">Ugly</span>,
 <span class="artist">Peach Pit</span>,
 <span class="song">Magpie</span>,
 <span class="release">Magpie</span>,
 <span class="artist">Andrew Bird</span>,
 <span class="song">Make a Picture</span>,
 <span class="release">Inside Problems</span>,
 <span class="artist">Maggie Rogers</span>,
 <span class="song">In The Living Room</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Joy Oladokun</span>,
 <span class="song">DUST/DIVINITY</span>,
 <span class="release">Observations From A Crowded Room</span>,
 <span class="artist">James Blake</span>,
 <span class="song">Retrograde</span>,
 <span class="release">Overgrown</span>,
 <span class="artist">Daughter of Swords</span>,
 <span class="song">Alone Together</span>,
 <span class="release">Cardinals At The Window</span>,
 <span class="artist">The Head &amp; The Heart</span>,
 <span class="song">Arrow</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Nick Cave &amp; The Bad Seeds</span>,
 <span class="song">Red Right Hand</span>,
 <span class="release">Let Love In</span>,
 <span class="artist">JD McPherson</span>,
 <span class="song">I Can't Go Anywhere With You</span>,
 <span class="release">Nite Owls</span>,
 <span class="artist">Camille Yarbrough</span>,
 <span class="song">Take Yo' Praise</span>,
 <span class="release">The Iron Pot Cooker</span>,
 <span class="artist">Fruition</span>,
 <span class="song">Labor of Love</span>,
 <span class="release">Labor of Love</span>,
 <span class="artist">twen</span>,
 <span class="song">Infinite Sky</span>,
 <span class="release">Infinite Sky EP</span>,
 <span class="artist">Tom Petty &amp; The Heartbreakers</span>,
 <span class="song">Never Be You</span>,
 <span class="release">Long After Dark (Deluxe Edition)</span>,
 <span class="artist">Peter, Bjorn, and John</span>,
 <span class="song">Young Folks</span>,
 <span class="release">Writer's Block</span>,
 <span class="artist">The Smile</span>,
 <span class="song">No Words</span>,
 <span class="release">Cutouts</span>,
 <span class="artist">Sharon Van Etten &amp; The Attachment Theory</span>,
 <span class="song">Afterlife</span>,
 <span class="release">Sharon Van Etten &amp; The Attachment Theory</span>,
 <span class="artist">Sleepwalkers</span>,
 <span class="song">Until the Night Is Gone</span>,
 <span class="release">(Single)</span>,
 <span class="artist">The Dare</span>,
 <span class="song">All Night</span>,
 <span class="release">What's Wrong With New York?</span>,
 <span class="artist">The Devil Makes Three</span>,
 <span class="song">Spirits</span>,
 <span class="release">Spirits</span>,
 <span class="artist">Edie Brickell &amp; The New Bohemians</span>,
 <span class="song">What I Am</span>,
 <span class="release">Shooting Rubberbands at the Stars</span>,
 <span class="artist">Mt. Joy</span>,
 <span class="song">She Wants To Go Dancing</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Tyler Meacham</span>,
 <span class="song">Dream House</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Gary Clark Jr.</span>,
 <span class="song">Bright Lights</span>,
 <span class="release">Black &amp; Blu</span>,
 <span class="artist">Humbird</span>,
 <span class="song">Blueberry Bog</span>,
 <span class="release">Right On</span>,
 <span class="artist">Jenny Owen Youngs</span>,
 <span class="song">Someone's Ex</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Lyle Lovett</span>,
 <span class="song">If I Had a Boat</span>,
 <span class="release">Pontiac</span>,
 <span class="artist">St. Vincent</span>,
 <span class="song">Cheerleader</span>,
 <span class="release">Strange Mercy</span>,
 <span class="artist">Angie McMahon</span>,
 <span class="song">Untangling</span>,
 <span class="release">Light Sides EP</span>,
 <span class="artist">Johnny Blue Skies</span>,
 <span class="song">Mint Tea</span>,
 <span class="release">Passage Du Desir</span>,
 <span class="artist">Sarah Shook &amp; The Disarmers</span>,
 <span class="song">Revelations</span>,
 <span class="release">REVELATIONS</span>,
 <span class="artist">Ray LaMontagne</span>,
 <span class="song">And They Call Her California</span>,
 <span class="release">Long Way Home</span>,
 <span class="artist">Jontavious Willis</span>,
 <span class="song">Keep Your Worries On the Dance Floor</span>,
 <span class="release">West Georgia Blues</span>,
 <span class="artist">Eels</span>,
 <span class="song">Novocaine for the Soul</span>,
 <span class="release">Beautiful Freak</span>,
 <span class="artist">Wye Oak</span>,
 <span class="song">No Good Reason</span>,
 <span class="release">Cardinals At The Window</span>,
 <span class="artist">Birdtalker</span>,
 <span class="song">Season Of Charade</span>,
 <span class="release">All Means, No End</span>,
 <span class="artist">Yeah Yeah Yeahs f/ Perfume Genius</span>,
 <span class="song">Spitting Off The Edge of the World</span>,
 <span class="release">Cool It Down</span>,
 <span class="artist">Mavis Staples</span>,
 <span class="song">Worthy</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Coldplay</span>,
 <span class="song">Every Teardrop is a Waterfall</span>,
 <span class="release">Mylo Xyloto</span>,
 <span class="artist">Wallice</span>,
 <span class="song">I Want You Yesterday</span>,
 <span class="release">The Jester</span>,
 <span class="artist">Julien Baker &amp; TORRES</span>,
 <span class="song">Sugar In The Tank</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Beck</span>,
 <span class="song">Blue Moon</span>,
 <span class="release">Morning Phase</span>,
 <span class="artist">Michael Kiwanuka</span>,
 <span class="song">The Rest Of Me</span>,
 <span class="release">Small Changes</span>,
 <span class="artist">Waxahatchee</span>,
 <span class="song">Much Ado About Nothing</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Rob Cheatham &amp; Co.</span>,
 <span class="song">It's Hard On Your Mind</span>,
 <span class="release">Painting Self Portraits</span>,
 <span class="artist">Momma</span>,
 <span class="song">Ohio All The Time</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Illiterate Light</span>,
 <span class="song">Payphone</span>,
 <span class="release">Arches</span>,
 <span class="artist">Kacey Musgraves</span>,
 <span class="song">Follow Your Arrow</span>,
 <span class="release">Same Trailer Different Park</span>,
 <span class="artist">Oh He Dead</span>,
 <span class="song">Strange Love</span>,
 <span class="release">Ugly</span>,
 <span class="artist">Father John Misty</span>,
 <span class="song">She Cleans Up</span>,
 <span class="release">Mahashmashana</span>,
 <span class="artist">Laura Marling</span>,
 <span class="song">Where Can I Go</span>,
 <span class="release">Once I Was An Eagle</span>,
 <span class="artist">Gigi Perez</span>,
 <span class="song">Sailor Song</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Nick Lowe &amp; Los Straitjackets</span>,
 <span class="song">Jet Pac Boomerang</span>,
 <span class="release">Indoor Safari</span>,
 <span class="artist">Baby Rose f/ BADBADNOTGOOD</span>,
 <span class="song">Weekness</span>,
 <span class="release">Slow Burn EP</span>,
 <span class="artist">U2</span>,
 <span class="song">Picture Of You (X+W)</span>,
 <span class="release">How To Re-Assemble An Atomic Bomb</span>,
 <span class="artist">Fontaines D.C.</span>,
 <span class="song">Favourite</span>,
 <span class="release">Romance</span>,
 <span class="artist">Bonnie Raitt</span>,
 <span class="song">Made Up Mind</span>,
 <span class="release">Just like That</span>,
 <span class="artist">The Vices</span>,
 <span class="song">Before It Might Be Gone</span>,
 <span class="release">Before It Might Be Gone</span>,
 <span class="artist">Oracle Sisters</span>,
 <span class="song">Alouette</span>,
 <span class="release">Divinations</span>,
 <span class="artist">Caamp</span>,
 <span class="song">Peach Fuzz</span>,
 <span class="release">By and By</span>,
 <span class="artist">Bon Iver</span>,
 <span class="song">S P E Y S I D E</span>,
 <span class="release">SABLE EP</span>,
 <span class="artist">The Weather Station</span>,
 <span class="song">Neon Signs</span>,
 <span class="release">Humanhood</span>,
 <span class="artist">The Rolling Stones</span>,
 <span class="song">Wild Horses</span>,
 <span class="release">Sticky Fingers</span>,
 <span class="artist">Blondshell</span>,
 <span class="song">What's Fair</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Freedy Johnston</span>,
 <span class="song">Bad Reputation</span>,
 <span class="release">This Perfect World</span>,
 <span class="artist">Dwight Yoakam</span>,
 <span class="song">Wide Open Heart</span>,
 <span class="release">Brighter Days</span>,
 <span class="artist">SubT</span>,
 <span class="song">Unearthly</span>,
 <span class="release">Spring Skin EP</span>,
 <span class="artist">Paul Thorn</span>,
 <span class="song">Tough Times Don't Last</span>,
 <span class="release">Life Is Just A Vapor</span>,
 <span class="artist">Fancy Gap</span>,
 <span class="song">Starlight Motel</span>,
 <span class="release">Cardinals At The Window</span>,
 <span class="artist">Lady Blackbird</span>,
 <span class="song">Like A Woman</span>,
 <span class="release">Slang Spirituals</span>,
 <span class="artist">Adam's Plastic Pond</span>,
 <span class="song">Mission Report</span>,
 <span class="release">Confident Melancholy</span>,
 <span class="artist">Cautious Clay</span>,
 <span class="song">Puffer</span>,
 <span class="release">Thin Ice on the Cake</span>,
 <span class="artist">Trampled By Turtles f/ LeAnn Rimes</span>,
 <span class="song">Out Of Time</span>,
 <span class="release">Always Here/Always Now</span>,
 <span class="artist">Tunde Adebimpe</span>,
 <span class="song">Magnetic</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Talking Heads</span>,
 <span class="song">Burning Down the House</span>,
 <span class="release">Speaking in Tongues</span>,
 <span class="artist">Ringo Starr</span>,
 <span class="song">Thankful</span>,
 <span class="release">Look Up</span>,
 <span class="artist">Wonder Women of Country</span>,
 <span class="song">Another Broken Heart</span>,
 <span class="release">Willis, Carper, Leigh</span>,
 <span class="artist">Kasey Chambers</span>,
 <span class="song">Broken Cup</span>,
 <span class="release">Backbone</span>,
 <span class="artist">Joan Baez</span>,
 <span class="song">Diamonds and Rust</span>,
 <span class="release">Diamonds and Rust</span>,
 <span class="artist">Sean McConnell</span>,
 <span class="song">Here We Go</span>,
 <span class="release">Secondhand Smoke</span>,
 <span class="artist">The Cure</span>,
 <span class="song">A Fragile Thing</span>,
 <span class="release">Songs Of A Lost World</span>,
 <span class="artist">Paul Kelly</span>,
 <span class="song">All Those Smiling Faces</span>,
 <span class="release">Fever Longing Still</span>,
 <span class="artist">MJ Lenderman</span>,
 <span class="song">She's Leaving You</span>,
 <span class="release">Manning Fireworks</span>,
 <span class="artist">Maribou State f/ Holly Walker</span>,
 <span class="song">Otherside</span>,
 <span class="release">Hallucinating Love</span>,
 <span class="artist">Maggie Rogers</span>,
 <span class="song">In The Living Room</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Courtney Marie Andrews</span>,
 <span class="song">Kindness of Strangers</span>,
 <span class="release">May Your Kindness Remain</span>,
 <span class="artist">David Gray f/ Talia Rae</span>,
 <span class="song">Plus &amp; Minus</span>,
 <span class="release">Dear Life</span>,
 <span class="artist">Johnny Blue Skies</span>,
 <span class="song">If The Sun Never Rises Again</span>,
 <span class="release">Passage Du Desir</span>,
 <span class="artist">Beth Orton</span>,
 <span class="song">Stolen Car</span>,
 <span class="release">Central Reservation</span>,
 <span class="artist">Gold Connections</span>,
 <span class="song">Fool's Gold</span>,
 <span class="release">Fortune</span>,
 <span class="artist">Joan Baez</span>,
 <span class="song">Donna, Donna</span>,
 <span class="release">Joan Baez</span>,
 <span class="artist">Mipso</span>,
 <span class="song">Cornfields</span>,
 <span class="release">Cardinals At The Window</span>,
 <span class="artist">Reckless Kelly</span>,
 <span class="song">Romantic Disaster</span>,
 <span class="release">The Last Frontier</span>,
 <span class="artist">Vampire Weekend</span>,
 <span class="song">Harmony Hall</span>,
 <span class="release">Father of the Bride</span>,
 <span class="artist">Billy Strings</span>,
 <span class="song">Gild the Lily</span>,
 <span class="release">Highway Prayers</span>,
 <span class="artist">The Head &amp; The Heart</span>,
 <span class="song">Arrow</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Neko Case</span>,
 <span class="song">Night Still Comes</span>,
 <span class="release">The Worse Things Get, the Harder I Fight, the Harder I Fight..</span>,
 <span class="artist">49 Winchester</span>,
 <span class="song">Miles to Go</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Deep Sea Diver</span>,
 <span class="song">Billboard Heart</span>,
 <span class="release">Billboard Heart</span>,
 <span class="artist">Beabadoobee</span>,
 <span class="song">Talk</span>,
 <span class="release">Beatopia</span>,
 <span class="artist">Willie Nelson</span>,
 <span class="song">Keep Me In Your Heart</span>,
 <span class="release">Last Leaf On The Tree</span>,
 <span class="artist">Sharon Van Etten &amp; The Attachment Theory</span>,
 <span class="song">Afterlife</span>,
 <span class="release">Sharon Van Etten &amp; The Attachment Theory</span>,
 <span class="artist">Sam Burchfield &amp; The Scoundrels</span>,
 <span class="song">The Ridge</span>,
 <span class="release">Me and My Religion</span>,
 <span class="artist">Jason Scott &amp; the High Heat</span>,
 <span class="song">If We Make It Till the Mornin'</span>,
 <span class="release">High Country Heat</span>,
 <span class="artist">Sharon Jones &amp; the Dap-Kings</span>,
 <span class="song">I Learned The Hard Way</span>,
 <span class="release">I Learned The Hard Way</span>,
 <span class="artist">Saya Gray</span>,
 <span class="song">Shell (Of A Man)</span>,
 <span class="release">SAYA</span>,
 <span class="artist">The Heavy Heavy</span>,
 <span class="song">Feel</span>,
 <span class="release">One of a Kind</span>,
 <span class="artist">Norah Jones</span>,
 <span class="song">Running</span>,
 <span class="release">Visions</span>,
 <span class="artist">Dave Alvin &amp; Jimmie Dale Gilmore</span>,
 <span class="song">Why I'm Walking</span>,
 <span class="release">Texicali</span>,
 <span class="artist">Mt. Joy</span>,
 <span class="song">She Wants To Go Dancing</span>,
 <span class="release">(Single)</span>,
 <span class="artist">Big Thief</span>,
 <span class="song">Shark Smile</span>,
 <span class="release">Capacity</span>,
 <span class="artist">Chuck Prophet</span>,
 <span class="song">First Came The Thunder</span>,
 <span class="release">Wake the Dead</span>,
 <span class="artist">Angie McMahon</span>,
 <span class="song">Untangling</span>,
 <span class="release">Light Sides EP</span>,
 <span class="artist">Anderson East</span>,
 <span class="song">Satisfy Me</span>,
 <span class="release">Delilah</span>,
 <span class="artist">Royel Otis</span>,
 <span class="song">If Our Love Is Dead</span>,
 <span class="release">Pratts &amp; Pain</span>,
 <span class="artist">Maggie Rose</span>,
 <span class="song">Under The Sun</span>,
 <span class="release">No One Gets Out Alive</span>,
 <span class="artist">Peter Case</span>,
 <span class="song">Crooked Mile</span>,
 <span class="release">Full Service No Waiting</span>]

Notice that the HTML source code distinguishes between the three types of datapoint with different class values. To limit this list to just the artists, we can specify the "artist" class as a second argument of .find_all():

artistlist = wnrn.find_all("span", "artist")
artistlist
[<span class="artist">Leslie Odom Jr.</span>,
 <span class="artist">R.E.M.</span>,
 <span class="artist">Hinds</span>,
 <span class="artist">Billy Strings</span>,
 <span class="artist">Sonic Youth</span>,
 <span class="artist">Oracle Sisters</span>,
 <span class="artist">Little Feat</span>,
 <span class="artist">Deep Sea Diver</span>,
 <span class="artist">Lady Blackbird</span>,
 <span class="artist">Dave Matthews Band</span>,
 <span class="artist">The Heavy Heavy</span>,
 <span class="artist">Palmyra</span>,
 <span class="artist">Natalie Merchant</span>,
 <span class="artist">Hozier</span>,
 <span class="artist">Chuck Prophet</span>,
 <span class="artist">Olivia Wolf</span>,
 <span class="artist">Bob Marley &amp; The Wailers</span>,
 <span class="artist">Alabama Shakes</span>,
 <span class="artist">Royel Otis</span>,
 <span class="artist">flipturn</span>,
 <span class="artist">Chamomile &amp; Whiskey</span>,
 <span class="artist">Franz Ferdinand</span>,
 <span class="artist">Jim Lauderdale</span>,
 <span class="artist">The National</span>,
 <span class="artist">MRCY</span>,
 <span class="artist">The Cure</span>,
 <span class="artist">Devon Gilfillian</span>,
 <span class="artist">Beach House</span>,
 <span class="artist">Oh He Dead f/ The Honeynut Horns</span>,
 <span class="artist">Peach Pit</span>,
 <span class="artist">Andrew Bird</span>,
 <span class="artist">Maggie Rogers</span>,
 <span class="artist">Joy Oladokun</span>,
 <span class="artist">James Blake</span>,
 <span class="artist">Daughter of Swords</span>,
 <span class="artist">The Head &amp; The Heart</span>,
 <span class="artist">Nick Cave &amp; The Bad Seeds</span>,
 <span class="artist">JD McPherson</span>,
 <span class="artist">Camille Yarbrough</span>,
 <span class="artist">Fruition</span>,
 <span class="artist">twen</span>,
 <span class="artist">Tom Petty &amp; The Heartbreakers</span>,
 <span class="artist">Peter, Bjorn, and John</span>,
 <span class="artist">The Smile</span>,
 <span class="artist">Sharon Van Etten &amp; The Attachment Theory</span>,
 <span class="artist">Sleepwalkers</span>,
 <span class="artist">The Dare</span>,
 <span class="artist">The Devil Makes Three</span>,
 <span class="artist">Edie Brickell &amp; The New Bohemians</span>,
 <span class="artist">Mt. Joy</span>,
 <span class="artist">Tyler Meacham</span>,
 <span class="artist">Gary Clark Jr.</span>,
 <span class="artist">Humbird</span>,
 <span class="artist">Jenny Owen Youngs</span>,
 <span class="artist">Lyle Lovett</span>,
 <span class="artist">St. Vincent</span>,
 <span class="artist">Angie McMahon</span>,
 <span class="artist">Johnny Blue Skies</span>,
 <span class="artist">Sarah Shook &amp; The Disarmers</span>,
 <span class="artist">Ray LaMontagne</span>,
 <span class="artist">Jontavious Willis</span>,
 <span class="artist">Eels</span>,
 <span class="artist">Wye Oak</span>,
 <span class="artist">Birdtalker</span>,
 <span class="artist">Yeah Yeah Yeahs f/ Perfume Genius</span>,
 <span class="artist">Mavis Staples</span>,
 <span class="artist">Coldplay</span>,
 <span class="artist">Wallice</span>,
 <span class="artist">Julien Baker &amp; TORRES</span>,
 <span class="artist">Beck</span>,
 <span class="artist">Michael Kiwanuka</span>,
 <span class="artist">Waxahatchee</span>,
 <span class="artist">Rob Cheatham &amp; Co.</span>,
 <span class="artist">Momma</span>,
 <span class="artist">Illiterate Light</span>,
 <span class="artist">Kacey Musgraves</span>,
 <span class="artist">Oh He Dead</span>,
 <span class="artist">Father John Misty</span>,
 <span class="artist">Laura Marling</span>,
 <span class="artist">Gigi Perez</span>,
 <span class="artist">Nick Lowe &amp; Los Straitjackets</span>,
 <span class="artist">Baby Rose f/ BADBADNOTGOOD</span>,
 <span class="artist">U2</span>,
 <span class="artist">Fontaines D.C.</span>,
 <span class="artist">Bonnie Raitt</span>,
 <span class="artist">The Vices</span>,
 <span class="artist">Oracle Sisters</span>,
 <span class="artist">Caamp</span>,
 <span class="artist">Bon Iver</span>,
 <span class="artist">The Weather Station</span>,
 <span class="artist">The Rolling Stones</span>,
 <span class="artist">Blondshell</span>,
 <span class="artist">Freedy Johnston</span>,
 <span class="artist">Dwight Yoakam</span>,
 <span class="artist">SubT</span>,
 <span class="artist">Paul Thorn</span>,
 <span class="artist">Fancy Gap</span>,
 <span class="artist">Lady Blackbird</span>,
 <span class="artist">Adam's Plastic Pond</span>,
 <span class="artist">Cautious Clay</span>,
 <span class="artist">Trampled By Turtles f/ LeAnn Rimes</span>,
 <span class="artist">Tunde Adebimpe</span>,
 <span class="artist">Talking Heads</span>,
 <span class="artist">Ringo Starr</span>,
 <span class="artist">Wonder Women of Country</span>,
 <span class="artist">Kasey Chambers</span>,
 <span class="artist">Joan Baez</span>,
 <span class="artist">Sean McConnell</span>,
 <span class="artist">The Cure</span>,
 <span class="artist">Paul Kelly</span>,
 <span class="artist">MJ Lenderman</span>,
 <span class="artist">Maribou State f/ Holly Walker</span>,
 <span class="artist">Maggie Rogers</span>,
 <span class="artist">Courtney Marie Andrews</span>,
 <span class="artist">David Gray f/ Talia Rae</span>,
 <span class="artist">Johnny Blue Skies</span>,
 <span class="artist">Beth Orton</span>,
 <span class="artist">Gold Connections</span>,
 <span class="artist">Joan Baez</span>,
 <span class="artist">Mipso</span>,
 <span class="artist">Reckless Kelly</span>,
 <span class="artist">Vampire Weekend</span>,
 <span class="artist">Billy Strings</span>,
 <span class="artist">The Head &amp; The Heart</span>,
 <span class="artist">Neko Case</span>,
 <span class="artist">49 Winchester</span>,
 <span class="artist">Deep Sea Diver</span>,
 <span class="artist">Beabadoobee</span>,
 <span class="artist">Willie Nelson</span>,
 <span class="artist">Sharon Van Etten &amp; The Attachment Theory</span>,
 <span class="artist">Sam Burchfield &amp; The Scoundrels</span>,
 <span class="artist">Jason Scott &amp; the High Heat</span>,
 <span class="artist">Sharon Jones &amp; the Dap-Kings</span>,
 <span class="artist">Saya Gray</span>,
 <span class="artist">The Heavy Heavy</span>,
 <span class="artist">Norah Jones</span>,
 <span class="artist">Dave Alvin &amp; Jimmie Dale Gilmore</span>,
 <span class="artist">Mt. Joy</span>,
 <span class="artist">Big Thief</span>,
 <span class="artist">Chuck Prophet</span>,
 <span class="artist">Angie McMahon</span>,
 <span class="artist">Anderson East</span>,
 <span class="artist">Royel Otis</span>,
 <span class="artist">Maggie Rose</span>,
 <span class="artist">Peter Case</span>]

Likewise we can create lists of the songs:

songlist = wnrn.find_all("span", "song")
songlist
[<span class="song">Winter Song</span>,
 <span class="song">South Central Rain</span>,
 <span class="song">Bats</span>,
 <span class="song">Gild the Lily</span>,
 <span class="song">Sunday</span>,
 <span class="song">Alouette</span>,
 <span class="song">Dixie Chicken</span>,
 <span class="song">Billboard Heart</span>,
 <span class="song">Like A Woman</span>,
 <span class="song">Where Are You Going</span>,
 <span class="song">Feel</span>,
 <span class="song">Fried</span>,
 <span class="song">Carnival</span>,
 <span class="song">Nobody's Soldier</span>,
 <span class="song">First Came The Thunder</span>,
 <span class="song">Cosmic Appalachian Radio</span>,
 <span class="song">I Shot the Sheriff</span>,
 <span class="song">Hang Loose</span>,
 <span class="song">If Our Love Is Dead</span>,
 <span class="song">Rodeo Clown</span>,
 <span class="song">Gone</span>,
 <span class="song">Night Or Day</span>,
 <span class="song">Don't You Treat 'Em That Way</span>,
 <span class="song">Fake Empire</span>,
 <span class="song">R.L.M.</span>,
 <span class="song">A Fragile Thing</span>,
 <span class="song">The Good Life</span>,
 <span class="song">Space Song</span>,
 <span class="song">Tell Me</span>,
 <span class="song">Magpie</span>,
 <span class="song">Make a Picture</span>,
 <span class="song">In The Living Room</span>,
 <span class="song">DUST/DIVINITY</span>,
 <span class="song">Retrograde</span>,
 <span class="song">Alone Together</span>,
 <span class="song">Arrow</span>,
 <span class="song">Red Right Hand</span>,
 <span class="song">I Can't Go Anywhere With You</span>,
 <span class="song">Take Yo' Praise</span>,
 <span class="song">Labor of Love</span>,
 <span class="song">Infinite Sky</span>,
 <span class="song">Never Be You</span>,
 <span class="song">Young Folks</span>,
 <span class="song">No Words</span>,
 <span class="song">Afterlife</span>,
 <span class="song">Until the Night Is Gone</span>,
 <span class="song">All Night</span>,
 <span class="song">Spirits</span>,
 <span class="song">What I Am</span>,
 <span class="song">She Wants To Go Dancing</span>,
 <span class="song">Dream House</span>,
 <span class="song">Bright Lights</span>,
 <span class="song">Blueberry Bog</span>,
 <span class="song">Someone's Ex</span>,
 <span class="song">If I Had a Boat</span>,
 <span class="song">Cheerleader</span>,
 <span class="song">Untangling</span>,
 <span class="song">Mint Tea</span>,
 <span class="song">Revelations</span>,
 <span class="song">And They Call Her California</span>,
 <span class="song">Keep Your Worries On the Dance Floor</span>,
 <span class="song">Novocaine for the Soul</span>,
 <span class="song">No Good Reason</span>,
 <span class="song">Season Of Charade</span>,
 <span class="song">Spitting Off The Edge of the World</span>,
 <span class="song">Worthy</span>,
 <span class="song">Every Teardrop is a Waterfall</span>,
 <span class="song">I Want You Yesterday</span>,
 <span class="song">Sugar In The Tank</span>,
 <span class="song">Blue Moon</span>,
 <span class="song">The Rest Of Me</span>,
 <span class="song">Much Ado About Nothing</span>,
 <span class="song">It's Hard On Your Mind</span>,
 <span class="song">Ohio All The Time</span>,
 <span class="song">Payphone</span>,
 <span class="song">Follow Your Arrow</span>,
 <span class="song">Strange Love</span>,
 <span class="song">She Cleans Up</span>,
 <span class="song">Where Can I Go</span>,
 <span class="song">Sailor Song</span>,
 <span class="song">Jet Pac Boomerang</span>,
 <span class="song">Weekness</span>,
 <span class="song">Picture Of You (X+W)</span>,
 <span class="song">Favourite</span>,
 <span class="song">Made Up Mind</span>,
 <span class="song">Before It Might Be Gone</span>,
 <span class="song">Alouette</span>,
 <span class="song">Peach Fuzz</span>,
 <span class="song">S P E Y S I D E</span>,
 <span class="song">Neon Signs</span>,
 <span class="song">Wild Horses</span>,
 <span class="song">What's Fair</span>,
 <span class="song">Bad Reputation</span>,
 <span class="song">Wide Open Heart</span>,
 <span class="song">Unearthly</span>,
 <span class="song">Tough Times Don't Last</span>,
 <span class="song">Starlight Motel</span>,
 <span class="song">Like A Woman</span>,
 <span class="song">Mission Report</span>,
 <span class="song">Puffer</span>,
 <span class="song">Out Of Time</span>,
 <span class="song">Magnetic</span>,
 <span class="song">Burning Down the House</span>,
 <span class="song">Thankful</span>,
 <span class="song">Another Broken Heart</span>,
 <span class="song">Broken Cup</span>,
 <span class="song">Diamonds and Rust</span>,
 <span class="song">Here We Go</span>,
 <span class="song">A Fragile Thing</span>,
 <span class="song">All Those Smiling Faces</span>,
 <span class="song">She's Leaving You</span>,
 <span class="song">Otherside</span>,
 <span class="song">In The Living Room</span>,
 <span class="song">Kindness of Strangers</span>,
 <span class="song">Plus &amp; Minus</span>,
 <span class="song">If The Sun Never Rises Again</span>,
 <span class="song">Stolen Car</span>,
 <span class="song">Fool's Gold</span>,
 <span class="song">Donna, Donna</span>,
 <span class="song">Cornfields</span>,
 <span class="song">Romantic Disaster</span>,
 <span class="song">Harmony Hall</span>,
 <span class="song">Gild the Lily</span>,
 <span class="song">Arrow</span>,
 <span class="song">Night Still Comes</span>,
 <span class="song">Miles to Go</span>,
 <span class="song">Billboard Heart</span>,
 <span class="song">Talk</span>,
 <span class="song">Keep Me In Your Heart</span>,
 <span class="song">Afterlife</span>,
 <span class="song">The Ridge</span>,
 <span class="song">If We Make It Till the Mornin'</span>,
 <span class="song">I Learned The Hard Way</span>,
 <span class="song">Shell (Of A Man)</span>,
 <span class="song">Feel</span>,
 <span class="song">Running</span>,
 <span class="song">Why I'm Walking</span>,
 <span class="song">She Wants To Go Dancing</span>,
 <span class="song">Shark Smile</span>,
 <span class="song">First Came The Thunder</span>,
 <span class="song">Untangling</span>,
 <span class="song">Satisfy Me</span>,
 <span class="song">If Our Love Is Dead</span>,
 <span class="song">Under The Sun</span>,
 <span class="song">Crooked Mile</span>]

And a list for the albums:

albumlist = wnrn.find_all("span", "release")
albumlist
[<span class="release">Live</span>,
 <span class="release">Reckoning</span>,
 <span class="release">(Single)</span>,
 <span class="release">Highway Prayers</span>,
 <span class="release">A Thousand Leaves</span>,
 <span class="release">Divinations</span>,
 <span class="release">Dixie Chicken</span>,
 <span class="release">Billboard Heart</span>,
 <span class="release">Slang Spirituals</span>,
 <span class="release">Busted Stuff</span>,
 <span class="release">One of a Kind</span>,
 <span class="release">Surprise No. 1 EP</span>,
 <span class="release">Tigerlily</span>,
 <span class="release">Unaired</span>,
 <span class="release">Wake the Dead</span>,
 <span class="release">Silver Rounds</span>,
 <span class="release">Burnin'</span>,
 <span class="release">Boys And Girls</span>,
 <span class="release">Pratts &amp; Pain</span>,
 <span class="release">Burnout Days</span>,
 <span class="release">Sweet Afton</span>,
 <span class="release">The Human Fear</span>,
 <span class="release">My Favorite Place</span>,
 <span class="release">The Boxer</span>,
 <span class="release">VOLUME 1</span>,
 <span class="release">Songs Of A Lost World</span>,
 <span class="release">Black Hole Rainbow</span>,
 <span class="release">Depression Cherry</span>,
 <span class="release">Ugly</span>,
 <span class="release">Magpie</span>,
 <span class="release">Inside Problems</span>,
 <span class="release">(Single)</span>,
 <span class="release">Observations From A Crowded Room</span>,
 <span class="release">Overgrown</span>,
 <span class="release">Cardinals At The Window</span>,
 <span class="release">(Single)</span>,
 <span class="release">Let Love In</span>,
 <span class="release">Nite Owls</span>,
 <span class="release">The Iron Pot Cooker</span>,
 <span class="release">Labor of Love</span>,
 <span class="release">Infinite Sky EP</span>,
 <span class="release">Long After Dark (Deluxe Edition)</span>,
 <span class="release">Writer's Block</span>,
 <span class="release">Cutouts</span>,
 <span class="release">Sharon Van Etten &amp; The Attachment Theory</span>,
 <span class="release">(Single)</span>,
 <span class="release">What's Wrong With New York?</span>,
 <span class="release">Spirits</span>,
 <span class="release">Shooting Rubberbands at the Stars</span>,
 <span class="release">(Single)</span>,
 <span class="release">(Single)</span>,
 <span class="release">Black &amp; Blu</span>,
 <span class="release">Right On</span>,
 <span class="release">(Single)</span>,
 <span class="release">Pontiac</span>,
 <span class="release">Strange Mercy</span>,
 <span class="release">Light Sides EP</span>,
 <span class="release">Passage Du Desir</span>,
 <span class="release">REVELATIONS</span>,
 <span class="release">Long Way Home</span>,
 <span class="release">West Georgia Blues</span>,
 <span class="release">Beautiful Freak</span>,
 <span class="release">Cardinals At The Window</span>,
 <span class="release">All Means, No End</span>,
 <span class="release">Cool It Down</span>,
 <span class="release">(Single)</span>,
 <span class="release">Mylo Xyloto</span>,
 <span class="release">The Jester</span>,
 <span class="release">(Single)</span>,
 <span class="release">Morning Phase</span>,
 <span class="release">Small Changes</span>,
 <span class="release">(Single)</span>,
 <span class="release">Painting Self Portraits</span>,
 <span class="release">(Single)</span>,
 <span class="release">Arches</span>,
 <span class="release">Same Trailer Different Park</span>,
 <span class="release">Ugly</span>,
 <span class="release">Mahashmashana</span>,
 <span class="release">Once I Was An Eagle</span>,
 <span class="release">(Single)</span>,
 <span class="release">Indoor Safari</span>,
 <span class="release">Slow Burn EP</span>,
 <span class="release">How To Re-Assemble An Atomic Bomb</span>,
 <span class="release">Romance</span>,
 <span class="release">Just like That</span>,
 <span class="release">Before It Might Be Gone</span>,
 <span class="release">Divinations</span>,
 <span class="release">By and By</span>,
 <span class="release">SABLE EP</span>,
 <span class="release">Humanhood</span>,
 <span class="release">Sticky Fingers</span>,
 <span class="release">(Single)</span>,
 <span class="release">This Perfect World</span>,
 <span class="release">Brighter Days</span>,
 <span class="release">Spring Skin EP</span>,
 <span class="release">Life Is Just A Vapor</span>,
 <span class="release">Cardinals At The Window</span>,
 <span class="release">Slang Spirituals</span>,
 <span class="release">Confident Melancholy</span>,
 <span class="release">Thin Ice on the Cake</span>,
 <span class="release">Always Here/Always Now</span>,
 <span class="release">(Single)</span>,
 <span class="release">Speaking in Tongues</span>,
 <span class="release">Look Up</span>,
 <span class="release">Willis, Carper, Leigh</span>,
 <span class="release">Backbone</span>,
 <span class="release">Diamonds and Rust</span>,
 <span class="release">Secondhand Smoke</span>,
 <span class="release">Songs Of A Lost World</span>,
 <span class="release">Fever Longing Still</span>,
 <span class="release">Manning Fireworks</span>,
 <span class="release">Hallucinating Love</span>,
 <span class="release">(Single)</span>,
 <span class="release">May Your Kindness Remain</span>,
 <span class="release">Dear Life</span>,
 <span class="release">Passage Du Desir</span>,
 <span class="release">Central Reservation</span>,
 <span class="release">Fortune</span>,
 <span class="release">Joan Baez</span>,
 <span class="release">Cardinals At The Window</span>,
 <span class="release">The Last Frontier</span>,
 <span class="release">Father of the Bride</span>,
 <span class="release">Highway Prayers</span>,
 <span class="release">(Single)</span>,
 <span class="release">The Worse Things Get, the Harder I Fight, the Harder I Fight..</span>,
 <span class="release">(Single)</span>,
 <span class="release">Billboard Heart</span>,
 <span class="release">Beatopia</span>,
 <span class="release">Last Leaf On The Tree</span>,
 <span class="release">Sharon Van Etten &amp; The Attachment Theory</span>,
 <span class="release">Me and My Religion</span>,
 <span class="release">High Country Heat</span>,
 <span class="release">I Learned The Hard Way</span>,
 <span class="release">SAYA</span>,
 <span class="release">One of a Kind</span>,
 <span class="release">Visions</span>,
 <span class="release">Texicali</span>,
 <span class="release">(Single)</span>,
 <span class="release">Capacity</span>,
 <span class="release">Wake the Dead</span>,
 <span class="release">Light Sides EP</span>,
 <span class="release">Delilah</span>,
 <span class="release">Pratts &amp; Pain</span>,
 <span class="release">No One Gets Out Alive</span>,
 <span class="release">Full Service No Waiting</span>]

Finally, we want to also extract the times each song was played. I look at the HTML code and find an example of the play time. These times are stored in the <td> tag with class="spin-time". I create a list of these times:

timelist = wnrn.find_all("td", "spin-time")
timelist
[<td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402017283">5:00 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402016938">4:55 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402016658">4:52 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402016191">4:45 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402016000">4:42 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402015779">4:38 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402015148">4:28 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402014932">4:25 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402014566">4:19 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402014344">4:15 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402014129">4:11 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402013719">4:05 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402013474">4:01 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402013128">3:57 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402012629">3:48 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402012351">3:44 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402012053">3:39 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402011874">3:36 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402011688">3:33 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402011346">3:28 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402011104">3:24 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402010911">3:20 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402010644">3:16 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402010410">3:12 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402010232">3:09 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402009892">3:03 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402009599">3:00 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402009277">2:54 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402009066">2:51 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402008742">2:45 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402008546">2:42 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402008244">2:37 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402007894">2:31 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402007670">2:27 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402007516">2:24 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402007229">2:19 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402006901">2:13 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402006724">2:10 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402006384">2:04 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402005841">1:55 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402005628">1:52 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402005252">1:45 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402004953">1:40 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402004700">1:36 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402004235">1:29 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402004027">1:26 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402003794">1:22 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402003452">1:16 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402003100">1:11 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402002928">1:08 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402002511">1:01 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402002104">12:55 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402001957">12:53 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402001595">12:47 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402001397">12:44 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402001127">12:40 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402000872">12:36 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402000485">12:30 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402000290">12:27 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=402000046">12:23 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401999733">12:19 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401999539">12:16 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401999274">12:12 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401998850">12:05 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401998606">12:01 PM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401998382">11:57 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401998134">11:53 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401997977">11:50 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401997549">11:44 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401997306">11:40 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401997140">11:37 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401996795">11:31 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401996558">11:27 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401996285">11:24 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401995836">11:18 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401995645">11:15 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401995477">11:12 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401995110">11:05 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401994565">10:57 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401994332">10:54 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401993991">10:48 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401993765">10:44 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401993495">10:39 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401993132">10:33 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401992923">10:29 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401992737">10:26 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401992439">10:20 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401992205">10:16 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401992025">10:13 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401991694">10:07 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401991387">10:01 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401991089">9:57 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401990843">9:53 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401990610">9:49 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401990363">9:44 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401990123">9:41 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401989904">9:36 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401989578">9:30 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401989335">9:26 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401989213">9:23 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401989044">9:20 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401988825">9:16 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401988599">9:11 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401988424">9:08 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401987915">8:59 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401987706">8:56 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401987450">8:51 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401986907">8:42 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401986694">8:38 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401986406">8:32 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401986141">8:27 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401985959">8:23 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401985619">8:17 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401985416">8:13 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401985232">8:10 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401984920">8:04 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401984717">8:00 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401984341">7:55 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401984156">7:52 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401983749">7:45 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401983608">7:42 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401983334">7:38 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401983054">7:33 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401982717">7:27 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401982506">7:23 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401982272">7:19 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401981962">7:14 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401981810">7:11 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401981611">7:07 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401981283">7:02 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401981054">6:59 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401980806">6:55 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401980599">6:51 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401980372">6:48 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401980066">6:43 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401979881">6:39 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401979668">6:36 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401979368">6:30 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401979159">6:27 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401978714">6:20 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401978403">6:15 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401978269">6:12 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401978059">6:09 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401977728">6:03 AM</a></td>,
 <td class="spin-time"><a href="/WNRN/pl/20052768/WNRN?sp=401977446">6:00 AM</a></td>]

Sometimes the information we need exists in a particular tag, but only when a specific attribute is present. For example, in the WNRN playlist HTML there are many <a> tags, but only some of those tags include a title attribute. To extract all of the <a> tags with a title attribute, specify title=True in the call to .find_all():

atags_title = wnrn.find_all("a", title=True)
print(atags_title[0:5]) # just show the first 6 elements
[<a class="buy-link" data-vendor="apple" href="#" target="_blank" title='View "Leslie Odom Jr. - Winter Song" on Apple'><div alt='View "Leslie Odom Jr. - Winter Song" on Apple' class="buy-icon buy-icon-apple"></div></a>, <a class="buy-link" data-vendor="amazon" href="#" target="_blank" title='View "Leslie Odom Jr. - Winter Song" on Amazon'><div alt='View "Leslie Odom Jr. - Winter Song" on Amazon' class="buy-icon buy-icon-amazon"></div></a>, <a class="buy-link" data-vendor="spotify" href="#" target="_blank" title='View "Leslie Odom Jr. - Winter Song" on Spotify'><div alt='View "Leslie Odom Jr. - Winter Song" on Spotify' class="buy-icon buy-icon-spotify"></div></a>, <a class="buy-link" data-vendor="apple" href="#" target="_blank" title='View "R.E.M. - South Central Rain" on Apple'><div alt='View "R.E.M. - South Central Rain" on Apple' class="buy-icon buy-icon-apple"></div></a>, <a class="buy-link" data-vendor="amazon" href="#" target="_blank" title='View "R.E.M. - South Central Rain" on Amazon'><div alt='View "R.E.M. - South Central Rain" on Amazon' class="buy-icon buy-icon-amazon"></div></a>]

5.4.4. Constructing a Data Frame from HTML Data#

Next we need to place these data into a clean data frame. For that, we will need to keep the valid data while dropping the HTML tags. We stored the tags with the artists, songs, albums, and times in separate lists. Every name is stored as a navigable string in the HTML tags, so to extract these names we need to loop across the elements of the list. The simplest loop for this task is called a list comprehension, which has the following syntax:

newlist = [ expression for item in oldlist if condition ]

In this syntax, we are creating a new list by iteratively performing operations on the elements of an existing list (oldlist). item is a token that we will use to represent one item of the existing list. expression is the same Python code we would use on a single element of the existing list, except we replace the name of the element with the token defined with item. Finally condition is an optional part of this code which sets a filter by which only certain elements of the old list are transformed and placed into the new list (there’s an example of conditioning in a comprehension loop in the section on spiders).

For example, to extract the navigable string from every element of artistlist, we can set item to a, expression to a.string, and list to artistlist:

artists = [a.string for a in artistlist]
artists
['Leslie Odom Jr.',
 'R.E.M.',
 'Hinds',
 'Billy Strings',
 'Sonic Youth',
 'Oracle Sisters',
 'Little Feat',
 'Deep Sea Diver',
 'Lady Blackbird',
 'Dave Matthews Band',
 'The Heavy Heavy',
 'Palmyra',
 'Natalie Merchant',
 'Hozier',
 'Chuck Prophet',
 'Olivia Wolf',
 'Bob Marley & The Wailers',
 'Alabama Shakes',
 'Royel Otis',
 'flipturn',
 'Chamomile & Whiskey',
 'Franz Ferdinand',
 'Jim Lauderdale',
 'The National',
 'MRCY',
 'The Cure',
 'Devon Gilfillian',
 'Beach House',
 'Oh He Dead f/ The Honeynut Horns',
 'Peach Pit',
 'Andrew Bird',
 'Maggie Rogers',
 'Joy Oladokun',
 'James Blake',
 'Daughter of Swords',
 'The Head & The Heart',
 'Nick Cave & The Bad Seeds',
 'JD McPherson',
 'Camille Yarbrough',
 'Fruition',
 'twen',
 'Tom Petty & The Heartbreakers',
 'Peter, Bjorn, and John',
 'The Smile',
 'Sharon Van Etten & The Attachment Theory',
 'Sleepwalkers',
 'The Dare',
 'The Devil Makes Three',
 'Edie Brickell & The New Bohemians',
 'Mt. Joy',
 'Tyler Meacham',
 'Gary Clark Jr.',
 'Humbird',
 'Jenny Owen Youngs',
 'Lyle Lovett',
 'St. Vincent',
 'Angie McMahon',
 'Johnny Blue Skies',
 'Sarah Shook & The Disarmers',
 'Ray LaMontagne',
 'Jontavious Willis',
 'Eels',
 'Wye Oak',
 'Birdtalker',
 'Yeah Yeah Yeahs f/ Perfume Genius',
 'Mavis Staples',
 'Coldplay',
 'Wallice',
 'Julien Baker & TORRES',
 'Beck',
 'Michael Kiwanuka',
 'Waxahatchee',
 'Rob Cheatham & Co.',
 'Momma',
 'Illiterate Light',
 'Kacey Musgraves',
 'Oh He Dead',
 'Father John Misty',
 'Laura Marling',
 'Gigi Perez',
 'Nick Lowe & Los Straitjackets',
 'Baby Rose f/ BADBADNOTGOOD',
 'U2',
 'Fontaines D.C.',
 'Bonnie Raitt',
 'The Vices',
 'Oracle Sisters',
 'Caamp',
 'Bon Iver',
 'The Weather Station',
 'The Rolling Stones',
 'Blondshell',
 'Freedy Johnston',
 'Dwight Yoakam',
 'SubT',
 'Paul Thorn',
 'Fancy Gap',
 'Lady Blackbird',
 "Adam's Plastic Pond",
 'Cautious Clay',
 'Trampled By Turtles f/ LeAnn Rimes',
 'Tunde Adebimpe',
 'Talking Heads',
 'Ringo Starr',
 'Wonder Women of Country',
 'Kasey Chambers',
 'Joan Baez',
 'Sean McConnell',
 'The Cure',
 'Paul Kelly',
 'MJ Lenderman',
 'Maribou State f/ Holly Walker',
 'Maggie Rogers',
 'Courtney Marie Andrews',
 'David Gray f/ Talia Rae',
 'Johnny Blue Skies',
 'Beth Orton',
 'Gold Connections',
 'Joan Baez',
 'Mipso',
 'Reckless Kelly',
 'Vampire Weekend',
 'Billy Strings',
 'The Head & The Heart',
 'Neko Case',
 '49 Winchester',
 'Deep Sea Diver',
 'Beabadoobee',
 'Willie Nelson',
 'Sharon Van Etten & The Attachment Theory',
 'Sam Burchfield & The Scoundrels',
 'Jason Scott & the High Heat',
 'Sharon Jones & the Dap-Kings',
 'Saya Gray',
 'The Heavy Heavy',
 'Norah Jones',
 'Dave Alvin & Jimmie Dale Gilmore',
 'Mt. Joy',
 'Big Thief',
 'Chuck Prophet',
 'Angie McMahon',
 'Anderson East',
 'Royel Otis',
 'Maggie Rose',
 'Peter Case']

Likewise, we extract the navigable strings for the songs, albums, and times:

songs = [a.string for a in songlist]
albums = [a.string for a in albumlist]
times = [a.string for a in timelist]

Finally, to construct a clean data frame, we create a dictionary that combines these lists and passes this dictionary to the pd.DataFrame() function:

mydict = {'time':times,
          'artist':artists,
         'song':songs,
         'album':albums}
wnrn_df = pd.DataFrame(mydict)
wnrn_df
time artist song album
0 5:00 PM Leslie Odom Jr. Winter Song Live
1 4:55 PM R.E.M. South Central Rain Reckoning
2 4:52 PM Hinds Bats (Single)
3 4:45 PM Billy Strings Gild the Lily Highway Prayers
4 4:42 PM Sonic Youth Sunday A Thousand Leaves
... ... ... ... ...
140 6:15 AM Angie McMahon Untangling Light Sides EP
141 6:12 AM Anderson East Satisfy Me Delilah
142 6:09 AM Royel Otis If Our Love Is Dead Pratts & Pain
143 6:03 AM Maggie Rose Under The Sun No One Gets Out Alive
144 6:00 AM Peter Case Crooked Mile Full Service No Waiting

145 rows × 4 columns

5.5. Building a Spider#

At the bottom of the WNRN playlist on https://spinitron.com/WNRN/ there are links to older song playlists. Let’s extend our example by building a spider to capture the data that exists on these links as well. A spider is a web scraper that follows links on a page automatically and scrapes from those links as well.

I look at the page source for these links, and find that they are contained in a <div class="recent-playlists"> tag. I start by finding this tag. As there’s only one occurrence, I can use .find() instead of .find_all():

recent = wnrn.find("div", "recent-playlists")
recent
<div class="recent-playlists">
<h4>Recent</h4>
<div class="grid-view" id="w2"><div class="summary"></div>
<table class="table table-bordered table-narrow"><tbody>
<tr data-key="0"><td class="show-time">5:00 AM</td><td></td><td><strong><a href="/WNRN/pl/20052657/WNRN-1-9-25-5-01-AM">WNRN 1/9/25, 5:01 AM</a></strong> with <a href="/WNRN/dj/104061/WNRN">WNRN</a></td></tr>
<tr data-key="1"><td class="show-time">4:00 AM</td><td></td><td><strong><a href="/WNRN/pl/20052550/WNRN-1-9-25-4-03-AM">WNRN 1/9/25, 4:03 AM</a></strong> with <a href="/WNRN/dj/104061/WNRN">WNRN</a></td></tr>
<tr data-key="2"><td class="show-time">8:00 PM</td><td></td><td><strong><a href="/WNRN/pl/20051019/WNRN">WNRN</a></strong> (Music)</td></tr>
<tr data-key="3"><td class="show-time">6:00 PM</td><td></td><td><strong><a href="/WNRN/pl/20050560/World-Caf%C3%A9">World Café</a></strong> (Music) with <a href="/WNRN/dj/179987/Raina-Douris-and-Stephen-Kallao">Raina Douris and Stephen Kallao</a></td></tr>
<tr data-key="4"><td class="show-time">6:00 AM</td><td></td><td><strong><a href="/WNRN/pl/20048193/WNRN">WNRN</a></strong> (Music)</td></tr>
</tbody></table>
</div></div>

Notice that all of the addresses we need are contained in <a> tags. We can extract these <a> tags with .find_all():

recent_atags = recent.find_all("a")
recent_atags
[<a href="/WNRN/pl/20052657/WNRN-1-9-25-5-01-AM">WNRN 1/9/25, 5:01 AM</a>,
 <a href="/WNRN/dj/104061/WNRN">WNRN</a>,
 <a href="/WNRN/pl/20052550/WNRN-1-9-25-4-03-AM">WNRN 1/9/25, 4:03 AM</a>,
 <a href="/WNRN/dj/104061/WNRN">WNRN</a>,
 <a href="/WNRN/pl/20051019/WNRN">WNRN</a>,
 <a href="/WNRN/pl/20050560/World-Caf%C3%A9">World Café</a>,
 <a href="/WNRN/dj/179987/Raina-Douris-and-Stephen-Kallao">Raina Douris and Stephen Kallao</a>,
 <a href="/WNRN/pl/20048193/WNRN">WNRN</a>]

The resulting list contains the web endpoints we need, and also some web endpoints we don’t need: we want the URLs that contain the string /pl/ as these are playlists, and we want to exclude the URLs that contain the string /dj/ as these pages refer to a particular DJ. We need a comprehension loop that loops across these elements, extracts the href attribute of the entries that include /pl/, and ignore the entries that include /dj/. We again use this syntax:

newlist = [ expression for item in oldlist if condition ]

In this case:

  • newlist is a list containing the URLs we want to direct our spider to. I call it urls.

  • item is one element of recent_atags, which I will call pl.

  • expression is code that extracts the web address from the href attribute of the <a> tag, so here the code would be pl['href'].

  • Finally, condition is a logical statement that should be True if the web address contains /pl/ and False if the web address contains /dj/. Here, the conditional statement should be if "/pl/" in pl['href']. This code will look for the string "/pl/" inside the string called by pl['href'] and return True or False depending on whether this string is found.

Putting all this syntax together gives us our list of playlist URLs:

wnrn_url = [pl['href'] for pl in recent_atags if "/pl/" in pl['href']]
wnrn_url
['/WNRN/pl/20052657/WNRN-1-9-25-5-01-AM',
 '/WNRN/pl/20052550/WNRN-1-9-25-4-03-AM',
 '/WNRN/pl/20051019/WNRN',
 '/WNRN/pl/20050560/World-Caf%C3%A9',
 '/WNRN/pl/20048193/WNRN']

First, we need to collect all of the code we created above to extract the artist, song, album, and play times from the HTML code. We define a function that does all of this work. We specify one argument for this function, the URL, so that all the function needs is the URL and it can output a clean dataframe. I name the function wnrn_spider():

def wnrn_spider(url):
    """Perform web scraping for any WNRN playlist given the available link"""
    
    headers = {'user-agent': 'Kropko class example (jkropko@virginia.edu)'}
    r = requests.get(url, headers=headers)
    wnrn = BeautifulSoup(r.text, 'html')
    
    artistlist = wnrn.find_all("span", "artist")
    songlist = wnrn.find_all("span", "song")
    albumlist = wnrn.find_all("span", "release")
    timelist = wnrn.find_all("td", "spin-time")
    
    artists = [a.string for a in artistlist]
    songs = [a.string for a in songlist]
    albums = [a.string for a in albumlist]
    times = [a.string for a in timelist]
    
    mydict = {'time':times, 'artist':artists, 'song':songs, 'album':albums}
    wnrn_df = pd.DataFrame(mydict)
    
    return wnrn_df

We can pass any of the URLs we collected to our function and get the other playlists. We will have to add the domain “https://spinitron.com” to the beginning of each of the URLs we collected:

wnrn2 = wnrn_spider('https://spinitron.com/' + wnrn_url[0])
wnrn2
time artist song album
0 5:01 AM Sondre Lerche I Can't See Myself Without You Patience
1 5:04 AM Waxahatchee Much Ado About Nothing (Single)
2 5:08 AM Paul Cauthen Angels & Heathens Black on Black
3 5:11 AM Modest Mouse Missed the Boat We Were Dead Before The Ship Even Sank
4 5:16 AM The Wood Brothers Alabaster Kingdom in My Mind
5 5:19 AM Daughter of Swords Alone Together Cardinals At The Window
6 5:22 AM Olivia Ellen Lloyd Every Good Man Do It Myself
7 5:26 AM Fleet Foxes Can I Believe You Shore
8 5:30 AM John Prine The Sins of Memphisto The Missing Years
9 5:34 AM Jontavious Willis Keep Your Worries On the Dance Floor West Georgia Blues
10 5:37 AM Future Islands A Dream of You and Me Singles
11 5:42 AM Peach Pit Magpie Magpie
12 5:47 AM Amanda Anne Platt & the Honeycutters The Lesson The Ones That Stay
13 5:50 AM Johnny Delaware Running Para Llevar
14 5:53 AM Jim Lauderdale Don't You Treat 'Em That Way My Favorite Place
15 5:56 AM Grizzly Bear Two Weeks Veckatimest

Our goal here is to loop across all the URLs we collected, extract the data in a clean data frame, and append these data frames together to construct a longer playlist. To do that, we will use a for loop, which has the following syntax:

for index in list:
    expressions

This syntax is similar to the syntax we used to build a comprehension loop. list is an existing list, and index stands in for one element of this list. For each element of the list, we execute the code contained in expressions, which can use the index.

For our spider, we will use the following steps:

  1. We take the data we already scraped from https://spinitron.com/WNRN (saved as wnrn_df) and clone it as a new variable named wnrn_total_playlist. It is important that we make a copy, and that we do not overwrite wnrn_df. We will be repeatedly saving over wnrn_total_playlist within the loop, and if we do not overwrite wnrn_df, it gives us a stable data frame to return to as a starting point if we need to rerun this loop.

  2. We use a for loop to loop across all the web addresses inside wnrn_url.

  3. In the for loop, we use the wnrn_spider() function to extract the playlist data from each of the URLs inside wnrn_url.

  4. In the for loop, we use the .append() method to attach the new data to the bottom of the existing data, matching corresponding columns.

The code is as follows:

wnrn_total_playlist = wnrn_df 
for w in wnrn_url:
    moredata = wnrn_spider('https://spinitron.com/' + w) 
    wnrn_total_playlist = wnrn_total_playlist.append(moredata)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/q2/fzs0c2rx5pxgnlwzh2p7qmw00000gs/T/ipykernel_76370/3262710234.py in ?()
      1 wnrn_total_playlist = wnrn_df
      2 for w in wnrn_url:
      3     moredata = wnrn_spider('https://spinitron.com/' + w)
----> 4     wnrn_total_playlist = wnrn_total_playlist.append(moredata)

~/.pyenv/versions/3.12.5/lib/python3.12/site-packages/pandas/core/generic.py in ?(self, name)
   6295             and name not in self._accessors
   6296             and self._info_axis._can_hold_identifiers_and_holds_name(name)
   6297         ):
   6298             return self[name]
-> 6299         return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'append'

We now have a data frame that combines all of the playlists on https://spinitron.com/WNRN and on the playlists linked to under “Recent”:

wnrn_total_playlist
time artist song album
0 8:31 PM Nick Lowe Lay It On Me Baby Lay It on Me
1 8:27 PM Okkervil River Lost Coastlines The Stand Ins
2 8:24 PM Perfume Genius On the Floor Set My Heart on Fire Immediately
3 8:20 PM Stray Fossa It's Nothing (Single)
4 8:13 PM Lianne La Havas Can't Fight Lianne La Havas
... ... ... ... ...
12 4:44 AM Chicano Batman Color My Life Invisible People
13 4:48 AM Laura Marling Held Down Song for Our Daughter
14 4:52 AM J. Roddy Walston & The Business Sweat Shock Essential Tremors
15 4:55 AM Becca Mancari Hunter The Greatest Part
16 4:58 AM Cordovas This Town’s a Drag That Santa Fe Channel

209 rows × 4 columns