I am using python to code. I have been trying to webscrape the names, team images, and colleges of nba draft prospects.However when I scrape for the name of the colleges I get both the college page and the college name. How do I get it so that I only see the colleges? I have tried adding .string and .text to the end of anchor (anchor.string).
import urllib2
from BeautifulSoup import BeautifulSoup
# or if your're using BeautifulSoup4:
# from bs4 import BeautifulSoup
list = []
soup = BeautifulSoup(urllib2.urlopen(
'http://www.cbssports.com/nba/draft/mock-draft'
).read()
)
rows = soup.findAll("table",
attrs = {'class':'data borderTop'})[0].tbody.findAll("tr")[2:]
for row in rows:
fields = row.findAll("td")
if len(fields) >= 3:
anchor = row.findAll("td")[2].findAll("a")[1:]
if anchor:
print anchor