Scrape Bing Search Engine Results Page
Based on my last post for scraping the Google SERP I decided to make the small change to scrape the organic search results from Bing.
I wasn’t able to find a way to display 100 results per page in the Bing results so this script will only return the top 10. However it could be enhanced to loop through the pages of results but I have left that out of this code.
Example Usage:
$ python BingScrape.py http://twitter.com/halotis http://www.halotis.com/ http://www.halotis.com/progress/ http://doi.acm.org/10.1145/367072.367328 http://runtoloseweight.com/privacy.php http://twitter.com/halotis/statuses/2391293559 http://friendfeed.com/mfwarren http://www.date-conference.com/archive/conference/proceedings/PAPERS/2001/DATE01/PDFFILES/07a_2.pdf http://twitterrespond.com/ http://heatherbreen.com
Here’s the Python Code:
#!/usr/bin/env python # -*- coding: utf-8 -*- # (C) 2009 HalOtis Marketing # written by Matt Warren # http://halotis.com/ import urllib,urllib2 from BeautifulSoup import BeautifulSoup def bing_grab(query): address = "http://www.bing.com/search?q=%s" % (urllib.quote_plus(query)) request = urllib2.Request(address, None, {'User-Agent':'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'} ) urlfile = urllib2.urlopen(request) page = urlfile.read(200000) urlfile.close() soup = BeautifulSoup(page) links = [x.find('a')['href'] for x in soup.find('div', id='results').findAll('h3')] return links if __name__=='__main__': # Example: Search written to file links = bing_grab('halotis') print '\n'.join(links)


Hi Matt,
great sharing there. This piece of code is short and effective!
I am actually interested in the “related searches” in bing and was trying to use BeautifulSoup to crawl it for personal data collection.
I was using the code below to identify it, however it returned me with another data that i am not interested in.
results = soup.findAll(‘div’, attrs={‘class’ : ‘sw_menu’})
After going through the html,I realised there are actually two classes with similar class name but different id’s. Well, the module took the latter one which is not what i am interested in.
I am wondering if you know how to use BeautifulSoup more effectively?
Thanks.
Best regards
Marcin