Random User Agent String Python Script

This is more of a helpful snippit than a useful program but it can sometimes be useful to have some user agent strings handy for web scraping.

Some websites check the user agent string and will filter the results of a request. It’s a very simple way to prevent automated scraping. But it is very easy to get around. The user agent can also be checked by spam filters to help detect automated posting.

A great resource for finding and understanding what user agent strings mean is UserAgentString.com.

This simple snippit uses a file containing the list of user agent strings that you want to use. It can very simply source that file and return a random one from the list.

Here’s my source file UserAgents.txt:

Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.3) Gecko/20090913 Firefox/3.5.3
Mozilla/5.0 (Windows; U; Windows NT 6.1; en; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.1) Gecko/20090718 Firefox/3.5.1
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/532.1 (KHTML, like Gecko) Chrome/4.0.219.6 Safari/532.1
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.2; Win64; x64; Trident/4.0)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SV1; .NET CLR 2.0.50727; InfoPath.2)Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)
Mozilla/4.0 (compatible; MSIE 6.1; Windows XP)

And here is the python code that makes getting a random agent very simple:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
import random
 
SOURCE_FILE='UserAgents.txt'
 
def get():
    f = open(SOURCE_FILE)
    agents = f.readlines()
 
    return random.choice(agents).strip()
 
def getAll():
    f = open(SOURCE_FILE)
    agents = f.readlines()
    return [a.strip() for a in agents]
 
if __name__=='__main__':
    agents = getAll()
    for agent in agents:
        print agent

You can grab the source code for this along with my other scripts from the bitbucket repository.

Bookmark and Share

Technorati Tags: , , ,

Related posts:

  1. Python Web Crawler Script
  2. Wordpress Blog Posting Robot
  3. Getting Ezine Article Content Automatically with Python
  4. Translating Text Using Google Translate and Python
  5. Google Page Rank Python Script
Stumble it!


RSS feed | Trackback URI

1 Comment »

Comment by halotis Subscribed to comments via email
2009-10-06 09:02:03

Random User Agent String Python Script – This is more of a helpful snippit than a useful program but it can sometim.. http://bit.ly/2u1z4K

This comment was originally posted on Twitter

 
Name (required)
E-mail (required - never shown publicly)
URI
Subscribe to comments via email
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped=""> in your comment.



Additional comments powered by BackType