Reading and Writing Adwords Editor CSV files in Python

I have been hard at work testing out different approaches to Adwords.  One of the keys is that I’m scripting up a lot of the management of campaigns, ad groups, keywords, and ads.  The Adwords API could be used but there’s an expense to using it which would be a significant expense for the size of my campaigns.  So I have been using the Adwords Editor to help manage everything.  What makes it excellent is that the tool has import and export to/from csv files.  This makes it pretty simple to play with the data.

To get a file that this script will work with just go to the File menu in Google Adwords Editor and select “Export to CSV”  You can then select “Export Selected Campaigns”.  it will write out a csv file.

This Python script will read those output csv files into a Python data structure which you can then manipulate and write back out to a file.

With the file modified you can then use the Adwords Editor’s “Import CSV” facility to get your changes back into the Editor and then uploaded to Adwords.

Having this ability to pull this data into Python, modify it, and then get it back into Adwords means that I can do a lot of really neat things:

  • create massive campaigns with a large number of targeted ads
  • Invent bidding strategies that act individually at the keyword level
  • automate some of the management
  • pull in statistics from CPA networks to calculate ROIs
  • convert text ads into image ads

Here’s the script:

#!/usr/bin/env python
# coding=utf-8
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
"""
read and write exported campaigns from Adwords Editor
 
"""
 
import codecs
import csv
 
FIELDS = ['Campaign', 'Campaign Daily Budget', 'Languages', 'Geo Targeting', 'Ad Group', 'Max CPC', 'Max Content CPC', 'Placement Max CPC', 'Max CPM', 'Max CPA', 'Keyword', 'Keyword Type', 'First Page CPC', 'Quality Score', 'Headline', 'Description Line 1', 'Description Line 2', 'Display URL', 'Destination URL', 'Campaign Status', 'AdGroup Status', 'Creative Status', 'Keyword Status', 'Suggested Changes', 'Comment', 'Impressions', 'Clicks', 'CTR', 'Avg CPC', 'Avg CPM', 'Cost', 'Avg Position', 'Conversions (1-per-click)', 'Conversion Rate (1-per-click)', 'Cost/Conversion (1-per-click)', 'Conversions (many-per-click)', 'Conversion Rate (many-per-click)', 'Cost/Conversion (many-per-click)']
 
def readAdwordsExport(filename):
 
    campaigns = {}
 
    f = codecs.open(filename, 'r', 'utf-16')
    reader = csv.DictReader(f, delimiter='\t')
 
    for row in reader:
        #remove empty values from dict
        row = dict((i, j) for i, j in row.items() if j!='' and j != None)
        if row.has_key('Campaign Daily Budget'):  # campain level settings
            campaigns[row['Campaign']] = {}
            for k,v in row.items():
                campaigns[row['Campaign']][k] = v
        if row.has_key('Max Content CPC'):  # AdGroup level settings
            if not campaigns[row['Campaign']].has_key('Ad Groups'):
                campaigns[row['Campaign']]['Ad Groups'] = {}
            campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']] = row
        if row.has_key('Keyword'):  # keyword level settings
            if not campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']].has_key('keywords'):
                campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']]['keywords'] = []
            campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']]['keywords'].append(row)
        if row.has_key('Headline'):  # ad level settings
            if not campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']].has_key('ads'):
                campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']]['ads'] = []
            campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']]['ads'].append(row)
    return campaigns
 
def writeAdwordsExport(data, filename):
    f = codecs.open(filename, 'w', 'utf-16')
    writer = csv.DictWriter(f, FIELDS, delimiter='\t')
    writer.writerow(dict(zip(FIELDS, FIELDS)))
    for campaign, d in data.items():
        writer.writerow(dict((i,j) for i, j in d.items() if i != 'Ad Groups'))
        for adgroup, ag in d['Ad Groups'].items():
            writer.writerow(dict((i,j) for i, j in ag.items() if i != 'keywords' and i != 'ads'))
            for keyword in ag['keywords']:
                writer.writerow(keyword)            
            for ad in ag['ads']:
                writer.writerow(ad)
    f.close()
 
if __name__=='__main__':
    data = readAdwordsExport('export.csv')
    print 'Campaigns:'
    print data.keys()
    writeAdwordsExport(data, 'output.csv')

This code is available in my public repository: http://bitbucket.org/halotis/halotis-collection/

Bookmark and Share

Technorati Tags: , , , ,

A Week of Adwords Testing – Recap

Google Adwords LogoLast week I started testing some new concepts on Adwords. A week has passed and I wanted to recap what has happened and some things that I have noticed and learned so far.

First off, the strategy that I’m testing is to use the content network exclusively. As a result some of the standard SEM practices don’t really apply. Click through rates are dramatically lower than on search and it takes some time to get used to 0.1% CTRs. It takes a lot of impressions to get traffic at those levels.

Luckily the inventory of ad space with people using Adsense is monstrous and as a result there is plenty opportunities for placements. So for my limited week of testing I have had about 150,000 impressions on my ads resulting in 80 clicks.

The other thing to note is that there is comparatively nobody running ads on the content network. So the competition is almost non-existent. That makes the price per click very low. The total ad spend for the first week of testing was less than $10.

I have run into a number of problems in my testing that I never expected.

  • It’s not possible to use the Adwords API to build flash ads with the Display Ad Builder :(
  • There seems to be a bug with the Adwords Editor when trying to upload a lot of image ads.
  • It takes a long time for image ads to be approved and start running (none of my image ads have been approved yet)
  • Paying to use the Adwords API works out to be very expensive for the scale I want to use it at.
  • optimizing the price is time consuming since it can take days to see enough results.

With all those problems I’m still optimistic that I can find a way to scale things up more radically.  So far in the past week I have written a number of scripts that have helped me build out the campaigns, ad groups and ads.  It has gotten to the point where I can now upload over 1000 text ads to new campaigns, ad groups and keywords in one evening.

Since so far the testing has been inconclusive I’m going to hold off sharing the scripts I have for this strategy.  If it works out you can count on me recording some videos of the technique and the scripts to help.

Bookmark and Share

Technorati Tags: , , , , , ,

Google Adwords – Another Try

Google Adwords LogoLast night I had one of those sleepless nights. I’m sure you have had one of these before – After hearing a great idea your mind starts spinning with the possibilities and there’s no way you’ll be able to sleep. I got excited last night about a new approach to Google Adwords that has just might have a lot of potential.

Google Adwords has never really proven to be a profitable way to drive traffic for me (though Microsoft Adcenter has). However several times a year for the past 4 years I have heard a little tip or seen someone use it successfully and have become intrigued enough to dive back in and test the waters again. Each time my testing has been plagued with failure and I have lost thousands of dollars trying to find a system that works.

Yesterday I got a tip, something new that I haven’t yet tried but that sounded promising. And so over the next few weeks I’m going to be running some tests. The problem with the approach I’m testing is that it requires creating a MASSIVE number of keyword targeted ads – a total of over 100,000 ads per niche.

It took me 2.5 hours last night to manually create 400 of the 100,000 ads I need (for the one niche I’m going to test first). There’s no feasible way to create all those ads manually and I’m not interested in spending yet more money on ebooks or software that claims to make money or magically do the work for me. So I am going to program some scripts myself to test the techniques. If it works or doesn’t work I will let you know, and share the code right here on this blog.

The testing started last night. Check back next week for the preliminary results (and maybe a hint about what I’m doing).

Bookmark and Share

Technorati Tags: , ,

Random User Agent String Python Script

This is more of a helpful snippit than a useful program but it can sometimes be useful to have some user agent strings handy for web scraping.

Some websites check the user agent string and will filter the results of a request. It’s a very simple way to prevent automated scraping. But it is very easy to get around. The user agent can also be checked by spam filters to help detect automated posting.

A great resource for finding and understanding what user agent strings mean is UserAgentString.com.

This simple snippit uses a file containing the list of user agent strings that you want to use. It can very simply source that file and return a random one from the list.

Here’s my source file UserAgents.txt:

Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.3) Gecko/20090913 Firefox/3.5.3
Mozilla/5.0 (Windows; U; Windows NT 6.1; en; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.1) Gecko/20090718 Firefox/3.5.1
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/532.1 (KHTML, like Gecko) Chrome/4.0.219.6 Safari/532.1
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.2; Win64; x64; Trident/4.0)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SV1; .NET CLR 2.0.50727; InfoPath.2)Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)
Mozilla/4.0 (compatible; MSIE 6.1; Windows XP)

And here is the python code that makes getting a random agent very simple:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
import random
 
SOURCE_FILE='UserAgents.txt'
 
def get():
    f = open(SOURCE_FILE)
    agents = f.readlines()
 
    return random.choice(agents).strip()
 
def getAll():
    f = open(SOURCE_FILE)
    agents = f.readlines()
    return [a.strip() for a in agents]
 
if __name__=='__main__':
    agents = getAll()
    for agent in agents:
        print agent

You can grab the source code for this along with my other scripts from the bitbucket repository.

Bookmark and Share

Technorati Tags: , , ,

Scrape Digg Search Results Python Script

Digg is by far the most popular social news site on the internet. With it’s simple “thumbs up” system the users of the site promote the most interesting and high quality stores and the best of those make it to the front page. What you end up with is a filtered view of the most interesting stuff.

It’s a great site and one that I visit every day.

I wanted to write a script that makes use of the search feature on Digg so that I could scrape out and re-purpose the best stuff to use elsewhere. The first step in writing that larger (top secret) program was to start with a scraper for Digg search.

The short python script I came up with will return the search results from Digg in a standard python data structure so it’s simple to use. It parses out the title, destination, comment count, digg link, digg count, and summary for the top 100 search results.

You can perform advanced searches on digg by using a number of different flags:

  • +b Add to see buried stories
  • +p Add to see only promoted stories
  • +np Add to see only unpromoted stories
  • +u Add to see only upcoming stories
  • Put terms in “quotes” for an exact search
  • -d Remove the domain from the search
  • Add -term to exclude a term from your query (e.g. apple -iphone)
  • Begin your query with site: to only display stories from that URL.

This script also allows the search results to be sorted:

from DiggSearch import digg_search
digg_search('twitter', sort='newest')  #sort by newest first
digg_search('twitter', sort='digg')  # sort by number of diggs
digg_search('twitter -d')  # sort by best match

Here’s the Python code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
import urllib,urllib2
import re
 
from BeautifulSoup import BeautifulSoup
 
USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'
 
def remove_extra_spaces(data):
    p = re.compile(r'\s+')
    return p.sub(' ', data)
 
def digg_search(query, sort=None, pages=10):
    """Returns a list of the information I need from a digg query
    sort can be one of [None, 'digg', 'newest']
    """
 
    digg_results = []
    for page in range (1,pages):
 
        #create the URL
        address = "http://digg.com/search?s=%s" % (urllib.quote_plus(query))
        if sort:
            address = address + '&sort=' + sort
        if page > 1:
            address = address + '&page=' + str(page)
 
        #GET the page
        request = urllib2.Request(address, None, {'User-Agent':USER_AGENT} )
        urlfile = urllib2.urlopen(request)
        page = urlfile.read(200000)
        urlfile.close()
 
        #scrape it
        soup = BeautifulSoup(page)
        links = soup.findAll('h3', id=re.compile("title\d"))
        comments = soup.findAll('a', attrs={'class':'tool comments'})
        diggs = soup.findAll('strong', id=re.compile("diggs-strong-\d"))
        body = soup.findAll('a', attrs={'class':'body'})
        for i in range(0,len(links)):
            item = {'title':remove_extra_spaces(' '.join(links[i].findAll(text=True))).strip(), 
                    'destination':links[i].find('a')['href'],
                    'comment_count':int(comments[i].string.split()[0]),
                    'digg_link':comments[i]['href'],
                    'digg_count':diggs[i].string,
                    'summary':body[i].find(text=True)
                    }
            digg_results.append(item)
 
        #last page early exit
        if len(links) < 10:
            break
 
    return digg_results
 
if __name__=='__main__':
    #for testing
    results = digg_search('twitter -d', 'digg', 2)
    for r in results:
        print r

You can grab the source code from the bitbucket repository.

Bookmark and Share

Technorati Tags: , , , , , ,

SEOCheck: Track Your Google Position Over Time

Have you ever wanted to track and assess your SEO efforts by seeing how they change your position in Google’s organic SERP? With this script you can now track and chart your position for any number of search queries and find the position of the site/page you are trying to rank.

This will allow you to visually identify any target keyword phrases that are doing well, and which ones may need some more SEO work.

This python script has a number of different components.

  • SEOCheckConfig.py script is used to add new target search queries to the database.
  • SEOCheck.py searches Google and saves the best position (in the top 100 results)
  • SEOCheckCharting.py graph all the results

The charts produced look like this:

seocheck

The main part of the script is SEOCheck.py. This script should be scheduled to run regularly (I have mine running 3 times per day on my webfaction hosting account).

For a small SEO consultancy business this type of application generates the feedback and reports that you should be using to communicate with your clients. It identifies where the efforts should go and how successful you have been.

To use this set of script you first will need to edit and run the SEOCheckConfig.py file. Add your own queries and domains that you’d like to check to the SETTINGS variable then run the script to load those into the database.

Then schedule SEOCheck.py to run periodically. On Windows you can do that using Scheduled Tasks:
Scheduled Task Dialog

On either Mac OSX or Linux you can use crontab to schedule it.

To generate the Chart simply run the SEOCheckCharting.py script. It will plot all the results on one graph.

You can find and download all the source code for this in the HalOtis-Collection on bitbucket. It requires BeautifulSoup, matplotlib, and sqlalchemy libraries to be installed.

Bookmark and Share

Technorati Tags: , , , , , , ,

Python Twitter API Library Reviews and Samples

There are many Twitter API libraries available for Python. I wanted to find out which one was the best and what the strengths and weaknesses of each are. However there are too many out there to find and review all of them. Instead here’s a bunch of the most popular Python Twitter API wrappers with a small review of each one, and some sample code showing off the syntax.

Python Twitter

This is the library that I personally use in most of my Twitter scripts. It’s very simple to use, but is not up to date with the latest developments at Twitter. It was written by DeWitt Clinton and available on Google Code. If you just want the basic API functionality this does a pretty decent job.

import twitter
api = twitter.Api('username', 'password')
statuses = api.GetPublicTimeline()
print [s.user.name for s in statuses]
users = api.GetFriends()
print [u.name for u in users]
statuses = api.GetUserTimeline(user)
print [s.text for s in statuses]
api.PostUpdate(username, password, 'I love python-twitter!')

twyt

Twyt is a pretty comprehensive library that seems to be pretty solid and well organized. In some cases there is added complexity to parse the return json objects from the Python Twitter API. It is written and maintained by Andrew Price.

from twyt import twitter, data
t = twitter.Twitter()
t.set_auth("username", "password")
print t.status_friends_timeline()
print t.user_friends()
return_val = t.status_update("Testing 123")
s = data.Status()
s.load_json(return_val)
print s
t.status_destroy(s.id)

Twython

An up to date, Python wrapper for the Twitter API. Supports Twitter’s main API, Twitter’s search API, and (soon) using OAuth with Twitter/Streaming API. It is based on the Python Twitter library and is actively maintained by Ryan Mcgrath.

import twython
twitter = twython.setup(authtype="Basic", username="example", password="example")
twitter.updateStatus("See how easy this was?")
friends_timeline = twitter.getFriendsTimeline(count="150", page="3")
print [tweet["text"] for tweet in friends_timeline]

Tweepy

Tweepy is a pretty compelling Python Twitter API library. It’s up to date with the latest features of Twitter and actively being developed by Joshua Roesslein. It features OAuth support, Python 3 support, streaming API support and it’s own cache system. Retweet streaming was recently added. If you want to use the most up to date features of the Twitter API on Python, or use Python 3, then you should definitely check out Tweepy.

import tweepy
api = tweepy.API.new('basic', 'username', 'password')
public_timeline = api.public_timeline()
print [tweet.text for tweet in public_timeline]
friends_timeline = api.friends_timeline()
print [tweet.text for tweet in friends_timeline]
u = tweepy.api.get_user('username')
friends = u.friends()
print [tweet.screen_name for f in friends]
api.update_status('tweeting with tweepy')

It’s pretty clear from the syntax samples that there’s not much difference between any of these Python Twitter API libraries when just getting the basics done. The difference only start to show up when you get into the latest features. OAuth, Streaming, and the retweet functions really differentiate these libraries. I hope this overview helps you find and choose the library that’s right for your project.

Bookmark and Share

Technorati Tags: , , , , ,

Source Code Available on Bitbucket

bitbucket logoSomeone asked me to make the source code available in a repository somewhere. One place to grab all the scripts I’ve shared on this blog. Well I’m starting to make them available on bitbucket.org. So far thirteen scripts have been published to the repository and are available using Mercurial (hg) or through the web interface.

Check out the halotis-collection repository

As time permits I’ll be updating and adding more scripts to the repository. Going forward all the new scripts will be added to the repository, and I’ll link to the source code from the post which will make it easier to get the files and play with them yourself.

Bookmark and Share

Technorati Tags: , , , , ,

List the Links in Your Twitter Timeline Python Script

This is a simple Twitter Python script that checks your friends time-line and prints out any links that have been posted. In addition it visits each of the URLs and finds the actual title of the destination page and prints that along side. This simple script demonstrates an easy way to gather some of the hottest trends on the internet the moment they happen.

If you set up a Twitter account within a niche and find a few of the players in that niche to follow then you can simply find any links posted, check them to see if they are on topic (using some keyword/heuristics) and then either notify yourself of the interesting content, or automatically scrape it for use on one of your related websites. That gives you perhaps the most up to date content possible before it hits Google Trends. It also gives you a chance to promote it before the social news sites find it (or be the first to submit it to them).

With a bit more work you could parse out some of the meta tag keywords/description, crawl the website, or find and cut out the content from the page. If it’s a blog you could post a comment.

Example Usage:

$ python TwitterLinks.py
http://bit.ly/s8rQX - Twitter Status - Tweets from users you follow may be missing from your timeline
http://bit.ly/26hiT - Why Link Exchanges Are a Terrible, No-Good Idea - Food Blog Alliance
http://FrankAndTrey.com - Frank and Trey
http://bit.ly/yPRHp - Gallery: Cute animals in the news this week
...

And here’s the python code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
try:
   import json
except:
   import simplejson as json # http://undefined.org/python/#simplejson
import twitter     #http://code.google.com/p/python-twitter/
 
from urllib2 import urlopen
import re
 
SETTINGS = {'user':'twitter user name', 'password':'you password here'}
 
def listFriendsURLs(user, password):
    re_pattern='.*?((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s"]*))'	# HTTP URL
    rg = re.compile(re_pattern,re.IGNORECASE|re.DOTALL)
 
    api = twitter.Api(user, password)
    timeline = api.GetFriendsTimeline(user)
 
    for status in timeline:
        m = rg.search(status.text)
        if m:
            httpurl=m.group(1)
            title = getTitle(httpurl)
            print httpurl, '-', title
 
def getTitle(url):
    req = urlopen(url)
    html = req.read()
 
    re_pattern='<title>(.*?)</title>'
    rg = re.compile(re_pattern,re.IGNORECASE|re.DOTALL)
 
    m = rg.search(html)
    if m:
        title = m.group(1)
        return title.strip()
    return None
 
if __name__ == '__main__':
    listFriendsURLs(SETTINGS['user'], SETTINGS['password'])
Bookmark and Share

Technorati Tags: , , ,

Python Web Crawler Script

spider_webHere’s a simple web crawling script that will go from one url and find all the pages it links to up to a pre-defined depth. Web crawling is of course the lowest level tool used by Google to create its multi-billion dollar business. You may not be able to compete with Google’s search technology but being able to crawl your own sites, or that of your competitors can be very valuable.

You could for instance routinely check your websites to make sure that it is live and all the links are working. it could notify you of any 404 errors. By adding in a page rank check you could identify better linking strategies to boost your page rank scores. And you could identify possible leaks – paths a user could take that takes them away from where you want them to go.

Here’s the script:

# -*- coding: utf-8 -*-
from HTMLParser import HTMLParser
from urllib2 import urlopen
 
class Spider(HTMLParser):
    def __init__(self, starting_url, depth, max_span):
        HTMLParser.__init__(self)
        self.url = starting_url
        self.db = {self.url: 1}
        self.node = [self.url]
 
        self.depth = depth # recursion depth max
        self.max_span = max_span # max links obtained per url
        self.links_found = 0
 
    def handle_starttag(self, tag, attrs):
        if self.links_found < self.max_span and tag == 'a' and attrs:
            link = attrs[0][1]
            if link[:4] != "http":
                link = '/'.join(self.url.split('/')[:3])+('/'+link).replace('//','/')
 
            if link not in self.db:
                print "new link ---> %s" % link
                self.links_found += 1
                self.node.append(link)
            self.db[link] = (self.db.get(link) or 0) + 1
 
    def crawl(self):
        for depth in xrange(self.depth):
            print "*"*70+("\nScanning depth %d web\n" % (depth+1))+"*"*70
            context_node = self.node[:]
            self.node = []
            for self.url in context_node:
                self.links_found = 0
                try:
                    req = urlopen(self.url)
                    res = req.read()
                    self.feed(res)
                except:
                    self.reset()
        print "*"*40 + "\nRESULTS\n" + "*"*40
        zorted = [(v,k) for (k,v) in self.db.items()]
        zorted.sort(reverse = True)
        return zorted
 
if __name__ == "__main__":
    spidey = Spider(starting_url = 'http://www.7cerebros.com.ar', depth = 5, max_span = 10)
    result = spidey.crawl()
    for (n,link) in result:
        print "%s was found %d time%s." %(link,n, "s" if n is not 1 else "")
Bookmark and Share

Technorati Tags: , , , , ,