Google Page Rank Python Script

This isn’t my script but I thought it would appeal to the reader of this blog.  It’s a script that  will lookup the Google Page Rank for any website and uses the same interface as the Google Toolbar to do it. I’d like to thank Fred Cirera for writing it and you can checkout his blog about this script here.

I’m not exactly sure what I would use this for but it might have applications for anyone who wants to do some really advanced SEO work and find a real way to accomplish Page Rank sculpting. Perhaps finding the best websites to put links on.

The reason it is such an involved bit of math is that it need to compute a checksum in order to work. It should be pretty reliable since it doesn’t involve and scraping.

Example usage:

$ python pagerank.py http://www.google.com/
PageRank: 10	URL: http://www.google.com/
 
$ python pagerank.py http://www.mozilla.org/
PageRank: 9	URL: http://www.mozilla.org/
 
$ python pagerank.py http://halotis.com
PageRange: 3   URL: http://www.halotis.com/

And the script:

#!/usr/bin/env python
#
#  Script for getting Google Page Rank of page
#  Google Toolbar 3.0.x/4.0.x Pagerank Checksum Algorithm
#
#  original from http://pagerank.gamesaga.net/
#  this version was adapted from http://www.djangosnippets.org/snippets/221/
#  by Corey Goldberg - 2010
#
#  Licensed under the MIT license: http://www.opensource.org/licenses/mit-license.php
 
 
 
import urllib
 
 
def get_pagerank(url):
    hsh = check_hash(hash_url(url))
    gurl = 'http://www.google.com/search?client=navclient-auto&features=Rank:&q=info:%s&ch=%s' % (urllib.quote(url), hsh)
    try:
        f = urllib.urlopen(gurl)
        rank = f.read().strip()[9:]
    except Exception:
        rank = 'N/A'
    if rank == '':
        rank = '0'
    return rank
 
 
def  int_str(string, integer, factor):
    for i in range(len(string)) :
        integer *= factor
        integer &= 0xFFFFFFFF
        integer += ord(string[i])
    return integer
 
 
def hash_url(string):
    c1 = int_str(string, 0x1505, 0x21)
    c2 = int_str(string, 0, 0x1003F)
 
    c1 >>= 2
    c1 = ((c1 >> 4) & 0x3FFFFC0) | (c1 & 0x3F)
    c1 = ((c1 >> 4) & 0x3FFC00) | (c1 & 0x3FF)
    c1 = ((c1 >> 4) & 0x3C000) | (c1 & 0x3FFF)
 
    t1 = (c1 & 0x3C0) < < 4
    t1 |= c1 & 0x3C
    t1 = (t1 << 2) | (c2 & 0xF0F)
 
    t2 = (c1 & 0xFFFFC000) << 4
    t2 |= c1 & 0x3C00
    t2 = (t2 << 0xA) | (c2 & 0xF0F0000)
 
    return (t1 | t2)
 
 
def check_hash(hash_int):
    hash_str = '%u' % (hash_int)
    flag = 0
    check_byte = 0
 
    i = len(hash_str) - 1
    while i >= 0:
        byte = int(hash_str[i])
        if 1 == (flag % 2):
            byte *= 2;
            byte = byte / 10 + byte % 10
        check_byte += byte
        flag += 1
        i -= 1
 
    check_byte %= 10
    if 0 != check_byte:
        check_byte = 10 - check_byte
        if 1 == flag % 2:
            if 1 == check_byte % 2:
                check_byte += 9
            check_byte >>= 1
 
    return '7' + str(check_byte) + hash_str
 
 
 
if __name__ == '__main__':
    if len(sys.argv) != 2:
        url = 'http://www.google.com/'
    else:
        url = sys.argv[1]
 
    print get_pagerank(url)

Technorati Tags: , , ,



RSS feed | Trackback URI

14 Comments »

Comment by Doug
2009-09-29 07:37:33

Seems to have stopped working.

 
Comment by Matt Warren
2009-09-29 07:42:24

Thanks for letting me know. I’ll look into it.

 
Comment by Ssnodgra
2009-11-29 12:21:34

Any luck sorting out why it stopped working?

 
Comment by Matt Warren
2009-11-29 12:56:29

Unfortunately it doesn’t seem to be fixable. :(

Previous versions of the Google toolbar had a simple way to get the data. But since they introduced the SideWiki they changed how the toolbar works. The new API seems to return binary data (possibly encrypted) Cracking this new scheme is not going to be easy.

 
Comment by iker
2010-03-04 09:23:13

Hi, I have tested the Page Rank IURL with my personal webpage (sniffed with whireshark):

http://toolbarqueries.google.es/search?features=Rank&sourceid=navclient-ff&client=navclient-auto-ff&googleip=O;208.117.235.17;97&iqrn=8VdB&querytime=4P&orig=0X557&swwk=-1&ch=84f831859&q=info:http%3A%2F%2Fwww.ikeralbeniz.net%2F

adding this headers to de http petition..

Client.Headers.Add(“User-Agent”, “Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.9.1.8) Gecko/20100202 Firefox/3.5.8 GTBDFff GTB7.0 GoogleToolbarFF 7.0.20091216″);
Client.Headers.Add(“Accept”, “text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8″);
Client.Headers.Add(“Accept-Language”, “es-es,es;q=0.8,en-us;q=0.5,en;q=0.3″);
Client.Headers.Add(“Accept-Encoding”, “gzip,deflate”);
Client.Headers.Add(“Accept-Charset”, “ISO-8859-1,utf-8;q=0.7,*;q=0.7″);
Client.Headers.Add(“Keep-Alive”, “300″);

and works ok… i get Rank_1:1:0 (je je je)

what i have not found is the way to get the google hash.. in my case:

84f831859 < http://www.ikeralbeniz.net/

but if i use your function the hash is:
620570008487

have you got any idea about the way google calculates the hash?

 
2010-04-05 23:52:02

[...] 加粗部分为要查询的网页地址(URL),返回为JSON格式的数据,包含所查询的URL及PR值。 更新:由于接口的请求限制,该API会频繁出错,不再公开。获取PR的接口代码可以参考这篇Google Page Rank Python Script -EOF- Posted in General | Tags: PageRank « 全国各地ISP IP表 You can leave a response, or trackback from your own site. [...]

 
2010-07-06 01:10:23

[...] 原文:http://www.halotis.com/2009/08/02/google-page-range-python-script/ [...]

 
Comment by Reed
2011-05-23 23:49:18

Does this code work?

Comment by Matt Warren
2011-05-30 10:28:07

probably not. it was relying on a private, undocumented API from google, and they seem to like changing things fairly often.

 
 
Comment by google traffic
2011-10-10 18:34:28

I changed the url and it worked a few days ago, but now it doesn’t.

gurl = ‘http://toolbarqueries.google.com/tbr?client=navclient-auto&features=Rank&ch=%s&q=info:%s‘ % (hsh,urllib.quote(url))

Although now that I am looking more closely, it looks like the hash isn’t returning a valid value.

 
Comment by rudraksha
2011-10-18 05:35:56

It rocks. Very good work.

 
Comment by Bobek
2011-11-14 06:52:02

Any ideas how to turn this to work after google changes ?

 
Comment by dahax
2011-12-02 07:00:46

you should use another url:

gurl = “http://toolbarqueries.google.com/tbr?client=navclient-auto&ch=” + hsh + “&features=Rank&q=info:” +url+”&num=100&filter=0″;

 
Comment by MZB
2012-01-02 13:37:10

Thanks – that fixed it for me.

 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight=""> in your comment.