<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Scrape Digg Search Results Python Script</title>
	<atom:link href="http://www.halotis.com/2009/09/30/scrape-digg-search-results-python-script/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.halotis.com/2009/09/30/scrape-digg-search-results-python-script/</link>
	<description>Entrepreneurship in the 21st Centruy</description>
	<lastBuildDate>Sat, 21 Jan 2012 09:19:41 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Jan Paricka</title>
		<link>http://www.halotis.com/2009/09/30/scrape-digg-search-results-python-script/comment-page-1/#comment-17234</link>
		<dc:creator>Jan Paricka</dc:creator>
		<pubDate>Mon, 07 Dec 2009 14:54:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.halotis.com/?p=780#comment-17234</guid>
		<description>Matt, we use mostly bitnami (rightscale) server images running on GoGrid.  I can bring one server instance up if you want to have a look?   I tried everything ... :-(  Thank you!  I very much appreciate your help with this.</description>
		<content:encoded><![CDATA[<p>Matt, we use mostly bitnami (rightscale) server images running on GoGrid.  I can bring one server instance up if you want to have a look?   I tried everything &#8230; :-(  Thank you!  I very much appreciate your help with this.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matt Warren</title>
		<link>http://www.halotis.com/2009/09/30/scrape-digg-search-results-python-script/comment-page-1/#comment-17233</link>
		<dc:creator>Matt Warren</dc:creator>
		<pubDate>Mon, 07 Dec 2009 14:48:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.halotis.com/?p=780#comment-17233</guid>
		<description>I have tried it on a number of my computers - Linux, Windows and Mac and it works fine. (all with python 2.6)

Are you stuck behind a firewall or proxy server ?</description>
		<content:encoded><![CDATA[<p>I have tried it on a number of my computers &#8211; Linux, Windows and Mac and it works fine. (all with python 2.6)</p>
<p>Are you stuck behind a firewall or proxy server ?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jan Paricka</title>
		<link>http://www.halotis.com/2009/09/30/scrape-digg-search-results-python-script/comment-page-1/#comment-17232</link>
		<dc:creator>Jan Paricka</dc:creator>
		<pubDate>Mon, 07 Dec 2009 14:22:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.halotis.com/?p=780#comment-17232</guid>
		<description>[ jparicka dev ~ ] cat delete.me
import urllib, json
address=&quot;http://services.digg.com/search/stories?query=frankincense%20tree&amp;sort=digg_count-desc&amp;appkey=http%3A%2F%2Fwww.beepl.com&amp;type=json&quot;
data = urllib.urlopen(address)
print data.readlines()

[ jparicka dev ~ ] python delete.me
[&#039;{&quot;error&quot;:{&quot;timestamp&quot;:1260195682,&quot;message&quot;:&quot;HTTP User-Agent header required&quot;,&quot;code&quot;:1029}}&#039;]

I get the same problem on all our boxes.   :-(</description>
		<content:encoded><![CDATA[<p>[ jparicka dev ~ ] cat delete.me<br />
import urllib, json<br />
address=&#8221;http://services.digg.com/search/stories?query=frankincense%20tree&amp;sort=digg_count-desc&amp;appkey=http%3A%2F%2Fwww.beepl.com&amp;type=json&#8221;<br />
data = urllib.urlopen(address)<br />
print data.readlines()</p>
<p>[ jparicka dev ~ ] python delete.me<br />
['{"error":{"timestamp":1260195682,"message":"HTTP User-Agent header required","code":1029}}']</p>
<p>I get the same problem on all our boxes.   :-(</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matt Warren</title>
		<link>http://www.halotis.com/2009/09/30/scrape-digg-search-results-python-script/comment-page-1/#comment-17231</link>
		<dc:creator>Matt Warren</dc:creator>
		<pubDate>Sun, 06 Dec 2009 19:41:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.halotis.com/?p=780#comment-17231</guid>
		<description>I wasn&#039;t able to duplicate the 403 error.  but a couple of things to note.

since you&#039;re using the developer api, you don&#039;t need to specify the user agent stuff and you can actually make it much simplier:

&gt;&gt;&gt; import urllib, json
&gt;&gt;&gt; address=&quot;http://services.digg.com/search/stories?query=frankincense%20tree&amp;sort=digg_count-desc&amp;appkey=http%3A%2F%2Fwww.beepl.com&amp;type=json&quot;
&gt;&gt;&gt; data = json.load(urllib.urlopen(address))</description>
		<content:encoded><![CDATA[<p>I wasn&#8217;t able to duplicate the 403 error.  but a couple of things to note.</p>
<p>since you&#8217;re using the developer api, you don&#8217;t need to specify the user agent stuff and you can actually make it much simplier:</p>
<p>&gt;&gt;&gt; import urllib, json<br />
&gt;&gt;&gt; address=&#8221;http://services.digg.com/search/stories?query=frankincense%20tree&amp;sort=digg_count-desc&amp;appkey=http%3A%2F%2Fwww.beepl.com&amp;type=json&#8221;<br />
&gt;&gt;&gt; data = json.load(urllib.urlopen(address))</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jan Paricka</title>
		<link>http://www.halotis.com/2009/09/30/scrape-digg-search-results-python-script/comment-page-1/#comment-17230</link>
		<dc:creator>Jan Paricka</dc:creator>
		<pubDate>Sat, 05 Dec 2009 02:23:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.halotis.com/?p=780#comment-17230</guid>
		<description>OK, so I tried to simplify your code to

import urllib, urllib2

USER_AGENT = &#039;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)&#039;

address = &quot;http://services.digg.com/search/stories?query=frankincense%20tree&amp;sort=digg_count-desc&amp;appkey=http%3A%2F%2Fwww.beepl.com&amp;type=json&quot;

request = urllib2.Request(address, None, {&#039;User-Agent&#039;:USER_AGENT} )

urlfile = urllib2.urlopen(request)
    
page = urlfile.read(200000)

urlfile.close()

...........but receiving 403.

Digg failed to help me out - and I tried from like 10 different servers (all hosted on goGrid)

I am desperate.    Any ideas?

Thank you!!

[jparicka@25358_2_42578_205369:~] python abcd.py
Traceback (most recent call last):
  File &quot;abcd.py&quot;, line 9, in ?
    urlfile = urllib2.urlopen(request)
  File &quot;/usr/lib/python2.4/urllib2.py&quot;, line 130, in urlopen
    return _opener.open(url, data)
  File &quot;/usr/lib/python2.4/urllib2.py&quot;, line 364, in open
    response = meth(req, response)
  File &quot;/usr/lib/python2.4/urllib2.py&quot;, line 471, in http_response
    response = self.parent.error(
  File &quot;/usr/lib/python2.4/urllib2.py&quot;, line 402, in error
    return self._call_chain(*args)
  File &quot;/usr/lib/python2.4/urllib2.py&quot;, line 337, in _call_chain
    result = func(*args)
  File &quot;/usr/lib/python2.4/urllib2.py&quot;, line 480, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden</description>
		<content:encoded><![CDATA[<p>OK, so I tried to simplify your code to</p>
<p>import urllib, urllib2</p>
<p>USER_AGENT = &#8216;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)&#8217;</p>
<p>address = &#8220;http://services.digg.com/search/stories?query=frankincense%20tree&amp;sort=digg_count-desc&amp;appkey=http%3A%2F%2Fwww.beepl.com&amp;type=json&#8221;</p>
<p>request = urllib2.Request(address, None, {&#8216;User-Agent&#8217;:USER_AGENT} )</p>
<p>urlfile = urllib2.urlopen(request)</p>
<p>page = urlfile.read(200000)</p>
<p>urlfile.close()</p>
<p>&#8230;&#8230;&#8230;..but receiving 403.</p>
<p>Digg failed to help me out &#8211; and I tried from like 10 different servers (all hosted on goGrid)</p>
<p>I am desperate.    Any ideas?</p>
<p>Thank you!!</p>
<p>[jparicka@25358_2_42578_205369:~] python abcd.py<br />
Traceback (most recent call last):<br />
  File &#8220;abcd.py&#8221;, line 9, in ?<br />
    urlfile = urllib2.urlopen(request)<br />
  File &#8220;/usr/lib/python2.4/urllib2.py&#8221;, line 130, in urlopen<br />
    return _opener.open(url, data)<br />
  File &#8220;/usr/lib/python2.4/urllib2.py&#8221;, line 364, in open<br />
    response = meth(req, response)<br />
  File &#8220;/usr/lib/python2.4/urllib2.py&#8221;, line 471, in http_response<br />
    response = self.parent.error(<br />
  File &#8220;/usr/lib/python2.4/urllib2.py&#8221;, line 402, in error<br />
    return self._call_chain(*args)<br />
  File &#8220;/usr/lib/python2.4/urllib2.py&#8221;, line 337, in _call_chain<br />
    result = func(*args)<br />
  File &#8220;/usr/lib/python2.4/urllib2.py&#8221;, line 480, in http_error_default<br />
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)<br />
urllib2.HTTPError: HTTP Error 403: Forbidden</p>
]]></content:encoded>
	</item>
</channel>
</rss>

