<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Field Guide to Programmers &#187; python</title>
	<atom:link href="http://www.fieldguidetoprogrammers.com/category/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.fieldguidetoprogrammers.com</link>
	<description>Code, Toys, Bits of Odd Fluff</description>
	<lastBuildDate>Fri, 19 Jun 2009 16:05:47 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>feedextractor &#8211; a quick and dirty python script to grab lots of feeds from web pages</title>
		<link>http://www.fieldguidetoprogrammers.com/python/feedextractor-a-quick-and-dirty-python-script-to-grab-lots-of-feeds-from-web-pages/</link>
		<comments>http://www.fieldguidetoprogrammers.com/python/feedextractor-a-quick-and-dirty-python-script-to-grab-lots-of-feeds-from-web-pages/#comments</comments>
		<pubDate>Tue, 05 Feb 2008 02:18:13 +0000</pubDate>
		<dc:creator>jamiegrove</dc:creator>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.fieldguidetoprogrammers.com/blog/python/feedextractor-a-quick-and-dirty-python-script-to-grab-lots-of-feeds-from-web-pages/</guid>
		<description><![CDATA[While looking for new feeds to add to my RSS reader (NetNewsWire), I thought it might be nice to have a utility that would let me grab a web page, spider all of the outbound links, check to see which pages had feeds, and then create an opml file of new feeds I didn&#8217;t have [...]]]></description>
			<content:encoded><![CDATA[<p>While looking for new feeds to add to my RSS reader (NetNewsWire), I thought it might be nice to have a utility that would let me grab a web page, spider all of the outbound links, check to see which pages had feeds, and then create an opml file of new feeds I didn&#8217;t have already.</p>
<p>How&#8217;s that for a run-on sentence? <img src='http://www.fieldguidetoprogrammers.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Alright, so in addition to being too lazy to click on every link, I&#8217;m also too lazy to write fancy code for this project.  What I wanted was something quick and dirty.  Something that got me 80% of the way there.</p>
<p>feedextractor.py is where I ended up.</p>
<p>This little python script uses the <a href="http://www.crummy.com/software/BeautifulSoup/">wonderful BeautifulSoup xml/html parsing library</a> from Crummy Software.  I highly recommend the soup and Lewis Carroll&#8217;s Alice in Wonderland.</p>
<h2>Using feedextractor.py</h2>
<p>1. Export you current list of feeds in opml format.<br />
2. Rename the export file to &#8220;mysubscriptions.opml&#8221;<br />
3. Place the export file in the same directory as feedextractor.py<br />
4. Change the baseurl variable in feedextractor.py to the url of the page you would like start from<br />
5. Run feedextractor.py (i.e. [python feedextractor.py])</p>
<p>This will create a file called newfeeds.opml with all of the spidered feeds that do not appear to be in your current list of feeds.</p>
<h2>Known &#8220;problems&#8221;</h2>
<p>1. The script only takes the first feed from a found site.  If there is an RSS and an ATOM feed, the script will grab whichever one is at the top of the file.  This means that multiple feeds are ignored.  You might think this is a bad thing.  If so, feel free to change it.  I set it up this way because I didn&#8217;t want to comb through the found feeds and delete what amounts to duplicates.</p>
<p>2. The script has just one exception block.  If something happens while trying to pull back a page, the script skips that site.  Could be more elegant.</p>
<p>3. The script does not bother with parameters.  Would be nice if you could just pass in a url&#8230;  I know this is simple, but again I am in a hurry.  I just wanted it to work.  I&#8217;m not making a project here.</p>
<p>4. urllib2 gets rejected by some sites.  True enough.  Some web servers will reject a request from urllib2.  If you want to go to the trouble of adding a user agent header, be my guest.</p>
<h2>It works</h2>
<p>This script comes as-is.  Use it to your heart&#8217;s content.  I&#8217;m not planning updates or anything else.  Just a fun bit o&#8217; code I whipped up to suit a need.</p>
<p>But it does work, and quite efficiently too (even for some sloppy-quick hacking).</p>
<p><a href="http://www.fieldguidetoprogrammers.com/downloads/feedextractor.py">Download feedextractor.py</a></p>
<p>Raw source after the jump&#8230;<br />
<span id="more-69"></span></p>
<hr/>
<code></p>
<pre class="codebox" style="width:900px;">
#!/usr/bin/env python
# encoding: utf-8
"""
feedextractor.py

Created by Jamie Grove on 2008-01-30.

"""

from BeautifulSoup import BeautifulSoup
import urllib2
from xml.dom import minidom
from urlparse import urlparse
import socket

timeout = 15
socket.setdefaulttimeout(timeout)

subscribedsites = []
subscribedfeeds = []

# put your seed url here
baseurl = 'http://www.please-change-to-some-site.com/or-full/url.html'

# loadsubscriptions - imports your current opml list
def loadsubscriptions():
	global subscribedsites,subscribedfeeds
	dom = minidom.parse('mysubscriptions.opml')
	for node in dom.getElementsByTagName('outline'):
		subscribedsites.append(node.attributes['htmlUrl'].value)
		subscribedfeeds.append(node.attributes['xmlUrl'].value)

# gethtml(url) - fairly obvious, right?
def gethtml(url):
	html = urllib2.urlopen(url).read()
	return html

# extractlinks(html)  - pulls out all the anchor tags from the html, skips sites you already have
#   uses beautifulsoup to extract links
#   1) checks to see if the netloc of the anchor is in the list of subscribed sites
#   2) checks to see if the netloc of the anchor is in the list of links (keeps out the dupes)
#   3) checks to see if the netloc of the anchor is in the seed url
def extractlinks(html):
	global subscribedsites,subscribedfeeds,baseurl
	soup = BeautifulSoup(html)
	anchors = soup.findAll('a')
	links = []
	for a in anchors:
		o = urlparse(a['href'])
		if len([s for s in subscribedsites if o.netloc in s]) == 0 and len([s for s in links if o.netloc in s]) == 0 and o.netloc not in baseurl:
			links.append(a['href'])
	return links

# getfeed(html) - looks for feed URLs in the html you pass in
#   uses beautifulsoup to extract links
#   same basic logic as extract links to make sure you only get feeds you don't have
def getfeed(html):
	global subscribedsites,subscribedfeeds,baseurl
	soup = BeautifulSoup(html)
	linkedfiles = soup.findAll('link')
	feed = []
	for l in linkedfiles:
		if l.has_key('rel'):
			if l['rel'] == 'alternate':
				o = urlparse(l['href'])
				if len([s for s in subscribedfeeds if o.netloc in s]) == 0 and len([s for s in feed if o.netloc in s['href']]) == 0:
					feed.append({'href':l['href'],'title':l['title']})
	return feed

# main - unimaginative?  yes, but it works
#   1) creates a opml stub
#   2) grabs the seed page and parses it for new links
#   3) goes out and gets feeds (if they exist)
#   4) adds feeds to the stub opml
#   5) writes the opml file out for import elsewhere
def main():
	global subscribedsites,subscribedfeeds,baseurl
	loadsubscriptions()
	html = gethtml(baseurl)
	links = extractlinks(html)
	xml = minidom.Document()
	opml = xml.createElement('opml')
	opml.appendChild(xml.createElement('head'))
	body = xml.createElement('body')
	print '%d links' % len(links)
	counter = 0
	for l in links:
		counter = counter + 1
		print 'processing link %d - %s' % (counter,l.encode('latin-1'))
		try:
			html = gethtml(l)
			feed = getfeed(html)
			if len(feed) > 0:
				for f in feed:
					outline = xml.createElement('outline')
					outline.setAttribute('title',f['title'])
					outline.setAttribute('htmlUrl',l)
					outline.setAttribute('xmlUrl',f['href'])
					body.appendChild(outline)
		except:
			print 'Could not get %s'% l.encode('latin-1')
	opml.appendChild(body)
	xml.appendChild(opml)
	fp = open("newfeeds.opml","w")
	# writexml(self, writer, indent='', addindent='', newl='', encoding=None)
	xml.writexml(fp, "    ", "", "\n", "UTF-8")

if __name__ == '__main__':
	main()
</pre>
<p></code></p>
<hr/>
<p><b>P.S. Looking for a good book on Python Network Programming?</b>  I highly recommend John Goerzen&#8217;s Foundations of Python Network Programming.  John&#8217;s style is engaging and easy to read.  His examples are practical and clear (way better than the shoddy code I wrote above).</p>
<p>This book will have you spinning ideas and code so fast you&#8217;ll wonder how you got along without it.</p>
<p><iframe src="http://rcm.amazon.com/e/cm?t=authorstorecom&#038;o=1&#038;p=8&#038;l=as1&#038;asins=1590593715&#038;fc1=000000&#038;IS2=1&#038;lt1=_blank&#038;lc1=0000FF&#038;bc1=000000&#038;bg1=FFFFFF&#038;f=ifr" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://www.fieldguidetoprogrammers.com/python/feedextractor-a-quick-and-dirty-python-script-to-grab-lots-of-feeds-from-web-pages/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>httpd anywhere &#8211; A simple python script to quickly run a web server using any directory as the root</title>
		<link>http://www.fieldguidetoprogrammers.com/python/httpd-anywhere-a-simple-python-script-to-quickly-run-a-web-server-using-any-directory-as-the-root/</link>
		<comments>http://www.fieldguidetoprogrammers.com/python/httpd-anywhere-a-simple-python-script-to-quickly-run-a-web-server-using-any-directory-as-the-root/#comments</comments>
		<pubDate>Sat, 06 Oct 2007 13:58:57 +0000</pubDate>
		<dc:creator>jamiegrove</dc:creator>
				<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.fieldguidetoprogrammers.com/blog/python/httpd-anywhere-a-simple-python-script-to-quickly-run-a-web-server-using-any-directory-as-the-root/</guid>
		<description><![CDATA[I develop a lot of little sites for fun.  Sometimes I&#8217;m trying out an idea or I just want to test some CSS or Javascript, etc.  In any case, I find it useful in testing to have a web server that can fire off from any directory on my machine.
Python just happens to [...]]]></description>
			<content:encoded><![CDATA[<p>I develop a lot of little sites for fun.  Sometimes I&#8217;m trying out an idea or I just want to test some CSS or Javascript, etc.  In any case, I find it useful in testing to have a web server that can fire off from any directory on my machine.</p>
<p><a href="http://www.python.org">Python</a> just happens to have a basic HTTP server as part of the base language (see <a href="http://docs.python.org/lib/module-SimpleHTTPServer.html">SimpleHTTPServer</a>).  There is also an extension of the module that runs CGI (see <a href="http://docs.python.org/lib/module-CGIHTTPServer.html">CGIHTTPServer</a>).  So, a quick script and you have an instant web server that:</p>
<ul>
<li> runs on port 8000 (by default)</li>
<li> uses the current directory as the root htdocs</li>
<li> can run CGI scripts</li>
</ul>
<p><span id="more-19"></span><br />
[Note: I'm not the first to think of using this built-in python function.  I mean, that's what it's for, right?  I'm just passing this along as a tip.]</p>
<p>If you place said script on your path (as I have), you can fire up a mini web server in an instant from anywhere.</p>
<p>Here is the script:</p>
<p><code></p>
<pre class="codebox" style="width:450px;">
#!/usr/bin/env python
# If necessary, replace above with your path to python

import CGIHTTPServer

CGIHTTPServer.test()
</pre>
<p></code></p>
<p>I called my script cgiserver for ease of use. Also, if you are new to shell scripting, be sure to make your shiny, new script executable:</p>
<blockquote><p>chmod +x [script_name]</p></blockquote>
<p>Below is a clip from my terminal window where I fired up the server and browsed <a href="http://localhost:8000">http://localhost:8000</a>.  To close the server, just hit control-c to kill the script.</p>
<p><code></p>
<pre class="codebox" style="width:450px;">
jgmbpro:~/Documents/Work/FGTP/site jamiegrove$ cgiserver
Serving HTTP on 0.0.0.0 port 8000 ...
localhost - - [15/Sep/2007 09:21:02] "GET / HTTP/1.1" 200 -
localhost - - [15/Sep/2007 09:21:02] "GET /styles/stylemain.css HTTP/1.1" 200 -
localhost - - [15/Sep/2007 09:21:02] "GET /styles/scal.css HTTP/1.1" 200 -
localhost - - [15/Sep/2007 09:21:02] "GET /javascripts/prototype.js HTTP/1.1" 200 -
localhost - - [15/Sep/2007 09:21:02] "GET /javascripts/scriptaculous.js HTTP/1.1" 200 -
localhost - - [15/Sep/2007 09:21:02] "GET /javascripts/builder.js HTTP/1.1" 200 -
localhost - - [15/Sep/2007 09:21:02] "GET /javascripts/effects.js HTTP/1.1" 200 -
localhost - - [15/Sep/2007 09:21:02] "GET /javascripts/dragdrop.js HTTP/1.1" 200 -
localhost - - [15/Sep/2007 09:21:02] "GET /javascripts/controls.js HTTP/1.1" 200 -
localhost - - [15/Sep/2007 09:21:02] "GET /javascripts/slider.js HTTP/1.1" 200 -
localhost - - [15/Sep/2007 09:21:02] "GET /javascripts/sound.js HTTP/1.1" 200 -
localhost - - [15/Sep/2007 09:21:03] "GET /javascripts/scal.js HTTP/1.1" 200 -
</pre>
<p></code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.fieldguidetoprogrammers.com/python/httpd-anywhere-a-simple-python-script-to-quickly-run-a-web-server-using-any-directory-as-the-root/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using httplib.HTTPConnection.set_debuglevel() with urllib2</title>
		<link>http://www.fieldguidetoprogrammers.com/python/using-httplibhttpconnectionset_debuglevel-with-urllib2/</link>
		<comments>http://www.fieldguidetoprogrammers.com/python/using-httplibhttpconnectionset_debuglevel-with-urllib2/#comments</comments>
		<pubDate>Tue, 02 Oct 2007 02:52:01 +0000</pubDate>
		<dc:creator>jamiegrove</dc:creator>
				<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.fieldguidetoprogrammers.com/blog/python/using-httplibhttpconnectionset_debuglevel-with-urllib2/</guid>
		<description><![CDATA[I&#8217;ve been trying to get the debug level turned on in urllib2 for about an hour and now that it is working I thought I would post what I found&#8230;  
When using urllib, you can set the debuglevel directly by using something like this:

import urllib, httplib
httplib.HTTPConnection.debuglevel = 1
urllib.urlopen(&#8221;http://www.somesite.com&#8221;)

However, when using urllib2 you need to [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been trying to get the debug level turned on in urllib2 for about an hour and now that it is working I thought I would post what I found&#8230; <img src='http://www.fieldguidetoprogrammers.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>When using urllib, you can set the debuglevel directly by using something like this:</p>
<blockquote><p>
import urllib, httplib</p>
<p>httplib.HTTPConnection.debuglevel = 1<br />
urllib.urlopen(&#8221;http://www.somesite.com&#8221;)
</p></blockquote>
<p>However, when using urllib2 you need to create a handler install it for use.  The sample below creates the lovely debuglevel handler.</p>
<blockquote><p>
import urllib2<br />
h=urllib2.HTTPHandler(debuglevel=1)<br />
opener = urllib2.build_opener(h)
</p></blockquote>
<p><a href="http://mail.python.org/pipermail/tutor/2005-November/043069.html">Here&#8217;s the original post on mail.python.org</a></p>
<p><b>P.S. Looking for a good book on Python Network Programming?</b>  I highly recommend John Goerzen&#8217;s Foundations of Python Network Programming.  John&#8217;s style is engaging and easy to read.  His examples are practical and clear.  This book will have you spinning ideas and code so fast you&#8217;ll wonder how you got along without it.</p>
<p><iframe src="http://rcm.amazon.com/e/cm?t=authorstorecom&#038;o=1&#038;p=8&#038;l=as1&#038;asins=1590593715&#038;fc1=000000&#038;IS2=1&#038;lt1=_blank&#038;lc1=0000FF&#038;bc1=000000&#038;bg1=FFFFFF&#038;f=ifr" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://www.fieldguidetoprogrammers.com/python/using-httplibhttpconnectionset_debuglevel-with-urllib2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Basic SQLAlchemy ORM Example</title>
		<link>http://www.fieldguidetoprogrammers.com/python/basic-sqlalchemy-orm-example/</link>
		<comments>http://www.fieldguidetoprogrammers.com/python/basic-sqlalchemy-orm-example/#comments</comments>
		<pubDate>Fri, 21 Sep 2007 13:15:29 +0000</pubDate>
		<dc:creator>jamiegrove</dc:creator>
				<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.fieldguidetoprogrammers.com/blog/python/basic-sqlalchemy-orm-example/</guid>
		<description><![CDATA[If you are not into DB-API, SQLAlchemy may be for you.
SQLAlchemy is a database &#8220;toolkit&#8221; for python.  In many ways, it is like Hibernate from the Java world.  Both kits are focused on providing high-performance object relational mapping, but they also provide some nice database abstraction functions as well.  The SQLAlchemy document [...]]]></description>
			<content:encoded><![CDATA[<p>If you are not into DB-API, SQLAlchemy may be for you.</p>
<p><a href="http://www.sqlalchemy.org/">SQLAlchemy</a> is a database &#8220;toolkit&#8221; for python.  In many ways, it is like <a href="http://www.hibernate.org">Hibernate</a> from the Java world.  Both kits are focused on providing high-performance object relational mapping, but they also provide some nice database abstraction functions as well.  The SQLAlchemy document is bit dense, but then this is serious stuff.  Still, after I played with it for awhile, I would recommend it over <a href="http://www.sqlobject.org/">SQLObject</a>.</p>
<p><b>The SQLAlchemy Philosophy</b>:</p>
<blockquote><p>
SQL databases behave less and less like object collections the more size and performance start to matter; object collections behave less and less like tables and rows the more abstraction starts to matter. SQLAlchemy aims to accommodate both of these principles.</p>
<p>SQLAlchemy doesn&#8217;t view databases as just collections of tables; it sees them as relational algebra engines. Its object relational mapper enables classes to be mapped against the database in more than one way. SQL constructs don&#8217;t just select from just tables—you can also select from joins, subqueries, and unions. Thus database relationships and domain object models can be cleanly decoupled from the beginning, allowing both sides to develop to their full potential.
</p></blockquote>
<p>Like I said, it&#8217;s a bit dense, but check out <a href="http://www.rmunn.com/sqlalchemy-tutorial/tutorial.html">Robin Munn&#8217;s article on SQLAlchemy</a>.  What appears below is a much summarized version of Munn&#8217;s sample code:</p>
<pre class="codebox">
from sqlalchemy import *
from sqlalchemy.orm import *

#accessing a database
db = create_engine('mysql://root@localhost/test')

#metadata object used for binding
metadata = BoundMetaData(db)

# creating a table
users_table = Table('users', metadata,
	Column('user_id', Integer, primary_key=True),
	Column('user_name', String(40))
)

#metadata.engine.echo = True
try:
	users_table.create()
except exceptions.SQLError:
	print 'TABLE \'users\' already exists.'

# loading definitions automatically
users_table = Table('users', metadata, autoload=True)

# printing a column
print list(users_table.columns)[0].name

#create a holding class
class User(object):
	def __repr__(self):
		return '%s(%r,%r)' % (
			self.__class__.__name__,self.user_name,self.user_id)

	def wager(self):
		return 'betting on it'

# map the holding class to the table definition
mapper(User, users_table)

#create an instance of the class
u1 = User()

#see, it automatically maps class and fields.  Slick.
print u1.user_id
print u1.wager()
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.fieldguidetoprogrammers.com/python/basic-sqlalchemy-orm-example/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Simple MySQL Python Example</title>
		<link>http://www.fieldguidetoprogrammers.com/python/simple-mysql-python-example/</link>
		<comments>http://www.fieldguidetoprogrammers.com/python/simple-mysql-python-example/#comments</comments>
		<pubDate>Fri, 21 Sep 2007 12:50:05 +0000</pubDate>
		<dc:creator>jamiegrove</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.fieldguidetoprogrammers.com/blog/python/simple-mysql-python-example/</guid>
		<description><![CDATA[In order to use MySQL with python, you need to get the MySQLdb up and running.  Once you have that installed, you use the DB-API to perform queries and such.
The DB-API PEP is good if you already know what you&#8217;re doing (you can also look at the sqlite module doc).  Below is my [...]]]></description>
			<content:encoded><![CDATA[<p>In order to use MySQL with python, you need to get the <a href="http://mysql-python.sourceforge.net/">MySQLdb</a> up and running.  Once you have that installed, you use the <a href="http://www.python.org/dev/peps/pep-0249/">DB-API</a> to perform queries and such.</p>
<p>The DB-API PEP is good if you already know what you&#8217;re doing (you can also look at the <a href="http://docs.python.org/lib/module-sqlite3.html">sqlite module doc</a>).  Below is my <i>very</i> simple example.  Paul DuBois has an extensive article on <a href="http://www.kitebird.com/articles/pydbapi.html">using MySQL with python</a>.</p>
<pre class="codebox">
from MySQLdb import *

# connect to the database
conn = connect(
	host='localhost',
	user='test',
	db='test'
)

table_def = """
CREATE TABLE samples (
	sample_id int PRIMARY KEY,
	sample_name varchar(20),
	sample_url varchar(250)
)
"""

try:
	cursor = conn.cursor()
	cursor.execute(table_def)
	conn.commit()
except:
	print 'Table already exists.' #obviously, there could be other errors too.

conn.close()
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.fieldguidetoprogrammers.com/python/simple-mysql-python-example/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OPML to HTML: Parsing a list of feeds</title>
		<link>http://www.fieldguidetoprogrammers.com/python/opml-to-html-parsing-a-list-of-feeds/</link>
		<comments>http://www.fieldguidetoprogrammers.com/python/opml-to-html-parsing-a-list-of-feeds/#comments</comments>
		<pubDate>Thu, 20 Sep 2007 12:16:29 +0000</pubDate>
		<dc:creator>jamiegrove</dc:creator>
				<category><![CDATA[RSS]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.fieldguidetoprogrammers.com/blog/python/opml-to-html-parsing-a-list-of-feeds/</guid>
		<description><![CDATA[I&#8217;m a feed junkie.  If you are searching for a simple method to convert OPML to HMTL, chances are that you are too&#8230;
OPML as defined by Dave Winer way back in 2000:

OPML an XML-based format that allows exchange of outline-structured information between applications running on different operating systems and environments.

More specifically, OPML is the [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m a feed junkie.  If you are searching for a simple method to convert OPML to HMTL, chances are that you are too&#8230;</p>
<p><a href="http://www.opml.org/">OPML</a> as defined by <a href="http://www.scripting.com/">Dave Winer</a> way back in 2000:</p>
<blockquote><p>
OPML an XML-based format that allows exchange of outline-structured information between applications running on different operating systems and environments.
</p></blockquote>
<p>More specifically, OPML is the format used by most RSS aggregators to spit out lists of feed subscriptions.  That list usually contains the name of the feed, the feed&#8217;s URL, and the site&#8217;s URL.<br />
<span id="more-8"></span><br />
On one of my little sites, I wanted to keep a quick list of links showing the feeds I subscribed to on that particular subject.  As I add feeds to my aggregator at a pretty good clip, updating the site would be a tedious process.  So, rather than use the tools available in my aggregator <img src='http://www.fieldguidetoprogrammers.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  I went ahead and wrote a little python script to grab the OPML and convert it into an HTML stub.</p>
<pre class="codebox">
# opml2html.py - sample code for converting OPML to HTML
from xml.dom.minidom import parse, parseString
import urllib2

#dom1 = parse('mah_links.opml') # parse an XML file by name - uncomment if you want to draw from a file
dom1 = parseString(urllib2.urlopen('http://share.opml.org/opml/top100.opml').read()) #use this to parse a feed

links = dom1.getElementsByTagName('outline')

f = open('links.html','w')

for link in links:
	linktext = '&lt;a href="' + link.getAttribute('htmlUrl') + '"&gt;'
	linktext += link.getAttribute('title') + '&lt;/a&gt;&lt;br /&gt;\n'
	print linktext
	f.write(linktext)
	f.flush()

f.close()
</pre>
<p>Pretty simple, eh?  I included the method to parse a live feed too just for reference.  By the way, if you&#8217;re looking for lists of feeds, Winer&#8217;s <a href="http://share.opml.org">share.opml.org</a> is great.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.fieldguidetoprogrammers.com/python/opml-to-html-parsing-a-list-of-feeds/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
