Simple Python Twitter rss feed parser

If you want to display your tweets somewhere on your own web page, the easiest way is to use the RSS feed in your Twitter profile page (for example http://twitter.com/teebesz). Of course if you want to parse the @, # and links, you need just a little bit of code.

Here is the Python script I use for this site's Twitter display. You'll need the feedparser library installed (how have you been living without it anyway!)

import datetime
import feedparser
import re
    
def get_twitter(url, limit=3):
    """Takes a twitter rss feed and returns a list of dictionaries, one per
    tweet. Each dictionary contains two attributes:
        - An html ready string with the @, # and links parsed to the correct
        html code
        - A datetime object of the posted date"""

    twitter_entries = []
    for entry in feedparser.parse(url)['entries'][:limit]:

        # convert the given time format to datetime
        posted_datetime = datetime.datetime(
            entry['updated_parsed'][0],
            entry['updated_parsed'][1],
            entry['updated_parsed'][2],
            entry['updated_parsed'][3],
            entry['updated_parsed'][4],
            entry['updated_parsed'][5],
            entry['updated_parsed'][6],
        )
        
        # format the date a bit
        if posted_datetime.year == datetime.datetime.now().year:
            posted = posted_datetime.strftime("%b %d")
        else:
            posted = posted_datetime.strftime("%b %d %y")
        
        # strip the "<username>: " that preceeds all twitter feed entries
        text = re.sub(r'^\w+:\s', '', entry['title'])
        
# parse links
        text = re.sub(
            r"[^\"](http://(\w|\.|/|\?|=|%|&)+)",
            lambda x: "<a href='%s'>%s</a>" % (x.group(), x.group()),
            text)
        
        # parse @tweeter
        text = re.sub(
            r'@(\w+)',
            lambda x: "<a href='http://twitter.com/%s'>%s</a>"\
                 % (x.group()[1:], x.group()),
            text)
        
        # parse #hashtag
        text = re.sub(
            r'#(\w+)',
            lambda x: "<a href='http://twitter.com/search?q=%%23%s'>%s</a>"\
                 % (x.group()[1:], x.group()),
            text)
        
        twitter_entries.append({
            'text': text,
            'posted': posted,
            })
        
    return twitter_entries
4 comments - leave a comment

November 16, 2009 4:36 p.m. by alex

I'm trying to understand the parse links in the above code, but something seems off - that code as copy/pasted gives an error in idle.

November 17, 2009 3:30 p.m. by Teebes

Alex,

What do you mean by 'an error in idle'? Can you post the error that you're getting? feel free to shoot me a mail at teebes at teebes.com if you need help.

- Teebes

November 29, 2009 3:05 a.m. by alex

When I copy pasted the code in the python IDLE editor, it wouldn't execute. I replaced it after a few trials and experimentations with the following:
<pre>text = re.sub(r"\b(http://(\w|\.|/|\?|=|%|&)+)",</pre> that seems to work (although I'm not a re expert).

December 2, 2009 7:04 a.m. by Teebes

Alex,

You were right, that re got a little messed up when I converted the python code to html, I've corrected it in the code above, it should be:

r"[^\"](http://(\w|\.|/|\?|=|%|&)+)"

thanks a lot for pointing that out!

- Teebes

Leave a comment







Twitter

Sep 04 - It's so much easier to understand a solution once you've experienced the problem

Sep 03 - @tarequeh @shawnr I did see that, very excited to see what Mr Rider has to say :)

Sep 03 - Doing it, buying The Suburbs by @arcadefire. We Used to Wait really got stuck in my head after http://thewildernessdowntown.com/