Django vs feedparser on dates
G'day! Welcome to my blog. I only post intermittently. If you'd like to keep track of what I'm doing, please consider subscribing to my RSS feed. You can also follow me on Twitter (1-20 short posts a day) and Tumblr (a few found objects per month).
I'm having trouble storing feedparser results in a Django model.
It's all about timestamps. Feedparser returns timestamps in a standard time nine-tuple, asserting UTC. Django wants datetime objects. So, I'm trying to translate:
django_timestamp = datetime.datetime.fromtimestamp(time.mktime(feedparser_timestamp))
feedparser_timestamp = django_timestamp.utctimetuple()
This works fine for the majority of timestamps, but sometimes translating to datetime and back mutates the timestamp. In turn, that makes get-if-modified-since somewhat unreliable. Here are some examples, from my log file:
WARNING: (2004, 11, 19, 5, 13, 31, 4, 324, 0) => datetime.datetime(2004, 11, 19, 6, 13, 31) => (2004, 11, 19, 6, 13, 31, 4, 324, 0)
WARNING: (2005, 11, 2, 2, 17, 55, 2, 306, 0) => datetime.datetime(2005, 11, 2, 3, 17, 55) => (2005, 11, 2, 3, 17, 55, 2, 306, 0)
WARNING: (2006, 12, 13, 0, 21, 25, 2, 347, 0) => datetime.datetime(2006, 12, 13, 1, 21, 25) => (2006, 12, 13, 1, 21, 25, 2, 347, 0)
WARNING: (2004, 11, 14, 23, 55, 31, 6, 319, 0) => datetime.datetime(2004, 11, 15, 0, 55, 31) => (2004, 11, 15, 0, 55, 31, 0, 320, 0)
I'm off by an hour. I smell a problem with daylight savings. I just wish I knew what to do about it.
I've waved a dead chicken at this one all the ways I know how. Every change I make breaks the conversion entirely. So, I'm throwing this out to the community in the hope that someone can help me.
September 1st, 2007 at 10:21 pm
You could try something like this:
>>> dt = datetime.datetime(*fp_time_tuple[:-3])
You get a naif datetime without any timezone information, representing the UTC time tuple you get from feedparser.
September 1st, 2007 at 10:24 pm
Be careful though that sometimes feed have no timestamp. I use something like this (data is the entry data from feedparser):
>>> date_published = data.get(’published_parsed’, data.get(’updated_parsed’))
>>> if isinstance(date_published, struct_time):
… date_published = datetime.datetime(*date_published[:-3])
… else:
… date_published = datetime.datetime.utcnow()
>>>
September 1st, 2007 at 11:27 pm
try this code:
django_timestamp = datetime.datetime(*feedparser_timestamp[:7])
September 1st, 2007 at 11:47 pm
This seems to work for me. Although i don’t convert this back to struct_time. Maybe i’m having the same problem without knowing it.
django_timestamp = datetime.datetime( *feedparser_timestamp[:6] )
Greetings
September 2nd, 2007 at 3:41 am
I just wrote a function like so
def struct_to_datetime(struct): return datetime.datetime(struct[0], struct[1], struct[2], struct[3], struct[4], struct[5], tzinfo=utc)
using pytz to convert to the timezone I wanted, this seems to have worked fine for the project I did with it, and I didn’t notice any serious anomalies on my side of things (I did notice anomalies in people feeds).
September 2nd, 2007 at 3:52 am
When you make a datetime out of the 9-tuple, you need to make it non-naive; if you give it a tzinfo that makes it aware that it’s in UTC, you can then safely convert it to other timezones or emit it in whatever formats you like, and DST should be taken care of. I’m a big, big fan of dateutil for stuff like this as it has comprehensive timezone support.
I don’t have much Django experience, though, so I’m not sure what your model will make of a non-naive datetime…
September 2nd, 2007 at 4:10 am
Try setting the DST flag to ‘-1′ when creating the timestamp. It should look something like this… see if it works.
django_timestamp = datetime.datetime.fromtimestamp(
time.mktime(
feedparser_timestamp[0:8] + (-1,)))
feedparser_timestamp = django_timestamp.utctimetuple()
September 2nd, 2007 at 12:50 pm
I’m gathering unique timestamps, and will try all these methods. Thanks!
Anders, how is `utc` defined?
September 2nd, 2007 at 8:58 pm
garth, utc should be pytz.UTC
September 2nd, 2007 at 11:02 pm
Mine looks like this:
if hasattr(content, 'modified') and content.modified != None:
feed.lastModified = datetime.datetime.utcfromtimestamp(calendar.timegm(content.modified))
September 2nd, 2007 at 11:04 pm
Try again with formatting …
if hasattr(content, 'modified') and \
content.modified != None:
feed.lastModified = \
datetime.datetime.utcfromtimestamp(
calendar.timegm(content.modified))
September 3rd, 2007 at 1:22 am
Dealing With Dates…
Simon Willison: Django vs feedparser on dates. Some useful tips in the comments. I find Python’s timezone stuff endlessly frustrating: I know it can do what I want, but it always take…
September 3rd, 2007 at 1:26 am
datetime.datetime.utcfromtimestamp(calendar.timegm(feedparser_timestamp))
September 3rd, 2007 at 7:38 am
I think i ended up scratching the use of the feedparser _parsed attributes all together. The parsing code there just isn’t as robust and dateutil.parser.parse().
For what it’s worth I’ve never had problems with dateutil’s parsing code, especially for ISO8601 parsing which is usually what feeds use.
September 3rd, 2007 at 9:55 am
Testing translation methods against 195 timestamps...
_ludo: 0
_devex: 0
_kiopi: 0
_anders: 0
_gt: 0
_sam_and_zeb: 0
_garth: 4
All the methods work, except the one I got from the Django community blog. Oops.
Now I’ll have to collect raw timestamps so I can check dateutil versus feedburner…
September 3rd, 2007 at 7:18 pm
If you’re building an aggregator, you might want to have a look at feedcache. When combined with shove, it makes managing feeds for aggregators very easy.
September 16th, 2007 at 10:14 pm
Keep in mind the DST bug could well be in the feed generator. Quite a few will blithely assume UTC = Local Time +/- offset and not factor in DST.
I have often seen feeds with timestamps 1 hour into the future, and added logic to my aggregator, Temboz, to automatically substract 1 hour if it detects this:
http://www.temboz.com/temboz/fileview?f=temboz/normalize.py