Django vs feedparser on dates

by garth on September 1, 2007

I'm having trouble storing feedparser results in a Django model.

It's all about timestamps. Feedparser returns timestamps in a standard time nine-tuple, asserting UTC. Django wants datetime objects. So, I'm trying to translate:

django_timestamp = datetime.datetime.fromtimestamp(time.mktime(feedparser_timestamp))

feedparser_timestamp = django_timestamp.utctimetuple()

This works fine for the majority of timestamps, but sometimes translating to datetime and back mutates the timestamp. In turn, that makes get-if-modified-since somewhat unreliable. Here are some examples, from my log file:

WARNING: (2004, 11, 19, 5, 13, 31, 4, 324, 0) => datetime.datetime(2004, 11, 19, 6, 13, 31) => (2004, 11, 19, 6, 13, 31, 4, 324, 0)

WARNING: (2005, 11, 2, 2, 17, 55, 2, 306, 0) => datetime.datetime(2005, 11, 2, 3, 17, 55) => (2005, 11, 2, 3, 17, 55, 2, 306, 0)

WARNING: (2006, 12, 13, 0, 21, 25, 2, 347, 0) => datetime.datetime(2006, 12, 13, 1, 21, 25) => (2006, 12, 13, 1, 21, 25, 2, 347, 0)

WARNING: (2004, 11, 14, 23, 55, 31, 6, 319, 0) => datetime.datetime(2004, 11, 15, 0, 55, 31) => (2004, 11, 15, 0, 55, 31, 0, 320, 0) 

I'm off by an hour. I smell a problem with daylight savings. I just wish I knew what to do about it. 

I've waved a dead chicken at this one all the ways I know how. Every change I make breaks the conversion entirely. So, I'm throwing this out to the community in the hope that someone can help me. 

{ 1 trackback }

Sam Ruby
09.03.07 at 1:22 am

{ 17 comments }

ludo 09.01.07 at 10:21 pm

You could try something like this:

>>> dt = datetime.datetime(*fp_time_tuple[:-3])

You get a naif datetime without any timezone information, representing the UTC time tuple you get from feedparser.

ludo 09.01.07 at 10:24 pm

Be careful though that sometimes feed have no timestamp. I use something like this (data is the entry data from feedparser):

>>> date_published = data.get(‘published_parsed’, data.get(‘updated_parsed’))
>>> if isinstance(date_published, struct_time):
… date_published = datetime.datetime(*date_published[:-3])
… else:
… date_published = datetime.datetime.utcnow()
>>>

DevEx 09.01.07 at 11:27 pm

try this code:

django_timestamp = datetime.datetime(*feedparser_timestamp[:7])

kioopi 09.01.07 at 11:47 pm

This seems to work for me. Although i don’t convert this back to struct_time. Maybe i’m having the same problem without knowing it.


django_timestamp = datetime.datetime( *feedparser_timestamp[:6] )

Greetings

Anders Conbere 09.02.07 at 3:41 am

I just wrote a function like so

def struct_to_datetime(struct): return datetime.datetime(struct[0], struct[1], struct[2], struct[3], struct[4], struct[5], tzinfo=utc)

using pytz to convert to the timezone I wanted, this seems to have worked fine for the project I did with it, and I didn’t notice any serious anomalies on my side of things (I did notice anomalies in people feeds).

Mike Pirnat 09.02.07 at 3:52 am

When you make a datetime out of the 9-tuple, you need to make it non-naive; if you give it a tzinfo that makes it aware that it’s in UTC, you can then safely convert it to other timezones or emit it in whatever formats you like, and DST should be taken care of. I’m a big, big fan of dateutil for stuff like this as it has comprehensive timezone support.

I don’t have much Django experience, though, so I’m not sure what your model will make of a non-naive datetime…

gt 09.02.07 at 4:10 am

Try setting the DST flag to ‘-1′ when creating the timestamp. It should look something like this… see if it works.


django_timestamp = datetime.datetime.fromtimestamp(
time.mktime(
feedparser_timestamp[0:8] + (-1,)))

feedparser_timestamp = django_timestamp.utctimetuple()

garth 09.02.07 at 12:50 pm

I’m gathering unique timestamps, and will try all these methods. Thanks!

Anders, how is `utc` defined?

ludo 09.02.07 at 8:58 pm

garth, utc should be pytz.UTC

Zebziggle 09.02.07 at 11:02 pm

Mine looks like this:


if hasattr(content, 'modified') and content.modified != None:
feed.lastModified = datetime.datetime.utcfromtimestamp(calendar.timegm(content.modified))

Zebziggle 09.02.07 at 11:04 pm

Try again with formatting …


if hasattr(content, 'modified') and \
content.modified != None:
feed.lastModified = \
datetime.datetime.utcfromtimestamp(
calendar.timegm(content.modified))

Sam Ruby 09.03.07 at 1:26 am
mikeal 09.03.07 at 7:38 am

I think i ended up scratching the use of the feedparser _parsed attributes all together. The parsing code there just isn’t as robust and dateutil.parser.parse().

For what it’s worth I’ve never had problems with dateutil’s parsing code, especially for ISO8601 parsing which is usually what feeds use.

garth 09.03.07 at 9:55 am


Testing translation methods against 195 timestamps...
_ludo: 0
_devex: 0
_kiopi: 0
_anders: 0
_gt: 0
_sam_and_zeb: 0
_garth: 4

All the methods work, except the one I got from the Django community blog. Oops.

Now I’ll have to collect raw timestamps so I can check dateutil versus feedburner…

Doug Hellmann 09.03.07 at 7:18 pm

If you’re building an aggregator, you might want to have a look at feedcache. When combined with shove, it makes managing feeds for aggregators very easy.

Fazal Majid 09.16.07 at 10:14 pm

Keep in mind the DST bug could well be in the feed generator. Quite a few will blithely assume UTC = Local Time +/- offset and not factor in DST.

I have often seen feeds with timestamps 1 hour into the future, and added logic to my aggregator, Temboz, to automatically substract 1 hour if it detects this:

http://www.temboz.com/temboz/fileview?f=temboz/normalize.py

Rafael 12.11.08 at 7:46 am

Hi! I’ve got the same problem with 1-hour-delay, you just have to use your timezone:

import datetime, time

datetime.datetime(2004, 11, 19, 5, 13, 31) + datetime.timedelta(seconds=time.timezone)

Comments on this entry are closed.