G'day! Welcome to my blog. I only post intermittently. If you'd like to keep track of what I'm doing, please consider subscribing to my RSS feed. You can also follow me on Twitter (1-20 short posts a day) and Tumblr (a few found objects per month).
I'm having trouble storing feedparser results in a Django model.
It's all about timestamps. Feedparser returns timestamps in a standard time nine-tuple, asserting UTC. Django wants datetime objects. So, I'm trying to translate:
django_timestamp = datetime.datetime.fromtimestamp(time.mktime(feedparser_timestamp))
feedparser_timestamp = django_timestamp.utctimetuple()
This works fine for the majority of timestamps, but sometimes translating to datetime and back mutates the timestamp. In turn, that makes get-if-modified-since somewhat unreliable. Here are some examples, from my log file:
WARNING: (2004, 11, 19, 5, 13, 31, 4, 324, 0) => datetime.datetime(2004, 11, 19, 6, 13, 31) => (2004, 11, 19, 6, 13, 31, 4, 324, 0)
WARNING: (2005, 11, 2, 2, 17, 55, 2, 306, 0) => datetime.datetime(2005, 11, 2, 3, 17, 55) => (2005, 11, 2, 3, 17, 55, 2, 306, 0)
WARNING: (2006, 12, 13, 0, 21, 25, 2, 347, 0) => datetime.datetime(2006, 12, 13, 1, 21, 25) => (2006, 12, 13, 1, 21, 25, 2, 347, 0)
WARNING: (2004, 11, 14, 23, 55, 31, 6, 319, 0) => datetime.datetime(2004, 11, 15, 0, 55, 31) => (2004, 11, 15, 0, 55, 31, 0, 320, 0)
I'm off by an hour. I smell a problem with daylight savings. I just wish I knew what to do about it.
I've waved a dead chicken at this one all the ways I know how. Every change I make breaks the conversion entirely. So, I'm throwing this out to the community in the hope that someone can help me.
{ 1 trackback }
{ 17 comments }
You could try something like this:
>>> dt = datetime.datetime(*fp_time_tuple[:-3])
You get a naif datetime without any timezone information, representing the UTC time tuple you get from feedparser.
Be careful though that sometimes feed have no timestamp. I use something like this (data is the entry data from feedparser):
>>> date_published = data.get(‘published_parsed’, data.get(‘updated_parsed’))
>>> if isinstance(date_published, struct_time):
… date_published = datetime.datetime(*date_published[:-3])
… else:
… date_published = datetime.datetime.utcnow()
>>>
try this code:
django_timestamp = datetime.datetime(*feedparser_timestamp[:7])
This seems to work for me. Although i don’t convert this back to struct_time. Maybe i’m having the same problem without knowing it.
django_timestamp = datetime.datetime( *feedparser_timestamp[:6] )
Greetings
I just wrote a function like so
def struct_to_datetime(struct): return datetime.datetime(struct[0], struct[1], struct[2], struct[3], struct[4], struct[5], tzinfo=utc)
using pytz to convert to the timezone I wanted, this seems to have worked fine for the project I did with it, and I didn’t notice any serious anomalies on my side of things (I did notice anomalies in people feeds).
When you make a datetime out of the 9-tuple, you need to make it non-naive; if you give it a tzinfo that makes it aware that it’s in UTC, you can then safely convert it to other timezones or emit it in whatever formats you like, and DST should be taken care of. I’m a big, big fan of dateutil for stuff like this as it has comprehensive timezone support.
I don’t have much Django experience, though, so I’m not sure what your model will make of a non-naive datetime…
Try setting the DST flag to ‘-1′ when creating the timestamp. It should look something like this… see if it works.
django_timestamp = datetime.datetime.fromtimestamp(
time.mktime(
feedparser_timestamp[0:8] + (-1,)))
feedparser_timestamp = django_timestamp.utctimetuple()
I’m gathering unique timestamps, and will try all these methods. Thanks!
Anders, how is `utc` defined?
garth, utc should be pytz.UTC
Mine looks like this:
if hasattr(content, 'modified') and content.modified != None:
feed.lastModified = datetime.datetime.utcfromtimestamp(calendar.timegm(content.modified))
Try again with formatting …
if hasattr(content, 'modified') and \
content.modified != None:
feed.lastModified = \
datetime.datetime.utcfromtimestamp(
calendar.timegm(content.modified))
datetime.datetime.utcfromtimestamp(calendar.timegm(feedparser_timestamp))
I think i ended up scratching the use of the feedparser _parsed attributes all together. The parsing code there just isn’t as robust and dateutil.parser.parse().
For what it’s worth I’ve never had problems with dateutil’s parsing code, especially for ISO8601 parsing which is usually what feeds use.
Testing translation methods against 195 timestamps...
_ludo: 0
_devex: 0
_kiopi: 0
_anders: 0
_gt: 0
_sam_and_zeb: 0
_garth: 4
All the methods work, except the one I got from the Django community blog. Oops.
Now I’ll have to collect raw timestamps so I can check dateutil versus feedburner…
If you’re building an aggregator, you might want to have a look at feedcache. When combined with shove, it makes managing feeds for aggregators very easy.
Keep in mind the DST bug could well be in the feed generator. Quite a few will blithely assume UTC = Local Time +/- offset and not factor in DST.
I have often seen feeds with timestamps 1 hour into the future, and added logic to my aggregator, Temboz, to automatically substract 1 hour if it detects this:
http://www.temboz.com/temboz/fileview?f=temboz/normalize.py
Hi! I’ve got the same problem with 1-hour-delay, you just have to use your timezone:
import datetime, time
datetime.datetime(2004, 11, 19, 5, 13, 31) + datetime.timedelta(seconds=time.timezone)
Comments on this entry are closed.