Recently I had to deal with dateutil's parser. Apparently it is very powerful and lots of people masturbate to it and I managed to bring it to its heels with this:
>>> dparser.parse("P 16:08 May 14, 2003 UTC", fuzzy=True)
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/pymodules/python2.6/dateutil/parser.py", line 697, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/usr/lib/pymodules/python2.6/dateutil/parser.py", line 301, in parse
res = self._parse(timestr, **kwargs)
File "/usr/lib/pymodules/python2.6/dateutil/parser.py", line 557, in _parse
res.hour += 12
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'
I am not sure what the fuck it is even doing...
Some form of AI possibly.
>>> dateutil.parser.parse("16:08 May 14, 2003 UTC", fuzzy=True)
datetime.datetime(2003, 5, 14, 16, 8, tzinfo=tzutc())
It doesn't like the first P in your string.
One of the examples has no problem picking out the date from a long natural language string. How is P an edge-case?
ISO 8601 uses an initial P as a duration designator, indicating that the rest of the string represents an interval rather than an absolute time. (I only know this through seeing such strings in SCORM results. Crazy obscure standards...)
The fact that dateutil fails to recover properly is a bug, but that at least explains why this particular string is an edge case, and might aid the debugging effort.
@Eric:
That was very helpful thanks. I used some regex magic to clean up my data and it works better now. Thanks for the help (sorry I used harsh words in the post - that was uncalled for).
S.P