Skip to content

ERR: improved pd.Period date parsing and formatting #13931

Closed
@smontanaro

Description

@smontanaro

It seems that pandas.Period has a few bugs w.r.t. date parsing and formatting, or at least poorly documented behavior.

  • The documentation makes no mention of the possible range of dates/years.
  • Formatting of years in strftime using the %Y format seems inconsistent.
  • Parsing of what is commonly assumed to be ISO-8601 dates seems wrong.

Code Sample, a copy-pastable example if possible

This seems reasonable, assuming American-centered times (m/d/y):
>>> pd.Period("1/2/3")
Period('2003-01-02', 'D')

This seems wrong, as the common interpretation of dates using hyphens as separators is (in my experience), ISO-8601 (though, I will grant that not specifying the necessary leading zeroes makes the input suspect):

>>> pd.Period("1-2-3")
Period('2003-01-02', 'D')

Hard to see how either of these is correct. I'm not sure quite what to expect for the first example, but the second example clearly reads like year=3, month=1, day=2 to me. Despite the presence or lack of leading zeroes, I would think when presented with a date containing hyphens as separators, %Y-%m-%d would be assumed. I also think that %Y, %m, and %d should also zero-pad their arguments to the correct widths (4, 2, 2, respectively). (The same could be said for other normally zero-padded timestamp fields.)

>>> pd.Period("01-02-0003")
Period('3-01-02', 'D')
>>> pd.Period("0003-01-02")
Period('2-03-01', 'D')
>>> pd.Period("0003-01-02").strftime("%Y")
u'2'

Given that date formats differ so widely, to eliminate ambiguity perhaps the Period constructor should take an (optional keyword) argument which specifies a format string as understood by strptime(). It would clearly have to be extended somewhat to accommodate the quarter notation.

output of pd.show_versions()

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 3.4.63-2.44-desktop
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions