Skip to content

BUG: Period constructor does not take into account nanonseconds #34621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
OlivierLuG opened this issue Jun 6, 2020 · 6 comments · Fixed by #38175
Closed
3 tasks done

BUG: Period constructor does not take into account nanonseconds #34621

OlivierLuG opened this issue Jun 6, 2020 · 6 comments · Fixed by #38175
Labels
Bug Frequency DateOffsets Period Period data type
Milestone

Comments

@OlivierLuG
Copy link
Contributor

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • (optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example:

print(pd.Period("2000-12-31 23:59:59.000000999").to_timestamp().nanosecond)
print(pd.Period("2000-12-31 23:59:59.000000500").to_timestamp().nanosecond)
print(pd.Period("2000-12-31 23:59:59.000000001").to_timestamp().nanosecond)
print(pd.Period("2000-12-31 23:59:59.000000001").to_timestamp(freq='ns').nanosecond)

output:

0
0
0
0

Problem description

Nanoseconds are set to 0 when using to_timestamp function.

Expected Output

999
500
1
1

Output of pd.show_versions()

INSTALLED VERSIONS

commit : c71bfc3
python : 3.8.2.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-33-generic
Version : #37-Ubuntu SMP Thu May 21 12:53:59 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : fr_FR.UTF-8

pandas : 1.1.0.dev0+1767.gc71bfc362
numpy : 1.18.4
pytz : 2020.1
dateutil : 2.7.3
pip : 20.0.2
setuptools : 45.2.0
Cython : None
pytest : 5.4.2
hypothesis : 5.16.0
sphinx : 3.0.3
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.0
bottleneck : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.3.17
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@OlivierLuG OlivierLuG added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 6, 2020
@OlivierLuG
Copy link
Contributor Author

I wen't through this bug. I believe it is related to dateutil:

# file 'pandas/libs/_tslibs/parsing.pyx'
from dateutil.parser import DEFAULTPARSER
res, _ = DEFAULTPARSER._parse("2000-12-31 23:59:59.123456999", dayfirst=False, yearfirst=False)
print(res)

out: _result(year=2000, month=12, day=31, hour=23, minute=59, second=59, microsecond=123456)

Problem description:

The function DEFAULTPARSER._parse does not return a nanosecond attribute.

I try to make a PR at the dateutil repo.

@matteosantama
Copy link
Contributor

Do we have to use the dateutil parser? Or can we can directly parse into a pandas Timestamp?

@OlivierLuG
Copy link
Contributor Author

I've managed to obtain the Timestamp, and it's ordinal value, but the frequency is fixed to ns. Is there an easy way to deduce the frequency knowking a timestamp ?

@mroeschke
Copy link
Member

I don't think this is a bug. In to_timestamp you can specify how to indicate whether you want to return the start or end timestamp

In [9]: pd.Period("2000-12-31 23:59:59.000000999").to_timestamp().nanosecond
Out[9]: 0

In [10]: pd.Period("2000-12-31 23:59:59.000000999").to_timestamp(how='S').nanosecond
Out[10]: 0

In [11]: pd.Period("2000-12-31 23:59:59.000000999").to_timestamp(how='E').nanosecond
Out[11]: 999

@OlivierLuG
Copy link
Contributor Author

Yes, I this bug is not linked with to_timestamp function. The Period constructor loses nanoseconds

# the constructor loses the nanosecond value:
In [1]:  print(pd.Period("2000/12/31 23:59:59.000000999", freq='ns'))
Out[1]: 2000-12-31 23:59:59.000000000

# This one is running correctly: 
In [2]:  print(pd.Period("2000/12/31 23:59:59.000000999", freq='us'))
Out[2]: 2000-12-31 23:59:59.000000

# the constructor loses the nanosecond value and assume the frequency is second
In [3]:  print(pd.Period("2000/12/31 23:59:59.000000999"))
Out[3]: 2000-12-31 23:59:59

The reason is the constructor calls the dateutil.DEFAULTPARSER._parse which does not support nanoseconds.

expected output are:

Out[1]: 2000-12-31 23:59:59.000000999
Out[2]: 2000-12-31 23:59:59.000000
Out[3]: 2000-12-31 23:59:59.000000999

@OlivierLuG OlivierLuG changed the title BUG: Nanoseconds are set to 0 when converting Period to Timestamp BUG: Period constructor do not take into account nanonseconds Jun 12, 2020
@OlivierLuG OlivierLuG changed the title BUG: Period constructor do not take into account nanonseconds BUG: Period constructor does not take into account nanonseconds Jun 12, 2020
@mroeschke
Copy link
Member

Ah I see. Thanks for clarifying.

@mroeschke mroeschke added Frequency DateOffsets Period Period data type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 12, 2020
OlivierLuG pushed a commit to OlivierLuG/pandas that referenced this issue Jun 12, 2020
OlivierLuG pushed a commit to OlivierLuG/pandas that referenced this issue Jun 12, 2020
OlivierLuG pushed a commit to OlivierLuG/pandas that referenced this issue Jun 15, 2020
OlivierLuG pushed a commit to OlivierLuG/pandas that referenced this issue Jun 15, 2020
OlivierLuG pushed a commit to OlivierLuG/pandas that referenced this issue Jun 16, 2020
@jreback jreback added this to the 1.2 milestone Dec 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Frequency DateOffsets Period Period data type
Projects
None yet
4 participants