Skip to content

BUG: Timedelta input string with only symbols and no digits failed to raise an error #39710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
avinashpancham opened this issue Feb 9, 2021 · 1 comment · Fixed by #39737
Closed
3 tasks done
Assignees
Labels
Bug Error Reporting Incorrect or improved errors from pandas Timedelta Timedelta data type
Milestone

Comments

@avinashpancham
Copy link
Contributor

avinashpancham commented Feb 9, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


pd.Timedelta with non-digit string input, such as pd.Timedelta('-'), successfully processes and returns Timedelta('0 days 00:00:00'). From discussion with @jreback in #39497 it was concluded that this is unwanted and that it should raise an error instead.

I found that this appears for string input with '+', '-', ',' and ' ' in various combinations, see below

In [1]: pd.Timedelta('+')
Out[1]: Timedelta('0 days 00:00:00')

In [2]: pd.Timedelta('-')
Out[2]: Timedelta('0 days 00:00:00')

In [3]: pd.Timedelta(' ')
Out[3]: Timedelta('0 days 00:00:00')

In [4]: pd.Timedelta(',')
Out[4]: Timedelta('0 days 00:00:00')

In [5]: pd.Timedelta('++')
Out[5]: Timedelta('0 days 00:00:00')

In [6]: pd.Timedelta('+-')
Out[6]: Timedelta('0 days 00:00:00')

Problem description

Invalid input should raise an error instead of successfully resolving to 0

Expected Output

Raise an error. I would propose to generalize the ValueError("unit abbreviation w/o a number") on L497 of timedeltas.pyx to ValueError("characters w/o a number") . Since this error also shows up if there are no unit abbrevations, for example:

In [2]: pd.Timedelta('foo')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-06e16138b2f6> in <module>
----> 1 pd.Timedelta('foo')

~/Documents/developer/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.Timedelta.__new__()
   1192                 value = parse_iso_format_string(value)
   1193             else:
-> 1194                 value = parse_timedelta_string(value)
   1195             value = np.timedelta64(value)
   1196         elif PyDelta_Check(value):

~/Documents/developer/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.parse_timedelta_string()
    429             result += timedelta_as_neg(r, neg)
    430         else:
--> 431             raise ValueError("unit abbreviation w/o a number")
    432
    433     # treat as nanoseconds

ValueError: unit abbreviation w/o a number

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 492c5e0
python : 3.8.6.final.0
python-bits : 64
OS : Darwin
OS-release : 20.2.0
Version : Darwin Kernel Version 20.2.0: Wed Dec 2 20:39:59 PST 2020; root:xnu-7195.60.75~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.2.0.dev0+1259.g492c5e00d1
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 49.6.0.post20201009
Cython : 0.29.21
pytest : 6.1.1
hypothesis : 5.37.3
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.0
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.4
fastparquet : 0.4.1
gcsfs : 0.7.1
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 1.0.1
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.2
sqlalchemy : 1.3.20
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2

@avinashpancham avinashpancham added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 9, 2021
@lithomas1 lithomas1 added the Error Reporting Incorrect or improved errors from pandas label Feb 10, 2021
@lithomas1 lithomas1 added this to the Contributions Welcome milestone Feb 10, 2021
@lithomas1 lithomas1 added Timedelta Timedelta data type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 10, 2021
@avinashpancham
Copy link
Contributor Author

take

@avinashpancham avinashpancham changed the title BUG: pd.Timedelta with non-digit string input does not raise an error BUG: Timedelta input string with only symbols and no digits does not raise an error Feb 10, 2021
@avinashpancham avinashpancham changed the title BUG: Timedelta input string with only symbols and no digits does not raise an error BUG: Timedelta input string with only symbols and no digits failed to raise an error Feb 10, 2021
@jreback jreback modified the milestones: Contributions Welcome, 1.3 Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas Timedelta Timedelta data type
Projects
None yet
3 participants