-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
CLN remove duplicated entries in valid_resos in pandas/core/indexes/datetimes.py #38121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you take a look at this guide and provide a smaller code snippet? The idea is to localize the bug you want to fix |
I tried to think in the developers and I trimmed my code all the maximum so you can understand it. Now it is easy for everyone to understand and I even the dataset to only 7 seconds |
It is 16 lines of code, but the problem is super easy to solve: The current behavior is that the indexes are partitioned, as they are strings, and the reso or Resolution class, which has nanosecond and minisecond resolution, then is passed to the datetime.py file on the pandas/core/indexes, which has the valid_resos (on line 520) wrong. |
Thanks for the report @boringow , but this kind of comment isn't very helpful as the content of files changes all the time. Please post what you think the error is (looking at the current content of those lines: """
Find the `freq` for self.insert(loc, item).
"""
value = self._data._validate_scalar(item)
item = self._data._box_func(value)
freq = None
if is_period_dtype(self.dtype):
freq = self.freq
elif self.freq is not None:
# freq can be preserved on edge cases it's not at all obvious to me) |
OK, I see what you're referring to now: the repetition in
(from your PR #38118) |
exactly! And if I changed locally, it seems to work. It was obviously a human error. |
Hi @boringow - no, this issue is still unfixed. If you'd like to submit another PR, then please include a test |
@MarcoGorelli !? I do not understand, right here on issue #38118 , I reported the bug and I did a simple code to test it. I will change it now to be even simpler, but you already saw the error:
|
@boringow the bug report is fine :) What I meant was that if you make a pull request, there needs to be a test inside the pull request |
@MarcoGorelli Okay! I see the BUG is fixed! sorry for disturbing, is my first time reporting a bug.
|
I don't think so, what makes you think that?
I tried running the code, but didn't get any error - could you please show the output you get, and the output you were expecting to get? I think you may be misunderstanding what we mean when we write "please add a test" - see here for an example of a pull request with a test. Look at the changes they made to the file pandas/tests/io/test_common.py . That's what a test looks like. If you want to fix this issue, then please include a test in your pull request. If not, someone else will pick this up sooner or later |
I also don't get any error from trying to run the snippet you have in #38121 (comment) I'll look into this more deeply later and check if it was throwing an error in the past |
It does not give en error because someone solved the issue, but still, some developer did not perform his job correctly. OK, I see what you're referring to now: the repetition in
Now, I checked the MASTER of pandas, and it appear to be like this:
Reference here: pandas/pandas/core/indexes/datetimes.py Line 562 in 0b16fb3
So, as you can see, some developer add the millisecond! cool :) BUT HE DID NOT REMOVE THE DOUBLED "minute","second". By the way, I want to become a pandas mantainer, although I have 1 year of experience I learn quite fast and this kind of errors won't happen to me :) How could I help? |
Awesome, so indeed it's fixed now
Yes, looks like it was fixed intentionally in #39378: $ git log -S millisecond -p -- pandas/core/indexes/datetimes.py
commit 56476d1cd4e5b956e2597ca7361f0d6567dd1da9
Author: patrick <[email protected]>
Date: Mon Jan 25 17:33:12 2021 +0100
BUG: Slicing DatetimeIndex with strings containing microseconds raising KeyError (#39378)
diff --git a/pandas/core/indexes/datetimes.py b/pandas/core/indexes/datetimes.py
index 0df954e054..abe9a9c26c 100644
--- a/pandas/core/indexes/datetimes.py
+++ b/pandas/core/indexes/datetimes.py
@@ -565,6 +565,7 @@ class DatetimeIndex(DatetimeTimedeltaMixin):
"second",
"minute",
"second",
+ "millisecond",
"microsecond",
}
if reso.attrname not in valid_resos:
True - do you want to submit a new pull request (the old one would need rebasing anyway) to remove them then? As it would just be a clean-up, no tests would be required. As an aside, gender-neutral language such as "BUT THEY DID NOT" would be appreciated
Awesome! Perhaps start by becoming a contributor - if you're interested in maintenance, check the governance page: "The Project's Core Team will consist of Project Contributors who have produced contributions that are substantial in quality and quantity, and sustained over at least one year." |
I was using the individual term for developer, as it is one persone who changed it. But yeah! surely, I want to start helping and form part of this community :). I will create a new pull request referencing this issue, so we can close it. |
@boringow The issue should remain open until the PR that closes it is merged. |
Okay! sorry about that I didn't know it. |
They've opened #39503, which is clearer than this issue, so closing in favour of that one |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Before running, it is easy! just check the lines 560-570 of the pandas/code/indexes/datetime.py and you will understand the human error some developer did :)
Inside Github.csv, we have:
Problem description
The current behavior is that the indexes are partitioned, as they are strings, and the reso or Resolution class, which has nanosecond and minisecond resolution, then is passed to the datetime.py file on the pandas/core/indexes, which has the valid_resos (on line 520) wrong.
Expected Output
Process finished with exit code 0
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : 67a3d42
python : 3.9.0.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United Kingdom.1252
pandas : 1.1.4
numpy : 1.19.4
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 50.3.1.post20201107
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
[paste the output of
pd.show_versions()
here leaving a blank line after the details tag]The text was updated successfully, but these errors were encountered: