-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Rethinking to_flat_index
on flat Index
#23670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmm not sure I agree. Might just be my personal usage bias but why would someone opt for a 1-D MultiIndex in the first place? I ultimately foresee most people using |
I don't have in mind a killer application, but my answer to your question is "for the same reason why people would want to call |
... and explaining that |
FWIW, There's probably many more broken methods for 1-level
|
Right, thanks! But this is clearly a bug... (because at least some other
Sorry, naive question, but what is the problem with just running More in general, differences between
I think there are no differences of the second kind. If we have, and plan to keep, differences of the first kind (and not considering them as "bug"), then I wonder whether we should ban 1-level (Notice that if we did that, I would even more convinced of the current issue, where we pretend that |
Fair point, will most likely use that. Just didn't bother investigating further when I saw those were mostly broken. Only question then would be the following: Currently,
I admit I don't have the full implications in view, but would be sympathetic (+0.25) to always turn 1-level MI to regular
I may be missing something, but if that ban should be enforced, then this issue goes away, no?
|
On the contrary, I would think it is even more important. |
This sounds like an argument against abolishing 1-level |
That we do tend to "backport" (all?) |
Giving this some more thought after the PR you posted I am +/- 0 here. Part of me thinks consistency is great in that regardless of whether an My assumption (potentially wrong) is that users would still have to be aware of the labels contained within the calling object since they'd be indexing afterwards, and moving from a scalar to a tuple there feels weird, though maybe it's weird for them to ever call this in the first place then... |
I think it is close to irrelevant (I'm insisting on this only i) for consistency/robust code ii) because the thing is brand new so I want to solve this soon), but if I do try to think to some real world case, it's going to be something like: def anonymize(idx, hashfunc):
"""
Replace each label in a (multi-)index with its hashfunc-ed version.
"""
new_cont = [tuple([hashfunc(l) for l in t]) for t in idx.to_flat_index()]
return pd.Index(new_cont, tupleize_cols=True) ... which currently will break on flat indexes. (Not claiming my example is particularly well coded, there might be better ones) On the other hand, I'm pretty sure we don't loose anything from subtracting the user another alias for |
With reference to #22866, and to a comment I made ( #22866 (comment) ) which I thought was paranoid and maybe is not.
Three considerations:
to_flat_index
is the only method (the only operation in pandas, actually) I can think of which gives a different result when called on a flatIndex
and when called on an equivalent 1-levelMultiIndex
: you obtain theIndex
itself (each item being assumingly a scalar) in the first case and anIndex
of length 1 tuples in the second caseIndex
of length-one scalar, there is no simple way to obtain it. Or if you prefer: if we add somesep
or analogous arguments (e.g.fmt
) toto_flat_index
, then it is not going to work on a flatIndex
(at least with the current implementation). In general,to_flat_index
is a method which makes sense also because it can be combined with, for instance.map
. When doing so, it is good to know for sure that the arguments received by the callable are always tuples.Index.to_flat_index()
as it is now is idempotent and pretty useless: it makes sense for compatibility... but then compatibility is a priority!These all lead me to conclude: shouldn't
pd.Index.to_flat_index()
return anIndex
of length-1 tuples?@WillAyd
Code Sample, a copy-pastable example if possible
Expected Output
Out[2]
should be equal toOut[3]
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: 454ecfc
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-8-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8
pandas: 0.24.0.dev0+995.g454ecfc61
pytest: 3.5.0
pip: 9.0.1
setuptools: 39.2.0
Cython: 0.28.4
numpy: 1.14.3
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.2.2.post1634.dev0+ge8120cf6d
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1
gcsfs: None
The text was updated successfully, but these errors were encountered: