-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: join
with list
does not behave like singleton
#57676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
take |
I believe the problem is that whenever join receives a list, that list is evaluated and then it either is concatenated or merged. With the current evaluation method [cat_df] is getting concatenated with join_df while cat_df gets merged with join_df. In my pull request I naively changed this evaluation to fix this issue but now it fails several other join tests. I'll look into the theory about concatenation vs merging in further detail and update the pull request. |
Figured out that the simpler way to deal with this is that whenever a list of a single element is passed, convert it into a join with another element. The operation that evaluates the boolean "can_concat" has been there for 12 years (doubt it's wrong), however there might have been an oversight for some specific cases of this uncommon practice (passing a list with a single element). |
@Dacops I will say that it isn't my intention to pass a single item list, but when you are creating the lists from other things, it can happen. |
Yeah that makes sense, meanwhile I've sent the pull request and it passed everything in the pipeline so now it's just waiting for a developer review |
Thanks so much @Dacops :) |
…iIndexes check uniqueness of individual indexes (pandas-dev#57676)
…iIndexes check uniqueness of individual indexes (pandas-dev#57676)
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
I would expect the result to be identical. I am really interested in being able to left join multiple
cat_df
s that share (one of) an index with thejoin_df
.Expected Behavior
I would expect identical behavior. I did notice that with the
on
kwarg
, I get an error that might indicate this is not allowed:gives
Installed Versions
Not sure what's up with the installed version commit since my
git log
looks likeINSTALLED VERSIONS
commit : 52cb549
python : 3.11.6.final.0
python-bits : 64
OS : Darwin
OS-release : 22.6.0
Version : Darwin Kernel Version 22.6.0: Wed Oct 4 21:26:23 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 3.0.0.dev0+87.g52cb549f44.dirty
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3.1
Cython : 3.0.8
pytest : 8.0.2
hypothesis : 6.98.13
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : 3.2.0
lxml.etree : 5.1.0
html5lib : 1.1
pymysql : 1.4.6
psycopg2 : 2.9.9
jinja2 : 3.1.3
IPython : 8.22.1
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : 1.3.8
fastparquet : 2024.2.0
fsspec : 2024.2.0
gcsfs : 2024.2.0
matplotlib : 3.8.3
numba : 0.59.0
numexpr : 2.9.0
odfpy : None
openpyxl : 3.1.2
pyarrow : 15.0.0
pyreadstat : 1.2.6
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2024.2.0
scipy : 1.12.0
sqlalchemy : 2.0.27
tables : 3.9.2
tabulate : 0.9.0
xarray : 2024.2.0
xlrd : 2.0.1
zstandard : 0.22.0
tzdata : 2024.1
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: