We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
drop_duplicates
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import pandas as pd dat = pd.DataFrame([ {'a':1,'b':2,'c':3}, {'a':1,'b':2,'c':3} ]) dat.drop_duplicates(subset=['a','d'])
a b c 0 1 2 3
When using drop_duplicates() with the subset option, a warning should probably be raised when trying to subset on a column that doesn't exist.
drop_duplicates()
subset
pd.show_versions()
commit: None python: 3.6.3.final.0 python-bits: 64 OS: Darwin OS-release: 17.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.22.0 pytest: 3.3.1 pip: 10.0.1 setuptools: 28.8.0 Cython: None numpy: 1.13.3 scipy: 1.0.1 pyarrow: None xarray: None IPython: 6.2.1 sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.2.2 openpyxl: None xlrd: 1.1.0 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 1.0.1 sqlalchemy: 1.1.15 pymysql: None psycopg2: 2.7 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
The text was updated successfully, but these errors were encountered:
This raises on master. Looks like #19730 implemented. It'll be in 0.23 (hopefully 1 week away).
import pandas as pd dat = pd.DataFrame([ {'a':1,'b':2,'c':3}, {'a':1,'b':2,'c':3} ]) dat.drop_duplicates(subset=['a','d']) ## -- End pasted text -- --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-13-2414fa1d4159> in <module>() 6 ]) 7 ----> 8 dat.drop_duplicates(subset=['a','d']) ~/sandbox/pandas/pandas/core/frame.py in drop_duplicates(self, subset, keep, inplace) 4287 """ 4288 inplace = validate_bool_kwarg(inplace, 'inplace') -> 4289 duplicated = self.duplicated(subset, keep=keep) 4290 4291 if inplace: ~/sandbox/pandas/pandas/core/frame.py in duplicated(self, subset, keep) 4337 diff = Index(subset).difference(self.columns) 4338 if not diff.empty: -> 4339 raise KeyError(diff) 4340 4341 vals = (col.values for name, col in self.iteritems() KeyError: Index(['d'], dtype='object')
The error message could be a little friendlier, but this is an improvement.
Sorry, something went wrong.
I had KeyError and thanks to this issue I was able to fix it.
No branches or pull requests
Problem description
When using
drop_duplicates()
with thesubset
option, a warning should probably be raised when trying to subset on a column that doesn't exist.Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: 3.3.1
pip: 10.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.13.3
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.1.15
pymysql: None
psycopg2: 2.7 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: