-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: namedtuple's fields as columns #11416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -538,6 +538,15 @@ def test_is_list_like(): | |
for f in fails: | ||
assert not com.is_list_like(f) | ||
|
||
def test_is_named_tuple(): | ||
passes = (collections.namedtuple('Test',list('abc'))(1,2,3),) | ||
fails = ((1,2,3), 'a', Series({'pi':3.14})) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove 1 line There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
for p in passes: | ||
assert com.is_named_tuple(p) | ||
|
||
for f in fails: | ||
assert not com.is_named_tuple(f) | ||
|
||
def test_is_hashable(): | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,8 +16,7 @@ | |
|
||
from pandas.compat import( | ||
map, zip, range, long, lrange, lmap, lzip, | ||
OrderedDict, u, StringIO, string_types, | ||
is_platform_windows | ||
OrderedDict, u, StringIO, is_platform_windows | ||
) | ||
from pandas import compat | ||
|
||
|
@@ -33,8 +32,7 @@ | |
import pandas.core.datetools as datetools | ||
from pandas import (DataFrame, Index, Series, Panel, notnull, isnull, | ||
MultiIndex, DatetimeIndex, Timestamp, date_range, | ||
read_csv, timedelta_range, Timedelta, CategoricalIndex, | ||
option_context, period_range) | ||
read_csv, timedelta_range, Timedelta, option_context, period_range) | ||
from pandas.core.dtypes import DatetimeTZDtype | ||
import pandas as pd | ||
from pandas.parser import CParserError | ||
|
@@ -2239,7 +2237,6 @@ class TestDataFrame(tm.TestCase, CheckIndexing, | |
_multiprocess_can_split_ = True | ||
|
||
def setUp(self): | ||
import warnings | ||
|
||
self.frame = _frame.copy() | ||
self.frame2 = _frame2.copy() | ||
|
@@ -3568,6 +3565,20 @@ def test_constructor_tuples(self): | |
expected = DataFrame({'A': Series([(1, 2), (3, 4)])}) | ||
assert_frame_equal(result, expected) | ||
|
||
def test_constructor_namedtuples(self): | ||
# GH11181 | ||
from collections import namedtuple | ||
named_tuple = namedtuple("Pandas", list('ab')) | ||
tuples = [named_tuple(1, 3), named_tuple(2, 4)] | ||
expected = DataFrame({'a': [1, 2], 'b': [3, 4]}) | ||
result = DataFrame(tuples) | ||
assert_frame_equal(result, expected) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. acutally I just realized, that if you have DIFFERENT named tuples this code will break (e.g. different fields). Can you do a PR to assert that test? pretty pathological but possible There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, will do now. To be clear, is the case you're suggesting: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the correct way would be to compare the columns with each of the namedtuples and if they differ then raise a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's pretty expensive: In [1]:
from collections import namedtuple
nt = namedtuple('NT', list('abc'))
tuples = [nt(0,1,2) for i in range(int(1e7))]
In [2]:
t=tuples[0]
correct_type=type(t)
In [3]:
%timeit all(type(tup)==correct_type for tup in tuples)
1 loops, best of 3: 1.16 s per loop Given this is a 'best efforts' check - i.e. the alternative is 'useless' columns of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I bet the == actually is doing a lot of work There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. very odd that this is slow, though I guess its a lot of tuples ok guess just go with first namedtuple then. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what happens if tuples have different lengths does this break? (eg. the current code in master), I think yes.. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, you get In [2]:
pd.DataFrame([(1,2,3),(4,5)])
pd.DataFrame([(1,2,3),(4,5)])
Out[2]:
0 1 2
0 1 2 3
1 4 5 NaN If you pass columns, they need to be the max length: In [4]:
pd.DataFrame([(1,2),(3,4,5)], columns=['a','b'])
pd.DataFrame([(1,2),(3,4,5)], columns=['a','b'])
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-4-e2f410a01184> in <module>()
----> 1 pd.DataFrame([(1,2),(3,4,5)], columns=['a','b'])
/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
262 if len(data) > 0:
263 if is_list_like(data[0]) and getattr(data[0], 'ndim', 1) == 1:
--> 264 arrays, columns = _to_arrays(data, columns, dtype=dtype)
265 columns = _ensure_index(columns)
266
/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in _to_arrays(data, columns, coerce_float, dtype)
5211 if isinstance(data[0], (list, tuple)):
5212 return _list_to_arrays(data, columns, coerce_float=coerce_float,
-> 5213 dtype=dtype)
5214 elif isinstance(data[0], collections.Mapping):
5215 return _list_of_dict_to_arrays(data, columns,
/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in _list_to_arrays(data, columns, coerce_float, dtype)
5294 content = list(lib.to_object_array(data).T)
5295 return _convert_object_array(content, columns, dtype=dtype,
-> 5296 coerce_float=coerce_float)
5297
5298
/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in _convert_object_array(content, columns, coerce_float, dtype)
5352 # caller's responsibility to check for this...
5353 raise AssertionError('%d columns passed, passed data had %s '
-> 5354 'columns' % (len(columns), len(content)))
5355
5356 # provide soft conversion of object dtypes
AssertionError: 2 columns passed, passed data had 3 columns There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok then There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cheers! |
||
# with columns | ||
expected = DataFrame({'y': [1, 2], 'z': [3, 4]}) | ||
result = DataFrame(tuples, columns=['y', 'z']) | ||
assert_frame_equal(result, expected) | ||
|
||
def test_constructor_orient(self): | ||
data_dict = self.mixed_frame.T._series | ||
recons = DataFrame.from_dict(data_dict, orient='index') | ||
|
@@ -4418,7 +4429,7 @@ def test_timedeltas(self): | |
|
||
def test_operators_timedelta64(self): | ||
|
||
from datetime import datetime, timedelta | ||
from datetime import timedelta | ||
df = DataFrame(dict(A = date_range('2012-1-1', periods=3, freq='D'), | ||
B = date_range('2012-1-2', periods=3, freq='D'), | ||
C = Timestamp('20120101')-timedelta(minutes=5,seconds=5))) | ||
|
@@ -9645,7 +9656,6 @@ def test_replace_mixed(self): | |
assert_frame_equal(result,expected) | ||
|
||
# test case from | ||
from pandas.util.testing import makeCustomDataframe as mkdf | ||
df = DataFrame({'A' : Series([3,0],dtype='int64'), 'B' : Series([0,3],dtype='int64') }) | ||
result = df.replace(3, df.mean().to_dict()) | ||
expected = df.copy().astype('float64') | ||
|
@@ -12227,7 +12237,6 @@ def test_sort_index_inplace(self): | |
assert_frame_equal(df, expected) | ||
|
||
def test_sort_index_different_sortorder(self): | ||
import random | ||
A = np.arange(20).repeat(5) | ||
B = np.tile(np.arange(5), 20) | ||
|
||
|
@@ -13301,7 +13310,6 @@ def test_quantile(self): | |
|
||
def test_quantile_axis_parameter(self): | ||
# GH 9543/9544 | ||
from numpy import percentile | ||
|
||
df = DataFrame({"A": [1, 2, 3], "B": [2, 3, 4]}, index=[1, 2, 3]) | ||
|
||
|
@@ -16093,8 +16101,6 @@ def test_query_doesnt_pickup_local(self): | |
n = m = 10 | ||
df = DataFrame(np.random.randint(m, size=(n, 3)), columns=list('abc')) | ||
|
||
from numpy import sin | ||
|
||
# we don't pick up the local 'sin' | ||
with tm.assertRaises(UndefinedVariableError): | ||
df.query('sin > 5', engine=engine, parser=parser) | ||
|
@@ -16392,7 +16398,6 @@ def setUpClass(cls): | |
cls.frame = _frame.copy() | ||
|
||
def test_query_builtin(self): | ||
from pandas.computation.engines import NumExprClobberingError | ||
engine, parser = self.engine, self.parser | ||
|
||
n = m = 10 | ||
|
@@ -16413,7 +16418,6 @@ def setUpClass(cls): | |
cls.frame = _frame.copy() | ||
|
||
def test_query_builtin(self): | ||
from pandas.computation.engines import NumExprClobberingError | ||
engine, parser = self.engine, self.parser | ||
|
||
n = m = 10 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's import
namedtupled
at the top and just check isinstance here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namedtuple
is not a type, so that wouldn't work, unfortunatelyFYI: it's an factory function which builds and
eval
s a string to create a class inherited fromtuple
. Example code here: https://docs.python.org/2/library/collections.html#collections.namedtupleThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://bugs.python.org/issue7796
ok!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah cool!
Lots of discussion on those boards about a better way of doing
namedtuple
. It's not perfect at the moment, but it's very functional (we use them a lot)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
we might have named tuples elsewhere that could use he is_ function can u give a check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look - while
namedtuple
is in half a dozen places, there's nowhere that checks its type.