Skip to content

more flexible type of input arguments for px functions #1768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 69 commits into from
Oct 4, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
152f87f
more flexible type of input arguments for px functions
emmanuelle Sep 13, 2019
fe4eda3
wrong col name case
emmanuelle Sep 13, 2019
86937ea
corner case of functions grabbing all cols
emmanuelle Sep 13, 2019
1ad0f5c
better behavior of index, more tests
emmanuelle Sep 14, 2019
7d0e985
comment code
emmanuelle Sep 14, 2019
915a5a1
black
emmanuelle Sep 15, 2019
5d5ab81
debugging
emmanuelle Sep 15, 2019
ea0fa6a
relax column ordering in tests
emmanuelle Sep 15, 2019
e5f6953
tests
emmanuelle Sep 15, 2019
29d7e18
array arguments
emmanuelle Sep 16, 2019
4a028a2
move column checks
emmanuelle Sep 16, 2019
ab35b42
case of dimensions
emmanuelle Sep 16, 2019
e4b8835
clean code + black
emmanuelle Sep 16, 2019
3b0d21a
deduplicated labels logics
emmanuelle Sep 16, 2019
2a6ff71
better handling of dimensions
emmanuelle Sep 17, 2019
72b5d1c
corner case when column was modified
emmanuelle Sep 17, 2019
19431dd
modified docs
emmanuelle Sep 17, 2019
7297a7f
black
emmanuelle Sep 17, 2019
dadd645
case when no df provided and str arguments
emmanuelle Sep 17, 2019
2cb9dba
argument=None
emmanuelle Sep 17, 2019
b09dc58
better handling of problematic case
emmanuelle Sep 17, 2019
91e6cda
simplify code
emmanuelle Sep 17, 2019
33db488
symbol and line_dash
emmanuelle Sep 17, 2019
e58c82e
stricter check on column
emmanuelle Sep 17, 2019
9d65a07
more readable error message
emmanuelle Sep 17, 2019
64010de
check length
emmanuelle Sep 17, 2019
20bd812
error when multiindex
emmanuelle Sep 17, 2019
a7241a5
accept int as column names
emmanuelle Sep 18, 2019
dc1fc0a
array and dict as dataframe
emmanuelle Sep 18, 2019
61678c2
splom case
emmanuelle Sep 18, 2019
71756a3
simplified index case
emmanuelle Sep 18, 2019
40ba5b9
black
emmanuelle Sep 18, 2019
749db6c
better error messages
emmanuelle Sep 19, 2019
cd8574a
better handling of Series (name is None)
emmanuelle Sep 19, 2019
299fdde
handle case of name conflicts
emmanuelle Sep 19, 2019
8753797
removed print statements
emmanuelle Sep 20, 2019
ed24d8b
black
emmanuelle Sep 20, 2019
2da2b21
wip
emmanuelle Sep 20, 2019
102b89f
name heuristic
emmanuelle Sep 20, 2019
0a50f6a
Py2 fix
emmanuelle Sep 20, 2019
58c9eba
more complex test
emmanuelle Sep 20, 2019
0a92537
use sets instead of lists
emmanuelle Sep 20, 2019
372bd8f
bug correction and new test
emmanuelle Sep 20, 2019
3d0aff0
comments
emmanuelle Sep 20, 2019
5a329ad
simpler way to use names
emmanuelle Sep 23, 2019
e0e0dd5
comments
emmanuelle Sep 23, 2019
6bced73
docstring
emmanuelle Sep 23, 2019
bcc41a2
more tests
emmanuelle Sep 23, 2019
382e768
integer column names
emmanuelle Sep 23, 2019
82c6592
typo
emmanuelle Sep 24, 2019
9caeee1
better error message
emmanuelle Sep 24, 2019
adc68d3
better test
emmanuelle Sep 24, 2019
5dc0ab1
order of tests
emmanuelle Sep 24, 2019
f5d056c
doc modification
emmanuelle Sep 24, 2019
bedf74f
qa
emmanuelle Sep 24, 2019
074342b
case of named index
emmanuelle Sep 24, 2019
ead8430
be more defensive with names
emmanuelle Sep 24, 2019
db10967
better error msg with index
emmanuelle Sep 24, 2019
ef29378
do not modify input arguments
emmanuelle Sep 24, 2019
5e40653
corrected bug
emmanuelle Sep 25, 2019
78564cc
name consistency
emmanuelle Sep 25, 2019
2534ca9
name consistency
emmanuelle Sep 25, 2019
c88660f
qa
emmanuelle Sep 25, 2019
b5adbcf
if args[field] is a dict it should stay a dict
emmanuelle Sep 25, 2019
b375f91
addressed Jon's comments
emmanuelle Sep 26, 2019
c59a4d5
spaces in error messages
emmanuelle Sep 28, 2019
1774ade
case of size column
emmanuelle Sep 28, 2019
16d95af
qa
emmanuelle Sep 30, 2019
5c95786
revert to is
emmanuelle Sep 30, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 18 additions & 17 deletions packages/python/plotly/plotly/express/_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -795,7 +795,7 @@ def _initialize_argument_col_names(args, attrables, array_attrables):
return used_col_names


def build_dataframe(args, attrables, array_attrables):
def build_dataframe(input_args, attrables, array_attrables):
"""
Constructs a dataframe and modifies `args` in-place.

Expand All @@ -813,6 +813,12 @@ def build_dataframe(args, attrables, array_attrables):
array_attrables : list
argument names corresponding to iterables, such as `hover_data`, ...
"""
args = dict(input_args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this makes a deep copy...

>>> x= [1,2,3]
>>> y = dict(a=x)
>>> z = dict(y)
>>> z["a"] is x
True
>>> z["a"][0]=100
>>> z
{'a': [100, 2, 3]}
>>> x
[100, 2, 3]

But also think this is only an issue for array_attrables where we write into args[something][something] so if we just copy those that's enough:

>>> x = [1,2,3]
>>> y = list(x)
>>> y is x
False
>>> y[0] = 100
>>> x
[1, 2, 3]

Copy link
Contributor Author

@emmanuelle emmanuelle Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no! I knew it does a deep copy for a list so I had wrongly assumed I could transpose this pattern to a dict.

for field in args:
if field in array_attrables and isinstance(
args[field], pd.core.indexes.base.Index
):
args[field] = list(args[field])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

following from comment above... if we just always do this, instead of in an if then I think we're safe (for now!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea! Then we don't have to do a deepcopy of the whole args, it's better not to copy the dataframe in particular.

# Cast data_frame argument to DataFrame (it could be a numpy array, dict etc.)
df_provided = args["data_frame"] is not None
if df_provided and not isinstance(args["data_frame"], pd.DataFrame):
Expand Down Expand Up @@ -864,8 +870,15 @@ def build_dataframe(args, attrables, array_attrables):
length = len(df)
if argument is None:
continue
# Case of multiindex
if isinstance(argument, pd.core.indexes.multi.MultiIndex):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pd.core.indexes.multi.MultiIndex -> pd.MultiIndex

raise TypeError(
"Argument '%s' is a pandas MultiIndex."
"pandas MultiIndex is not supported by plotly express "
"at the moment." % field
)
## ----------------- argument is a col name ----------------------
elif isinstance(argument, str) or isinstance(
if isinstance(argument, str) or isinstance(
argument, int
): # just a column name given as str or int
if not df_provided:
Expand Down Expand Up @@ -902,19 +915,7 @@ def build_dataframe(args, attrables, array_attrables):
col_name = argument
if isinstance(argument, int):
col_name = _name_heuristic(argument, field, reserved_names)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the if here. Can we always wrap argument in _name_heuristic(...)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out we could write simpler code here. Thanks !

if field_name not in array_attrables:
args[field_name] = col_name
else:
args[field_name][i] = col_name
df[col_name] = args["data_frame"][argument]
continue
# Case of multiindex
elif isinstance(argument, pd.core.indexes.multi.MultiIndex):
raise TypeError(
"Argument '%s' is a pandas MultiIndex."
"pandas MultiIndex is not supported by plotly express "
"at the moment." % field
)
# ----------------- argument is a column / array / list.... -------
else:
is_index = isinstance(argument, pd.core.indexes.range.RangeIndex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pd.core.indexes.range.RangeIndex -> pd.RangeIndex

Expand Down Expand Up @@ -980,7 +981,7 @@ def infer_config(args, constructor, trace_patch):
if group_attr in args:
all_attrables += [group_attr]

build_dataframe(args, all_attrables, array_attrables)
args = build_dataframe(args, all_attrables, array_attrables)

attrs = [k for k in attrables if k in args]
grouped_attrs = []
Expand Down Expand Up @@ -1058,7 +1059,7 @@ def infer_config(args, constructor, trace_patch):

# Create trace specs
trace_specs = make_trace_spec(args, constructor, attrs, trace_patch)
return trace_specs, grouped_mappings, sizeref, show_colorbar
return args, trace_specs, grouped_mappings, sizeref, show_colorbar


def get_orderings(args, grouper, grouped):
Expand Down Expand Up @@ -1096,7 +1097,7 @@ def get_orderings(args, grouper, grouped):
def make_figure(args, constructor, trace_patch={}, layout_patch={}):
apply_default_cascade(args)

trace_specs, grouped_mappings, sizeref, show_colorbar = infer_config(
args, trace_specs, grouped_mappings, sizeref, show_colorbar = infer_config(
args, constructor, trace_patch
)
grouper = [x.grouper or one_group for x in grouped_mappings] or [one_group]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -227,11 +227,8 @@ def test_build_df_from_lists():
def test_build_df_with_index():
tips = px.data.tips()
args = dict(data_frame=tips, x=tips.index, y="total_bill")
changed_output = dict(x="index")
out = build_dataframe(args, all_attrables, array_attrables)
assert_frame_equal(tips.reset_index()[out["data_frame"].columns], out["data_frame"])
out.pop("data_frame")
assert out == args


def test_splom_case():
Expand Down Expand Up @@ -261,3 +258,23 @@ def test_data_frame_from_dict():
fig = px.scatter({"time": [0, 1], "money": [1, 2]}, x="time", y="money")
assert fig.data[0].hovertemplate == "time=%{x}<br>money=%{y}"
assert np.all(fig.data[0].x == [0, 1])


def test_arguments_not_modified():
iris = px.data.iris()
petal_length = iris.petal_length
fig = px.scatter(iris, x=petal_length, y="petal_width")
assert iris.petal_length.equals(petal_length)


def test_pass_df_columns():
tips = px.data.tips()
fig = px.histogram(
tips,
x="total_bill",
y="tip",
color="sex",
marginal="rug",
hover_data=tips.columns,
)
assert fig.data[1].hovertemplate.count("customdata") == len(tips.columns)