Skip to content

DOC: clarify the behavior of the .as_matrix method #7413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
clham opened this issue Jun 10, 2014 · 15 comments
Closed

DOC: clarify the behavior of the .as_matrix method #7413

clham opened this issue Jun 10, 2014 · 15 comments

Comments

@clham
Copy link
Contributor

clham commented Jun 10, 2014

When using df.as_matrix() method, rows and columns do not render as 1xN or Nx1 matricies, rather as 1xN arrays.

In [4]: df.ix['foo']=[5,3]

In [5]: df.ix['bar']=[2,6]

In [6]: df
Out[6]:
     A  B
foo  5  3
bar  2  6

[2 rows x 2 columns]

In [7]: df['A'].as_matrix()
Out[7]: array([ 5.,  2.])

In [8]: df.ix['foo'].as_matrix()
Out[8]: array([ 5.,  3.])

Expected :

In [9]: np.matrix('5; 2')
Out[9]:
matrix([[5],
        [2]])
@cpcloud
Copy link
Member

cpcloud commented Jun 10, 2014

This is a poorly named method. It's doing what it's intended to do, and for back compat reasons we can't change it. This corresponds to and I think is the implementation of the .values attribute of all pandas objects that carry the attribute. If you want a matrix you'll have to do np.matrix(df.A.values). I could potentially see a to_matrix() method, but seeing as there's not much you can do with a matrix that you can't do with an array I'd say that that's a bit overkill when you can just use the matrix constructor.

@cpcloud cpcloud closed this as completed Jun 10, 2014
@clham
Copy link
Contributor Author

clham commented Jun 10, 2014

That's a bummer... I wanted to be able to do math on them elegantly. I'll see if I can clear that up in the docs.

@cpcloud
Copy link
Member

cpcloud commented Jun 10, 2014

@clham Great! I'll reopen to track the doc issue.

@cpcloud cpcloud reopened this Jun 10, 2014
@cpcloud cpcloud added this to the 0.14.1 milestone Jun 10, 2014
@cpcloud cpcloud self-assigned this Jun 10, 2014
@clham clham changed the title BUG: .as_matrix renders improperly DOCS: update .as_matrix to reflect it renders as an array and not a matrix Jun 10, 2014
@cpcloud cpcloud changed the title DOCS: update .as_matrix to reflect it renders as an array and not a matrix DOC: clarify the behavior of the .as_matrix method Jun 10, 2014
@jreback
Copy link
Contributor

jreback commented Jun 10, 2014

see long discussion on np.matrix here: http://comments.gmane.org/gmane.comp.python.numeric.general/56494

@clham
Copy link
Contributor Author

clham commented Jun 10, 2014

Thanks @jreback, My head exploded about halfway through that thread. Based on what I got through however, I'm guessing the consensus here is to frame the docs to reflect that as_matrix should generally be avoided, and .values should be used in its place, both for clarity of code, and the (distant) possibility of depreciation?

@cpcloud
Copy link
Member

cpcloud commented Jun 10, 2014

whew wow that thread was interesting, thanks @jreback. personally, i've never once thought: "oh, i'll use np.matrix here", usually for linalg i just use the dot method.

@jorisvandenbossche
Copy link
Member

Some things that could be improved in the docstring:

  • indeed clarify it returns a numpy array and not a matrix
  • add Parameters section (for the column keyword)
  • point to DataFrame.values ('see also' section)
  • expand the DataFrame.values docstring (explanation about the dtype as in docstring of as_matrix is also relevant for .values I think?)

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.as_matrix.html
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.values.htm

@clham
Copy link
Contributor Author

clham commented Jun 10, 2014

@cpcloud to the original point of the bug df['some_column'].values still yields an array row, not an array column, while ``df['column1':'column2'] yields an Nx2 array. This may just be my misunderstanding of how numpy is usually used.

@jorisvandenbossche
Copy link
Member

If you want a 2D Nx1 array, you can do df[['col']].values (note double square brackets). In this case it is still a dataframe, while df['some_column'] is a series, and the .values of a series is a 1D array.

So, in fact, a column does not return as a (1, N) array as you say, but as a (N, ) array, strictly speaking.

@cpcloud
Copy link
Member

cpcloud commented Jun 10, 2014

@clham

to the original point of the bug df['some_column'].values still yields an array row

as @jorisvandenbossche said, this yields an N-element vector. This is different from the semantics of a language like MATLAB, where there are no 1D arrays, everything is 2D even size(1) == [1, 1] in MATLAB. The arrays in numpy are more generic and 1D arrays work like rows in some cases (e.g., broadcasting) and columns in others (matrix multiplication). Just takes a bit of getting used to, but I think in the long run you'll appreciate the ability to write both linear algebraic operations and production ready data analysis code with the same set of semantics.

@cpcloud
Copy link
Member

cpcloud commented Jun 10, 2014

@clham Going to get to this soon? Otherwise I can put up a PR today...

@clham
Copy link
Contributor Author

clham commented Jun 10, 2014

I'm working on it now. Docs are about the only thing I'm able to contribute!

@cpcloud
Copy link
Member

cpcloud commented Jun 10, 2014

Nice! We'd gladly give you pointers and/or guidance if you want to tackle anything else!

@clham
Copy link
Contributor Author

clham commented Jun 10, 2014

PR #7417

@jorisvandenbossche
Copy link
Member

Closed by #7417

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants