-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC clarify that pct_change() scale is actually 0-1, i.e. fractional_change() #20752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
How would you re-phrase the first sentence? Do the examples not make it clear? |
Hi TomAugspurger, Are you asking about re-phrasing the title or the first sentence of the Problem Description in my post? Which examples are you referring to? |
The first sentence & examples in the docstring:
http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.Series.pct_change.html
…On Thu, Apr 19, 2018 at 9:06 PM, sunilshah ***@***.***> wrote:
Hi TomAugspurger,
Are you asking about re-phrasing the title or the first sentence of the
Problem Description in my post?
Which examples are you referring to?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#20752 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIt3woWkn7RnlSOmmMPcZbkk89FOIks5tqUKjgaJpZM4Tcu9g>
.
|
I would change "Percent Change" to "Fractional Change" for every occurrence. The examples do imply fractional changes. When I looked at there were no examples. |
I haven't heard the term "Fractional Change" before, is it standard?
The docs I linked to are the developer docs, with a recently improved
docstring. The ones you saw were (understandably) for the currently
released version.
Are the improved docs sufficiently clear?
…On Thu, Apr 19, 2018 at 9:15 PM, sunilshah ***@***.***> wrote:
I would change "Percent Change" to "Fractional Change" for every
occurrence.
The examples do imply fractional changes. When I looked at
https://pandas.pydata.org/pandas-docs/stable/generated/
pandas.DataFrame.pct_change.html
there were no examples.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#20752 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIkHqnUs8TbQ4_uBDqpF9wwpi5cCHks5tqUS-gaJpZM4Tcu9g>
.
|
The dictionary meaning of fractional (adjective) is precisely that: pertaining to or being a fraction. http://www.dictionary.com/browse/fractional It could also be called relative change. Percent change implies 100 times the fractional or relative change. Therefore the name and the description are misleading. The improved docs are better than the original because of the examples. The user still has to go through the numbers in the examples to compute and figure out that the author really meant fractions and not percentage, even though it mentions percent eight times. IMHO it should be corrected. |
this is the first time anyone has brought this up, even though this method has existed since the beginning of pandas. this is splitting hairs about the semantic meaning. This is a won't fix. |
The only reason to consider a change (to the docs), is to help new users learn faster. That was my motivation to spend my time filing the request. |
Thanks. I think we're OK with the current docs for now. We have this issue
as a record in case someone else is confused, and we can change it then.
…On Fri, Apr 20, 2018 at 10:27 AM, sunilshah ***@***.***> wrote:
The only reason to consider a change (to the docs), is to help new users a
faster ramp in using pandas. That was my motivation to spend my time filing
the request.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#20752 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIjb3wxCRbOUbjX9YcjpyKkOYkI3Mks5tqf5pgaJpZM4Tcu9g>
.
|
For what it's worth, I was calculating percent differences in a class today and had a student use this function. They neglected to multiply by 100 because they didn't check the docs (and I was surprised myself at the behavior after reading them). I used it as a good teachable moment for why you should always check the docs, but I don't think that's a particularly good reason for keeping it that way. I volunteer to make the change if you're open to it, but I have to agree with sunilshah here that the standard definition of percent change is out of 100. |
Yup, the description should specify that it gives fractional output instead of the expected percentage change output. |
We just came across this too, when a student asked why the values weren't percents. I can't say why people have not complained, but it's very confusing to see that the method is called
This is just incorrect. It should read:
All the other instances of "percent" should be "fraction". |
I agree that this is extremely confusing. At least the docs should be changed. The current Doc says thus, which is very much misleading. Percentage is scaled to 100. `Percentage change between the current and a prior element. Computes the percentage change from the immediately previous row by default. This is useful in comparing the percentage of change in a time series of elements.` |
I also find this highly misleading as this function simply does not involve any percentage. I end up not using it as it makes my code less understandable. |
@CHARRON - of course I completely agree, but I'm afraid we didn't get anywhere when I tried bringing this up on the list - starting here. The argument against doing anything about this appears to be, that when some presumably large proportion of Pandas users see a 'pct_change' of 1.0, from the values [1, 2], they instinctively think of this as being a proportion (rather than a percent), and they regard the actual number - 1.0 - as representing 100%, just not displayed as such. I think. But you're probably best reading it yourself, I doubt I'm doing the argument justice. |
@matthew-brett thank you for linking the extended discussion in the mailing-list, I was not aware of it. Maybe this is a matter of background as one poster mentioned. I am only surprised that this issue did not get more traction, which unfortunately lends support to the argument against doing anything. |
It seems obvious to me that there's at least a significant proportion of users who find this behavior somewhere between surprising and clearly incorrect. As someone who uses and teaches Pandas a lot, has contributed some commits and the original wheel build machinery, and wants it to succeed, it's sad for me that this one doesn't have a better solution than "It's fine as it is". |
Maybe this should go into a separate issue, but I think in addition to the name of In [2]: pd.Series([-2, -1]).pct_change()
Out[2]:
0 NaN
1 -0.5
dtype: float64 I would expect a change from -2 to -1 to be a 50% increase, not a decrease. But perhaps there is a scenario where the negative value is what is desired? |
This gives a per unit change. It is not a per cent change. (cent is latin for 100) |
Yes, it should be indicated in the doc that it is a relative rate and not a percent. It is confusing. |
sure - interested in submitting a pull request to clarify the docs? opening as a doc change |
This is clearly an error in the document or function naming. Why are you still trying to argue? Please fix this bug quickly and stop misleading beginners! |
@beyondguo thanks for your interest and constructive criticism - are you interested in submitting a PR to clarify the docs? https://pandas.pydata.org/docs/dev/development/contributing_docstring.html I think this comment has some good suggestions for how to clarify the docs |
take |
It's certainly a good idea to update the docstring, but it is a little difficult to do it. It seems to me the docstring would have to start with something like:
I do care about this, and I'm happy to write the docstring - but would such a change really be acceptable? |
looks like @gustavovargas has already taken this I think all that needs adding is a note, just like there's a note here https://pandas.pydata.org/docs/reference/api/pandas.Series.searchsorted.html |
Well - that note - for those who did not follow the link - is "The Series must be monotonically sorted, otherwise wrong locations will likely be returned.". That's a perfectly reasonable caveat in usage. Here we have the problem of explaining why the method name strongly implies it is returning something other than its actual result. But - just to ask again - is something like my suggestion acceptable? |
yeah looks fine |
Code Sample, a copy-pastable example if possible
Problem description
This is fraction change and not percent change.
Dictionary definition of percent is
"One part in every hundred."
Either change the documentation to state that it is fractional change and not percent change with the formula (for backward compatibility) or rename it (to get rid of the obvious error).
Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: 3.5.0
pip: 10.0.0
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.3.1
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.6.0
[
The text was updated successfully, but these errors were encountered: