Skip to content

DOC clarify that pct_change() scale is actually 0-1, i.e. fractional_change() #20752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sunilshah opened this issue Apr 20, 2018 · 29 comments · Fixed by #53335
Closed

DOC clarify that pct_change() scale is actually 0-1, i.e. fractional_change() #20752

sunilshah opened this issue Apr 20, 2018 · 29 comments · Fixed by #53335

Comments

@sunilshah
Copy link

sunilshah commented Apr 20, 2018

Code Sample, a copy-pastable example if possible

# Your code here
>>>import pandas as pd
>>>a=pd.Series([0,1,2,3,4])
>>> a.pct_change()
0         NaN
1         inf
2    1.000000
3    0.500000
4    0.333333
dtype: float64

Problem description

This is fraction change and not percent change.

Dictionary definition of percent is
"One part in every hundred."

Either change the documentation to state that it is fractional change and not percent change with the formula (for backward compatibility) or rename it (to get rid of the obvious error).

Expected Output

a.pct_change()*100
Val
2017-12-31 NaN
2018-12-31 100.000000 %
2019-12-31 50.000000 %
2020-12-31 33.333333 %
2021-12-31 25.000000 %

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.5.0
pip: 10.0.0
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.3.1
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.6.0
[

@TomAugspurger
Copy link
Contributor

How would you re-phrase the first sentence? Do the examples not make it clear?

@sunilshah
Copy link
Author

Hi TomAugspurger,

Are you asking about re-phrasing the title or the first sentence of the Problem Description in my post?

Which examples are you referring to?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 20, 2018 via email

@sunilshah
Copy link
Author

I would change "Percent Change" to "Fractional Change" for every occurrence.

The examples do imply fractional changes. When I looked at
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pct_change.html

there were no examples.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 20, 2018 via email

@sunilshah
Copy link
Author

The dictionary meaning of fractional (adjective) is precisely that: pertaining to or being a fraction.

http://www.dictionary.com/browse/fractional
https://www.merriam-webster.com/dictionary/fractional

It could also be called relative change. Percent change implies 100 times the fractional or relative change. Therefore the name and the description are misleading.

The improved docs are better than the original because of the examples. The user still has to go through the numbers in the examples to compute and figure out that the author really meant fractions and not percentage, even though it mentions percent eight times. IMHO it should be corrected.

@jreback
Copy link
Contributor

jreback commented Apr 20, 2018

this is the first time anyone has brought this up, even though this method has existed since the beginning of pandas. this is splitting hairs about the semantic meaning. This is a won't fix.

@jreback jreback closed this as completed Apr 20, 2018
@jreback jreback added this to the won't fix milestone Apr 20, 2018
@sunilshah
Copy link
Author

sunilshah commented Apr 20, 2018

The only reason to consider a change (to the docs), is to help new users learn faster. That was my motivation to spend my time filing the request.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 21, 2018 via email

@TomAugspurger TomAugspurger modified the milestones: won't fix, No action Jul 6, 2018
@mitchnegus
Copy link
Contributor

mitchnegus commented Sep 25, 2018

For what it's worth, I was calculating percent differences in a class today and had a student use this function. They neglected to multiply by 100 because they didn't check the docs (and I was surprised myself at the behavior after reading them). I used it as a good teachable moment for why you should always check the docs, but I don't think that's a particularly good reason for keeping it that way. I volunteer to make the change if you're open to it, but I have to agree with sunilshah here that the standard definition of percent change is out of 100.

@tr-ips-bahubali
Copy link

Yup, the description should specify that it gives fractional output instead of the expected percentage change output.

@matthew-brett
Copy link
Contributor

We just came across this too, when a student asked why the values weren't percents. I can't say why people have not complained, but it's very confusing to see that the method is called pct_change and there is "percentage change" through the docstring, when in fact it is fractional change. In particular you have:

Percentage change between the current and a prior element.

Computes the percentage change from the immediately previous row by
default. This is useful in comparing the percentage of change in a time
series of elements.

This is just incorrect. It should read:

Fractional change between the current and a prior element.

*Be careful* - despite the name of this method, it calculates fractional
change and not percentage change.

Computes the fractional change from the immediately previous row by
default. This is useful in comparing the fraction of change in a time
series of elements.

If you need the percentage change, multiply these values by 100.

All the other instances of "percent" should be "fraction".

@thusithaC
Copy link

I agree that this is extremely confusing. At least the docs should be changed. The current Doc says thus, which is very much misleading. Percentage is scaled to 100.

`Percentage change between the current and a prior element.

Computes the percentage change from the immediately previous row by default. This is useful in comparing the percentage of change in a time series of elements.`

@brcharron
Copy link

I also find this highly misleading as this function simply does not involve any percentage. I end up not using it as it makes my code less understandable.
Another name which may be more common and intuitive than fractional change would be relative change.

@matthew-brett
Copy link
Contributor

@CHARRON - of course I completely agree, but I'm afraid we didn't get anywhere when I tried bringing this up on the list - starting here. The argument against doing anything about this appears to be, that when some presumably large proportion of Pandas users see a 'pct_change' of 1.0, from the values [1, 2], they instinctively think of this as being a proportion (rather than a percent), and they regard the actual number - 1.0 - as representing 100%, just not displayed as such. I think. But you're probably best reading it yourself, I doubt I'm doing the argument justice.

@brcharron
Copy link

@matthew-brett thank you for linking the extended discussion in the mailing-list, I was not aware of it. Maybe this is a matter of background as one poster mentioned. I am only surprised that this issue did not get more traction, which unfortunately lends support to the argument against doing anything.

@matthew-brett
Copy link
Contributor

It seems obvious to me that there's at least a significant proportion of users who find this behavior somewhere between surprising and clearly incorrect. As someone who uses and teaches Pandas a lot, has contributed some commits and the original wheel build machinery, and wants it to succeed, it's sad for me that this one doesn't have a better solution than "It's fine as it is".

@buhrmann
Copy link

Maybe this should go into a separate issue, but I think in addition to the name of pct_change, the formula on my interpretation is not actually correct for negative values:

In [2]: pd.Series([-2, -1]).pct_change()                                                                                                                                                                                                                                        
Out[2]: 
0    NaN
1   -0.5
dtype: float64

I would expect a change from -2 to -1 to be a 50% increase, not a decrease. But perhaps there is a scenario where the negative value is what is desired?

@thorr18
Copy link

thorr18 commented Feb 7, 2021

This gives a per unit change. It is not a per cent change. (cent is latin for 100)
Even if one doesn't want to fix the badly named function, the doc string should not say "Percentage change" for it does not actually produce a per cent change. The output is actually a relative change expressed as a per unit change. The relative change is not expressed as a percentage change. The docs and function name would both be correct if they called it relative change. They would likewise be correct if they called it more specifically a per unit change. They would not be correct if they call it a percentage change.
https://en.wikipedia.org/wiki/Relative_change_and_difference#Percentage_change

@MarcoGorelli
Copy link
Member

Hi @thorr18 - I agree that the docstring could do with being clarified - are you interested in submitting a pull request? If so, here's the contributing guide

Hi @buhrmann -0.5 is correct, according to the definition ( (-1 - (-2)) / -2 = - 0.5) . If you disagree, please open a new issue

@gustavovargas
Copy link
Contributor

Yes, it should be indicated in the doc that it is a relative rate and not a percent. It is confusing.

@MarcoGorelli
Copy link
Member

sure - interested in submitting a pull request to clarify the docs?

opening as a doc change

@MarcoGorelli MarcoGorelli changed the title pct_change() scale is actually 0-1, i.e. fractional_change() DOC clarify that pct_change() scale is actually 0-1, i.e. fractional_change() Apr 26, 2023
@mroeschke mroeschke removed this from the No action milestone May 11, 2023
@beyondguo
Copy link

This is clearly an error in the document or function naming. Why are you still trying to argue? Please fix this bug quickly and stop misleading beginners!

@MarcoGorelli
Copy link
Member

MarcoGorelli commented May 18, 2023

@beyondguo thanks for your interest and constructive criticism - are you interested in submitting a PR to clarify the docs? https://pandas.pydata.org/docs/dev/development/contributing_docstring.html

I think this comment has some good suggestions for how to clarify the docs

@gustavovargas
Copy link
Contributor

take

@matthew-brett
Copy link
Contributor

matthew-brett commented May 22, 2023

It's certainly a good idea to update the docstring, but it is a little difficult to do it. It seems to me the docstring would have to start with something like:

Although the method is named `pct_change`, in fact, it returns proportional change, and not percent change.
To get percent change, multiply the returned value by 100.

I do care about this, and I'm happy to write the docstring - but would such a change really be acceptable?

@MarcoGorelli
Copy link
Member

looks like @gustavovargas has already taken this

I think all that needs adding is a note, just like there's a note here https://pandas.pydata.org/docs/reference/api/pandas.Series.searchsorted.html

@matthew-brett
Copy link
Contributor

Well - that note - for those who did not follow the link - is "The Series must be monotonically sorted, otherwise wrong locations will likely be returned.". That's a perfectly reasonable caveat in usage. Here we have the problem of explaining why the method name strongly implies it is returning something other than its actual result. But - just to ask again - is something like my suggestion acceptable?

@MarcoGorelli
Copy link
Member

yeah looks fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.