You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I did a long analysis of what kind of Period objects can and cannot produced from a string alone. I.e. pd.Period("2022Q2") gives a valid Period representing April-June 2022. I found lots of blank spots though. Here's a description in words. In the Additional Context section is a long list of examples.
Types of Period with lots of obvious ways to create them from just a string
Types of Period that can't be created from a string alone at all
There are other ways to create these frequencies using the full suite of arguments to the Period class but they can't be created with a string alone.
Business years not going from January to December. I don't know how to specify those unambiguously though.
Quarters starting with a different month than January/April/July/October. Again I'm not aware of standardized strings to specify these.
Weeks. There would be two obvious ways to implement this:
a. ISO 8601 week strings such as "2022-W31". This only covers weeks starting on a Monday. This is supported by dateutil.parser.isoparse, so would be easy to implement. However, isoparse just returns a datetime in this case and doesn't indicate that it refers to an entire week.
b. Using the string produced by printing a week period: str(pd.Period("2017-01-25", freq="W")) => "2017-01-23/2017-01-29". This often fails silently and creates a Period with freq="min" or raises a DateParseError (see 6. & 7. for details).
c. Using strings giving the start and end day of the week but in a slightly different form than given in b such as "20170123-20170129".
Business days. No idea what their use case is and if this is a problem.
Types of Period that can't be created from their string representation alone
Hours are interpreted as minutes because their string representation is "2010-11-12 13:00". However hours can be created in different ways from a string alone e.g. from "2010-11-12 13h"
In addition the frequencies listed in 1-4 can also not be recovered from their string representation. I.e. given a original = Period(...) with one of these frequencies, Period(str(original)) != original. However, adding the frequency information usually works fine: Period(str(original), freq=original.freq) == original. There is one exception though:
Some weeks raise a DateParseError when you try to re-create them from their string representation:
Weeks in the 60s, 70s, 80s, or 90s of any century
Weeks from the 24th century onwards
This is because in "2481-12-29/2482-01-04" the year 2482 is interpreted as 24 hours and 82 minutes. One can argue that we don't need to be able to re-convert the string representation of a Period back to a Period. However, there are tests in the codebase that rely on this functionality. These happen to be written with dates in the early 21st century so they work. Nevertheless, I propose to remove some of them in #47990.
Other specific standardized string representations that are not understood:
ISO 8601 ordinal dates such as "1981-095" for "1981-04-05". Though probably not widely used, this is also supported by dateutil.parser.isoparse.
Feature Description
Process:
Decide which cases given in the Problem Description we even wish to cover.
We could expect users to do their own conversions tailored to their use cases. We can't cover all possible strings that might show up in databases anyways. After all the Period class and related classes provide plenty of other arguments besides value to control the result.
Pandas developers should be careful to not parse string representations of Period objects.
Additional Context
Full list of tested strings and other Period constructors.
code to create `DataFrame`
import pandas as pd
from pandas.io import clipboard
period_strings = [
# ISO 8601 examples from Wikipedia
"2022-08-07",
"2022-08-07T06:41:54+00:00",
"2022-08-07T06:41:54Z",
"20220807T064154Z",
"2022-W31",
"2022-W31-7",
"2022-219",
# examples from the Pandas docs
"2002Q3",
"2000",
"1/1/2011",
"2011-01",
"2012-1-1 19:00",
# couldn't find docs but these work
"2012-1-1 12:34:56.1",
"2012-1-1 12:34:56.123",
"2012-1-1 12:34:56.123456",
"2012-1-1 12:34:56.123456789",
"2010-11-12 13h",
]
def eval_period(s):
try:
return eval(s, None, {"dt": "2012-1-1 12:34:56.123456789"})
except Exception as e:
return e.__class__.__name__
def recreate_yes_no(old, use):
assert "str" in use
if not isinstance(old, pd.Period):
return "N/A"
freq = old.freq if "freq" in use else None
s = str(old)
try:
new = pd.Period(s, freq=freq)
except Exception as e:
return e.__class__.__name__
if new == old:
return "== p"
return f"failed: freq={new.freq}"
period_strings = pd.Series(period_strings)
str_constructors = period_strings.apply(lambda s: f"pd.Period( '{s}' )")
freq_strings = pd.Series(
pd._libs.tslibs.dtypes._reverse_period_code_map.values()
)
freq_constructors = freq_strings.apply(
lambda s: f"pd.Period( dt, freq='{s}' )"
)
years = pd.Series(range(2002, 2500, 30))
week_constructors = years.apply(
lambda a: f"pd.Period( freq='W', year={a} )"
)
df = pd.DataFrame()
df["constructors"] = pd.concat(
[str_constructors, freq_constructors, week_constructors], ignore_index=True
)
df["str(p)"] = df["constructors"].apply(eval_period)
df["p.freq"] = df["str(p)"].apply(lambda p: getattr(p, "freq", "N/A"))
df["pd.Period(str(p))"] = df["str(p)"].apply(recreate_yes_no, use=["str"])
df["pd.Period(str(p), freq=p.freq)"] = df["str(p)"].apply(
recreate_yes_no, use=["str", "freq"]
)
clipboard.copy(df.to_markdown())
Note: dt = "2012-1-1 12:34:56.123456789"
constructors
str(p)
p.freq
pd.Period(str(p))
pd.Period(str(p), freq=p.freq)
0
pd.Period( '2022-08-07' )
2022-08-07
Day
== p
== p
1
pd.Period( '2022-08-07T06:41:54+00:00' )
2022-08-07 06:41:54
Second
== p
== p
2
pd.Period( '2022-08-07T06:41:54Z' )
2022-08-07 06:41:54
Second
== p
== p
3
pd.Period( '20220807T064154Z' )
2022-08-07 06:41:54
Second
== p
== p
4
pd.Period( '2022-W31' )
DateParseError
N/A
N/A
N/A
5
pd.Period( '2022-W31-7' )
DateParseError
N/A
N/A
N/A
6
pd.Period( '2022-219' )
DateParseError
N/A
N/A
N/A
7
pd.Period( '2002Q3' )
2002Q3
QuarterEnd: startingMonth=12
== p
== p
8
pd.Period( '2000' )
2000
YearEnd: month=12
== p
== p
9
pd.Period( '1/1/2011' )
2011-01-01
Day
== p
== p
10
pd.Period( '2011-01' )
2011-01
MonthEnd
== p
== p
11
pd.Period( '2012-1-1 19:00' )
2012-01-01 19:00
Minute
== p
== p
12
pd.Period( '2012-1-1 12:34:56.1' )
2012-01-01 12:34:56.100
Milli
== p
== p
13
pd.Period( '2012-1-1 12:34:56.123' )
2012-01-01 12:34:56.123
Milli
== p
== p
14
pd.Period( '2012-1-1 12:34:56.123456' )
2012-01-01 12:34:56.123456
Micro
== p
== p
15
pd.Period( '2012-1-1 12:34:56.123456789' )
2012-01-01 12:34:56.123456789
Nano
== p
== p
16
pd.Period( '2010-11-12 13h' )
2010-11-12 13:00
Hour
failed: freq=Minute
== p
17
pd.Period( dt, freq='A-DEC' )
2012
YearEnd: month=12
== p
== p
18
pd.Period( dt, freq='A-JAN' )
2012
YearEnd: month=1
failed: freq=YearEnd: month=12
== p
19
pd.Period( dt, freq='A-FEB' )
2012
YearEnd: month=2
failed: freq=YearEnd: month=12
== p
20
pd.Period( dt, freq='A-MAR' )
2012
YearEnd: month=3
failed: freq=YearEnd: month=12
== p
21
pd.Period( dt, freq='A-APR' )
2012
YearEnd: month=4
failed: freq=YearEnd: month=12
== p
22
pd.Period( dt, freq='A-MAY' )
2012
YearEnd: month=5
failed: freq=YearEnd: month=12
== p
23
pd.Period( dt, freq='A-JUN' )
2012
YearEnd: month=6
failed: freq=YearEnd: month=12
== p
24
pd.Period( dt, freq='A-JUL' )
2012
YearEnd: month=7
failed: freq=YearEnd: month=12
== p
25
pd.Period( dt, freq='A-AUG' )
2012
YearEnd: month=8
failed: freq=YearEnd: month=12
== p
26
pd.Period( dt, freq='A-SEP' )
2012
YearEnd: month=9
failed: freq=YearEnd: month=12
== p
27
pd.Period( dt, freq='A-OCT' )
2012
YearEnd: month=10
failed: freq=YearEnd: month=12
== p
28
pd.Period( dt, freq='A-NOV' )
2012
YearEnd: month=11
failed: freq=YearEnd: month=12
== p
29
pd.Period( dt, freq='Q-DEC' )
2012Q1
QuarterEnd: startingMonth=12
== p
== p
30
pd.Period( dt, freq='Q-JAN' )
2012Q4
QuarterEnd: startingMonth=1
failed: freq=QuarterEnd: startingMonth=12
== p
31
pd.Period( dt, freq='Q-FEB' )
2012Q4
QuarterEnd: startingMonth=2
failed: freq=QuarterEnd: startingMonth=12
== p
32
pd.Period( dt, freq='Q-MAR' )
2012Q4
QuarterEnd: startingMonth=3
failed: freq=QuarterEnd: startingMonth=12
== p
33
pd.Period( dt, freq='Q-APR' )
2012Q3
QuarterEnd: startingMonth=4
failed: freq=QuarterEnd: startingMonth=12
== p
34
pd.Period( dt, freq='Q-MAY' )
2012Q3
QuarterEnd: startingMonth=5
failed: freq=QuarterEnd: startingMonth=12
== p
35
pd.Period( dt, freq='Q-JUN' )
2012Q3
QuarterEnd: startingMonth=6
failed: freq=QuarterEnd: startingMonth=12
== p
36
pd.Period( dt, freq='Q-JUL' )
2012Q2
QuarterEnd: startingMonth=7
failed: freq=QuarterEnd: startingMonth=12
== p
37
pd.Period( dt, freq='Q-AUG' )
2012Q2
QuarterEnd: startingMonth=8
failed: freq=QuarterEnd: startingMonth=12
== p
38
pd.Period( dt, freq='Q-SEP' )
2012Q2
QuarterEnd: startingMonth=9
failed: freq=QuarterEnd: startingMonth=12
== p
39
pd.Period( dt, freq='Q-OCT' )
2012Q1
QuarterEnd: startingMonth=10
failed: freq=QuarterEnd: startingMonth=12
== p
40
pd.Period( dt, freq='Q-NOV' )
2012Q1
QuarterEnd: startingMonth=11
failed: freq=QuarterEnd: startingMonth=12
== p
41
pd.Period( dt, freq='M' )
2012-01
MonthEnd
== p
== p
42
pd.Period( dt, freq='W-SUN' )
2011-12-26/2012-01-01
Week: weekday=6
failed: freq=Minute
== p
43
pd.Period( dt, freq='W-MON' )
2011-12-27/2012-01-02
Week: weekday=0
failed: freq=Minute
== p
44
pd.Period( dt, freq='W-TUE' )
2011-12-28/2012-01-03
Week: weekday=1
failed: freq=Minute
== p
45
pd.Period( dt, freq='W-WED' )
2011-12-29/2012-01-04
Week: weekday=2
failed: freq=Minute
== p
46
pd.Period( dt, freq='W-THU' )
2011-12-30/2012-01-05
Week: weekday=3
failed: freq=Minute
== p
47
pd.Period( dt, freq='W-FRI' )
2011-12-31/2012-01-06
Week: weekday=4
failed: freq=Minute
== p
48
pd.Period( dt, freq='W-SAT' )
2012-01-01/2012-01-07
Week: weekday=5
failed: freq=Minute
== p
49
pd.Period( dt, freq='B' )
2012-01-02
BusinessDay
failed: freq=Day
== p
50
pd.Period( dt, freq='D' )
2012-01-01
Day
== p
== p
51
pd.Period( dt, freq='H' )
2012-01-01 12:00
Hour
failed: freq=Minute
== p
52
pd.Period( dt, freq='T' )
2012-01-01 12:34
Minute
== p
== p
53
pd.Period( dt, freq='S' )
2012-01-01 12:34:56
Second
== p
== p
54
pd.Period( dt, freq='L' )
2012-01-01 12:34:56.123
Milli
== p
== p
55
pd.Period( dt, freq='U' )
2012-01-01 12:34:56.123456
Micro
== p
== p
56
pd.Period( dt, freq='N' )
2012-01-01 12:34:56.123456789
Nano
== p
== p
57
pd.Period( freq='W', year=2002 )
2001-12-31/2002-01-06
Week: weekday=6
failed: freq=Minute
== p
58
pd.Period( freq='W', year=2032 )
2031-12-29/2032-01-04
Week: weekday=6
failed: freq=Minute
== p
59
pd.Period( freq='W', year=2062 )
2061-12-26/2062-01-01
Week: weekday=6
DateParseError
DateParseError
60
pd.Period( freq='W', year=2092 )
2091-12-31/2092-01-06
Week: weekday=6
DateParseError
DateParseError
61
pd.Period( freq='W', year=2122 )
2121-12-29/2122-01-04
Week: weekday=6
failed: freq=Minute
== p
62
pd.Period( freq='W', year=2152 )
2151-12-27/2152-01-02
Week: weekday=6
failed: freq=Minute
== p
63
pd.Period( freq='W', year=2182 )
2181-12-31/2182-01-06
Week: weekday=6
DateParseError
DateParseError
64
pd.Period( freq='W', year=2212 )
2211-12-30/2212-01-05
Week: weekday=6
failed: freq=Minute
== p
65
pd.Period( freq='W', year=2242 )
2241-12-27/2242-01-02
Week: weekday=6
failed: freq=Minute
== p
66
pd.Period( freq='W', year=2272 )
2272-01-01/2272-01-07
Week: weekday=6
DateParseError
DateParseError
67
pd.Period( freq='W', year=2302 )
2301-12-30/2302-01-05
Week: weekday=6
failed: freq=Minute
== p
68
pd.Period( freq='W', year=2332 )
2331-12-28/2332-01-03
Week: weekday=6
failed: freq=Minute
== p
69
pd.Period( freq='W', year=2362 )
2362-01-01/2362-01-07
Week: weekday=6
DateParseError
DateParseError
70
pd.Period( freq='W', year=2392 )
2391-12-30/2392-01-05
Week: weekday=6
DateParseError
DateParseError
71
pd.Period( freq='W', year=2422 )
2421-12-27/2422-01-02
Week: weekday=6
DateParseError
DateParseError
72
pd.Period( freq='W', year=2452 )
2452-01-01/2452-01-07
Week: weekday=6
DateParseError
DateParseError
73
pd.Period( freq='W', year=2482 )
2481-12-29/2482-01-04
Week: weekday=6
DateParseError
DateParseError
The text was updated successfully, but these errors were encountered:
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
I did a long analysis of what kind of
Period
objects can and cannot produced from a string alone. I.e.pd.Period("2022Q2")
gives a validPeriod
representing April-June 2022. I found lots of blank spots though. Here's a description in words. In the Additional Context section is a long list of examples.Types of
Period
with lots of obvious ways to create them from just a stringYear, quarter, month, day, minute, second, millisecond, microsecond, nanosecond
Types of
Period
that can't be created from a string alone at allThere are other ways to create these frequencies using the full suite of arguments to the
Period
class but they can't be created with a string alone.a. ISO 8601 week strings such as
"2022-W31"
. This only covers weeks starting on a Monday. This is supported bydateutil.parser.isoparse
, so would be easy to implement. However,isoparse
just returns a datetime in this case and doesn't indicate that it refers to an entire week.b. Using the string produced by printing a week period:
str(pd.Period("2017-01-25", freq="W")) => "2017-01-23/2017-01-29"
. This often fails silently and creates aPeriod
withfreq="min"
or raises aDateParseError
(see 6. & 7. for details).c. Using strings giving the start and end day of the week but in a slightly different form than given in b such as
"20170123-20170129"
.Types of
Period
that can't be created from their string representation alone"2010-11-12 13:00"
. However hours can be created in different ways from a string alone e.g. from"2010-11-12 13h"
In addition the frequencies listed in 1-4 can also not be recovered from their string representation. I.e. given a
original = Period(...)
with one of these frequencies,Period(str(original)) != original
. However, adding the frequency information usually works fine:Period(str(original), freq=original.freq) == original
. There is one exception though:Some weeks raise a
DateParseError
when you try to re-create them from their string representation:This is because in
"2481-12-29/2482-01-04"
the year2482
is interpreted as 24 hours and 82 minutes. One can argue that we don't need to be able to re-convert the string representation of aPeriod
back to aPeriod
. However, there are tests in the codebase that rely on this functionality. These happen to be written with dates in the early 21st century so they work. Nevertheless, I propose to remove some of them in #47990.Other specific standardized string representations that are not understood:
"1981-095"
for"1981-04-05"
. Though probably not widely used, this is also supported bydateutil.parser.isoparse
.Feature Description
Process:
Alternative Solutions
Period
class and related classes provide plenty of other arguments besidesvalue
to control the result.Period
objects.Additional Context
Full list of tested strings and other
Period
constructors.code to create `DataFrame`
Note:
dt = "2012-1-1 12:34:56.123456789"
The text was updated successfully, but these errors were encountered: