Skip to content

Commit a8c9d63

Browse files
tswastloferrisparthea
authored
chore: add ROADMAP document describing the purpose of the package (#505)
* doc: add ROADMAP document describing the purpose of the package * additional thoughts Co-authored-by: Lo Ferris <[email protected]> Co-authored-by: Anthonios Partheniou <[email protected]>
1 parent 106d29a commit a8c9d63

File tree

1 file changed

+57
-0
lines changed

1 file changed

+57
-0
lines changed

ROADMAP.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# pandas-gbq Roadmap
2+
3+
The purpose of this package is to provide a small subset of BigQuery
4+
functionality that maps well to
5+
[pandas.read_gbq](https://pandas.pydata.org/docs/reference/api/pandas.read_gbq.html#pandas.read_gbq)
6+
and
7+
[pandas.DataFrame.to_gbq](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_gbq.html#pandas.DataFrame.to_gbq).
8+
Those methods in the pandas library are a thin wrapper to the equivalent
9+
methods in this package.
10+
11+
## Adding features to pandas-gbq
12+
13+
Considerations when adding new features to pandas-gbq:
14+
15+
* New method? Consider an alternative, as the core focus of this library is
16+
`read_gbq` and `to_gbq`.
17+
* Breaking change to an existing parameter? Consider an alternative, as folks
18+
could be using an older version of `pandas` that doesn't account for the
19+
change when a newer version of `pandas-gbq` is installed. If you must, please
20+
follow a 1+ year deprecation timeline.
21+
* New parameter? Go for it! Be sure to also send a PR to `pandas` after the
22+
feature is released so that folks using the `pandas` wrapper can take
23+
advantage of it.
24+
* New data type? OK. If there's not a good mapping to an existing `pandas`
25+
dtype, consider adding one to the `db-dtypes` package.
26+
27+
## Vision
28+
29+
The `pandas-gbq` package should do the "right thing" by default. This means you
30+
should carefully choose dtypes for maximum compatibility with BigQuery and
31+
avoid data loss. As new data types are added to BigQuery that don't have good
32+
equivalents yet in the `pandas` ecosystem, equivalent dtypes should be added to
33+
the `db-dtypes` package.
34+
35+
As new features are added that might improve performance, `pandas-gbq` should
36+
offer easy ways to use them without sacrificing usability. For example, one
37+
might consider using the `api_method` parameter of `to_gbq` to support the
38+
BigQuery Storage Write API.
39+
40+
A note on `pandas.read_sql`: we'd like to be compatible with this too, for folks
41+
that need better performance compared to the SQLAlchemy connector.
42+
43+
## Usability
44+
45+
Unlike the more object-oriented client-libraries, it's natural to have a method
46+
with many parameters in the Python data science ecosystem. That said, the
47+
`configuration` argument is provided, which takes the REST representation of
48+
the job configuration so that power users can use new features without the need
49+
for an explicit parameter being added.
50+
51+
## Conclusion
52+
53+
Keep it simple.
54+
55+
Don't break existing users.
56+
57+
Do the right thing by default.

0 commit comments

Comments
 (0)