|
| 1 | +# pandas-gbq Roadmap |
| 2 | + |
| 3 | +The purpose of this package is to provide a small subset of BigQuery |
| 4 | +functionality that maps well to |
| 5 | +[pandas.read_gbq](https://pandas.pydata.org/docs/reference/api/pandas.read_gbq.html#pandas.read_gbq) |
| 6 | +and |
| 7 | +[pandas.DataFrame.to_gbq](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_gbq.html#pandas.DataFrame.to_gbq). |
| 8 | +Those methods in the pandas library are a thin wrapper to the equivalent |
| 9 | +methods in this package. |
| 10 | + |
| 11 | +## Adding features to pandas-gbq |
| 12 | + |
| 13 | +Considerations when adding new features to pandas-gbq: |
| 14 | + |
| 15 | +* New method? Consider an alternative, as the core focus of this library is |
| 16 | + `read_gbq` and `to_gbq`. |
| 17 | +* Breaking change to an existing parameter? Consider an alternative, as folks |
| 18 | + could be using an older version of `pandas` that doesn't account for the |
| 19 | + change when a newer version of `pandas-gbq` is installed. If you must, please |
| 20 | + follow a 1+ year deprecation timeline. |
| 21 | +* New parameter? Go for it! Be sure to also send a PR to `pandas` after the |
| 22 | + feature is released so that folks using the `pandas` wrapper can take |
| 23 | + advantage of it. |
| 24 | +* New data type? OK. If there's not a good mapping to an existing `pandas` |
| 25 | + dtype, consider adding one to the `db-dtypes` package. |
| 26 | + |
| 27 | +## Vision |
| 28 | + |
| 29 | +The `pandas-gbq` package should do the "right thing" by default. This means you |
| 30 | +should carefully choose dtypes for maximum compatibility with BigQuery and |
| 31 | +avoid data loss. As new data types are added to BigQuery that don't have good |
| 32 | +equivalents yet in the `pandas` ecosystem, equivalent dtypes should be added to |
| 33 | +the `db-dtypes` package. |
| 34 | + |
| 35 | +As new features are added that might improve performance, `pandas-gbq` should |
| 36 | +offer easy ways to use them without sacrificing usability. For example, one |
| 37 | +might consider using the `api_method` parameter of `to_gbq` to support the |
| 38 | +BigQuery Storage Write API. |
| 39 | + |
| 40 | +A note on `pandas.read_sql`: we'd like to be compatible with this too, for folks |
| 41 | +that need better performance compared to the SQLAlchemy connector. |
| 42 | + |
| 43 | +## Usability |
| 44 | + |
| 45 | +Unlike the more object-oriented client-libraries, it's natural to have a method |
| 46 | +with many parameters in the Python data science ecosystem. That said, the |
| 47 | +`configuration` argument is provided, which takes the REST representation of |
| 48 | +the job configuration so that power users can use new features without the need |
| 49 | +for an explicit parameter being added. |
| 50 | + |
| 51 | +## Conclusion |
| 52 | + |
| 53 | +Keep it simple. |
| 54 | + |
| 55 | +Don't break existing users. |
| 56 | + |
| 57 | +Do the right thing by default. |
0 commit comments