Skip to content

Commit d92c680

Browse files
author
hekaisheng
committed
upgrade to 0.8.1
1 parent f409b26 commit d92c680

File tree

18 files changed

+406
-220
lines changed

18 files changed

+406
-220
lines changed

README.rst

Lines changed: 103 additions & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
11
ODPS Python SDK and data analysis framework
22
===========================================
33

4-
|PyPI version| |Docs| |License| |Implementation|
4+
`PyPI version <https://pypi.python.org/pypi/pyodps>`__
5+
`Docs <http://pyodps.readthedocs.org/>`__
6+
`License <https://github.com/aliyun/aliyun-odps-python-sdk/blob/master/License>`__
7+
|Implementation|
58

69
Elegent way to access ODPS API.
710
`Documentation <http://pyodps.readthedocs.org/>`__
@@ -13,25 +16,25 @@ The quick way:
1316

1417
::
1518

16-
pip install 'pyodps[full]'
19+
pip install 'pyodps[full]'
1720

18-
If you don't need to use Jupyter, just type
21+
If you dont need to use Jupyter, just type
1922

2023
::
2124

22-
pip install pyodps
25+
pip install pyodps
2326

2427
The dependencies will be installed automatically.
2528

2629
Or from source code:
2730

2831
.. code:: shell
2932
30-
$ virtualenv pyodps_env
31-
$ source pyodps_env/bin/activate
32-
$ git clone <git clone URL> pyodps
33-
$ cd pyodps
34-
$ python setup.py install
33+
$ virtualenv pyodps_env
34+
$ source pyodps_env/bin/activate
35+
$ git clone <git clone URL> pyodps
36+
$ cd pyodps
37+
$ python setup.py install
3538
3639
Dependencies
3740
------------
@@ -52,116 +55,116 @@ Usage
5255

5356
.. code:: python
5457
55-
>>> from odps import ODPS
56-
>>> o = ODPS('**your-access-id**', '**your-secret-access-key**',
57-
... project='**your-project**', endpoint='**your-end-point**')
58-
>>> dual = o.get_table('dual')
59-
>>> dual.name
60-
'dual'
61-
>>> dual.schema
62-
odps.Schema {
63-
c_int_a bigint
64-
c_int_b bigint
65-
c_double_a double
66-
c_double_b double
67-
c_string_a string
68-
c_string_b string
69-
c_bool_a boolean
70-
c_bool_b boolean
71-
c_datetime_a datetime
72-
c_datetime_b datetime
73-
}
74-
>>> dual.creation_time
75-
datetime.datetime(2014, 6, 6, 13, 28, 24)
76-
>>> dual.is_virtual_view
77-
False
78-
>>> dual.size
79-
448
80-
>>> dual.schema.columns
81-
[<column c_int_a, type bigint>,
82-
<column c_int_b, type bigint>,
83-
<column c_double_a, type double>,
84-
<column c_double_b, type double>,
85-
<column c_string_a, type string>,
86-
<column c_string_b, type string>,
87-
<column c_bool_a, type boolean>,
88-
<column c_bool_b, type boolean>,
89-
<column c_datetime_a, type datetime>,
90-
<column c_datetime_b, type datetime>]
58+
>>> from odps import ODPS
59+
>>> o = ODPS('**your-access-id**', '**your-secret-access-key**',
60+
... project='**your-project**', endpoint='**your-end-point**')
61+
>>> dual = o.get_table('dual')
62+
>>> dual.name
63+
'dual'
64+
>>> dual.schema
65+
odps.Schema {
66+
c_int_a bigint
67+
c_int_b bigint
68+
c_double_a double
69+
c_double_b double
70+
c_string_a string
71+
c_string_b string
72+
c_bool_a boolean
73+
c_bool_b boolean
74+
c_datetime_a datetime
75+
c_datetime_b datetime
76+
}
77+
>>> dual.creation_time
78+
datetime.datetime(2014, 6, 6, 13, 28, 24)
79+
>>> dual.is_virtual_view
80+
False
81+
>>> dual.size
82+
448
83+
>>> dual.schema.columns
84+
[<column c_int_a, type bigint>,
85+
<column c_int_b, type bigint>,
86+
<column c_double_a, type double>,
87+
<column c_double_b, type double>,
88+
<column c_string_a, type string>,
89+
<column c_string_b, type string>,
90+
<column c_bool_a, type boolean>,
91+
<column c_bool_b, type boolean>,
92+
<column c_datetime_a, type datetime>,
93+
<column c_datetime_b, type datetime>]
9194
9295
DataFrame API
9396
-------------
9497

9598
.. code:: python
9699
97-
>>> from odps.df import DataFrame
98-
>>> df = DataFrame(o.get_table('pyodps_iris'))
99-
>>> df.dtypes
100-
odps.Schema {
101-
sepallength float64
102-
sepalwidth float64
103-
petallength float64
104-
petalwidth float64
105-
name string
106-
}
107-
>>> df.head(5)
108-
|==========================================| 1 / 1 (100.00%) 0s
109-
sepallength sepalwidth petallength petalwidth name
110-
0 5.1 3.5 1.4 0.2 Iris-setosa
111-
1 4.9 3.0 1.4 0.2 Iris-setosa
112-
2 4.7 3.2 1.3 0.2 Iris-setosa
113-
3 4.6 3.1 1.5 0.2 Iris-setosa
114-
4 5.0 3.6 1.4 0.2 Iris-setosa
115-
>>> df[df.sepalwidth > 3]['name', 'sepalwidth'].head(5)
116-
|==========================================| 1 / 1 (100.00%) 12s
117-
name sepalwidth
118-
0 Iris-setosa 3.5
119-
1 Iris-setosa 3.2
120-
2 Iris-setosa 3.1
121-
3 Iris-setosa 3.6
122-
4 Iris-setosa 3.9
100+
>>> from odps.df import DataFrame
101+
>>> df = DataFrame(o.get_table('pyodps_iris'))
102+
>>> df.dtypes
103+
odps.Schema {
104+
sepallength float64
105+
sepalwidth float64
106+
petallength float64
107+
petalwidth float64
108+
name string
109+
}
110+
>>> df.head(5)
111+
|==========================================| 1 / 1 (100.00%) 0s
112+
sepallength sepalwidth petallength petalwidth name
113+
0 5.1 3.5 1.4 0.2 Iris-setosa
114+
1 4.9 3.0 1.4 0.2 Iris-setosa
115+
2 4.7 3.2 1.3 0.2 Iris-setosa
116+
3 4.6 3.1 1.5 0.2 Iris-setosa
117+
4 5.0 3.6 1.4 0.2 Iris-setosa
118+
>>> df[df.sepalwidth > 3]['name', 'sepalwidth'].head(5)
119+
|==========================================| 1 / 1 (100.00%) 12s
120+
name sepalwidth
121+
0 Iris-setosa 3.5
122+
1 Iris-setosa 3.2
123+
2 Iris-setosa 3.1
124+
3 Iris-setosa 3.6
125+
4 Iris-setosa 3.9
123126
124127
Command-line and IPython enhancement
125128
------------------------------------
126129

127130
::
128131

129-
In [1]: %load_ext odps
132+
In [1]: %load_ext odps
130133

131-
In [2]: %enter
132-
Out[2]: <odps.inter.Room at 0x10fe0e450>
134+
In [2]: %enter
135+
Out[2]: <odps.inter.Room at 0x10fe0e450>
133136

134-
In [3]: %sql select * from pyodps_iris limit 5
135-
|==========================================| 1 / 1 (100.00%) 2s
136-
Out[3]:
137-
sepallength sepalwidth petallength petalwidth name
138-
0 5.1 3.5 1.4 0.2 Iris-setosa
139-
1 4.9 3.0 1.4 0.2 Iris-setosa
140-
2 4.7 3.2 1.3 0.2 Iris-setosa
141-
3 4.6 3.1 1.5 0.2 Iris-setosa
142-
4 5.0 3.6 1.4 0.2 Iris-setosa
137+
In [3]: %sql select * from pyodps_iris limit 5
138+
|==========================================| 1 / 1 (100.00%) 2s
139+
Out[3]:
140+
sepallength sepalwidth petallength petalwidth name
141+
0 5.1 3.5 1.4 0.2 Iris-setosa
142+
1 4.9 3.0 1.4 0.2 Iris-setosa
143+
2 4.7 3.2 1.3 0.2 Iris-setosa
144+
3 4.6 3.1 1.5 0.2 Iris-setosa
145+
4 5.0 3.6 1.4 0.2 Iris-setosa
143146

144147
Python UDF Debugging Tool
145148
-------------------------
146149

147150
.. code:: python
148151
149-
#file: plus.py
150-
from odps.udf import annotate
152+
#file: plus.py
153+
from odps.udf import annotate
151154
152-
@annotate('bigint,bigint->bigint')
153-
class Plus(object):
154-
def evaluate(self, a, b):
155-
return a + b
155+
@annotate('bigint,bigint->bigint')
156+
class Plus(object):
157+
def evaluate(self, a, b):
158+
return a + b
156159
157160
::
158161

159-
$ cat plus.input
160-
1,1
161-
3,2
162-
$ pyou plus.Plus < plus.input
163-
2
164-
5
162+
$ cat plus.input
163+
1,1
164+
3,2
165+
$ pyou plus.Plus < plus.input
166+
2
167+
5
165168

166169
Contributing
167170
------------
@@ -171,29 +174,23 @@ source:
171174

172175
::
173176

174-
git clone https://github.com/aliyun/aliyun-odps-python-sdk
175-
cd pyodps
176-
pip install -r requirements.txt -e .
177+
git clone https://github.com/aliyun/aliyun-odps-python-sdk
178+
cd pyodps
179+
pip install -r requirements.txt -e .
177180

178181
If you need to modify the frontend code, you need to install
179182
`nodejs/npm <https://www.npmjs.com/>`__. To build and install your
180183
frontend code, use
181184

182185
::
183186

184-
python setup.py build_js
185-
python setup.py install_js
187+
python setup.py build_js
188+
python setup.py install_js
186189

187190
License
188191
-------
189192

190193
Licensed under the `Apache License
191194
2.0 <https://www.apache.org/licenses/LICENSE-2.0.html>`__
192195

193-
.. |PyPI version| image:: https://img.shields.io/pypi/v/pyodps.svg?style=flat-square
194-
:target: https://pypi.python.org/pypi/pyodps
195-
.. |Docs| image:: https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat-square
196-
:target: http://pyodps.readthedocs.org/
197-
.. |License| image:: https://img.shields.io/pypi/l/pyodps.svg?style=flat-square
198-
:target: https://github.com/aliyun/aliyun-odps-python-sdk/blob/master/License
199196
.. |Implementation| image:: https://img.shields.io/pypi/implementation/pyodps.svg?style=flat-square

docs/source/base-tables.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -279,6 +279,9 @@ Record表示表的一行记录,我们在 Table 对象上调用 new_record 就
279279
同时过多的文件会降低后续的查询效率。因此,我们建议在使用 write_table 方法时,一次性写入多组数据,
280280
或者传入一个 generator 对象。
281281

282+
write_table 写表时会追加到原有数据。PyODPS 不提供覆盖数据的选项,如果需要覆盖数据,需要手动清除
283+
原有数据。对于非分区表,需要调用 table.truncate(),对于分区表,需要删除分区后再建立。
284+
282285
删除表
283286
-------
284287

docs/source/df-basic.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -810,7 +810,9 @@ ResultFrame 也支持在安装有 pandas 的前提下转换为 pandas DataFrame
810810
3 5.0 2.0 3.5 1.0 Iris-versicolor
811811
4 6.0 2.2 4.0 1.0 Iris-versicolor
812812
813-
``persist``\ 可以传入partitions参数,这样会创建一个表,它的分区是partitions所指定的字段。
813+
``persist``\ 可以传入 partitions 参数。加入该参数后,会创建一个分区表,它的分区字段为 partitions 列出的字段,
814+
DataFrame 中相应字段的值决定该行将被写入的分区。例如,当 partitions 为 ['name'] 且某行 name 的值为 test,
815+
那么该行将被写入分区 ``name=test``。这适用于当分区需要通过计算获取的情形。
814816

815817
.. code:: python
816818
@@ -827,7 +829,7 @@ ResultFrame 也支持在安装有 pandas 的前提下转换为 pandas DataFrame
827829
name : string
828830
829831
830-
如果想写入已经存在的表的某个分区,``persist``\ 可以传入partition参数,指明写入表的哪个分区(如ds=******)。
832+
如果想写入已经存在的表的某个分区,``persist``\ 可以传入 partition 参数,指明写入表的哪个分区(如ds=******)。
831833
这时要注意,该DataFrame的每个字段都必须在该表存在,且类型相同。drop_partition和create_partition参数只有在此时有效,
832834
分别表示是否要删除(如果分区存在)或创建(如果分区不存在)该分区。
833835

0 commit comments

Comments
 (0)