Skip to content

Commit 174b570

Browse files
committed
upgrade to 0.7.20
1 parent 29f94c3 commit 174b570

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+963
-447
lines changed

docs/source/_static/theme_override.js

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,25 @@
11
$(function() {
2+
var div_replacer = function(r, hn) {
3+
var id_str = '', sub_class='rubric-sub';
4+
if (!hn) {
5+
hn = 6;
6+
sub_class = 'rubric-sub rubric-default';
7+
}
8+
if ($(r).attr('id')) id_str = 'id="' + $(r).attr('id') + '" ';
9+
$(r).replaceWith('<div ' + id_str + 'class="rubric"><h' + hn + ' class="' + sub_class + '">'
10+
+ $(r).html() + '</h' + hn + '></div>');
11+
};
212
for (var hn = 1; hn <= 6; hn++) {
3-
$('.rubric-h' + hn).each(function (i, r) {
4-
$(r).replaceWith('<div class="rubric"><h' + hn + ' class="rubric-sub">' + $(r).html() + '</h' + hn + '></div>');
5-
});
13+
$('.rubric-h' + hn).each(function (i, r) { div_replacer(r, hn); });
614
}
7-
$('p.rubric').each(function (i, r) {
8-
$(r).replaceWith('<div class="rubric"><h6 class="rubric-sub rubric-default">' + $(r).html() + '</h6></div>');
9-
});
15+
$('p.rubric').each(function (i, r) { div_replacer(r); });
1016
$('.rubric').each(function (i, r) {
11-
$(r).attr('id', 'rubric' + (i + 1));
12-
$(r).find('.rubric-sub').append('<a class="headerlink" href="#rubric' + (i + 1) + '">¶</a>');
17+
var rubric_id = 'rubric' + (i + 1);
18+
if ($(r).attr('id')) {
19+
rubric_id = $(r).attr('id');
20+
} else {
21+
$(r).attr('id', rubric_id);
22+
}
23+
$(r).find('.rubric-sub').append('<a class="headerlink" href="#' + rubric_id + '">¶</a>');
1324
});
1425
});

docs/source/base-functions.rst

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,12 @@ ODPS用户可以编写自定义 `函数 <https://docs.aliyun.com/#/pub/odps/basi
1616

1717
.. code-block:: python
1818
19+
>>> # 引用当前 project 中的资源
1920
>>> resource = o.get_resource('my_udf.py')
20-
>>> function = o.create_function('test_function', class_type='my_udf.Test', resources=[resource, ])
21-
22-
23-
.. note::
24-
25-
注意,公共云由于安全原因,使用 Python UDF 需要申请。
21+
>>> function = o.create_function('test_function', class_type='my_udf.Test', resources=[resource])
22+
>>> # 引用其他 project 中的资源
23+
>>> resource2 = o.get_resource('my_udf.py', project='another_project')
24+
>>> function2 = o.create_function('test_function2', class_type='my_udf.Test', resources=[resource2])
2625
2726
删除函数
2827
---------

docs/source/base-tables.rst

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -321,12 +321,19 @@ PyODPS提供了 :ref:`DataFrame框架 <df>` ,支持更方便地方式来查询
321321
>>> for partition in table.iterate_partitions(spec='pt=test'):
322322
>>> # 遍历二级分区
323323
324-
判断分区是否存在:
324+
判断分区是否存在(该方法需要填写所有分区字段值)
325325

326326
.. code-block:: python
327327
328328
>>> table.exist_partition('pt=test,sub=2015')
329329
330+
判断给定前缀的分区是否存在:
331+
332+
.. code-block:: python
333+
334+
>>> # 表 table 的分区字段依次为 pt, sub
335+
>>> table.exist_partitions('pt=test')
336+
330337
获取分区:
331338

332339
.. code-block:: python

docs/source/df-agg.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,9 @@ DataFrame 提供了一个\ ``value_counts``\ 操作,能返回按某列分组
116116
1 Iris-versicolor 50
117117
2 Iris-setosa 50
118118
119+
需要注意的是,该方法返回的行数大小受到 ODPS 排序返回结果大小的限制,默认为 10000 行,可通过
120+
``options.df.odps.sort.limit`` 配置,详见 :ref:`配置选项 <options>` 。
121+
119122
对于聚合后的单列操作,我们也可以直接取出列名。但此时只能使用聚合函数。
120123

121124
.. code:: python

docs/source/df-basic.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -865,6 +865,11 @@ ResultFrame 也支持在安装有 pandas 的前提下转换为 pandas DataFrame
865865
>>> type(iris[iris.sepalwidth < 2.5].to_pandas(wrap=True))
866866
odps.df.core.DataFrame
867867
868+
.. note::
869+
870+
``to_pandas`` 返回的 pandas DataFrame 与直接通过 pandas 创建的 DataFrame 没有任何区别,
871+
数据的存储和计算均在本地。如果 ``wrap=True``,生成的即便是 PyODPS DataFrame,数据依然在本地。
872+
如果你的数据很大,或者运行环境的内存限制较为严格,请谨慎使用 ``to_pandas``。
868873

869874
立即运行设置运行参数
870875
~~~~~~~~~~~~~~~~~~~

docs/source/df-quickstart.rst

Lines changed: 20 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,11 @@
2828
2929
>>> users.dtypes
3030
odps.Schema {
31-
user_id int64
32-
age int64
33-
sex string
34-
occupation string
35-
zip_code string
31+
user_id int64
32+
age int64
33+
sex string
34+
occupation string
35+
zip_code string
3636
}
3737
3838
@@ -123,11 +123,12 @@
123123
8 executive 32
124124
9 scientist 31
125125
126-
DataFrame API提供了value\_counts这个方法来快速达到同样的目的。
126+
DataFrame API提供了value\_counts这个方法来快速达到同样的目的。注意该方法返回的行数受到 ``options.df.odps.sort.limit``
127+
的限制,详见 :ref:`配置选项 <options>` 。
127128

128129
.. code:: python
129130
130-
>>> users.occupation.value_counts()[:10]
131+
>>> uses.occupation.value_counts()[:10]
131132
occupation count
132133
0 student 196
133134
1 other 105
@@ -178,18 +179,18 @@ DataFrame API提供了value\_counts这个方法来快速达到同样的目的。
178179
>>>
179180
>>> lens.dtypes
180181
odps.Schema {
181-
movie_id int64
182-
title string
183-
release_date string
184-
video_release_date string
185-
imdb_url string
186-
user_id int64
187-
rating int64
188-
unix_timestamp int64
189-
age int64
190-
sex string
191-
occupation string
192-
zip_code string
182+
movie_id int64
183+
title string
184+
release_date string
185+
video_release_date string
186+
imdb_url string
187+
user_id int64
188+
rating int64
189+
unix_timestamp int64
190+
age int64
191+
sex string
192+
occupation string
193+
zip_code string
193194
}
194195
195196
现在我们把年龄分成从0到80岁,分成8个年龄段,

docs/source/faq-ext.rst

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,14 @@ Endpoint配置不对,详细配置参考
88
`MaxCompute 开通 Region 和服务连接对照表 <https://help.aliyun.com/document_detail/34951.html#h2-maxcompute-region-3>`_ 。
99
此外还需要注意 ODPS 入口对象参数位置是否填写正确。
1010

11-
.. rubric:: 如何申请开通公共云 Python UDF
11+
.. rubric:: 如何手动指定 Tunnel Endpoint
12+
:name: faq_tunnel_endpoint
1213

13-
公共云 Python UDF 目前处于公测阶段,可通过工单申请及咨询审批进度。
14+
可以使用下面的方法创建带有 Tunnel Endpoint 的 ODPS 入口(参数值请自行替换,不包含星号):
15+
16+
.. code-block:: python
17+
18+
from odps import ODPS
19+
20+
o = ODPS('**your-access-id**', '**your-secret-access-key**', '**your-default-project**',
21+
endpoint='**your-end-point**', tunnel_endpoint='**your-tunnel-endpoint**')

docs/source/faq.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
.. extinclude:: faq-ext.rst
1616

1717
.. rubric:: 读取数据时报"project is protected"
18+
:name: faq_protected
1819

1920
Project 上的安全策略禁止读取表中的数据,此时,如果想使用全部数据,有以下选项可用:
2021

@@ -66,6 +67,7 @@ Project 要求对每张表设置 lifecycle,因而需要在每次执行时设
6667
请参考 :ref:`SQL设置运行参数 <sql_hints>` 。
6768

6869
.. rubric:: 如何遍历 PyODPS DataFrame 中的每行数据
70+
:name: faq_enumerate_df
6971

7072
PyODPS DataFrame 不支持遍历每行数据。这样设计的原因是由于 PyODPS DataFrame 面向大规模数据设计,在这种场景下,
7173
数据遍历是非常低效的做法。我们建议使用 DataFrame 提供的 ``apply`` 或 ``map_reduce`` 接口将原本串行的遍历操作并行化,

docs/source/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ PyODPS 的相关依赖会自动安装。
2929
.. rubric:: 快速开始
3030
:class: rubric-h2
3131

32-
首先,我们需要阿里云帐号来初始化一个ODPS的入口
32+
首先,我们需要阿里云帐号来初始化一个 ODPS 的入口(参数值请自行替换,不包含星号)
3333

3434
.. code-block:: python
3535

docs/source/locale/en/LC_MESSAGES/api-def.po

Lines changed: 29 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ msgid ""
88
msgstr ""
99
"Project-Id-Version: PyODPS 0.7.16\n"
1010
"Report-Msgid-Bugs-To: \n"
11-
"POT-Creation-Date: 2018-06-25 15:28+0800\n"
11+
"POT-Creation-Date: 2018-10-10 15:55+0800\n"
1212
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
1313
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
1414
"Language-Team: LANGUAGE <[email protected]>\n"
@@ -98,12 +98,13 @@ msgstr ""
9898
#: odps.models.Instance.put_task_info odps.models.Instance.wait_for_completion
9999
#: odps.models.Instance.wait_for_success odps.models.Table.create_partition
100100
#: odps.models.Table.delete_partition odps.models.Table.drop
101-
#: odps.models.Table.exist_partition odps.models.Table.get_ddl
102-
#: odps.models.Table.get_partition odps.models.Table.head
103-
#: odps.models.Table.iterate_partitions odps.models.Table.new_record
104-
#: odps.models.Table.open_reader odps.models.Table.open_writer
105-
#: odps.models.Table.truncate odps.models.TableResource.update
106-
#: odps.models.Worker.get_log odps.models.ml.OnlineModel.predict
101+
#: odps.models.Table.exist_partition odps.models.Table.exist_partitions
102+
#: odps.models.Table.get_ddl odps.models.Table.get_partition
103+
#: odps.models.Table.head odps.models.Table.iterate_partitions
104+
#: odps.models.Table.new_record odps.models.Table.open_reader
105+
#: odps.models.Table.open_writer odps.models.Table.truncate
106+
#: odps.models.TableResource.update odps.models.Worker.get_log
107+
#: odps.models.ml.OnlineModel.predict
107108
#: odps.models.ml.OnlineModel.wait_for_deletion
108109
#: odps.models.ml.OnlineModel.wait_for_service
109110
#: odps.models.partition.Partition.drop
@@ -268,14 +269,14 @@ msgstr ""
268269
#: odps.models.Instance.open_reader odps.models.Instance.stop
269270
#: odps.models.Instance.wait_for_completion
270271
#: odps.models.Instance.wait_for_success odps.models.Table.create_partition
271-
#: odps.models.Table.drop odps.models.Table.get_ddl
272-
#: odps.models.Table.get_partition odps.models.Table.head
273-
#: odps.models.Table.new_record odps.models.Table.open_reader
274-
#: odps.models.Table.open_writer odps.models.Table.to_df
275-
#: odps.models.Table.truncate odps.models.TableResource.partition
276-
#: odps.models.TableResource.table odps.models.TableResource.update
277-
#: odps.models.Worker.get_log odps.models.ml.OnlineModel.predict
278-
#: odps.models.partition.Partition.drop
272+
#: odps.models.Table.drop odps.models.Table.exist_partitions
273+
#: odps.models.Table.get_ddl odps.models.Table.get_partition
274+
#: odps.models.Table.head odps.models.Table.new_record
275+
#: odps.models.Table.open_reader odps.models.Table.open_writer
276+
#: odps.models.Table.to_df odps.models.Table.truncate
277+
#: odps.models.TableResource.partition odps.models.TableResource.table
278+
#: odps.models.TableResource.update odps.models.Worker.get_log
279+
#: odps.models.ml.OnlineModel.predict odps.models.partition.Partition.drop
279280
#: odps.models.partition.Partition.open_reader
280281
#: odps.models.partition.Partition.to_df of
281282
msgid "Returns"
@@ -1715,6 +1716,18 @@ msgstr ""
17151716
msgid "Check if a partition exists within the table."
17161717
msgstr ""
17171718

1719+
#: odps.models.Table.exist_partitions:1 of
1720+
msgid "Check if partitions with provided conditions exist."
1721+
msgstr ""
1722+
1723+
#: odps.models.Table.exist_partitions:3 of
1724+
msgid "prefix of partition"
1725+
msgstr ""
1726+
1727+
#: odps.models.Table.exist_partitions:4 of
1728+
msgid "whether partitions exist"
1729+
msgstr ""
1730+
17181731
#: odps.models.Table.get_ddl:1 of
17191732
msgid "Get DDL SQL statement for the given table."
17201733
msgstr ""
@@ -1749,7 +1762,7 @@ msgid "the columns which is subset of the table columns"
17491762
msgstr ""
17501763

17511764
#: odps.models.Table.iterate_partitions:1 of
1752-
msgid "Create an iiterable object to iterate over partitions."
1765+
msgid "Create an iterable object to iterate over partitions."
17531766
msgstr ""
17541767

17551768
#: odps.models.Table.new_record:1 of

docs/source/locale/en/LC_MESSAGES/base-functions.po

Lines changed: 26 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ msgid ""
88
msgstr ""
99
"Project-Id-Version: PyODPS 0.7.16\n"
1010
"Report-Msgid-Bugs-To: \n"
11-
"POT-Creation-Date: 2018-04-19 17:37+0800\n"
11+
"POT-Creation-Date: 2018-10-15 23:13+0800\n"
1212
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
1313
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
1414
"Language-Team: LANGUAGE <[email protected]>\n"
@@ -27,7 +27,9 @@ msgid ""
2727
"<https://docs.aliyun.com/#/pub/odps/basic/definition&function>`_ 用在ODPS "
2828
"SQL中。"
2929
msgstr ""
30-
"You can write user-defined `functions <https://www.alibabacloud.com/help/en/doc-detail/27823.htm>`_ (UDFs) to MaxCompute SQL."
30+
"You can write user-defined `functions "
31+
"<https://www.alibabacloud.com/help/en/doc-detail/27823.htm>`_ (UDFs) to "
32+
"MaxCompute SQL."
3133

3234
#: ../../source/base-functions.rst:9
3335
msgid "基本操作"
@@ -38,44 +40,55 @@ msgid ""
3840
"可以调用 ODPS 入口对象的 ``list_functions`` 来获取项目空间下的所有函数,``exist_function`` "
3941
"能判断是否存在函数, ``get_function`` 获取函数对象。"
4042
msgstr ""
41-
"Use ``list_functions`` as the ODPS object to obtain all functions in the project. Use ``exist_function`` to check whether the specified function exists. Use ``get_function`` to obtain the object of a function."
43+
"Use ``list_functions`` as the ODPS object to obtain all functions in the "
44+
"project. Use ``exist_function`` to check whether the specified function "
45+
"exists. Use ``get_function`` to obtain the object of a function."
4246

4347
#: ../../source/base-functions.rst:15
4448
msgid "创建函数"
4549
msgstr "Create functions"
4650

4751
#: ../../source/base-functions.rst:17
4852
msgid ""
53+
">>> # 引用当前 project 中的资源\n"
4954
">>> resource = o.get_resource('my_udf.py')\n"
5055
">>> function = o.create_function('test_function', "
51-
"class_type='my_udf.Test', resources=[resource, ])"
56+
"class_type='my_udf.Test', resources=[resource])\n"
57+
">>> # 引用其他 project 中的资源\n"
58+
">>> resource2 = o.get_resource('my_udf.py', project='another_project')\n"
59+
">>> function2 = o.create_function('test_function2', "
60+
"class_type='my_udf.Test', resources=[resource2])"
5261
msgstr ""
62+
">>> # reference resources in the current project\n"
63+
">>> resource = o.get_resource('my_udf.py')\n"
64+
">>> function = o.create_function('test_function', "
65+
"class_type='my_udf.Test', resources=[resource])\n"
66+
">>> # reference resources in other projects\n"
67+
">>> resource2 = o.get_resource('my_udf.py', project='another_project')\n"
68+
">>> function2 = o.create_function('test_function2', "
69+
"class_type='my_udf.Test', resources=[resource2])"
5370

54-
#: ../../source/base-functions.rst:25
55-
msgid "注意,公共云由于安全原因,使用 Python UDF 需要申请。"
56-
msgstr "Note: You need to request a license for using Python UDFs in public cloud to ensure security."
57-
58-
#: ../../source/base-functions.rst:28
71+
#: ../../source/base-functions.rst:27
5972
msgid "删除函数"
6073
msgstr "Delete functions"
6174

62-
#: ../../source/base-functions.rst:30
75+
#: ../../source/base-functions.rst:29
6376
msgid ""
6477
">>> o.delete_function('test_function')\n"
6578
">>> function.drop() # Function对象存在时直接调用drop"
6679
msgstr ""
6780
">>> o.delete_function('test_function')\n"
6881
">>> function.drop() # call drop method of a Function instance to delete"
6982

70-
#: ../../source/base-functions.rst:36
83+
#: ../../source/base-functions.rst:35
7184
msgid "更新函数"
7285
msgstr "Update functions"
7386

74-
#: ../../source/base-functions.rst:38
87+
#: ../../source/base-functions.rst:37
7588
msgid "只需对函数调用 ``update`` 方法即可。"
7689
msgstr "To update functions, use the ``update`` method."
7790

78-
#: ../../source/base-functions.rst:40
91+
#: ../../source/base-functions.rst:39
7992
msgid ""
8093
">>> function = o.get_function('test_function')\n"
8194
">>> new_resource = o.get_resource('my_udf2.py')\n"

0 commit comments

Comments
 (0)