From 0f90a019839e1a238fc1704b09c2c69cadc56e6c Mon Sep 17 00:00:00 2001 From: Lukasz Misiuda Date: Thu, 7 Dec 2017 09:13:37 -0800 Subject: [PATCH 1/3] Add SageMaker banner. Add link to 'Read the Docs'. Provide more information on AWS built-in algorithms in the description. --- README.rst | 21 +++++++++++++++++++++ branding/icon/sagemaker-banner.png | Bin 0 -> 9395 bytes 2 files changed, 21 insertions(+) create mode 100644 branding/icon/sagemaker-banner.png diff --git a/README.rst b/README.rst index 4a76573766..164ce62638 100644 --- a/README.rst +++ b/README.rst @@ -1,3 +1,8 @@ +.. image:: branding/icon/sagemaker-banner.png + :height: 100px + :alt: SageMaker + +==================== SageMaker Python SDK ==================== @@ -5,6 +10,8 @@ SageMaker Python SDK is an open source library for training and deploying machin With the SDK, you can train and deploy models using popular deep learning frameworks: **Apache MXNet** and **TensorFlow**. You can also train and deploy models with **Amazon algorithms**, these are scalable implementations of core machine learning algorithms that are optimized for SageMaker and GPU training. If you have **your own algorithms** built into SageMaker compatible Docker containers, you can train and host models using these as well. +For detailed API reference please go to: `Read the Docs `_ + Table of Contents ----------------- @@ -1419,6 +1426,20 @@ The full list of algorithms is available on the AWS website: https://docs.aws.am SageMaker Python SDK includes Estimator wrappers for the AWS K-means, Principal Components Analysis, and Liner Learner algorithms. +Definition and usage +~~~~~~~~~~~~~~~~~~~~ +Estimators that wrap Amazon's built-in algorithms define algorithm's hyperparameters with defaults. When a default is not possible you need to provide the value during construction: + +- ``KMean`` Estimator requires parameter ``k`` to define number of clusters +- ``PCA`` Estimator requires parameter ``num_components`` to define number of principal components + +Interaction is identical as any other Estimators. + +Predictions support +~~~~~~~~~~~~~~~~~~~ +Calling inference on deployed Amazon's built-in algorithms you must follow specific input format. By default this library creates a predictor that allows to use just numpy data. +Data is converted so that ``application/x-recordio-protobuf`` input format is used. Received response is deserialized from the protobuf and provided as result from the ``predict`` call. + BYO Docker Containers with SageMaker Estimators ----------------------------------------------- diff --git a/branding/icon/sagemaker-banner.png b/branding/icon/sagemaker-banner.png new file mode 100644 index 0000000000000000000000000000000000000000..0abe07a78ff776018db04d84b7cc0ceb1c9ae81f GIT binary patch literal 9395 zcmai)WmsEFwD$uo5Ue;9in|tCpm=e2cLK$o;0{Gv2<{YjFD^wxDG&+;iWj#6MS{ED zlylDeKF|GfKP1_EXYH9aGi%nKJ->gXnu;to1{nqb0Kk@)lhObHkV_HAAAzWdZ?@^f z7laapHGjSiJ(WauH znX3hb#6i<4EOjqRsbqeQBcRqg-{O0hg~XB)v8o&}06oFbhAUveW?OQmb{1VDzrAYr z6ERHf>-n?AX=4wiA_}iM8%xE$W>Zhiv(>B@pt!9iw~;$OFRL!9=lJ&f3!+;&ej^w6 zBL}1Tp`lLz-CkXhqyS3*3c#HDTR&wmYw&S+){=G;Ork3^hX2daA;qixA1^w?CFrS~{(SY!O*+De_ zG5^o5kLA&v=0Cf&;I+jes~NWTt(ovxh3EyNZEq?*Y|kw;u`i+!J(bDKYHq+dWxM9 zJ)fyv9BDh!8&Y8Q6e-+C0$A z(I6z@hXUVrS)s+kBjtr>Jx{IG{~6yk9&x45#e8h{NU|nAYN2^aut|DTwqYHue`>?( z&^g`w$)0!q#aWkJf3pk<_J&%=OZ+572 zuUXe|*`u#L)e-!6Xygx#IB^2~F)j-)5(!G2_FICdyn-LDYvBG|RVsTS_YJddn4< z1}hCtTXrvYHNc-9q^9M2N#25-0(IPUf6z*%{Tm(VLvFT0G%y1)M(Q%dwfie+ymBfu zL^l<`0mf*Hu(AGiaDeMjVot^8u6;E$z9Vs&N|ZF4vpnfn(D{#~^`;BU#6U^204PKC z%?95r&0|c16_9}wHCEatC5PPTA54Gi%;(4My)6jhnyGnczWisl zap#y5q!(grKKexSHE)9lAj_C2q+=v*^!JX)+~|}_q=74k`k#%vyv>Atw#;)kOX9lf zCs&-mmcruSXOzK^^o+aeGd%;+IHdm8AU1?*MPM*j%CBru|HIy)KQbxabaVzQIOO&> z;XJa~lVI#T)V%4&-Aso*t)&UlKC5=+b|SbX67Mm&kX5A&>%f{)XPaXOFT8-z!{b5?~vFEu_?HyG%&}8dv zOf)$$++#;r$R3OE-BM1x8GeZmpr(OAuCz9J=ve$l8?T!u6Z}Q)hpXUgRw#IxZfz0WlIDsyfoX(?b z+0P4^J7lnaT)~b(C*GODE0(G;==_yrsFC z7;k*Mv^ns!_LSJgt=MZ8163~TTDia+DNuHxwt$X0im~!p{X3NyWg*h>^y{gYS@3E_R4n?iUje@z2oxDS>iBsp#`#M$P;0V@QvB~t*$jE^lu$DdA8NSU%m(_wF0 zEU5JNSGm*5Yz|z|Yx7)I!P+Gi1D;FRK6?c-B4CjxLnlI~4ge@x`rl@-Ia zyCzBX+D~fMiRhGu0Lno&<#rPdhbI6SGB7S%N@1MiCT$B9%dQki8{76aPX7f7fk)>PbN^MQn4yPvh< z2o=yTf=^Ql_He{&Jg~aaNZE9;lN4amCQIgqOHU>kM%jJ0z6mfUoxtR!-wK%}HN_!a zGMjHGpaxhPHN$t2a^5RTX-UhU#|$COQ?tQ>`R{g0H{6N~$kzo|N0$>AW1Re8DHf|$AJ zc*gxTD!YHB*OjI-hy*^pgfnw5#lm`Fq&Pf2IyUVc?n%U(<_ zEGogHZEKQ{bnpE%-~yCacl(D)jHmBCOD5U8(GR^=0|Ssu+u{4TTsuCqEbJpJ3(0d` zS#s+_Z%Vqd54t$4!j_p^$ooP^Y1RBNP9IPu*}B@dT@=}+A+nyEgIH;#^~ z4H@rIHn~|eZfH)VUm_ZtMPs;{<4RnAL27}jfS?e?P{Q-+1AKPQ! ze^FM5w%3hUERvXn@4ovaBB$juL|pJADvmQeO3zsob$aQDB;Xm{$MB#Xst19?(jauB zFBrGq>ksRcdn$I4UEsx34x)6+)OVDp6>H1cSat^GTuu(e^*X}1JU59L{nA5Wpw77d8515k3gYL-E4xbd@qb~Mt z??UdRJF&VL;tWG0#uq)%cOfOUzmn#ssmF62$o3_@R=`!yn{W7yQD;eKRRX6OIKlNS zXiAzd?HkowPNvZ5XOlt$r>2@wt_)KQrAf!wp9m&w@aj@8MDHaHils!S48d?6jXwC` z4y9)Wa}HJ^6Fs+3KlJ&qa96)M@r=P6jkfj1YS4{ z#}&N3qxw~@eP!PyRYwKhS5|s2D8nQHle2P4RVS?mv+yTP*#dmzz@+B2_ z6hon2VGqVOvHooF5@Q;)g~nf}Ve7|Go}jq=a=R2t`n%gDrx)4IxvtcwiUYaB zdaa*;Cvv_r8*kuk9B9#KkN|HP`?mACNS7hupg+VTWID;pwy#cwoThhMN~ZCLyE7AE z0h-ufj2p-Xkh3=J*X)ezx$<8IYJf)%XhHk`EtS);nF%j57#9c{t7``gU1ccRn z$g`b^Iq@2VSE4UiQ{w?$A z=b_IBkUSq6F3S}X&g@un{G{<&BH3JJ!9=U0tfHS#6px2k95**NZ_l0LuszjREA<|C zU6*5jOSaid+MK+t;3&jU?bama^$+I<^iU>~GPx z{hRzy_5dAe$1;VdPA7xQ5(yNb=|4g|C9bSOR!Y@XvlX|~?XFTTeMYR0%?~EJspat1 zzB17+N+- zm{jlf7&;9GC{PnVdd2)fK{m%i$$W~G1t1}y{36}G*L)ZD&+b>XyNb^fH3TZFm9R-nt-oA%6RYs&|f4eFBY;Ba} zN$au58_>8z7TRW4KSTDjm@_xFwhpb8bHERjtPZmbt6bn=1U;(?4zO6bn~&&Kp}%A3 zU)SZmrojbssip z)P&!^zmUvG!4M=0yS*l#H->>`37d* zasjF-(_hNLT7(7T+~OWT_KxfPWQz6&Z?qGLJ-3QSid>b^G1ywo9IuQ9rHtl&#r#>S ztSSvGXr5W0;px%U#ZX)_Wm)pMQeN|+6b8>N=^C<8+=fJ3jh1PuE_S`VHPL=s!n7G& z+#-cLDH6AFl}QBdAb@V;mx!J%EkzX)&{|FgI`Ucwh4& za-fYw{`u2F%G^+uRvGOOyK$lXHkQ7x^+Q`~vdoCbI{G`$+yb5#+E{(I>^c)X(I6z| ze9=(nClg6p8Y->o6lNJBOiv3*tT`cA02SIG&gmNTv2p?z51lc@mJ<+f+g~Ys`-o*# zfy?hYa_D{1kD%@O=*zhDo?wd7nI4|p#P`Mr_ys4vY-3XYdb$n^@g`H+^^9DoeCd2347Yq5 z-?YZZP~+h5*2LS)Tj>0~JA1vjzW1>Jz|y4H*tb4nsr)XySpWSMWMQ=iy- zuXWcb_?f#>5|srsDu@u5totL!^bOS;IdwkIQ%&Z*PFufm7`ox6dvK!B%S*)$Qe07r z%RGh0Lzkmhvj#KPYVn$;CS7jlk}vp>rAG?K^3Qeq=X$`36!doZqT1Nk8Kj9O5numQXHP2o^;NIxXjgD z0VC8>qKeL_)e-}+n+a#ObXm#>rlfS%*Jp{cN#d$G=~(kOwF8vTD?ch@WLDQpzeL6aIWcejEpo`!KFlbbS@in0;N|S<|vOhQ4a=x6{#< zL9~5+I4lPD2l3th=@`VSKPWFwS6d@2A?{(C#bJAck-Ls-ub{lb-kdOn!c?kLV16HPYcD>FqTGnVhu@u zzBy}9>0@j*YE=pV@$j&fUaE&b)Gt2oJqFKi8Nql_@87%K#m|UP%D4{&l=Uv?4Ayg& zVYqiB*11}84t+L4^Z&)@qKXrpuT)ENF{t@5(f$lk}VQU z$#mx^@9gLN1f^1;oEB;+6V`shHi^U{{#lm@KxG45-#~IOSgw#YA*K}o=-{%`wiO8Y z?55iJ+btVcDOi>*qB$Wvfb2=OSASP#%J(<#j^b64e9B{-d_n20VFtT$haeA$<#}XA zYwOgwA{r#bQSF+gCle7(iTZt`6i<~)S=%IZx#8?RVHpw2K9qTA8{c@n+AksPrqaB& z>NYU7w=Oz`jTe6Hns!uKHrB6>Ice8F*!g+P`sEPt9s>JSZh(Fr*gweoyV+BAIX!{X zbG|*1WobBW zfjJA16}Li?UBxl{k5?6BBrKJWkaF>7odjCIn!Pb&I3#O8`{q_6xbG()ntH7Yoj*ef z8>Q0|-k5b~#Hz5Ku%5dfB%@7Q0TiipCa1O1H5h&S(6}3=R2RCA zeqOI1sDjv^?!wY~&yv*2QdxJz@1}qME;G1}5mIQ;)#7RX{{8*_6JXS&JC%tw9v&05 zS7ap_)uUt&{RU5C!**IbxCo0c^`)9x%PT^YP|E=&%TUmA>En?$0Q6lKytV=biW|q>PG`Nb#Jc z8ltN8ReThhKJsFsUYfVE-@qL%O8-ZXs|TW)&37nHL~F+QmCsbCiwhf>7jhCH6rzOu zZIB4tYzG*Zg1dI1AhiY?7mgDFLOV&NR@%4lX>x0an72OWab`PH=MWK1buLcwcv`d% zJ63m%>GlX~fPJA1?12@3t~9s8e@+fBe-dG;s&)bm5x5onuzqee>dkZ;S1FOqBDgiS zL;_z@eTgkXSZZkIrlhc1KvqEjweUSa;;vEnoi3G@q)X zPAV9Cb$MvzU%OKeL;DBKmkR z{c&(vDo5%zNf!#a&^8^GD(2AEFr=j{LxfHt!M8X=@HNf38S5b9R_Wb6CjsZ5MS6W$J~MR+2d7j%oIYc?uzPtS+zYJfE(E4I=KTDVsOQJdNBY zE9Cs$+4T1;|s5JeRC zkqV2H}CU97-UmLq2$0cAYCG&3uREQ)wc? zR(tO~R;hCgMGPhF!?YyjeAMn#D9*gq z$^GbvZuK!|wNN z;H5~rvAMDoESF;kezEm{di#gd_vE*-{&;Ocqr?2Y;ueSt_$wn?Gs&fyK9 zJ9)`f3*qQq6Rpkb=(L~) zmO}O+iuL}HGohN&*hv%ealt!s;>PR!I@8(c&ABLGcUk%VgLifi6QAx{&2#C$)UpmU z1PwAAi(RNk`%QVzN9DjDL)ZrWK@F;)hS&`EGC|Fu8k={nq+zdaO9yIFzqKz?7^nYC2=}Kns1G&B06pOhyNXy+gmB^f>_fEH58mJ}69SG}~T{j|QO&Q`Hh7 zLg}hTSYY!TMZyS6gtnXO$a1~^O`BH=2ZA<y2uP2<**)v_3SuX)YRCebg!HdW~0P3u>}f@YgFJHReP`L`pw!!?DO18@j46v{tTKACAFo8Tgcv_FLNr750v3X3W`fTn|~p3D6)VT z()?nUp&_76zz@&+RCMeMrLc{9I-+0Vk0O6m$^-js4St@n4d)Q_Xw4wcJ3`yS0ZN}9 z22HQ~5ZpL21r?PG=`BqFhf2`#MRy$Z02;$85XAIwA>gv>+{W3IDRZ=oKNEj3TLl9%bz zzY4;N-1VU^+P>dX>@OgbRqoX1TpVOX8XKp7M4;vmGI`U~igupKA5K{p+tBcaL7{Tu zeFMl6*Xmh0f{2yc`3KzMw)-5gzSes6k`w}yr>GHwsR4ke`hT?m5=|I+gQ?UOz}|A! zKigxm5#4dD!16jqscMpwsHKprO^nd|M}UZGkkx$Aj;Xcy`&QlUrcLgI$4}25t+4+vZN3ci~0^e!r5H$(0(0nSJkQ5yxbC#DHp$387r%c~PR&!E^J3 z^rcY=0Vp%=4Tb$~XnTP_>+U$gU;L<+W#`rmygg~!*G|Z?bQ5LV>U}HdmPaYX;xFZ_ zGwwxzVBr^|4cw2|zDOmD@)H2Tz}kAtp2;$WF`p@7dbraX&s4aL`~`mqERTNZg2umk zau+GEVt;_1JKmoG1=)Jav#(!ZGH(4?Vg090#%}m-_nz&UCxU~IHX2F=o@ZT}q*x8E zF4tUieYGA1ULY^Jbp>Q+KjMAD(mFRNN>wLYYEvA`&d)e<;)H8x6R;|X#2c)2X+t?? zR1znMW{3NjMRh@y-$L8hvh-~s|9XTmIpuZwITEIV%n2=rKAd*}^LC2Qz!q9YnXxIq z1h!0mJ3dl#QeW^60*K>fQxzYS4eCyPWs}>v)Vg%z@(}tdV*S_gc=3&&gOzA#wu1TV znmyysc|wf&20>f=^DlZJwr!33DI=^q>%GJOALdJPs8D1r5>rE}UIK94;6$Yqm_(=|Xy~)Yi{Jq(=Wzv* z$+=liHlfG8kZTvF^c6VTKUxU6@eZyCk~6jIr2em+G4~2+(tjt|xT-LW@lmz%|CA*E zDO3I#kH4J%u{ps1Lviw-kmWzS|AZ`I|0+|klK2T>7!&u zPPNyTv@tQEbT;W9nGC%?*msrn4bpEGwMfPEP0N3UH9A?y-QktBNO}MrW{r;q$<=^b hi=clrH(3v84ipNJtpSyN2o(}QURp(}TEaBse*pS#84mye literal 0 HcmV?d00001 From d45076655d90f58562c80f36c491eb5136ec8813 Mon Sep 17 00:00:00 2001 From: Lukasz Misiuda Date: Thu, 14 Dec 2017 15:38:48 -0800 Subject: [PATCH 2/3] PR comments. Additional details about RecordSet. --- README.rst | 32 ++++++++++++++++++++++++++++---- 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/README.rst b/README.rst index 164ce62638..c47d4738a8 100644 --- a/README.rst +++ b/README.rst @@ -1424,20 +1424,44 @@ Amazon SageMaker provides several built-in machine learning algorithms that you The full list of algorithms is available on the AWS website: https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html -SageMaker Python SDK includes Estimator wrappers for the AWS K-means, Principal Components Analysis, and Liner Learner algorithms. +SageMaker Python SDK includes Estimator wrappers for the AWS K-means, Principal Components Analysis, and Linear Learner algorithms. Definition and usage ~~~~~~~~~~~~~~~~~~~~ Estimators that wrap Amazon's built-in algorithms define algorithm's hyperparameters with defaults. When a default is not possible you need to provide the value during construction: -- ``KMean`` Estimator requires parameter ``k`` to define number of clusters +- ``KMeans`` Estimator requires parameter ``k`` to define number of clusters - ``PCA`` Estimator requires parameter ``num_components`` to define number of principal components -Interaction is identical as any other Estimators. +Interaction is identical as any other Estimators. There are additional details about how data is specified. + +Input data format +^^^^^^^^^^^^^^^^^ +Please note that Amazon's built-in algorithms are working best with protobuf ``recordIO`` format. +The data is expected to be available in S3 location and depending on algorithm it can handle dat in multiple data channels. + +This package offers support to prepare data into required fomrat and upload data to S3. +Provided class ``RecordSet`` captures necessary details like S3 location, number of records, data channel and is expected as input parameter when calling ``fit()``. + +Function ``record_set`` is available on algorithms objects to make it simple to achieve the above. +It takes 2D numpy array as input, uploads data to S3 and returns ``RecordSet`` objects. By default it uses ``train`` data channel and no labels but can be specified when called. + +Please find an example code snippet for illustration: + +.. code:: python + + from sagemaker import PCA + pca_estimator=PCA(role='SageMakerRole', train_instance_count=1, train_instance_type='ml.m4.xlarge', num_components=3) + + import numpy as np + records = pca_estimator.record_set(np.arange(10).reshape(2,5)) + + pca_estimator.fit(records) + Predictions support ~~~~~~~~~~~~~~~~~~~ -Calling inference on deployed Amazon's built-in algorithms you must follow specific input format. By default this library creates a predictor that allows to use just numpy data. +Calling inference on deployed Amazon's built-in algorithms requires specific input format. By default, this library creates a predictor that allows to use just numpy data. Data is converted so that ``application/x-recordio-protobuf`` input format is used. Received response is deserialized from the protobuf and provided as result from the ``predict`` call. From adff1549ea829e83b526387394fd25466b4aa9d5 Mon Sep 17 00:00:00 2001 From: Lukasz Misiuda Date: Thu, 14 Dec 2017 15:42:00 -0800 Subject: [PATCH 3/3] Add spaces around = to match format. --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index c47d4738a8..ba29e9200e 100644 --- a/README.rst +++ b/README.rst @@ -1451,7 +1451,7 @@ Please find an example code snippet for illustration: .. code:: python from sagemaker import PCA - pca_estimator=PCA(role='SageMakerRole', train_instance_count=1, train_instance_type='ml.m4.xlarge', num_components=3) + pca_estimator = PCA(role='SageMakerRole', train_instance_count=1, train_instance_type='ml.m4.xlarge', num_components=3) import numpy as np records = pca_estimator.record_set(np.arange(10).reshape(2,5))