Skip to content

NumPy API surface : plan/prioritize the coverage #87

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
34 of 72 tasks
ev-br opened this issue Mar 16, 2023 · 7 comments
Open
34 of 72 tasks

NumPy API surface : plan/prioritize the coverage #87

ev-br opened this issue Mar 16, 2023 · 7 comments

Comments

@ev-br
Copy link
Collaborator

ev-br commented Mar 16, 2023

EDIT: the relevant list for an MVP is #87 (comment).
The rest is maybe-some-day-if-need-arises.

Splitting it off gh-86, here's the difference in API surfaces of NumPy and this wrapper. We can edit the order to reflect priorities:

>>> import numpy as np
>>> import torch_np as tnp
>>> npset = set(x for x in dir(np) if not x.startswith('_') and not inspect.ismodule(x) and not x[0].isupper())
>>> tnpset = set(dir(tnp))
>>> for name in sorted(npset - tnpset):
...        print("-[ ]", name)

EDIT: now lightly edited:

Lower prio:

memmap
ndenumerate
ndindex
nditer
nested_iters
setxor1d
setdiff1d
vectorize
trapz
trim_zeros
version
sort_complex
flatiter
union1d
unpackbits
packbits

Low prio if at all:

array2string
array_repr
array_str
busday_count
busday_offset
busdaycalendar
byte_bounds
bytes_
cast
ctypeslib
deprecate
deprecate_with_doc
format_float_positional
format_float_scientific
format_parser
get_array_wrap
get_include
get_printoptions
getbufsize
geterr
geterrcall
geterrobj
obj2sctype
poly
poly1d
polyadd
polyder
polydiv
polyfit
polyint
polymul
polynomial
polysub
polyval
sctype2char
sctypeDict
seterr
seterrcall
seterrobj
set_numeric_ops
set_printoptions
set_string_function
setbufsize
shares_memory
source
tracemalloc_domain (?)
test
testing
use_hugepage
who
save
savetxt
savez
savez_compressed
show_config
show_runtime
frombuffer
fromfile
fromfunction
fromiter
frompyfunc
fromregex
fromstring
genfromtxt
base_repr
binary_repr
may_share_memory
broadcast
printoptions
issctype
issubsctype
require
lookfor
load
loadtxt
mask_indices
kernel_version
lexsort
little_endian
maximum_sctype
intersect1d

Definitely not (no pytorch equivalents):

add_docstring
add_newdoc
add_newdoc_ufunc
asmatrix
char
character
chararray
clongdouble
clongfloat
complex256
compare_chararrays
datetime64
datetime_as_string
datetime_data
flexible
float128
isnat
is_busday
longcomplex
longdouble
longfloat
matrix
rec
recarray
recfromcsv
recfromtxt
record
ushort
uint
uint16
uint32
uint64
uintc
uintp
ulonglong
unicode_
void
spacing
str_
string_
timedelta64
safe_eval
numarray
oldnumeric
object_
fastCopyAndTranspose (deprecated in numpy)
msort (deprecated in numpy)
disp
info
iterable

@ev-br
Copy link
Collaborator Author

ev-br commented Mar 16, 2023

>>> for name in sorted(set(dir(np.ndarray)) - set(dir(tnp.ndarray))):
 ...     print("- [ ]", name)
  • __copy__
  • __deepcopy__
  • __delitem__
  • __dlpack__
  • __dlpack_device__
  • __imatmul__
  • __matmul__
  • __rdivmod__
  • __rmatmul__
  • __setstate__
  • argpartition
  • choose
  • compress
  • data
  • dump
  • dumps
  • fill
  • item
  • nbytes
  • partition
  • put
  • resize
  • take
  • view

Later if at all:

  • __contains__
  • __iter__

base
byteswap
newbyteorder
flat
getfield
setfield
setflags
itemset
tostring [deprecated since numpy 1.19]
tobytes
tofile
ctypes
__array__
__array_finalize__
__array_function__
__array_interface__
__array_prepare__
__array_priority__
__array_struct__
__array_ufunc__
__array_wrap__
__class_getitem__

@ev-br
Copy link
Collaborator Author

ev-br commented Mar 16, 2023

For tnp.random:

>>> for name in sorted(dir(tnp.random)):
...     print("- [x]", name)
  • ArrayLike
  • NDArray
  • Optional
  • array_or_scalar
  • choice
  • normal
  • normalizer
  • rand
  • randint
  • randn
  • random
  • random_sample
  • sample
  • seed
  • shuffle
  • sqrt
  • torch
  • uniform
>>> for name in sorted(set(dir(np.random)) - set(dir(tnp.random))):
...     print(name)

BitGenerator
Generator
MT19937
PCG64
PCG64DXSM
Philox
RandomState
SFC64
SeedSequence
__RandomState_ctor
__path__
_bounded_integers
_common
_generator
_mt19937
_pcg64
_philox
_pickle
_sfc64
beta
binomial
bit_generator
bytes
chisquare
default_rng
dirichlet
exponential
f
gamma
geometric
get_bit_generator
get_state
gumbel
hypergeometric
laplace
logistic
lognormal
logseries
mtrand
multinomial
multivariate_normal
negative_binomial
noncentral_chisquare
noncentral_f
pareto
permutation
poisson
power
random_integers
ranf
rayleigh
set_bit_generator
set_state
standard_cauchy
standard_exponential
standard_gamma
standard_normal
standard_t
test
triangular
vonmises
wald
weibull
zipf

@ev-br
Copy link
Collaborator Author

ev-br commented Mar 16, 2023

And linalg:

>>> for name in [x for x in dir(np.linalg) if not x.startswith("_")]:
 ...     print("- [ ]", name)
  • LinAlgError
  • cholesky
  • cond
  • det
  • eig
  • eigh
  • eigvals
  • eigvalsh
  • inv
  • lstsq
  • matrix_power
  • matrix_rank
  • multi_dot
  • norm
  • pinv
  • qr
  • slogdet
  • solve
  • svd
  • tensorinv
  • tensorsolve

@rgommers
Copy link
Member

This looks quite good, thanks @ev-br. Here's where I would start:

  • All of linalg can be implemented I believe, except for test. That seems important, and hopefully relatively straightforward
  • From the main namespace I'd take the functions that are pretty heavily used and can be mapped to equivalent pytorch functionality: pad, take, convolve, einsum, gradient, cross, tensordot, histogram, `from_dlpack, ...

@ev-br
Copy link
Collaborator Author

ev-br commented Mar 16, 2023

For random, we can rather easily mock up RandomState and default_rng. Might make it easier for scikit-learn and others who frown on np.random.random usage.

@lezcano
Copy link
Collaborator

lezcano commented Mar 21, 2023

I agree with the list from Ralf. Here's a slightly more comprehensive list of things that either have PyTorch equivs or are close to trivial to implement.

These could go into a PR, as most of them should have a very simple implementation. After these, I think we can declare victory on the coverage end and we should spend some time finishing the refactorisation and doing general cleanups across the codebase (without spending too much time on this) and then move on to the testing part of the project, where we show that what we built, in fact, works.

  • append (we don't have an equiv, but should be easy to implement via advance indexing)
  • barlett
  • blackman
  • choose (didn't we have this one already?)
  • copyto (funnily enough this is the function we want to use to implement the out kwarg)
  • convolve
  • cross
  • e
  • einsum
  • fft
  • from_dlpack
  • gradient
  • hamming
  • histogram
  • histogram2d (dispatch to histogram2d)
  • histogramdd
  • kaiser
  • min_scalar_type (trivial to implement or just vendor)
  • nbytes (this is a property of Tensor in PyTorch)
  • pad
  • put
  • resize
  • take
  • tensordot
  • np.linalg

@rgommers
Copy link
Member

That list seems reasonable, minus infty - I'm going to deprecate that one in NumPy soon, so would prefer to leave it out here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants