Skip to content

No way to construct mixed dtype DataFrame without total copy, proposed solution #9216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
quicknir opened this issue Jan 9, 2015 · 61 comments · Fixed by #45455
Closed

No way to construct mixed dtype DataFrame without total copy, proposed solution #9216

quicknir opened this issue Jan 9, 2015 · 61 comments · Fixed by #45455
Assignees
Labels
Constructors Series/DataFrame/Index/pd.array Constructors DataFrame DataFrame data structure Docs good first issue

Comments

@quicknir
Copy link

quicknir commented Jan 9, 2015

After hours of tearing my hair, I've come to the conclusion that it is impossible to create a mixed dtype DataFrame without copying all of its data in. That is, no matter what you do, if you want to create a mixed dtype DataFrame, you will inevitably create a temporary version of the data (e.g. using np.empty), and the various DataFrame will constructors will always make copies of this temporary. This issue has already been brought up, a year ago: #5902.

This is especially terrible for interoperability with other programming languages. If you plan to populate the data in the DataFrame from e.g. a call to C, the easiest way to do it by far is to create the DataFrame in python, get pointers to the underlying data, which are np.arrays, and pass these np.arrays along so that they can be populated. In this situation, you simply don't care what data the DataFrame starts off with, the goal is just to allocate the memory so you know what you're copying to.

This is also just generally frustrating because it implies that in principle (depending potentially on the specific situation, and the implementation specifics, etc) it is hard to guarantee that you will not end up using twice the memory you really should.

This has an extremely simple solution that is already grounded in the quantitative python stack: have a method analagous to numpy's empty. This allocates the space, but does not actually waste any time writing or copying anything. Since empty is already taken, I would propose calling the method from_empty. It would accept an index (mandatory, most common use case would be to pass np.arange(N)), columns (mandatory, typically a list of strings), types (list of acceptable types for columns, same length as columns). The list of types should include support for all numpy numeric types (ints, floats), as well as special Pandas columns such as DatetimeIndex and Categorical.

As an added bonus, since the implementation is in a completely separate method, it will not interfere with the existing API at all.

@jreback
Copy link
Contributor

jreback commented Jan 9, 2015

you can simply create an empty frame with an index and columns
then assign ndarrays - these won't copy of you assign all of a particular dtype at once

you could create these with np.empty if you wish

@quicknir
Copy link
Author

quicknir commented Jan 9, 2015

df = pd.DataFrame(index=range(2), columns=["dude", "wheres"])

df
Out[12]:
  dude wheres
0  NaN    NaN
1  NaN    NaN

x = np.empty(2, np.int32)

x
Out[14]: array([6, 0], dtype=int32)

df.dude = x

df
Out[16]:
   dude wheres
0     6    NaN
1     0    NaN

x[0] = 0

x
Out[18]: array([0, 0], dtype=int32)

df
Out[19]:
   dude wheres
0     6    NaN
1     0    NaN

Looks like it's copying to me. Unless the code I wrote isn't what you meant, or the copying that occurred is not the copy you thought I was trying to elide.

@jreback
Copy link
Contributor

jreback commented Jan 9, 2015

you changed the dtype
that's why it copied try with a float

@quicknir
Copy link
Author

quicknir commented Jan 9, 2015

y = np.empty(2, np.float64)

df
Out[21]:
   dude wheres
0     6    NaN
1     0    NaN

df.wheres = y

y
Out[23]: array([  2.96439388e-323,   2.96439388e-323])

y[0] = 0

df
Out[25]:
   dude         wheres
0     6  2.964394e-323
1     0  2.964394e-323

df = pd.DataFrame(index=range(2), columns=["dude", "wheres"])

df.dtypes
Out[27]:
dude      object
wheres    object
dtype: object

The dtype is object, so its changed regardless of whether I use a float or an int.

@jreback
Copy link
Contributor

jreback commented Jan 9, 2015

In [25]: arr = np.ones((2,3))

In [26]: df = DataFrame(arr,columns=['a','b','c'])

In [27]: arr[0,1] = 5

In [28]: df
Out[28]: 
   a  b  c
0  1  5  1
1  1  1  1

Constructing w/o a copy on mixed type could be done but is quite tricky. The problem is some types require a copy (e.g. object to avoid memory contention issues). And the internal structure consolidates different types, so adding a new type will necessitatte a copy. Avoiding a copy is pretty difficult in most cases.

You should just create what you need, get pointers to the data and then overwrite it. Why is that a problem?

@quicknir
Copy link
Author

quicknir commented Jan 9, 2015

The problem is that in order to create what I need, I have to copy in stuff of the correct dtype, the data of which I have no intention of using. Even assuming that your suggestion of creating an empty DataFrame uses no significant RAM, this doesn't alleviate the cost of copying. If I want to create a 1 gigabyte DataFrame and populate it somewhere else, I'll have to pay the cost of copying a gigabyte of garbage around in memory, which is completely needless. Do you not see this as a problem?

Yes, I understand that the internal structure consolidates different types. I'm not sure exactly what you mean by memory contention issues, but in any case objects are not really what's of interest here.

Actually, while avoiding copies in general is a hard problem, avoiding them in the way I suggested is fairly easy because I'm supplying all the necessary information from the get-go. It's identical to constructing from data, except that instead of inferring the dtypes and the # of rows from data and copying the data, you specify the dtypes and # of rows directly, and do everything else exactly as you would have done minus the copy.

You need an "empty" constructor for every supported column type. For numpy numeric types this is obvious, it needs non-zero work for Categorical, unsure about DatetimeIndex.

@jreback
Copy link
Contributor

jreback commented Jan 9, 2015

passing a dict to the constructor and copy=False should work

@jreback
Copy link
Contributor

jreback commented Jan 9, 2015

So this will work. But you have to be SURE that the arrays that you are passing are distinct dtypes. And once you do anything to this it could copy the underlying data. So YMMV. you can of course pass in np.empty instead of the ones/zeros that I am.

In [75]: arr = np.ones((2,3))

In [76]: arr2 = np.zeros((2,2),dtype='int32')

In [77]: df = DataFrame(arr,columns=list('abc'))

In [78]: df2 = DataFrame(arr2,columns=list('de'))

In [79]: result = pd.concat([df,df2],axis=1,copy=False)

In [80]: arr2[0,1] = 20

In [81]: arr[0,1] = 10

In [82]: result
Out[82]: 
   a   b  c  d   e
0  1  10  1  0  20
1  1   1  1  0   0

In [83]: result._data
Out[83]: 
BlockManager
Items: Index([u'a', u'b', u'c', u'd', u'e'], dtype='object')
Axis 1: Int64Index([0, 1], dtype='int64')
FloatBlock: slice(0, 3, 1), 3 x 2, dtype: float64
IntBlock: slice(3, 5, 1), 2 x 2, dtype: int32

In [84]: result._data.blocks[0].values.base
Out[84]: 
array([[  1.,  10.,   1.],
       [  1.,   1.,   1.]])

In [85]: result._data.blocks[1].values.base
Out[85]: 
array([[ 0, 20],
       [ 0,  0]], dtype=int32)

@jreback jreback added API Design Dtype Conversions Unexpected or buggy dtype conversions labels Jan 9, 2015
@bashtage
Copy link
Contributor

bashtage commented Jan 9, 2015

Iniital attempt deleted since does not work since reindex forces casting, which is a strange "feature".

Have to use 'method', which make this attempt a little less satisfactory:

arr = np.empty(1, dtype=[('x', np.float), ('y', np.int)])
df = pd.DataFrame.from_records(arr).reindex(np.arange(100))

If you are really worried about performance, I'm not sure why one wouldn't just use numpy as much as possible since it is conceptually much simpler.

@quicknir
Copy link
Author

quicknir commented Jan 9, 2015

jreback, thank you for your solution. This seems to work, even for Categoricals (which surprised me). If I encounter issues I'll let you know. I'm not sure what you mean by: if you do anything to this, it could copy. What do you mean by anything? Unless there are COW semantics I would think what you see is what you get with regards to deep vs shallow copies, at construction time.

I still think a from_empty constructor should be implemented, and I don't think it would be that difficult, while this technique works, it does involve a lot of code overhead. In principle this could be done by specifying a single composite dtype and a number of rows.

bashtage, these solutions still write into the entire DataFrame. Since writing is generally slower than reading, this means at best it saves less than half the overhead in question.

Obviously if I haven't gone and used numpy, its because pandas has many awesome features and capabilities that I love, and I don't want to give those up. Were you really asking, or just implying that I should use numpy if I don't want to take this performance hit?

@quicknir
Copy link
Author

quicknir commented Jan 9, 2015

Scratch this, please, user error, and my apologies. reindex_axis with copy=False worked perfectly.

@bashtage
Copy link
Contributor

bashtage commented Jan 9, 2015

bashtage, these solutions still write into the entire DataFrame. Since writing is generally slower than reading, this means at best it saves less than half the overhead in question.

True, but all that you need to a new method for reindex that will not fill with anything and then you can allocate a typed array with arbitrary column types without writing/copying.

Obviously if I haven't gone and used numpy, its because pandas has many awesome features and capabilities that I love, and I don't want to give those up. Were you really asking, or just implying that I should use numpy if I don't want to take this performance hit?

It was a bit rhetorical - although also a serious suggestion from a performance point of view since numpy makes it much easier to get close to the data-as-a-blob-of-memory access that is important if you are trying to write very high performance code. You can always convert from numpy to pandas when code simplicity is more important than performance.

@quicknir
Copy link
Author

quicknir commented Jan 9, 2015

I see what you are saying. I still think it should more cleanly be part of the interface rather than a workaround, but as workarounds go it is a good one and easy to implement.

Pandas still emphasizes performance as one if its main objectives. Obviously it has higher level features compared to numpy, and those have to be paid for. What we're talking about has nothing to do with those higher level features, and there's no reason why one should be paying for massive copies in places where you don't need them. Your suggestion would be appropriate if someone was making a stink about the cost of setting up the columns, index, etc, which is completely different from this discussion.

@bashtage
Copy link
Contributor

bashtage commented Jan 9, 2015

I think you are overestimating the cost of writing vs. the code of alloating memory in Python -- the expensive part is the memory allocation. The object creation is also expensive.

Both allocate 1GB of memory, one empty and one zeros.

%timeit np.empty(1, dtype=[('x', float), ('y', int), ('z', float)])
100000 loops, best of 3: 2.44 µs per loop

%timeit np.zeros(1, dtype=[('x', float), ('y', int), ('z', float)])
100000 loops, best of 3: 2.47 µs per loop

%timeit np.zeros(50000000, dtype=[('x', float), ('y', int), ('z', float)])
100000 loops, best of 3: 11.7 µs per loop

%timeit np.empty(50000000, dtype=[('x', float), ('y', int), ('z', float)])
100000 loops, best of 3: 11.4 µs per loop

3µs for zeroing 150,000,000 values.

Now compare these for a trivial DataFrame.

%timeit pd.DataFrame([[0]])
1000 loops, best of 3: 426 µs per loop

Around 200 times slower for trivial. But it is far worse for larger arrays.

%timeit pd.DataFrame(np.empty((50000000, 3)),copy=False)
1 loops, best of 3: 275 ms per loop

Now it takes 275ms -- note that this is not copying anything. The cost is in setting up the index, etc which is clearly very slow when the array is nontrivially big.

This feels a like a premature optimization to me since the other overheads in pandas are so large that the malloc + filliing component is near 0 cost.

It seems that if you want to allocate anything in a tight loop that is must be a numpy array for performance reasons.

@jreback
Copy link
Contributor

jreback commented Jan 9, 2015

ok, here's what I think we should do, @quicknir if you'd like to make some improvements. 2 issues.

  • ER: compound dtypes - DataFrame constructor/astype #4464 - this is essentially allowing a compound dtype in the DataFrame constructor and then turning around and calling from_records(), which can also be called if the passed in array is a rec/structured array - this would basically make from_records the rec/structued array processing path
  • pass thru the copy= keyword to from_records
  • from_records can then use the concat soln that I show above, rather than splitting the rec-array up, sanitizing them (as series) and then putting them back together (into dtype blocks; this part is done internally).

This is slightly non-trivial but would then allow one to pass in an already created ndarray (could be empty) with mixed types pretty easily. Note that this would likely (in a first pass implementation) handle only (int/float/string). as datetime/timedelta need special sanitizing and would make this slighlty more complicated.

so @bashtage is right from a perf perspective. It makes a lot of sense to simply construct the frame as you want then modify the ndarrays (but you MUST do this by grabbing the blocks, otherwise you will get copies).

What I meant above is this. Pandas groups any like-dtype (e.g. int64,int32 are different) into a 'block' (2-d in a frame). These are a contiguous memory ndarray (that is newly allocated, unless it is simply passed in which only currently works for a single dtype). If you then do a setitem, e.g. df['new_columns'] = 5 and you already have a int64 block, then this new column will ultimatly be concatetated to it (resulting in a new memory allocation for that dtype). If you were using a reference as a view on this it will no longer be valid. That's why this is not a strategy you can employ w/o peering at the DataFrame internals.

@jreback
Copy link
Contributor

jreback commented Jan 9, 2015

@bashtage yeh the big cost is the index as you have noted. a RangeIndex (see #939) would solve this problem completely. (it is actually almost done in a side branch, just needs some dusting off).

@bashtage
Copy link
Contributor

bashtage commented Jan 9, 2015

Even with an optimized RangeIndex it will still be 2 orders of magnitude slower than constructing a NumPy array, which is fair enough given the much heavier weight nature and additional capabilities of a DataFrame.

I think this can only be considered a convenience function, and not a performance issue.It could be useful to initialize a mixed type DataFrame or Panel like.

dtype=np.dtype([('GDP', np.float64), ('Population', np.int64)])
pd.Panel(items=['AU','AT'],
         major_axis=['1972','1973'],
         minor_axis=['GDP','Population'], 
         dtype=[np.float, np.int64])

@jreback
Copy link
Contributor

jreback commented Jan 9, 2015

this is only an API / convenience issue

agreed the perf is really an incidental issue (and not the driver)

@quicknir
Copy link
Author

@bashtage

%timeit pd.DataFrame(np.empty((100, 1000000)))
100 loops, best of 3: 15.6 ms per loop

%timeit pd.DataFrame(np.empty((100, 1000000)), copy=True)
1 loops, best of 3: 302 ms per loop

So copying into a dataframe seems to take 20 times longer than all the other work involved in creating the DataFrame, i.e. the copy (and extra allocation) is 95% of the time. The benchmarks you did do not benchmark the correct thing. Whether the copy itself or the allocation is what's taking time doesn't really matter, the point is that if I could avoid copies for a multiple dtype DataFrame the way I can for a single dtype DataFrame I could save a huge amount of time.

Your two order of magnitude reasoning is also deceiving. This is not the only operation being performed, there are other operations being performed that take time, like disk reads. Right now, the extra copy I need to do to create the DataFrame is taking about half the time in my simple program that just reads the data off disk and into a DataFrame. If it took 1/20 th as much time, then the disk read would be dominant (as it should be) and further improvements would have almost no effect.

So I want to again emphasize to both of you: this is a real performance issue.

jreback, given that the concatenation strategy does not work for Categoricals, don't think that the improvements you suggested above will work. I think a better starting point would be reindex. The issue right now is that reindex does lots of extra stuff. But in principle, a DataFrame with zero rows has all the information necessary to allow the creation of a DataFrame with the correct number of rows, without doing any unnecessary work. Btw, this makes me really feel like pandas needs a schema object, but that's a discussion for another day.

@bashtage
Copy link
Contributor

I think we wil have to agree to disagree. IMO DataFrames are not extreme performance objects in the numeric ecosystem, as show by the order of magntude difference between a basic numpy array and a DataFrame creation.

%timeit np.empty((1000000, 100))
1000 loops, best of 3: 1.61 ms per loop

%timeit pd.DataFrame(np.empty((1000000,100)))
100 loops, best of 3: 15.3 ms per loop

Right now, the extra copy I need to do to create the DataFrame is taking about half the time in my simple program that just reads the data off disk and into a DataFrame. If it took 1/20 th as much time, then the disk read would be dominant (as it should be) and further improvements would have almost no effect.

I think this is even less reason to care about DataFrame performance -- even if you can make it 100% free, the total program time only declines by 50%.

I agree that there is scope for you to do a PR here to resolve this issue, whether you want to think of it as a performance issue or as a convenience issue. From my POV, I see it as the latter since I will always use a numpy array when I care are performance. Numpy does other things like not use a block manager which is relatively efficient for some things (like growing the array by adding columns). but bad from other points of view.

There could be two options. The first, an empty constructor as in the example I gave above. This would not copy anything, but would probably Null-fill to be consistent with other things in pandas. Null filling is pretty cheap and is not at the root of the problem IMO.

The other would be to have a method DataFrame.from_blocks that would take preformed blocks to pass straight to the block manager. Something like

DataFrame.from_blocks([np.empty((100,2)), 
                       np.empty((100,3), dtype=np.float32), 
                       np.empty((100,1), dtype=np.int8)],
                     columns=['f8_0','f8_1','f4_0','f4_1','f4_2','i1_0'],
                     index=np.arange(100))

A method of this type would enforce that the blocks have compatible shape, all blocks have unique types, as well as the usual checks for shape of the index and columns. This type of method would do nothing to the data and would use it in the BlockManger.

@jreback
Copy link
Contributor

jreback commented Jan 10, 2015

@quicknir you are trying to combine pretty complicated things. Categorical don't exist in numpy, rather they are a compound dtype like that is a pandas construct. You have to construct and assign then separately (which is actually quite cheap - these are not combined into blocks like other singular dtypes).

@bashtage soln seems reasonable. This could provide some simple checks and simply pass thru the data (and be called by the other internal routines). Normally the user need not concern themselves with the internal repr. Since you really really want to, then you need to be cognizant of this.

All that said, I am still not sure why you don't just create a frame exactly like you want. Then grab the block pointers and change the values. It costs the same memory, and as @bashtage points out this is pretty cheap to create essentially a null frame (that has all of the dtype,index,columns) already set.

@quicknir
Copy link
Author

Not sure what you mean by the empty constructor, but if you mean constructing a dataframe with no rows and the desired schema and calling reindex, this is the same amount of time as creating with copy=True.

Your second proposal is reasonable, but only if you can figure out how to do Categoricals. On that subject, I was going through the code and I realized that Categoricals are non-consolidatable. So on a hunch, I created an integer array and two categorical Series, I then created three DataFrames, and concatenated all three. Sure enough, it did not perform a copy even though two of the DataFrames had the same dtype. I will try to see how to get this to work for Datetime Index.

@jreback I still do not follow what you mean by create the frame exactly like you want.

@jreback
Copy link
Contributor

jreback commented Jan 10, 2015

@quicknir why don't you show a code/pseudo-code sample of what you are actually trying to do.

@quicknir
Copy link
Author

def read_dataframe(filename, ....):
   f = my_library.open(filename)
   schema = f.schema()
   row_count = f.row_count()
   df = pd.DataFrame.from_empty(schema, row_count)
   dict_of_np_arrays = get_np_arrays_from_DataFrame(df)
   f.read(dict_of_np_arrays)
   return df

The code previous was constructing a dictionary of numpy arrays first, and then constructing a DataFrame from that because it was copying everything. About half the time was being spent on that. So I am trying to change it to this scheme. The thing is, that constructing df as above even when you don't care about the contents is extremely expensive.

@jreback
Copy link
Contributor

jreback commented Jan 10, 2015

@quicknir dict of np arrays requires lots of copying.

You should simply do this:

# construct your biggest block type (e.g. say you have mostly floats)
df = DataFrame(np.empty((....)),index=....,columns=....)

# then add in other things you need (say strings)
df['foo'] = np.empty(.....)

# say ints
df['foo2'] = np.empty(...)

if you do this by dtype it will be cheap

then.

for dtype, block in df.as_blocks():
    # fill the values
    block.values[0,0] = 1

as these block values are views into numpy arrays

@quicknir
Copy link
Author

The composition of types isn't known in advance in general, and in the most common use case there is a healthy mix of floats and ints. I guess I don't follow how this will be cheap, if I have 30 float columns and 10 int columns, then yes, the floats will be very cheap. But when you do the ints, unless there is some way to do them all at once that I'm missing, each time you add one more column of ints it will cause the entire int block to be reallocated.

The solution you gave me previously is close to working, I can't seem to make it work out for DatetimeIndex.

@bashtage
Copy link
Contributor

Not sure what you mean by the empty constructor, but if you mean constructing a dataframe with no rows and the desired schema and calling reindex, this is the same amount of time as creating with copy=True.

An empty constructor would look like

dtype=np.dtype([('a', np.float64), ('b', np.int64), ('c', np.float32)])
df = pd.DataFrame(columns='abc',index=np.arange(100),dtype=dtype)

This would produce the same output as

dtype=np.dtype([('a', np.float64), ('b', np.int64), ('c', np.float32)])
arr = np.empty(100, dtype=dtype)
df = pd.DataFrame.from_records(arr, index=np.arange(100))

only it wouldn't copy data.

Basically the constructor would allow for a mixed dtype for the following call which works but only a single basic dtype.

df = pd.DataFrame(columns=['a','b','c'],index=np.arange(100), dtype=np.float32)

The only other feature would be to prevent it from null-filling int arrays which has the side effect of converting them to object dtype since there is no missing value for ints.

@bashtage
Copy link
Contributor

Your second proposal is reasonable, but only if you can figure out how to do Categoricals. On that subject, I was going through the code and I realized that Categoricals are non-consolidatable. So on a hunch, I created an integer array and two categorical Series, I then created three DataFrames, and concatenated all three. Sure enough, it did not perform a copy even though two of the DataFrames had the same dtype. I will try to see how to get this to work for Datetime Index.

The from_block method would have to know the rules of consolidation, so that it would allow multiple categoricals, but only one of other basic types.

@quicknir
Copy link
Author

quicknir commented Jun 9, 2015

@jreback It's not reinventing the wheel, as the dtype spec has several major problems:

  1. As far as I can see, to_records() will perform a deep copy of the entire DataFrame. Getting the spec (I'll just use this term from now on) for the DataFrame should be cheap and easy.

  2. The output of to_records is a numpy type. One implication of this is that I don't see how this could ever be extended to properly support Categoricals.

  3. This method of internally storing the spec is not easily compatible with how the data is stored inside the DataFrame (i.e. in blocks of like dtype). Creating the blocks from such a spec involves lots of extra work that could be elided by storing the the spec in a manner like I suggested, with a dict from dtype to column numbers. When you have a DataFrame with 2000 columns, this will be expensive.

In short, the dtype of the record representation is more of a workaround for the lack of a proper spec. It lacks several key features and is much poorer performance wise.

jreback added a commit to jreback/pandas that referenced this issue Oct 11, 2015
- closes pandas-dev#10556, add policy argument to constructors
- closes pandas-dev#9216, all passing of dict with view directly to the API
- closes pandas-dev#5902
- closes pandas-dev#8571 by defining __copy__/__deepcopy__
@allComputableThings
Copy link

There are many many threads on SO asking for this feature.

It seems to me that all these problem stem from BlockManager consolidating separate columns into a single memory chunks (the 'blocks').
Wouldn't the easiest fix be to not consolidate data into blocks when copy=False is specified.

I have a non-consolidating monkey-patched BlockManager:
https://stackoverflow.com/questions/45943160/can-memmap-pandas-series-what-about-a-dataframe
that I used to work around this problem.

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this issue Jan 14, 2018
- closes pandas-dev#10556, add policy argument to constructors
- closes pandas-dev#9216, all passing of dict with view directly to the API
- closes pandas-dev#5902
- closes pandas-dev#8571 by defining __copy__/__deepcopy__
jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this issue Jan 16, 2018
- closes pandas-dev#10556, add policy argument to constructors
- closes pandas-dev#9216, all passing of dict with view directly to the API
- closes pandas-dev#5902
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue Mar 17, 2020
- closes pandas-dev#10556, add policy argument to constructors
- closes pandas-dev#9216, all passing of dict with view directly to the API
- closes pandas-dev#5902
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue Mar 17, 2020
- closes pandas-dev#10556, add policy argument to constructors
- closes pandas-dev#9216, all passing of dict with view directly to the API
- closes pandas-dev#5902
jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this issue May 25, 2020
- closes pandas-dev#10556, add policy argument to constructors
- closes pandas-dev#9216, all passing of dict with view directly to the API
- closes pandas-dev#5902
@jbrockmendel jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label Nov 21, 2020
@jbrockmendel
Copy link
Member

You can now pass a dict of Series and copy=False and should be OK

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Dec 21, 2021
@mroeschke
Copy link
Member

Might be good to mention the dict of Series with copy=False solution in the DataFrame docs.

@mroeschke mroeschke added DataFrame DataFrame data structure Docs good first issue and removed Enhancement Dtype Conversions Unexpected or buggy dtype conversions Closing Candidate May be closeable, needs more eyeballs labels Jan 15, 2022
@kaipriester
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors DataFrame DataFrame data structure Docs good first issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants