-
Notifications
You must be signed in to change notification settings - Fork 936
Use approximate FieldValue size for MemoryLRU #2548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
c0a86af
e6a786f
300c9bd
6b83a8a
9f756d4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -126,6 +126,12 @@ export abstract class FieldValue { | |
abstract isEqual(other: FieldValue): boolean; | ||
abstract compareTo(other: FieldValue): number; | ||
|
||
/** | ||
* Returns an approximate (and wildly inaccurate) in-memory size for the field | ||
* value. | ||
*/ | ||
abstract byteSize(): number; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider renaming There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was torn between "byteSize()", "estimateByteSize()" and "approximateByteSize()". Looks like we have two votes for approximateByteSize(). Changed. |
||
|
||
toString(): string { | ||
const val = this.value(); | ||
return val === null ? 'null' : val.toString(); | ||
|
@@ -167,6 +173,10 @@ export class NullValue extends FieldValue { | |
return this.defaultCompareTo(other); | ||
} | ||
|
||
byteSize(): number { | ||
return 1; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not clear to me why you chose It's tempting to try to be consistent with the old code... in which case, would this use Alternatively, if this is really meant to represent the in-memory representation, I would guess (but could be wrong that at minimum, JavaScript stores values at 4-byte-aligned locations, so the minimum size of anything is probably 4 bytes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mostly chose 1 because I didn't want a document with a bunch of NULLs to be zero. 4 sounds reasonable, and basing it on the previous JSON-encoded size makes it even seem super scientific and well thought-out. If only the rest of this PR was more like this :) |
||
} | ||
|
||
static INSTANCE = new NullValue(); | ||
} | ||
|
||
|
@@ -195,6 +205,10 @@ export class BooleanValue extends FieldValue { | |
return this.defaultCompareTo(other); | ||
} | ||
|
||
byteSize(): number { | ||
return 1; | ||
} | ||
|
||
static of(value: boolean): BooleanValue { | ||
return value ? BooleanValue.TRUE : BooleanValue.FALSE; | ||
} | ||
|
@@ -221,6 +235,10 @@ export abstract class NumberValue extends FieldValue { | |
} | ||
return this.defaultCompareTo(other); | ||
} | ||
|
||
byteSize(): number { | ||
return 4; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should probably be 8? (double / long are 8) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oops. Yes. |
||
} | ||
} | ||
|
||
/** Utility function to compare doubles (using Firestore semantics for NaN). */ | ||
|
@@ -313,6 +331,13 @@ export class StringValue extends FieldValue { | |
} | ||
return this.defaultCompareTo(other); | ||
} | ||
|
||
byteSize(): number { | ||
// See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures: | ||
// "JavaScript's String type is [...] a set of elements of 16-bit unsigned | ||
// integer values" | ||
return this.internalValue.length * 2; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that if we're going for the on-disk size, this will probably (not 100% sure about IndexedDb) be UTF-8 encoded which means most characters will be 1-byte, not 2. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is mostly (at least for now) for memory size. IndexedDB size accounting also uses JSON.stringify(), but on already available Protobuf data. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Memory size isn't a super great proxy because it's not portable. Strings in C++ are encoded in UTF-8. Would it make sense to define this as UTF-8-encoded size and then just approximate by saying that on average characters encode to 1.1 bytes? I'm not sure this really matters, but it seems like a portable, single definition of what these sizes are would be useful. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Strings are the only primitive with a specified memory footprint on https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures. I am ok with "dumbing this down" and just using 1 byte, but as eluded to above, I don't think aiming for consistency among platforms is a worthy goal. |
||
} | ||
} | ||
|
||
export class TimestampValue extends FieldValue { | ||
|
@@ -347,6 +372,11 @@ export class TimestampValue extends FieldValue { | |
return this.defaultCompareTo(other); | ||
} | ||
} | ||
|
||
byteSize(): number { | ||
// Timestamps are made up of two distinct numbers (seconds + nanoseconds) | ||
return 8; | ||
} | ||
} | ||
|
||
/** | ||
|
@@ -410,6 +440,13 @@ export class ServerTimestampValue extends FieldValue { | |
toString(): string { | ||
return '<ServerTimestamp localTime=' + this.localWriteTime.toString() + '>'; | ||
} | ||
|
||
byteSize(): number { | ||
return ( | ||
/* localWriteTime */ 8 + | ||
(this.previousValue ? this.previousValue.byteSize() : 0) | ||
); | ||
} | ||
} | ||
|
||
export class BlobValue extends FieldValue { | ||
|
@@ -436,6 +473,10 @@ export class BlobValue extends FieldValue { | |
} | ||
return this.defaultCompareTo(other); | ||
} | ||
|
||
byteSize(): number { | ||
return this.internalValue.toUint8Array().byteLength; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All of the other calculations are cheap... should we add a byteSize to Blob? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To reduce size one must first increase size. Done. |
||
} | ||
} | ||
|
||
export class RefValue extends FieldValue { | ||
|
@@ -466,6 +507,14 @@ export class RefValue extends FieldValue { | |
} | ||
return this.defaultCompareTo(other); | ||
} | ||
|
||
byteSize(): number { | ||
return ( | ||
this.databaseId.projectId.length + | ||
this.databaseId.database.length + | ||
this.key.toString().length | ||
); | ||
} | ||
} | ||
|
||
export class GeoPointValue extends FieldValue { | ||
|
@@ -492,6 +541,11 @@ export class GeoPointValue extends FieldValue { | |
} | ||
return this.defaultCompareTo(other); | ||
} | ||
|
||
byteSize(): number { | ||
// GeoPoints are made up of two distinct numbers (latitude + longitude) | ||
return 8; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should perhaps be 16 then, I think... although JavaScript may use some clever encoding of numbers that allows it to use 4-bytes for small numbers? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Made this, Timestamps and ServerTimestamps 16. |
||
} | ||
} | ||
|
||
export class ObjectValue extends FieldValue { | ||
|
@@ -634,6 +688,14 @@ export class ObjectValue extends FieldValue { | |
return FieldMask.fromSet(fields); | ||
} | ||
|
||
byteSize(): number { | ||
let size = 0; | ||
this.internalValue.inorderTraversal((key, val) => { | ||
size += key.length + val.byteSize(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For consistency with your other code, this would be 2* key.length, right? But again, if we assume UTF-8 encoding, then this is okay. FWIW it is tempting to create a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was debating all of this too... but I think we are already widely off with our size accounting and keeping it simple has some value. I kept it as is for now (including the inconsistency with StringValue, which at least has a comment). BTW, the same issue applies to ReferenceValues. We could add a Sizer class but that means more code and more cross-platform inconsistencies. |
||
}); | ||
return size; | ||
} | ||
|
||
toString(): string { | ||
return this.internalValue.toString(); | ||
} | ||
|
@@ -720,6 +782,13 @@ export class ArrayValue extends FieldValue { | |
} | ||
} | ||
|
||
byteSize(): number { | ||
return this.internalValue.reduce( | ||
(previousSize, value) => previousSize + value.byteSize(), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would previousSize => totalSize or accumulatedSize be an improvement? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good. Picked totalSize. |
||
0 | ||
); | ||
} | ||
|
||
toString(): string { | ||
const descriptions = this.internalValue.map(v => v.toString()); | ||
return `[${descriptions.join(',')}]`; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ import { expect } from 'chai'; | |
import { GeoPoint } from '../../../src/api/geo_point'; | ||
import { Timestamp } from '../../../src/api/timestamp'; | ||
import * as fieldValue from '../../../src/model/field_value'; | ||
import { primitiveComparator } from '../../../src/util/misc'; | ||
import * as typeUtils from '../../../src/util/types'; | ||
import { | ||
blob, | ||
|
@@ -485,4 +486,57 @@ describe('FieldValue', () => { | |
} | ||
); | ||
}); | ||
|
||
it('estimates size correctly for fixed sized values', () => { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These tests are pretty clever in that they don't encode the actual sizes, but they don't directly verify the implementation. It would be more straightforward if the tests just listed out: function sizeof(value) {
return wrap(value).approximateByteSize();
}
it('estimates size correctly for fixed sized values', () => {
expect(sizeof(null)).to.be(4);
expect(sizeof('')).to.be(0);
// etc ...
}); There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added an explicit |
||
// This test verifies that each member of a group takes up the same amount | ||
// of space in memory (based on its estimated in-memory size). | ||
const equalityGroups = [ | ||
[wrap(null), wrap(false), wrap(true)], | ||
[wrap(blob(0, 1)), wrap(blob(128, 129))], | ||
[wrap(NaN), wrap(Infinity), wrap(1), wrap(1.1)], | ||
[wrap(new GeoPoint(0, 0)), wrap(new GeoPoint(0, 0))], | ||
[wrap(Timestamp.fromMillis(100)), wrap(Timestamp.now())], | ||
[ | ||
new fieldValue.ServerTimestampValue(Timestamp.fromMillis(100), null), | ||
new fieldValue.ServerTimestampValue(Timestamp.now(), null) | ||
] | ||
]; | ||
|
||
for (const group of equalityGroups) { | ||
const expectedItemSize = group[0].byteSize(); | ||
for (const element of group) { | ||
expect(element.byteSize()).to.equal(expectedItemSize); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it worth verifying/possible to verify that no size is zero? I'm a little worried that these tests don't actually verify that we're aggregating the sizes of objects correctly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should now be taken care of by explicitly comparing the size to a provided value. I also added more types to this test case. |
||
} | ||
} | ||
}); | ||
|
||
it('estimates size correctly for relatively sized values', () => { | ||
// This test verifies for each group that the estimated size increases | ||
// as the size of the underlying data grows. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clever! |
||
const relativeGroups = [ | ||
[wrap(blob(0)), wrap(blob(0, 1))], | ||
[ | ||
new fieldValue.ServerTimestampValue(Timestamp.fromMillis(100), null), | ||
new fieldValue.ServerTimestampValue(Timestamp.now(), wrap(null)) | ||
], | ||
[ | ||
new fieldValue.RefValue(dbId('p1', 'd1'), key('c1/doc1')), | ||
new fieldValue.RefValue(dbId('p1', 'd1'), key('c1/doc1/c2/doc2')) | ||
], | ||
[wrap('foo'), wrap('foobar')], | ||
[wrap(['a', 'b']), wrap(['a', 'bc'])], | ||
[wrap(['a', 'b']), wrap(['a', 'b', 'c'])], | ||
[wrap({ a: 'a', b: 'b' }), wrap({ a: 'a', b: 'bc' })], | ||
[wrap({ a: 'a', b: 'b' }), wrap({ a: 'a', bc: 'b' })], | ||
[wrap({ a: 'a', b: 'b' }), wrap({ a: 'a', b: 'b', c: 'c' })] | ||
]; | ||
|
||
for (const group of relativeGroups) { | ||
const expectedOrder = group; | ||
const actualOrder = group | ||
.slice() | ||
.sort((l, r) => primitiveComparator(l.byteSize(), r.byteSize())); | ||
expect(expectedOrder).to.deep.equal(actualOrder); | ||
} | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth specifying that implementations:
The first point is important because there are some elements where the JSON or proto representations that are both right and wrong for this purpose. For example:
Having some consistent frame of reference for this will make it easier to understand if the values we're using are reasonable.
The second part is important because the full memory size doesn't actually matter. We can make representation choices (e.g. boxed value or not). This matters more in languages like Java, where it's more explicit, or C++ where it's front and center, but I think we should explicitly delineate that boxing/class overhead should be ignored. This also helps make this implementation portable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as this is only used in Memory LRU, we should at least attempt to replicate the in-memory cost of storing a document, which makes the VM representation the biggest driving factor. I know that this implementation is already widely off (and probably aggressively undercounts), but I personally don't think that the implementation should be driven by platform convergence, JSON or Protobuf sizes, or any on-disk presentation. If we can, we should optimize for the user experience - if the memory LRU limit is set to 40 MBs, Firestore should try to approximate this.
With that said, I updated the comment to specify that it only accounts for user data and ignores object overhead.