Use approximate FieldValue size for MemoryLRU #2548

schmidt-sebastian · 2020-01-16T22:59:48Z

No description provided.

mikelehen

Looks reasonable. Might be worth getting a little more intentional around the sizing, but I don't care too much.

mikelehen · 2020-01-16T23:15:39Z

packages/firestore/src/model/field_value.ts

+   * Returns an approximate (and wildly inaccurate) in-memory size for the field
+   * value.
+   */
+  abstract byteSize(): number;


Consider renaming approximateByteSize() ?

I was torn between "byteSize()", "estimateByteSize()" and "approximateByteSize()". Looks like we have two votes for approximateByteSize(). Changed.

mikelehen · 2020-01-16T23:21:22Z

packages/firestore/src/model/field_value.ts

@@ -167,6 +173,10 @@ export class NullValue extends FieldValue {
    return this.defaultCompareTo(other);
  }

+  byteSize(): number {
+    return 1;


It's not clear to me why you chose 1. I'm not saying it's a bad value, but that it might be good for us to be deliberate about our approximations and document them to make it clearer the intention and allow some future person to improve them.

It's tempting to try to be consistent with the old code... in which case, would this use 4 since the JSON-encoding is null which is 4 bytes?

Alternatively, if this is really meant to represent the in-memory representation, I would guess (but could be wrong that at minimum, JavaScript stores values at 4-byte-aligned locations, so the minimum size of anything is probably 4 bytes.

I mostly chose 1 because I didn't want a document with a bunch of NULLs to be zero. 4 sounds reasonable, and basing it on the previous JSON-encoded size makes it even seem super scientific and well thought-out. If only the rest of this PR was more like this :)

mikelehen · 2020-01-16T23:22:46Z

packages/firestore/src/model/field_value.ts

@@ -221,6 +235,10 @@ export abstract class NumberValue extends FieldValue {
    }
    return this.defaultCompareTo(other);
  }
+
+  byteSize(): number {
+    return 4;


This should probably be 8? (double / long are 8)

mikelehen · 2020-01-16T23:37:36Z

packages/firestore/src/model/field_value.ts

+    // See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures:
+    // "JavaScript's String type is [...] a set of elements of 16-bit unsigned
+    // integer values"
+    return this.internalValue.length * 2;


Note that if we're going for the on-disk size, this will probably (not 100% sure about IndexedDb) be UTF-8 encoded which means most characters will be 1-byte, not 2.

This is mostly (at least for now) for memory size. IndexedDB size accounting also uses JSON.stringify(), but on already available Protobuf data.

Memory size isn't a super great proxy because it's not portable. Strings in C++ are encoded in UTF-8.

Would it make sense to define this as UTF-8-encoded size and then just approximate by saying that on average characters encode to 1.1 bytes?

I'm not sure this really matters, but it seems like a portable, single definition of what these sizes are would be useful.

Strings are the only primitive with a specified memory footprint on https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures. I am ok with "dumbing this down" and just using 1 byte, but as eluded to above, I don't think aiming for consistency among platforms is a worthy goal.

mikelehen · 2020-01-16T23:39:20Z

packages/firestore/src/model/field_value.ts

@@ -436,6 +473,10 @@ export class BlobValue extends FieldValue {
    }
    return this.defaultCompareTo(other);
  }
+
+  byteSize(): number {
+    return this.internalValue.toUint8Array().byteLength;


All of the other calculations are cheap... should we add a byteSize to Blob?

To reduce size one must first increase size.

Done.

mikelehen · 2020-01-16T23:40:29Z

packages/firestore/src/model/field_value.ts

+
+  byteSize(): number {
+    // GeoPoints are made up of two distinct numbers (latitude + longitude)
+    return 8;


Should perhaps be 16 then, I think... although JavaScript may use some clever encoding of numbers that allows it to use 4-bytes for small numbers?

Made this, Timestamps and ServerTimestamps 16.

mikelehen · 2020-01-16T23:42:05Z

packages/firestore/src/model/field_value.ts

+  byteSize(): number {
+    let size = 0;
+    this.internalValue.inorderTraversal((key, val) => {
+      size += key.length + val.byteSize();


For consistency with your other code, this would be 2* key.length, right? But again, if we assume UTF-8 encoding, then this is okay.

FWIW it is tempting to create a Sizer helper class that consistently deals with data types. E.g. Sizer.string(key.length) + val.byteSize()

I was debating all of this too... but I think we are already widely off with our size accounting and keeping it simple has some value. I kept it as is for now (including the inconsistency with StringValue, which at least has a comment).

BTW, the same issue applies to ReferenceValues. We could add a Sizer class but that means more code and more cross-platform inconsistencies.

mikelehen · 2020-01-16T23:43:14Z

packages/firestore/src/model/field_value.ts

@@ -720,6 +782,13 @@ export class ArrayValue extends FieldValue {
    }
  }

+  byteSize(): number {
+    return this.internalValue.reduce(
+      (previousSize, value) => previousSize + value.byteSize(),


Would previousSize => totalSize or accumulatedSize be an improvement?

Sounds good. Picked totalSize.

mikelehen · 2020-01-16T23:45:24Z

packages/firestore/test/unit/model/field_value.test.ts

+
+  it('estimates size correctly for relatively sized values', () => {
+    // This test verifies for each group that the estimated size increases
+    // as the size of the underlying data grows.


wilhuff · 2020-01-22T01:35:03Z

packages/firestore/src/api/blob.ts

@@ -120,6 +120,11 @@ export class Blob {
    return this._binaryString === other._binaryString;
  }

+  _approximateByteSize(): number {
+    // Assume UTF-16 encoding in memory (see StringValue.approximateByteSize())
+    return this._binaryString.length * 2;


Blobs are just bytes. They're not necessarily encoded in any character set. this._binaryString.length is sufficient and the comment can be removed.

Hm. I would agree with you if we stored the Blob as a Uint8Array. At this point, our storage is a JavaScript string, which unfortunately seems to take up double the amount of memory.

wilhuff · 2020-01-22T02:26:24Z

packages/firestore/src/model/field_value.ts

@@ -126,6 +126,12 @@ export abstract class FieldValue {
  abstract isEqual(other: FieldValue): boolean;
  abstract compareTo(other: FieldValue): number;

+  /**
+   * Returns an approximate (and wildly inaccurate) in-memory size for the field
+   * value.


I think it's worth specifying that implementations:

should be guided by some standard assumptions (i.e. in-memory proto representation, not VM representation; ignoring variable-length encoding)

Should only account for user data, not any object overhead

The first point is important because there are some elements where the JSON or proto representations that are both right and wrong for this purpose. For example:

JSON encodes numbers in their textual form, but calculating this length is actually pretty expensive

protobuf uses varint encoding, wherein an int64 values from -64 to 63 encode as a single byte

Javascript declares boolean values to occupy a single byte

VM differences also shouldn't matter: i.e. in JavaScript, strings are UCS-2 or UTF-16, but in C++ (for std::string at least) they're UTF-8. Ideally we should define this in a way that's portable to all these implementations.

Having some consistent frame of reference for this will make it easier to understand if the values we're using are reasonable.

The second part is important because the full memory size doesn't actually matter. We can make representation choices (e.g. boxed value or not). This matters more in languages like Java, where it's more explicit, or C++ where it's front and center, but I think we should explicitly delineate that boxing/class overhead should be ignored. This also helps make this implementation portable.

As long as this is only used in Memory LRU, we should at least attempt to replicate the in-memory cost of storing a document, which makes the VM representation the biggest driving factor. I know that this implementation is already widely off (and probably aggressively undercounts), but I personally don't think that the implementation should be driven by platform convergence, JSON or Protobuf sizes, or any on-disk presentation. If we can, we should optimize for the user experience - if the memory LRU limit is set to 40 MBs, Firestore should try to approximate this.

With that said, I updated the comment to specify that it only accounts for user data and ignores object overhead.

wilhuff · 2020-01-22T15:08:42Z

packages/firestore/src/model/field_value.ts

+    // See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures:
+    // "JavaScript's String type is [...] a set of elements of 16-bit unsigned
+    // integer values"
+    return this.internalValue.length * 2;


Memory size isn't a super great proxy because it's not portable. Strings in C++ are encoded in UTF-8.

Would it make sense to define this as UTF-8-encoded size and then just approximate by saying that on average characters encode to 1.1 bytes?

I'm not sure this really matters, but it seems like a portable, single definition of what these sizes are would be useful.

wilhuff · 2020-01-22T15:09:56Z

packages/firestore/src/model/field_value.ts

+
+  approximateByteSize(): number {
+    // Timestamps are made up of two distinct numbers (seconds + nanoseconds)
+    return 16;


On other platforms this is an int (aka int32_t) and a long (aka int64_t). Should this be 12?

These are two number types in JavaScript that are both encoded as 64bit doubles.

wilhuff · 2020-01-22T18:21:57Z

packages/firestore/test/unit/model/field_value.test.ts

+    for (const group of equalityGroups) {
+      const expectedItemSize = group[0].byteSize();
+      for (const element of group) {
+        expect(element.byteSize()).to.equal(expectedItemSize);


Is it worth verifying/possible to verify that no size is zero?

I'm a little worried that these tests don't actually verify that we're aggregating the sizes of objects correctly.

This should now be taken care of by explicitly comparing the size to a provided value. I also added more types to this test case.

wilhuff · 2020-01-22T18:25:18Z

packages/firestore/test/unit/model/field_value.test.ts

@@ -485,4 +486,57 @@ describe('FieldValue', () => {
      }
    );
  });
+
+  it('estimates size correctly for fixed sized values', () => {


These tests are pretty clever in that they don't encode the actual sizes, but they don't directly verify the implementation. It would be more straightforward if the tests just listed out:

function sizeof(value) { return wrap(value).approximateByteSize(); } it('estimates size correctly for fixed sized values', () => { expect(sizeof(null)).to.be(4); expect(sizeof('')).to.be(0); // etc ... });

I added an explicit expectedByteSize.

wilhuff

LGTM

Use approximate FieldValue size for MemoryLRU

c0a86af

schmidt-sebastian requested a review from mikelehen January 16, 2020 23:01

schmidt-sebastian assigned mikelehen Jan 16, 2020

schmidt-sebastian added the api: firestore label Jan 16, 2020

[AUTOMATED]: Prettier Code Styling

e6a786f

schmidt-sebastian force-pushed the mrschmidt/byte branch from 31a25c8 to e6a786f Compare January 16, 2020 23:02

mikelehen approved these changes Jan 16, 2020

View reviewed changes

Review

300c9bd

mikelehen assigned schmidt-sebastian and unassigned mikelehen Jan 17, 2020

Merge branch 'master' into mrschmidt/byte

6b83a8a

schmidt-sebastian assigned wilhuff and unassigned schmidt-sebastian Jan 21, 2020

wilhuff reviewed Jan 22, 2020

View reviewed changes

wilhuff assigned schmidt-sebastian and unassigned wilhuff Jan 22, 2020

Feedback

9f756d4

schmidt-sebastian assigned wilhuff and unassigned schmidt-sebastian Jan 22, 2020

wilhuff approved these changes Jan 22, 2020

View reviewed changes

wilhuff assigned schmidt-sebastian and unassigned wilhuff Jan 22, 2020

schmidt-sebastian merged commit 5fb352a into master Jan 22, 2020

schmidt-sebastian deleted the mrschmidt/byte branch January 22, 2020 22:07

firebase locked and limited conversation to collaborators Feb 22, 2020

Use approximate FieldValue size for MemoryLRU #2548

Use approximate FieldValue size for MemoryLRU #2548

Uh oh!

Conversation

schmidt-sebastian commented Jan 16, 2020

Uh oh!

mikelehen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

schmidt-sebastian Jan 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

schmidt-sebastian Jan 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wilhuff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

schmidt-sebastian Jan 17, 2020 •

edited

Loading

schmidt-sebastian Jan 17, 2020 •

edited

Loading