Skip to content

Use approximate FieldValue size for MemoryLRU #2548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 22, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions packages/firestore/src/api/blob.ts
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,11 @@ export class Blob {
return this._binaryString === other._binaryString;
}

_approximateByteSize(): number {
// Assume UTF-16 encoding in memory (see StringValue.approximateByteSize())
return this._binaryString.length * 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blobs are just bytes. They're not necessarily encoded in any character set. this._binaryString.length is sufficient and the comment can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I would agree with you if we stored the Blob as a Uint8Array. At this point, our storage is a JavaScript string, which unfortunately seems to take up double the amount of memory.

}

/**
* Actually private to JS consumers of our API, so this function is prefixed
* with an underscore.
Expand Down
26 changes: 6 additions & 20 deletions packages/firestore/src/local/memory_persistence.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,13 @@
*/

import { User } from '../auth/user';
import { MaybeDocument } from '../model/document';
import { Document, MaybeDocument } from '../model/document';
import { DocumentKey } from '../model/document_key';
import { JsonProtoSerializer } from '../remote/serializer';
import { fail } from '../util/assert';
import { debug } from '../util/log';
import * as obj from '../util/obj';
import { ObjectMap } from '../util/obj_map';
import { encode } from './encoded_resource_path';
import { LocalSerializer } from './local_serializer';
import {
ActiveTargets,
LruDelegate,
Expand Down Expand Up @@ -77,11 +75,10 @@ export class MemoryPersistence implements Persistence {

static createLruPersistence(
clientId: ClientId,
serializer: JsonProtoSerializer,
params: LruParams
): MemoryPersistence {
const factory = (p: MemoryPersistence): MemoryLruDelegate =>
new MemoryLruDelegate(p, new LocalSerializer(serializer), params);
new MemoryLruDelegate(p, params);
return new MemoryPersistence(clientId, factory);
}

Expand Down Expand Up @@ -334,7 +331,6 @@ export class MemoryLruDelegate implements ReferenceDelegate, LruDelegate {

constructor(
private readonly persistence: MemoryPersistence,
private readonly serializer: LocalSerializer,
lruParams: LruParams
) {
this.garbageCollector = new LruGarbageCollector(this, lruParams);
Expand Down Expand Up @@ -471,21 +467,11 @@ export class MemoryLruDelegate implements ReferenceDelegate, LruDelegate {
}

documentSize(maybeDoc: MaybeDocument): number {
const remoteDocument = this.serializer.toDbRemoteDocument(
maybeDoc,
maybeDoc.version
);
let value: unknown;
if (remoteDocument.document) {
value = remoteDocument.document;
} else if (remoteDocument.unknownDocument) {
value = remoteDocument.unknownDocument;
} else if (remoteDocument.noDocument) {
value = remoteDocument.noDocument;
} else {
throw fail('Unknown remote document type');
let documentSize = maybeDoc.key.toString().length;
if (maybeDoc instanceof Document) {
documentSize += maybeDoc.data().approximateByteSize();
}
return JSON.stringify(value).length;
return documentSize;
}

private isPinned(
Expand Down
72 changes: 72 additions & 0 deletions packages/firestore/src/model/field_value.ts
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,15 @@ export abstract class FieldValue {
abstract isEqual(other: FieldValue): boolean;
abstract compareTo(other: FieldValue): number;

/**
* Returns an approximate (and wildly inaccurate) in-memory size for the field
* value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth specifying that implementations:

  • should be guided by some standard assumptions (i.e. in-memory proto representation, not VM representation; ignoring variable-length encoding)
  • Should only account for user data, not any object overhead

The first point is important because there are some elements where the JSON or proto representations that are both right and wrong for this purpose. For example:

  • JSON encodes numbers in their textual form, but calculating this length is actually pretty expensive
  • protobuf uses varint encoding, wherein an int64 values from -64 to 63 encode as a single byte
  • Javascript declares boolean values to occupy a single byte
  • VM differences also shouldn't matter: i.e. in JavaScript, strings are UCS-2 or UTF-16, but in C++ (for std::string at least) they're UTF-8. Ideally we should define this in a way that's portable to all these implementations.

Having some consistent frame of reference for this will make it easier to understand if the values we're using are reasonable.

The second part is important because the full memory size doesn't actually matter. We can make representation choices (e.g. boxed value or not). This matters more in languages like Java, where it's more explicit, or C++ where it's front and center, but I think we should explicitly delineate that boxing/class overhead should be ignored. This also helps make this implementation portable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as this is only used in Memory LRU, we should at least attempt to replicate the in-memory cost of storing a document, which makes the VM representation the biggest driving factor. I know that this implementation is already widely off (and probably aggressively undercounts), but I personally don't think that the implementation should be driven by platform convergence, JSON or Protobuf sizes, or any on-disk presentation. If we can, we should optimize for the user experience - if the memory LRU limit is set to 40 MBs, Firestore should try to approximate this.

With that said, I updated the comment to specify that it only accounts for user data and ignores object overhead.

*
* The memory size takes into account only the actual user data as it resides
* in memory and ignores object overhead.
*/
abstract approximateByteSize(): number;

toString(): string {
const val = this.value();
return val === null ? 'null' : val.toString();
Expand Down Expand Up @@ -167,6 +176,10 @@ export class NullValue extends FieldValue {
return this.defaultCompareTo(other);
}

approximateByteSize(): number {
return 4;
}

static INSTANCE = new NullValue();
}

Expand Down Expand Up @@ -195,6 +208,10 @@ export class BooleanValue extends FieldValue {
return this.defaultCompareTo(other);
}

approximateByteSize(): number {
return 4;
}

static of(value: boolean): BooleanValue {
return value ? BooleanValue.TRUE : BooleanValue.FALSE;
}
Expand All @@ -221,6 +238,10 @@ export abstract class NumberValue extends FieldValue {
}
return this.defaultCompareTo(other);
}

approximateByteSize(): number {
return 8;
}
}

/** Utility function to compare doubles (using Firestore semantics for NaN). */
Expand Down Expand Up @@ -313,6 +334,13 @@ export class StringValue extends FieldValue {
}
return this.defaultCompareTo(other);
}

approximateByteSize(): number {
// See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures:
// "JavaScript's String type is [...] a set of elements of 16-bit unsigned
// integer values"
return this.internalValue.length * 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if we're going for the on-disk size, this will probably (not 100% sure about IndexedDb) be UTF-8 encoded which means most characters will be 1-byte, not 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly (at least for now) for memory size. IndexedDB size accounting also uses JSON.stringify(), but on already available Protobuf data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory size isn't a super great proxy because it's not portable. Strings in C++ are encoded in UTF-8.

Would it make sense to define this as UTF-8-encoded size and then just approximate by saying that on average characters encode to 1.1 bytes?

I'm not sure this really matters, but it seems like a portable, single definition of what these sizes are would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strings are the only primitive with a specified memory footprint on https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures. I am ok with "dumbing this down" and just using 1 byte, but as eluded to above, I don't think aiming for consistency among platforms is a worthy goal.

}
}

export class TimestampValue extends FieldValue {
Expand Down Expand Up @@ -347,6 +375,11 @@ export class TimestampValue extends FieldValue {
return this.defaultCompareTo(other);
}
}

approximateByteSize(): number {
// Timestamps are made up of two distinct numbers (seconds + nanoseconds)
return 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On other platforms this is an int (aka int32_t) and a long (aka int64_t). Should this be 12?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are two number types in JavaScript that are both encoded as 64bit doubles.

}
}

/**
Expand Down Expand Up @@ -410,6 +443,13 @@ export class ServerTimestampValue extends FieldValue {
toString(): string {
return '<ServerTimestamp localTime=' + this.localWriteTime.toString() + '>';
}

approximateByteSize(): number {
return (
/* localWriteTime */ 16 +
(this.previousValue ? this.previousValue.approximateByteSize() : 0)
);
}
}

export class BlobValue extends FieldValue {
Expand All @@ -436,6 +476,10 @@ export class BlobValue extends FieldValue {
}
return this.defaultCompareTo(other);
}

approximateByteSize(): number {
return this.internalValue._approximateByteSize();
}
}

export class RefValue extends FieldValue {
Expand Down Expand Up @@ -466,6 +510,14 @@ export class RefValue extends FieldValue {
}
return this.defaultCompareTo(other);
}

approximateByteSize(): number {
return (
this.databaseId.projectId.length +
this.databaseId.database.length +
this.key.toString().length
);
}
}

export class GeoPointValue extends FieldValue {
Expand All @@ -492,6 +544,11 @@ export class GeoPointValue extends FieldValue {
}
return this.defaultCompareTo(other);
}

approximateByteSize(): number {
// GeoPoints are made up of two distinct numbers (latitude + longitude)
return 16;
}
}

export class ObjectValue extends FieldValue {
Expand Down Expand Up @@ -634,6 +691,14 @@ export class ObjectValue extends FieldValue {
return FieldMask.fromSet(fields);
}

approximateByteSize(): number {
let size = 0;
this.internalValue.inorderTraversal((key, val) => {
size += key.length + val.approximateByteSize();
});
return size;
}

toString(): string {
return this.internalValue.toString();
}
Expand Down Expand Up @@ -720,6 +785,13 @@ export class ArrayValue extends FieldValue {
}
}

approximateByteSize(): number {
return this.internalValue.reduce(
(totalSize, value) => totalSize + value.approximateByteSize(),
0
);
}

toString(): string {
const descriptions = this.internalValue.map(v => v.toString());
return `[${descriptions.join(',')}]`;
Expand Down
2 changes: 1 addition & 1 deletion packages/firestore/test/integration/api/database.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -814,7 +814,7 @@ apiDescribe('Database', (persistence: boolean) => {
// eslint-disable-next-line no-restricted-properties
describe.skip('Listens are rejected remotely:', () => {
//eslint-disable-next-line @typescript-eslint/no-explicit-any
const queryForRejection : firestore.Query = null as any;
const queryForRejection: firestore.Query = null as any;

it('will reject listens', () => {
const deferred = new Deferred();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -128,11 +128,7 @@ export async function testMemoryEagerPersistence(): Promise<MemoryPersistence> {
export async function testMemoryLruPersistence(
params: LruParams = LruParams.DEFAULT
): Promise<MemoryPersistence> {
return MemoryPersistence.createLruPersistence(
AutoId.newId(),
JSON_SERIALIZER,
params
);
return MemoryPersistence.createLruPersistence(AutoId.newId(), params);
}

/** Clears the persistence in tests */
Expand Down
93 changes: 93 additions & 0 deletions packages/firestore/test/unit/model/field_value.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import { expect } from 'chai';
import { GeoPoint } from '../../../src/api/geo_point';
import { Timestamp } from '../../../src/api/timestamp';
import * as fieldValue from '../../../src/model/field_value';
import { primitiveComparator } from '../../../src/util/misc';
import * as typeUtils from '../../../src/util/types';
import {
blob,
Expand Down Expand Up @@ -485,4 +486,96 @@ describe('FieldValue', () => {
}
);
});

it('estimates size correctly for fixed sized values', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are pretty clever in that they don't encode the actual sizes, but they don't directly verify the implementation. It would be more straightforward if the tests just listed out:

function sizeof(value) {
  return wrap(value).approximateByteSize();
}

it('estimates size correctly for fixed sized values', () => {
  expect(sizeof(null)).to.be(4);
  expect(sizeof('')).to.be(0);
  // etc ...

});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an explicit expectedByteSize.

// This test verifies that each member of a group takes up the same amount
// of space in memory (based on its estimated in-memory size).
const equalityGroups = [
{ expectedByteSize: 4, elements: [wrap(null), wrap(false), wrap(true)] },
{
expectedByteSize: 4,
elements: [wrap(blob(0, 1)), wrap(blob(128, 129))]
},
{
expectedByteSize: 8,
elements: [wrap(NaN), wrap(Infinity), wrap(1), wrap(1.1)]
},
{
expectedByteSize: 16,
elements: [wrap(new GeoPoint(0, 0)), wrap(new GeoPoint(0, 0))]
},
{
expectedByteSize: 16,
elements: [wrap(Timestamp.fromMillis(100)), wrap(Timestamp.now())]
},
{
expectedByteSize: 16,
elements: [
new fieldValue.ServerTimestampValue(Timestamp.fromMillis(100), null),
new fieldValue.ServerTimestampValue(Timestamp.now(), null)
]
},
{
expectedByteSize: 20,
elements: [
new fieldValue.ServerTimestampValue(
Timestamp.fromMillis(100),
wrap(true)
),
new fieldValue.ServerTimestampValue(Timestamp.now(), wrap(false))
]
},
{
expectedByteSize: 11,
elements: [
new fieldValue.RefValue(dbId('p1', 'd1'), key('c1/doc1')),
new fieldValue.RefValue(dbId('p2', 'd2'), key('c2/doc2'))
]
},
{ expectedByteSize: 6, elements: [wrap('foo'), wrap('bar')] },
{ expectedByteSize: 4, elements: [wrap(['a', 'b']), wrap(['c', 'd'])] },
{
expectedByteSize: 6,
elements: [wrap({ a: 'a', b: 'b' }), wrap({ c: 'c', d: 'd' })]
}
];

for (const group of equalityGroups) {
for (const element of group.elements) {
expect(element.approximateByteSize()).to.equal(group.expectedByteSize);
}
}
});

it('estimates size correctly for relatively sized values', () => {
// This test verifies for each group that the estimated size increases
// as the size of the underlying data grows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever!

const relativeGroups = [
[wrap(blob(0)), wrap(blob(0, 1))],
[
new fieldValue.ServerTimestampValue(Timestamp.fromMillis(100), null),
new fieldValue.ServerTimestampValue(Timestamp.now(), wrap(null))
],
[
new fieldValue.RefValue(dbId('p1', 'd1'), key('c1/doc1')),
new fieldValue.RefValue(dbId('p1', 'd1'), key('c1/doc1/c2/doc2'))
],
[wrap('foo'), wrap('foobar')],
[wrap(['a', 'b']), wrap(['a', 'bc'])],
[wrap(['a', 'b']), wrap(['a', 'b', 'c'])],
[wrap({ a: 'a', b: 'b' }), wrap({ a: 'a', b: 'bc' })],
[wrap({ a: 'a', b: 'b' }), wrap({ a: 'a', bc: 'b' })],
[wrap({ a: 'a', b: 'b' }), wrap({ a: 'a', b: 'b', c: 'c' })]
];

for (const group of relativeGroups) {
const expectedOrder = group;
const actualOrder = group
.slice()
.sort((l, r) =>
primitiveComparator(l.approximateByteSize(), r.approximateByteSize())
);
expect(expectedOrder).to.deep.equal(actualOrder);
}
});
});
1 change: 0 additions & 1 deletion packages/firestore/test/unit/specs/spec_test_runner.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1238,7 +1238,6 @@ class MemoryTestRunner extends TestRunner {
? MemoryPersistence.createEagerPersistence(this.clientId)
: MemoryPersistence.createLruPersistence(
this.clientId,
serializer,
LruParams.DEFAULT
)
);
Expand Down