Skip to content

The yaml loading is slow when reading big openapi files (3MB+). Especially while debugging. #145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Wim-De-Clercq opened this issue Feb 16, 2022 · 1 comment

Comments

@Wim-De-Clercq
Copy link
Contributor

Wim-De-Clercq commented Feb 16, 2022

The main reason there is a custom loader in this library is to replace integer keys into strings.
However, the json library from python could handle this for you.

It is in fact faster to use the original yaml loaders in combination with json dumping and loading than using a custom loader.
And if the LibYAML package is installed it's even factor 9-10 faster.

(note you have to edit the paths if you want to run the below)

import json

from yaml import SafeLoader
from yaml import CSafeLoader
from yaml import load

from openapi_spec_validator.loaders import ExtendedSafeLoader


def read_yaml_file(path, loader=ExtendedSafeLoader):
    """Open a file, read it and return its contents."""
    with open(path) as fh:
        return load(fh, loader)


def read_yaml_file_fast(path, loader=SafeLoader):
    """Open a file, read it and return its contents."""
    with open(path) as fh:
        return json.loads(json.dumps(load(fh, loader)))


def read_yaml_file_faster(path, loader=CSafeLoader):
    """Open a file, read it and return its contents."""
    with open(path) as fh:
        return json.loads(json.dumps(load(fh, loader)))


if __name__ == '__main__':

    import timeit
    result = timeit.timeit(
        "read_yaml_file('EDIT/openapi-spec-validator/tests/integration/data/v3.1/petstore.yaml')",
        "from __main__ import read_yaml_file",
        number=1000
    )
    print("original:", result)
    result = timeit.timeit(
        "read_yaml_file_fast('EDIT/openapi-spec-validator/tests/integration/data/v3.1/petstore.yaml')",
        "from __main__ import read_yaml_file_fast",
        number=1000
    )
    print("+json:", result)
    result = timeit.timeit(
        "read_yaml_file_faster('EDIT/openapi-spec-validator/tests/integration/data/v3.1/petstore.yaml')",
        "from __main__ import read_yaml_file_faster",
        number=1000
    )
    print("cloader + json:", result)
original: 9.896625495999615
+json: 9.384017411000968
cloader + json: 1.066654717998972

My 3+MB file takes 9.2 seconds to load with the original code. Versus 1,5seconds with the Cloader.
And with debugging mode on, this becomes 40+ seconds with original versus 4 seconds with Cloader.

See PR: #146

Wim-De-Clercq added a commit to Wim-De-Clercq/openapi-spec-validator that referenced this issue Feb 16, 2022
This change allows the yaml loading to use the original
yaml loaders which are much faster.

Issue python-openapi#145
p1c2u pushed a commit to Wim-De-Clercq/openapi-spec-validator that referenced this issue Jun 21, 2022
This change allows the yaml loading to use the original
yaml loaders which are much faster.

Issue python-openapi#145
@Wim-De-Clercq
Copy link
Contributor Author

Closing because PR is merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant