Issues with prediction time not proportional w.r.t. number of trees in RF #681

soufianekhoudmi · 2019-03-04T13:16:33Z

Please fill out the form below.

System Information

Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): SKLearn/Custom
Framework Version: 0.20.0
Python Version: 3.5
CPU or GPU: CPU
Python SDK Version: 1.18.2
Are you using a custom image: No

Describe the problem

My prediction time is not proportional to the number of trees in a Random Forest

Minimal repro / logs

My estimation strategy consists on using a set of Random Forest models, each one concerns some
subset of data (ex : RF_A if feature == A). This has been said seek of completeness as I don't think this affects my issue.

My deployment strategy:

Fit: return a pickle that contains a dictionary of fitted sklearn Random Forest models
Deploy: load these dictionaries in memory.
Inference:
--maps each observation to the correct model in the already loaded dictionary
--for each observation, computes predictions given by each tree in order to allow for elementary confidence interval computation
http://blog.datadive.net/prediction-intervals-for-random-forests/
Note that this last operation is the most time consuming in the inference and the time is proportional to the number of trees in my RF (loop w.r.t. trees).

My code (my custom code in lib) :

import argparse
import os
import sys
import pandas as pd
from sklearn.externals import joblib
module_path = os.path.abspath('/opt/ml/code')
if module_path not in sys.path:
    sys.path.append(module_path)
from lib import training, prediction
from data.transactions import raw

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
    args = parser.parse_args()
    grid_models_dict =\
        training.train_models_in_dict(raw_training_data=raw)
    joblib.dump(grid_models_dict, os.path.join(args.model_dir, "model"))
def model_fn(model_dir):
    grid_models_dict = joblib.load(os.path.join(model_dir, "model"))
    return grid_models_dict
def predict_fn(input_data, model):
    predicted = prediction.predict(input_data, model)
    return predicted

My problem :

I have two deployments scenarios : one with 100 trees/RF and one with 300 trees/RF.
Fit is performed without issues. On S3 : compressed 100 trees/RF pickle is 261 Mo and compressed 300 trees/RF is 784 Mo.
Deploy is done with some issues : some timeout with some workers with the 300 trees/RF already reported for example aws/amazon-sagemaker-examples#556, but it deploy at the end.
Prediction is performed :

with the 100 trees/RF in around 500 ms, always, with the same observation
with the 300 trees/RF: in paper, with the same observation, due my prediction nature which is a for loop w.r.t. trees, I am supposed to predict in maximum 1.5 seconds
with the 300 trees/RF : in practice, with the same observation
-- sometimes (33% of cases) in 700 ms,
-- sometimes (33% of cases) in 40 to 50 seconds,
-- and sometimes (33% of cases) I have a timeout error (inference timeout is limited to 60 seconds)
This behavior remains when I deploy in a bigger/recent machine. (ml.t2.xlarge to ml.c5.4xlarge)

My guess is that there is a memory swapping mechanism or that the container's memory is not fully privately allocated to me after some threshold.

Is there any solution to predict consistently with more than 100trees/RF ?

Thanks in advance.

The text was updated successfully, but these errors were encountered:

laurenyu · 2019-04-17T16:42:12Z

hi @soufianekhoudmi, thanks for your patience! we've reached out to the team that is responsible for SageMaker's Scikit-learn support to see if they have any insight.

asadoughi · 2019-07-24T17:16:55Z

Hi @soufianekhoudmi, I would suggest profiling your predict function's performance outside of SageMaker (e.g. on SageMaker notebooks or vanilla EC2) to better understand its bottlenecks. If you see different performance characteristics outside of SageMaker, please report what you find along with relevant code and models to assist in reproducing the issue, if possible. Thanks.

laurenyu · 2019-09-06T20:28:53Z

closing due to inactivity

@xinlutu2

Co-authored-by: Xinlu Tu <[email protected]> Co-authored-by: Xinlu Tu <[email protected]> @xinlutu2 feat: Close feature gaps between Python SageMaker SDK and CreateAutoMLJob API includes ENSEMBLING mode (aws#681) @xinlutu2 feature: add AutoMLStep for SageMaker Pipelines Workflows (aws#693) @xinlutu2 feature: add AutoMLStep integration test (aws#713)

soufianekhoudmi changed the title ~~Issues with prediction time~~ Issues with prediction time not proportional w.r.t. number of trees in RF Mar 4, 2019

laurenyu added the type: question label Apr 17, 2019

laurenyu closed this as completed Sep 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with prediction time not proportional w.r.t. number of trees in RF #681

Issues with prediction time not proportional w.r.t. number of trees in RF #681

soufianekhoudmi commented Mar 4, 2019 •

edited

Loading

laurenyu commented Apr 17, 2019

asadoughi commented Jul 24, 2019

laurenyu commented Sep 6, 2019

Issues with prediction time not proportional w.r.t. number of trees in RF #681

Issues with prediction time not proportional w.r.t. number of trees in RF #681

Comments

soufianekhoudmi commented Mar 4, 2019 • edited Loading

System Information

Describe the problem

Minimal repro / logs

laurenyu commented Apr 17, 2019

asadoughi commented Jul 24, 2019

laurenyu commented Sep 6, 2019

soufianekhoudmi commented Mar 4, 2019 •

edited

Loading