ModelJson

Model representation

It is sometime useful to specify the model to use for a given use-case outside of the main code of the project. For example in a json like object. This can have several advantages :

  • allow the change of the underlying model without change any code (example : shift from a RandomForestClassifier to a LGBMClassifier)
  • allow the same code to be used for different sub problem BUT allowing specific hyper-parameters/models for each sub problems
  • easier to incorporate model that were found automatically by an ml_machine

To be able to do that we need to save the description of a complex model into a simple json like format.

The syntax is easy : a model is represented by a tuple with its name and its hyper-parameters.

Example, the model:

RandomForestClassifier(n_estimators=100)

is represented by the object:

("RandomForestClassifier",{"n_estimators":100})

So : klass(**kwargs) is equivalent to (‘klass’,kwargs)

Let’s take a more complexe example using a GraphPipeline:

gpipeline = GraphPipeline(models = {"vect" : CountVectorizerWrapper(analyzer="char",ngram_range=(1,4)),
                                        "svd"  : TruncatedSVDWrapper(n_components=400) ,
                                        "logit" : LogisticRegression(class_weight="balanced")},
                               edges = [("vect","svd","logit")]
                               )

is represented by:

json_object = ("GraphPipeline", {"models": {"vect" : ("CountVectorizerWrapper"  , {"analyzer":"char","ngram_range":(1,4)} ),
                             "svd"  : ("TruncatedSVDWrapper"     , {"n_components":400}) ,
                             "logit": ("LogisticRegression" , {"class_weight":"balanced"}) },
                  "edges":[("vect","svd","logit")]
                  })

So if a given model uses other models as parameters it works as well.

Model conversion

Once the object is create you can convert it to a real (unfitted) model using aikit.model_definition.sklearn_model_from_param()

sklearn_model_from_param(json_object)

which gives a model that can be fitted.

Json saving

That representation uses only simple types that are json serializable (string, number, list, dictionnary) and the json can be saved on disk.

Remark : since json doesn’t allow :
  • tuple (only list are known)
  • dictionnary with non string keys

it is best to overried the json serializer to handle those type. The special encoder is found in :module:`aikit.tools.json_helper` and ‘save_json’ and ‘load_json’ can be used directly

Example saving the ‘json_object’ above:

from aikit.tools.json_helper import save_json
save_json(json_object, fname ="model.json")

realoaded_json_object = load_json("model.json")
The special serializer works by transforming un-handle type into a dictionnary with
  • a ‘__items__’ key with a list of object
  • a ‘__type__’ key with the original type

Example:

("a","b")

is transformed into:

{"__items__":["a","b"], "__type__":"__tuple__"}
The handle types are :
  • dict : ‘__dict__’
  • tuple

Model Register

To be able to use a given model using only its name all the models should be registred in a dictionnary.

This is done within aikit.simple_model_registration, in that file you have a DICO_NAME_KLASS object which stored the classes of every model. To add a new model simple use the add_klass method.

Example:

DICO_NAME_KLASS.add_klass(LGBMClassifier)
DICO_NAME_KLASS.add_klass(LGBMRegressor)

Remark : this registrer is different from the one used for the automatic machine learning part (ml_machine) which contain more informations (hyper-parameters, type, …)