Auto ML Launcher¶

To help to launch the ml machine jobs you can use the MlMachineLauncher object.

it :

contains the configurations of the auto-ml

has methods to launch a controller, a worker, … (See after)

has a method to process command line arguments to quickly create a script that can be used to drive the ml-process (See after)

The easiest way is to create a script like the following one.

Example:

from aikit.datasets import load_dataset, DatasetEnum
from aikit.ml_machine import MlMachineLauncher

def loader():
    """ modify this function to load the data

    Returns
    -------
    dfX, y

    Or
    dfX, y, groups

    """
    dfX, y, *_ = load_dataset(DatasetEnum.titanic)
    return dfX, y

def set_configs(launcher):
    """ modify that function to change launcher configuration """

    launcher.job_config.score_base_line = 0.75
    launcher.job_config.allow_approx_cv = True

    return launcher

if __name__ == "__main__":
    launcher = MlMachineLauncher(base_folder = "C:/automl/titanic",
                                 name = "titanic",
                                 loader = loader,
                                 set_configs = set_configs)

    launcher.execute_processed_command_argument()

(in what follows we will assume that this is the content of the “automl_launcher.py” file)

Here is what is going on:

first import the launcher
define a loader function : it is better to define a loading function that can be called if needed instead of just loading the data (because you don’t always need the data)
create a launcher, with a base folder and the loading function

4. (Optional) : you can change a few things in the configurations. Here we set the base line to 75% and tell the auto-ml that it can do approximate cross-validation. (See the ‘advanced functionnalities’ section) To do that pass a function that change the configuration. 5. Process the command argument to actually start a command

Remarks:

if no change in the default configurations are needed you can use set_configs = None

the loading function can also return 3 things : dfX, y and groups (if a group cross-validation is needed)

don’t forget the ‘if __name__ == “__main__”’ part, since the code uses subprocess it is really needed

Having created a script like that you can now use the script to drive the auto-ml process :

to start a controller and n workers
to aggregate the result
to separately start a controller
to separately start worker(s)
fit a specific model
…

what you need to specify ?¶

For the automl to work you need to specify a few things:

a loader function

This function will load your data, it should return a DataFrame with features (dfX), the target (y), and optionnaly the groups (if you want to use a GroupedCV) It will be called only once during the initialisation phase. So if you’re loading data you don’t need to save it a shared folder accessible by all the worker. (After it is called, the auto-ml will persist everything needed)

a base folder : the folder on which the automl will work.

This folder should be accessible by all the workers and the controller. It will be used to save result, save the queue of jobs, the logs, …

set_configs function : a function to modify the settings of the automl

You can modify the cv, the base line, the scoring, … (See ml_machine_launcher_advanced for details).

run command¶

This is the main command, it will start everything that is needed. To start the whole process, you should use the ‘run’ command, in a command windows you can run:

python automl_launcher.py run

This is the main command, it will

load the data using the loader
initialize everything
modify configuration
save everything needed to disk
start one controller in a subprocess
start one worker

You can also start more than one worker, to do that, the “-n” command should be used:

python automl_launcher.py run -n 4

This will create a total of 4 workers (and also 1 controller), so at the end you’ll have 5 python processes running

manual start¶

You can also use this script to start everything manually. That way you can

do the initialization manually
have one console for the controller
have separate consoles for workers

To do that you need the same steps as before.

init command¶

If you only want to initialize everything, you can run the ‘init’ command:

python automl_launcher.py init

This won’t start anything (no worker, no controller), but will load the data, prepare the configuration and apply the change and persist everything to disk.

manual init¶

alternatively you can do that manually in a notebook or your favorite IDE. That way you can actually see what the default configuration, prepare the data, etc.

Here is the code to do that:

launcher.MlMachineLauncher(base_folder="C:/automl/titanic", loader=loader)
launcher.initialize()
launcher.job_config.base_line = 0.75
launcher.auto_ml_config.columns_informations["Pclass"]["TypeOfVariable"] = "TEXT"

# ... here you can take a look at job_config and auto_ml_config
# ... any other change

launcher.persist()

controller command¶

If you only want to start a controller, you should use the ‘controller’ command:

python automl_launcher.py controller

This will start one controller (in the main process)

worker command¶

If you only want to start worker(s) you should use the ‘worker’ command:

python automl_launcher.py worker -n 2

This will start 2 workers (one in main process and one in a subprocess). For it to do anything a controller needs to be started elsewhere. This command is useful to add new workers to an existing task, or to add new worker on another computer (assuming the controller is running elsewhere).

result command¶

If you want to launch the aggregation of result, you can use the ‘result’ command:

python automl_launcher.py result

This will trigger the results aggregations and generate the excel result file

stop command¶

If you want to stop every process, you can use the ‘stop’ command:

python automl_launcher.py stop

It will create the stop file that will trigger the exit of all process listening to that folder

fit command¶

If you want to fit one or more specific model(s), you can use the ‘fit’ command. You’ll need to specify the job_id(s) to fit:

python automl_launcher.py fit --job_ids 77648ab95306e564c4c230e8469e9470

Or:

python automl_launcher.py fit --job_ids 77648ab95306e564c4c230e8469e9470,469ee473a55a4d1376d3c3186c95f048

To fit more that one model. The models will be saved within ‘saved_models’ along with their json.

Summary¶

To start a new experiment, first create the script with the example above then use run command.

If you want to split everything you can use

launcher.initialize()

apply modifications

launcher.persist()

controller command

worker command

Whenever you want an aggregation of results : result command