Integrating MLflow into Python Logging

Integrating MLflow into Python Logging

This article solves an arguably niche issue where we wish to log both to the mlflow tracking server as well as other sources, such as a live terminal or some cloud storage destination. To be concrete, the aim of this post is to detail the setup process for creating a custom Logger class in Python to simultaneously maintain API-compatibility with MLFlow’s most-used calls for storing metrics and parameters while storing the output to both the MLFlow tracking serve as well as any other destination (sys.stdout a File, etc).

Logging in Python

Python’s logging setup is roughly as follows

Loggers

A logger is the parent class which implements log methods (debug, info, warning, error, critical). Each log method is associated with a log level (debug is lower than info which is lower than warning) and so on.

#---------------------------------------------------------------------------
#   Level related stuff
#   Source: https://github.com/python/cpython/blob/3.12/Lib/logging/__init__.py
#---------------------------------------------------------------------------
#
# Default levels and level names, these can be replaced with any positive set
# of values having corresponding names. There is a pseudo-level, NOTSET, which
# is only really there as a lower limit for user-defined levels. Handlers and
# loggers are initialized with NOTSET so that they will log all messages, even
# at user-defined levels.
#

CRITICAL = 50
FATAL = CRITICAL
ERROR = 40
WARNING = 30
WARN = WARNING
INFO = 20
DEBUG = 10
NOTSET = 0

This allows an application to batch-filter log messages. For example, a developer working on an application may filter to receive the lowest-level log messages but an end-user integrating the application may which to receive only critical logs.

Handlers

Each logger must have at least one handler, but each handler can be attached to a single logger. Handlers send the log records generated by the logging class to the appropriate destination. For example, StreamHandler typically sends output to sys.stdout and sys.stderr, while RotatingFileHandler is used to write logs to a file.

Formatters

Each handler must have a formatter (if a formatter is not assigned explicitly, the default is used). A formatter specifies the layout of log records in the final output.

For more details into Loggers, Handlers and Formatters, see

Logging HOWTO (Official Python Documentation)

Building a Custom Logger

The way we will build this is by first creating the MLFlowLogger class which sub-classes (or inherits from) Python’s logging.Logger class:

class MlflowLogger(logging.Logger):  
    def __init__(
        self,
        name: str='mlflow',
        run_id: str=None,
        level=logging.DEBUG,
        *args, 
        **kwargs
      ):

    self.run_id = run_id

    # Start an MLflow run if not provided
    if not self.run_id:
        self.run = mlflow.start_run()
        self.run_id = self.run.info.run_id
    else:
        mlflow.start_run(run_id=self.run_id)

    super().__init__(name, *args, **kwargs)
    logging.addLevelName(logging.INFO + 5, 'MLFLOW')
    self.setLevel(level)
    self.log(level=logging.INFO + 5, msg=f'{run_id=}')

There are two main things to note:

  1. We check for whether an active MLFlow run is active, if one is not the logging class starts an active run

  2. We define a custom logging level by calling logging.addLevelName. I wanted MLFlow logs to be just above the default INFO level, but below WARNING. There’s probably a cleaner way to do this, but creating the new level by calling logging.addLevelName(logging.INFO + 5, 'MLFLOW') works for the most part.

Now this just sets up a slightly modified python logger class, in order to make mlflow calls, we need to add the following methods to our class:

  def log_param(self, param:str, value: Any, *args, **kwargs):
    mlflow.log_param(param,value)
    if self.isEnabledFor(self.level):     
      self.log(logging.INFO + 5, *args, stacklevel=2, msg=f'PARAM - {param}: {str(value)}', **kwargs)

  def log_metric(self, metric:str, value: Any, *args, **kwargs):
    mlflow.log_metric(metric,value)
    if self.isEnabledFor(self.level):
      self.log(logging.INFO + 5, *args, stacklevel=2, msg=f'METRIC - {metric}: {str(value)}', **kwargs)

This allows the logging class to call MLFlow methods log_param and log_metric which logs to the MLFlow tracking server, whilst simultaneously logging via Python’s logging handlers.

(You can add the respective log_params and log_metrics methods as well, these are left out for brevity)

NOTE: The parameter stacklevel is set to 2 as opposed to the default 1. Without getting into unnecessary details, this ensures that when logging module and filenames, the name of the calling module is returned as opposed to the name of the module containing the given logging class.

Using the above, we can create our MLFlowLogger as follows:

`


Technically speaking, this is all you need, simply instantiate the class and use it, but we want to create custom handlers and custom formatters for these logs


Creating a Custom Formatter

This is more useful for on-the-fly monitoring but knowing how to set this up is useful for a variety of scenarios; what this builds is a class to format the message output from our logger. In this case, we want to assign different colors to each log message based on it’s type:

class RainbowFormatter(logging.Formatter):
  COLORS = {
       'DEBUG':'\033[1;32m',
       'INFO': '\033[1;35m',
       'WARNING': '\033[1;33m',
       'ERROR': '\033[1;31m',
       'CRITICAL': '\033[1;41m',
       'MLFLOW':  '\033[1;45m'
    }

  def format(self, record):
    msg = super().format(record)
    return RainbowFormatter.COLORS[record.levelname] + record.levelname + '\033[1;0m: ' + msg

Any subclass of logging.Formatter must implement the format method. The only difference from the default Formatter is appending the given ANSI color to the log. This allows colored ouputs, for example, the following code:

Tying this All Together

Assuming we have:

  1. The MLFlowLogger class created and

  2. RainbowFormatter defining the desired color-scheme

From there, we do the following:

# 1 - Create our Handler
# in this case we use StreamHandler 
# since we are printing to the console
handler = logging.StreamHandler()

# 2 Add the formatter to our handler
handler.setFormatter(RainbowFormatter())

# 3 Create our logger
logger = MlflowLogger(
    name='mlflowlogger',
    # ensuring we add our handler
    handlers=[handler]
)

And that’s it! From here we have our custom logger created and configured, which logs both to the console as well as the mlflow tracking server.

As shown below, the metric we logged in the example output prior was also logged to the tracking server: