This article solves an arguably niche issue where we wish to log both to the mlflow
tracking server as well as other sources, such as a live terminal or some cloud storage destination. To be concrete, the aim of this post is to detail the setup process for creating a custom Logger
class in Python to simultaneously maintain API-compatibility with MLFlow’s most-used calls for storing metrics and parameters while storing the output to both the MLFlow tracking serve as well as any other destination (sys.stdout
a File
, etc).
Logging in Python
Python’s logging
setup is roughly as follows
Loggers
A logger
is the parent class which implements log
methods (debug
, info
, warning
, error
, critical
). Each log method is associated with a log level (debug
is lower than info
which is lower than warning
) and so on.
#---------------------------------------------------------------------------
# Level related stuff
# Source: https://github.com/python/cpython/blob/3.12/Lib/logging/__init__.py
#---------------------------------------------------------------------------
#
# Default levels and level names, these can be replaced with any positive set
# of values having corresponding names. There is a pseudo-level, NOTSET, which
# is only really there as a lower limit for user-defined levels. Handlers and
# loggers are initialized with NOTSET so that they will log all messages, even
# at user-defined levels.
#
CRITICAL = 50
FATAL = CRITICAL
ERROR = 40
WARNING = 30
WARN = WARNING
INFO = 20
DEBUG = 10
NOTSET = 0
This allows an application to batch-filter log messages. For example, a developer working on an application may filter to receive the lowest-level log messages but an end-user integrating the application may which to receive only critical
logs.
Handlers
Each logger must have at least one handler, but each handler can be attached to a single logger. Handlers send the log records generated by the logging class to the appropriate destination. For example, StreamHandler
typically sends output to sys.stdout
and sys.stderr
, while RotatingFileHandler
is used to write logs to a file.
Formatters
Each handler must have a formatter (if a formatter is not assigned explicitly, the default is used). A formatter specifies the layout of log records in the final output.
For more details into Loggers
, Handlers
and Formatters
, see
Logging HOWTO (Official Python Documentation)
Building a Custom Logger
The way we will build this is by first creating the MLFlowLogger
class which sub-classes (or inherits from) Python’s logging.Logger
class:
class MlflowLogger(logging.Logger):
def __init__(
self,
name: str='mlflow',
run_id: str=None,
level=logging.DEBUG,
*args,
**kwargs
):
self.run_id = run_id
# Start an MLflow run if not provided
if not self.run_id:
self.run = mlflow.start_run()
self.run_id = self.run.info.run_id
else:
mlflow.start_run(run_id=self.run_id)
super().__init__(name, *args, **kwargs)
logging.addLevelName(logging.INFO + 5, 'MLFLOW')
self.setLevel(level)
self.log(level=logging.INFO + 5, msg=f'{run_id=}')
There are two main things to note:
We check for whether an active
MLFlow
run is active, if one is not the logging class starts an active runWe define a custom logging level by calling
logging.addLevelName
. I wantedMLFlow
logs to be just above the defaultINFO
level, but belowWARNING
. There’s probably a cleaner way to do this, but creating the new level by callinglogging.addLevelName(logging.INFO + 5, 'MLFLOW')
works for the most part.
Now this just sets up a slightly modified python logger class, in order to make mlflow calls, we need to add the following methods to our class:
def log_param(self, param:str, value: Any, *args, **kwargs):
mlflow.log_param(param,value)
if self.isEnabledFor(self.level):
self.log(logging.INFO + 5, *args, stacklevel=2, msg=f'PARAM - {param}: {str(value)}', **kwargs)
def log_metric(self, metric:str, value: Any, *args, **kwargs):
mlflow.log_metric(metric,value)
if self.isEnabledFor(self.level):
self.log(logging.INFO + 5, *args, stacklevel=2, msg=f'METRIC - {metric}: {str(value)}', **kwargs)
This allows the logging class to call MLFlow
methods log_param
and log_metric
which logs to the MLFlow
tracking server, whilst simultaneously logging via Python’s logging handlers.
(You can add the respective log_params
and log_metrics
methods as well, these are left out for brevity)
NOTE: The parameter stacklevel
is set to 2
as opposed to the default 1
. Without getting into unnecessary details, this ensures that when logging module and filenames, the name of the calling module is returned as opposed to the name of the module containing the given logging class.
Using the above, we can create our MLFlowLogger
as follows:
`
Technically speaking, this is all you need, simply instantiate the class and use it, but we want to create custom handlers and custom formatters for these logs
Creating a Custom Formatter
This is more useful for on-the-fly monitoring but knowing how to set this up is useful for a variety of scenarios; what this builds is a class to format the message output from our logger. In this case, we want to assign different colors to each log message based on it’s type:
class RainbowFormatter(logging.Formatter):
COLORS = {
'DEBUG':'\033[1;32m',
'INFO': '\033[1;35m',
'WARNING': '\033[1;33m',
'ERROR': '\033[1;31m',
'CRITICAL': '\033[1;41m',
'MLFLOW': '\033[1;45m'
}
def format(self, record):
msg = super().format(record)
return RainbowFormatter.COLORS[record.levelname] + record.levelname + '\033[1;0m: ' + msg
Any subclass of logging.Formatter
must implement the format
method. The only difference from the default Formatter
is appending the given ANSI color to the log. This allows colored ouputs, for example, the following code:
Tying this All Together
Assuming we have:
The
MLFlowLogger
class created andRainbowFormatter
defining the desired color-scheme
From there, we do the following:
# 1 - Create our Handler
# in this case we use StreamHandler
# since we are printing to the console
handler = logging.StreamHandler()
# 2 Add the formatter to our handler
handler.setFormatter(RainbowFormatter())
# 3 Create our logger
logger = MlflowLogger(
name='mlflowlogger',
# ensuring we add our handler
handlers=[handler]
)
And that’s it! From here we have our custom logger created and configured, which logs both to the console as well as the mlflow tracking server.
As shown below, the metric we logged in the example output prior was also logged to the tracking server: