ExperimentTracker¶
Use the ExperimentTracker class to connect to the trackserver and
report progress and metrics of experiments.
Example:
from traintrack.client import ExperimentTracker
tracker = ExperimentTracker()
for epoch in range(1, 11):
tracker.begin_epoch(epoch)
for i, batch in enumerate(batches):
# train on a batch
# ...
tracker.progress(i+1, n_batches)
# report metrics for the epoch
tracker.metric('loss/train', loss_train)
tracker.metric('loss/valid', loss_valid)
tracker.metric('acc/valid', acc_valid)
tracker.end_epoch()
-
class
traintrack.client.ExperimentTracker(experiment_id=None, host='0.0.0.0', port=4242, first_epoch=1, default_log_level='INFO', async_=False)¶ Experiment tracker client.
The experiment tracker client is used to communicate with a trackserver over ZeroRPC to report experiment configuration, metrics, and progress. The server then sends this information to configured backend services.
- Args:
- experiment_id (str, optional): identifier for the current experiment
- that will be tracked. This is used by the server to uniquely identify the experiment and often backend services to write to log files and databases, etc. If unspecified, it will be generated based on the current date and time.
- host (str, optional): the host name the server is running on. Default:
'0.0.0.0'- port (int, optioal): TCP port number that the server is running on.
- Default:
4242 - first_epoch (int, optional): the number of the epoch that will be
- sent to the server when
begin_epochis first called. Useful for resuming stopped experiments. Default:1 - default_log_level (str, optional): default logging level when none is
- specified in calls to
log. Default:'INFO' - async_ (bool, optional): whether to send messages to the server
- asynchronously. If enabled, method calls will return immediately
without waiting on a response from the server. This can be enabled
if you are worried about communication with the server slowing down
your experiments. Default:
False
-
begin_epoch(epoch=None)¶ Start a new training epoch
- Args:
- epoch (int, optional): if specified the given epoch will be sent to
- the server. Otherwise the last epoch will be incremented and sent to the server.
-
begin_task(name=None)¶ Start a new subtask (e.g. train, validation, etc).
- Args:
- name (str, optional): task name.
-
critical(text)¶ Convenience method to send a logging message with CRITICAL log level to the server.
- Args:
- test (str): the text to log.
-
debug(text)¶ Convenience method to send a logging message with DEBUG log level to the server.
- Args:
- test (str): the text to log.
-
description(text)¶ Report an description of the current experiment.
-
end_epoch()¶ End the current epoch.
-
end_task()¶ End current subtask.
-
error(text)¶ Convenience method to send a logging message with ERROR log level to the server.
- Args:
- test (str): the text to log.
-
image(name, image, pixel_order=None)¶ Report an image (e.g. a set of filters learned our outputs of a segmentation algorithm, etc.). The image will automatically be associated with the current training epoch.
- Args:
name (str): name of the image (e.g.
'filters').image (np.ndarray or PIL.Image): the image to report.
- pixel_order (str, optional): the order of the pixels in
- the
ndarray. Can be'CHW'for channels, height, width, or'HHC'for height, width, channels. By default, the image encoding algorithm will attempt to guess based on the dimensions of thendarray.
-
info(text)¶ Convenience method to send a logging message with INFO log level to the server.
- Args:
- test (str): the text to log.
-
log(text, level=None)¶ Send a logging message to the server.
- Args:
test (str): the text to log.
- level (str, optional): the log level. If unspecified, defaults to
self.default_log_level.
-
metric(name, value)¶ Report a (scalar) metric like training loss or validation accuracy. The metric will automatically be associated with the current training epoch.
- Args:
name (str): name of the metric (e.g.
'loss/train').value (float): the value of the metric
-
parameter(name, value)¶ Report an experiment parameter or hyperparameter (e.g. learning rate).
- Args:
name (str): name of the parameter (e.g.
'lr').value: value of the parameter being used in the experiment.
-
progress(completed, total, info=None)¶ Report progress on the current epoch.
- Args:
completed (int): number of items (e.g. batches) completed.
total (int): number of items (e.g. batches) in total.
info (str, optional): extra information to be shown.
-
warn(text)¶ Convenience method to send a logging message with WARNING log level to the server.
- Args:
- test (str): the text to log.