Usage

Towhee now provides a debug api for pipe to support profiling and tracing. Users can check the execution efficiency of intermediate nodes as well as their outputs by running a pipeline via debug. Also, users are allowed to specify the nodes to trace or not to trace by passing include or exclude args in debug.

example pipeline

from towhee import pipe, ops

p = (
    pipe.input('text')
        .map('text', 'emb', ops.text_embedding.data2vec())
        .output('text', 'emb')
)

v = p.debug('hello', batch=False, tracer=True, profiler=True, include=['embedding'])
v.tracer.show()
v.profiler.show()

arguments

batch: bool, whether to run in batch mode.

tracer: bool, whether to record intermediate nodes info, defaults to False.

profiler: bool, whether to record intermediate nodes efficiency, defaults to False.

include: str/list, the nodes to trace, defaults to None. If specified, only the specifed nodes will be traced.

exclude: str/list, the nodes not to trace, defaults to None.

details for include/exclude

Users are allowed to specify which nodes to include/exclude for tracing. When executing a video embedding task, the video decoder operator will output a large number of video frames, which we are not interested in, so we want to exclude them. On the other hand, sometimes we are particularly concerned about the information of a single operator and don’t care about the others, so we only want to include this operator only. Thus we provide include/exclude parameters in debug():

from towhee import pipe, ops

p = (
	pipe.input('url', 'title')
        .map('title', 'text_embedding', ops.text_embedding.data2vec())
		.flat_map('url', 'frames', ops.video_decode.ffmpeg())
		.map('frames', 'video_embedding', ops.image_embedding.timm(model_name='resnet34'))
		.output('text_embedding', 'video_embedding')
)

Exclude decode node by partial name

res = p.debug('your_video_path', tracer=True, exclude='decode')

Include embedding nodes only

res = p.debug('your_video_path', tracer=True, include=['embedding'])

Notes: Users can specify include/exclude nodes by partial names, note that all the nodes that contains the partial name will be regraded include/exclude. In the example above, we set include=['embedding'], both the text_embedding and image_embedding nodes will be traced in the pipeline.

show tracer

The tracer can be showed in two format, depends on the running enviroment:

res.tracer.show()

When showed in IPython enviroment:
When showed in command lines:

show profiler

Profiler records the performance of each node, including the total time, operator initialization time, data waiting time, operator execution time, etc…

res.profiler.show()

profiler

get pipeline info

Also, users can view the pipeline information in the form of a dict by property v.nodes:

res.nodes

nodes