towhee.engine.pipeline.Pipeline

class towhee.engine.pipeline.Pipeline(graph_repr: GraphRepr, parallelism: int = 1)[source]

Bases: object

The runtime pipeline context, include graph context, all dataframes.

Parameters:
  • graph_repr – (str or towhee.dag.GraphRepr) The graph representation either as a YAML-formatted string, or directly as an instance of GraphRepr.

  • parallelism – (int) The parallelism parameter dictates how many copies of the graph context we create. This is likely a low number (1-4) for local engines, but may be much higher for cloud instances.

Methods

register

Attributes

graph_repr

parallelism

pipeline_type

__call__(inputs: DataFrame) DataFrame[source]

Process an input DataFrame. This function instantiates an output DataFrame; upon completion, individual GraphContext outputs are merged into this dataframe. Inputs are weaved through the input DataFrame for each GraphContext as follows (parallelism = 3):

data[0] -> ctx[0] data[1] -> ctx[1] data[2] -> ctx[2] data[3] -> ctx[0] data[4] -> ctx[1]

Parameters:

inputs (towhee.dataframe.DataFrame) – Input DataFrame (with potentially multiple rows) to process.

Returns:

(towhee.dataframe.DataFrame)

Output DataFrame with ordering matching the input DataFrame.

__init__(graph_repr: GraphRepr, parallelism: int = 1) None[source]
__repr__() str[source]

Return repr(self).