towhee.engine.pipeline.Pipeline¶
- class towhee.engine.pipeline.Pipeline(graph_repr: GraphRepr, parallelism: int = 1)[source]¶
Bases:
object
The runtime pipeline context, include graph context, all dataframes.
- Parameters:
graph_repr – (str or towhee.dag.GraphRepr) The graph representation either as a YAML-formatted string, or directly as an instance of GraphRepr.
parallelism – (int) The parallelism parameter dictates how many copies of the graph context we create. This is likely a low number (1-4) for local engines, but may be much higher for cloud instances.
Methods
register
Attributes
graph_repr
parallelism
pipeline_type
- __call__(inputs: DataFrame) DataFrame [source]¶
Process an input DataFrame. This function instantiates an output DataFrame; upon completion, individual GraphContext outputs are merged into this dataframe. Inputs are weaved through the input DataFrame for each GraphContext as follows (parallelism = 3):
data[0] -> ctx[0] data[1] -> ctx[1] data[2] -> ctx[2] data[3] -> ctx[0] data[4] -> ctx[1]
- Parameters:
inputs (towhee.dataframe.DataFrame) – Input DataFrame (with potentially multiple rows) to process.
- Returns:
- (towhee.dataframe.DataFrame)
Output DataFrame with ordering matching the input DataFrame.