- class towhee.engine.pipeline.Pipeline(graph_repr: GraphRepr, parallelism: int = 1)¶
The runtime pipeline context, include graph context, all dataframes.
graph_repr – (str or towhee.dag.GraphRepr) The graph representation either as a YAML-formatted string, or directly as an instance of GraphRepr.
parallelism – (int) The parallelism parameter dictates how many copies of the graph context we create. This is likely a low number (1-4) for local engines, but may be much higher for cloud instances.
- __call__(inputs: DataFrame) DataFrame ¶
Process an input DataFrame. This function instantiates an output DataFrame; upon completion, individual GraphContext outputs are merged into this dataframe. Inputs are weaved through the input DataFrame for each GraphContext as follows (parallelism = 3):
data -> ctx data -> ctx data -> ctx data -> ctx data -> ctx
inputs (towhee.dataframe.DataFrame) – Input DataFrame (with potentially multiple rows) to process.
Output DataFrame with ordering matching the input DataFrame.
- __repr__() str ¶