towhee.dag.graph_repr.GraphRepr

class towhee.dag.graph_repr.GraphRepr(name: str, graph_type: str, op_reprs: Dict[str, OperatorRepr], df_reprs: Dict[str, DataFrameRepr], ir: Optional[str] = None)[source]

Bases: BaseRepr

A GraphRepr presents a complete DAG.

A graph contains individual subcomponents, including Operators, Dataframes, and Variables. Graph representations are used during execution to load functions and pass data to the correct operators.

Parameters:
  • name (str) – The representation name.

  • file_or_url (str) – The file or remote url that stores the information of this representation.

Methods

dfs

Depth-First Search the graph.

from_dict

Generate a GraphRepr from a description dict.

from_yaml

Import a YAML file describing this graph. Example YAML look like this: name: 'test_graph' operators: - name: 'test_op_1' function: 'test_function' inputs: - df: 'test_df_1' col: 0 outputs: - df: 'test_df_2' col: 0 iter_info: type: map dataframes: - name: 'test_df_1' columns: - vtype: 'int' name: 'test_df_2' columns: - vtype: 'int'.

get_isolated_df

Get the isolated dataframe(s) in the DAG.

get_isolated_op

Get the isolated operator(s) in the DAG.

get_loop

Get the loop(s) inside the graph.

inject_template

is_valid

Check if the src is a valid YAML file to describe a component in Towhee.

load_file

Load the representation(s) information from a local YAML file.

load_src

Load the information for the representation.

load_str

Load the representation(s) information from a YAML file (pre-loaded as string).

load_url

Load the representation information from a remote YAML file.

render_template

to_yaml

Export a YAML file describing this graph.

Attributes

dataframes

graph_type

ir

name

operators

__init__(name: str, graph_type: str, op_reprs: Dict[str, OperatorRepr], df_reprs: Dict[str, DataFrameRepr], ir: Optional[str] = None)[source]
static dfs(cur: str, adj: Dict[str, List[str]], flag: Dict[str, int], cur_list: List[str]) Tuple[bool, List[str]][source]

Depth-First Search the graph.

Parameters:
  • cur (str) – The name of current dataframe.

  • adj (Dict[str, List[str]]) – A dict store the adjacent dataframe of each dataframe.

  • flag (Dict[str, int]) – A dict store the status of the columns. - 0 means the dataframe has not been visted yet. - 1 means the dataframe has been visted in this search, which means this columns is part of a loop. - 2 means the dataframe has been searched and confirmed not to be a part of a loop.

  • cur_list (List[str]) – The list of dataframe that have been visited in this search.

Returns:

(Tuple[bool, List[str]])

Return False if there is no loop, else True and the loop.

static from_dict(info: Dict[Any, Any]) GraphRepr[source]

Generate a GraphRepr from a description dict.

Parameters:

info (Dict[Any, Any]) – A dict to describe the DAG.

Returns:

(towhee.dag.GraphRepr)

The GraphRepr obj.

static from_yaml(src: str)[source]

Import a YAML file describing this graph. Example YAML look like this: name: ‘test_graph’

operators:
  • name: ‘test_op_1’ function: ‘test_function’ inputs:

    • df: ‘test_df_1’ col: 0

    outputs:
    • df: ‘test_df_2’ col: 0

    iter_info:

    type: map

dataframes:
  • name: ‘test_df_1’ columns:

    • vtype: ‘int’

    name: ‘test_df_2’ columns:

    • vtype: ‘int’

Parameters:

src (str) – YAML file (could be pre-loaded as string) to import.

Returns:

(towhee.dag.GraphRepr)

The GraphRepr object.

get_isolated_df() Set[str][source]

Get the isolated dataframe(s) in the DAG.

Returns:

(Set[str])

Return the isolated dataframe set if exists, else an empty set.

get_isolated_op() Set[str][source]

Get the isolated operator(s) in the DAG.

Returns:

(Set[str])

Return the isolated operator set if exists, else an empty set.

get_loop() List[str][source]

Get the loop(s) inside the graph.

Returns:

(List[str])

Return the loop if exists, else an empty list.

static is_valid(info: Dict[str, Any], essentials: Set[str]) bool

Check if the src is a valid YAML file to describe a component in Towhee.

Parameters:
  • info (Dict[str, Any]) – The dict loaded from the source file.

  • essentials (Set[str]) – The essential keys that a valid YAML file should contain.

Returns:

(bool)

Return True if the src file is a valid YAML file to describe a component in Towhee, else False.

static load_file(file: str) dict

Load the representation(s) information from a local YAML file.

Parameters:

file (str) – The file path.

Returns:

(dict)

The dict loaded from the YAML file that contains the representation information.

static load_src(file_or_src: str) dict

Load the information for the representation. We support file from local file/HTTP/HDFS.

Parameters:

file_or_src (str) – The source YAML file or the URL points to the source file or a str loaded from source file.

Returns:

(dict)

The YAML file loaded as dict.

static load_str(string: str) dict

Load the representation(s) information from a YAML file (pre-loaded as string).

Parameters:

string (str) – The string pre-loaded from a YAML.

Returns:

(dict)

The dict loaded from the YAML file that contains the representation information.

static load_url(url: str) dict

Load the representation information from a remote YAML file.

Parameters:

url (str) – The url points to the remote YAML file.

Returns:

(dict)

The dict loaded from the YAML file that contains the representation information.

to_yaml() str[source]

Export a YAML file describing this graph.

Returns:

(str)

A string with the graph’s serialized contents.