towhee.functional.mixins.dataframe.DataFrameMixin

class towhee.functional.mixins.dataframe.DataFrameMixin[source]

Bases: object

Mixin to help deal with Entity.

Examples:

  1. define an operator with register decorator

>>> from towhee import register
>>> from towhee import DataFrame
>>> @register
... def add_1(x):
...     return x+1
  1. apply the operator to named field of entity and save result to another named field

>>> (
...     DataFrame([dict(a=1, b=2), dict(a=2, b=3)])
...         .as_entity()
...         .add_1['a', 'c']() # <-- use field `a` as input and filed `c` as output
...         .as_str()
...         .to_list()
... )
["{'a': 1, 'b': 2, 'c': 2}", "{'a': 2, 'b': 3, 'c': 3}"]

Select the entity on the specified fields.

Examples:

  1. Select the entity on one specified field:

>>> from towhee import Entity
>>> from towhee import DataFrame
>>> df = DataFrame([Entity(a=i, b=i, c=i) for i in range(2)])
>>> df.select['a']().to_list()
[<Entity dict_keys(['a'])>, <Entity dict_keys(['a'])>]
  1. Select multiple fields and unpack the entity:

>>> (
...     DataFrame([Entity(a=i, b=i, c=i) for i in range(5)])
...         .select['a', 'b']()
...         .as_raw()
...         .to_list()
... )
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
  1. Another field selection syntax (not suggested):

>>> (
...     DataFrame([Entity(a=i, b=i, c=i) for i in range(5)])
...         .select('a', 'b')
...         .as_raw()
...         .to_list()
... )
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]

Methods

as_entity

Convert elements into Entities.

as_json

Convert entities to json

as_raw

Convert entitis into raw python values

dropna

Drop entities that contain some specific values.

fill_entity

When DataFrame's iterable exists of Entities and some indexes missing, fill default value for those indexes.

parse_json

Parse string to entities.

rename

Rename an column in DataFrame.

replace

Replace specific attributes with given vlues.

Attributes

df

select

Select columns from a DC.

__init__()[source]
as_entity(schema: Optional[List[str]] = None)[source]

Convert elements into Entities.

Parameters:

schema (Optional[List[str]]) – schema contains field names.

Examples: 1. convert dicts into entities:

>>> from towhee import DataFrame
>>> (
...     DataFrame([dict(a=1, b=2), dict(a=2, b=3)])
...         .as_entity()
...         .as_str()
...         .to_list()
... )
["{'a': 1, 'b': 2}", "{'a': 2, 'b': 3}"]
  1. convert tuples into entities:

>>> from towhee import DataFrame
>>> (
...     DataFrame([(1, 2), (2, 3)])
...         .as_entity(schema=['a', 'b'])
...         .as_str()
...         .to_list()
... )
["{'a': 1, 'b': 2}", "{'a': 2, 'b': 3}"]
  1. convert single value into entities:

>>> from towhee import DataFrame
>>> (
...     DataFrame([1, 2])
...         .as_entity(schema=['a'])
...         .as_str()
...         .to_list()
... )
["{'a': 1}", "{'a': 2}"]
as_json()[source]

Convert entities to json

Examples:

>>> from towhee import DataFrame, Entity
>>> (
...     DataFrame([Entity(x=1)])
...         .as_json()
... )
['{"x": 1}']
as_raw()[source]

Convert entitis into raw python values

Examples:

  1. unpack multiple values from entities:

>>> from towhee import DataFrame
>>> (
...     DataFrame([(1, 2), (2, 3)])
...         .as_entity(schema=['a', 'b'])
...         .as_raw()
...         .to_list()
... )
[(1, 2), (2, 3)]
  1. unpack single value from entities:

>>> (
...     DataFrame([1, 2])
...         .as_entity(schema=['a'])
...         .as_raw()
...         .to_list()
... )
[1, 2]
dropna(na: Set[str] = {'', None}) Union[bool, DataFrame][source]

Drop entities that contain some specific values.

Parameters:

na (Set[str]) – Those entities contain values in na will be dropped.

Examples:

>>> from towhee import Entity, DataFrame
>>> entities = [Entity(a=i, b=i + 1) for i in range(3)]
>>> entities.append(Entity(a=3, b=''))
>>> df = DataFrame(entities)
>>> df
[<Entity dict_keys(['a', 'b'])>, <Entity dict_keys(['a', 'b'])>, <Entity dict_keys(['a', 'b'])>, <Entity dict_keys(['a', 'b'])>]
>>> df.dropna()
[<Entity dict_keys(['a', 'b'])>, <Entity dict_keys(['a', 'b'])>, <Entity dict_keys(['a', 'b'])>]
fill_entity(_DefaultKVs: Optional[Dict[str, Any]] = None, _ReplaceNoneValue: bool = False, **kws)[source]

When DataFrame’s iterable exists of Entities and some indexes missing, fill default value for those indexes.

Parameters:
  • _ReplaceNoneValue (bool) – Whether to replace None in Entity’s value.

  • _DefaultKVs (Dict[str, Any]) – The key-value pairs stored in a dict.

Examples:

>>> from towhee import Entity, DataFrame
>>> entities = [Entity(num=i) for i in range(3)]
>>> df = DataFrame(entities)
>>> df
[<Entity dict_keys(['num'])>, <Entity dict_keys(['num'])>, <Entity dict_keys(['num'])>]
>>> kvs = {'foo': 'bar'}
>>> df.fill_entity(kvs).fill_entity(usage='test').to_list()
[<Entity dict_keys(['num', 'foo', 'usage'])>, <Entity dict_keys(['num', 'foo', 'usage'])>, <Entity dict_keys(['num', 'foo', 'usage'])>]
>>> kvs = {'FOO': None}
>>> df.fill_entity(_ReplaceNoneValue=True, _DefaultKVs=kvs).to_list()[0].FOO
0
parse_json()[source]

Parse string to entities.

Examples:

>>> from towhee import DataFrame
>>> df = (
...     DataFrame(['{"x": 1}'])
...         .parse_json()
... )
>>> df[0].x
1
rename(column: Dict[str, str])[source]

Rename an column in DataFrame.

Parameters:

column (Dict[str, str]) – The columns to rename and their corresponding new name.

Examples:

>>> from towhee import Entity, DataFrame
>>> entities = [Entity(a=i, b=i + 1) for i in range(3)]
>>> df = DataFrame(entities)
>>> df
[<Entity dict_keys(['a', 'b'])>, <Entity dict_keys(['a', 'b'])>, <Entity dict_keys(['a', 'b'])>]
>>> df.rename(column={'a': 'A', 'b': 'B'})
[<Entity dict_keys(['A', 'B'])>, <Entity dict_keys(['A', 'B'])>, <Entity dict_keys(['A', 'B'])>]
replace(**kws)[source]

Replace specific attributes with given vlues.

Examples:

>>> from towhee import Entity, DataFrame
>>> entities = [Entity(num=i) for i in range(5)]
>>> df = DataFrame(entities)
>>> [i.num for i in df]
[0, 1, 2, 3, 4]
>>> df = df.replace(num={0: 1, 1: 2, 2: 3, 3: 4, 4: 5})
>>> [i.num for i in df]
[1, 2, 3, 4, 5]
property select

Select columns from a DC.

Examples:

>>> from towhee import Entity, DataFrame
>>> entities = [Entity(a=i, b=i, c=i) for i in range(3)]
>>> dc = DataFrame(entities)
>>> dc.select('a')
[<Entity dict_keys(['a'])>, <Entity dict_keys(['a'])>, <Entity dict_keys(['a'])>]