towhee.functional.mixins.column.ColumnMixin

class towhee.functional.mixins.column.ColumnMixin[source]

Bases: object

Mixins to support column-based storage.

Methods

cmap

chunked map

from_arrow_table

get_chunksize

set_chunksize

Set chunk size for arrow

to_column

Convert the iterables to column-based table.

class ModeFlag(value)[source]

Bases: Flag

An enumeration.

__init__() None[source]
cmap(unary_op)[source]

chunked map

Examples:

>>> import towhee
>>> dc = towhee.dc['a'](range(10))
>>> dc = dc.to_column()
>>> dc = dc.runas_op['a', 'b'](func=lambda x: x+1)
>>> dc.show(limit=5, tablefmt='plain')
  a    b
  0    1
  1    2
  2    3
  3    4
  4    5
>>> dc._iterable
pyarrow.Table
a: int64
b: int64
----
a: [[0,1,2,3,4,5,6,7,8,9]]
b: [[1,2,3,4,5,6,7,8,9,10]]
>>> len(dc._iterable)
10
set_chunksize(chunksize)[source]

Set chunk size for arrow

Examples:

>>> import towhee
>>> dc_1 = towhee.dc['a'](range(20))
>>> dc_1 = dc_1.set_chunksize(10)
>>> dc_2 = dc_1.runas_op['a', 'b'](func=lambda x: x+1)
>>> dc_1.get_chunksize(), dc_2.get_chunksize()
(10, 10)
>>> dc_2._iterable.chunks()
[pyarrow.Table
a: int64
b: int64
----
a: [[0,1,2,3,4,5,6,7,8,9]]
b: [[1,2,3,4,5,6,7,8,9,10]], pyarrow.Table
a: int64
b: int64
----
a: [[10,11,12,13,14,15,16,17,18,19]]
b: [[11,12,13,14,15,16,17,18,19,20]]]
>>> dc_3 = towhee.dc['a'](range(20)).stream()
>>> dc_3 = dc_3.set_chunksize(10)
>>> dc_4 = dc_3.runas_op['a', 'b'](func=lambda x: x+1)
>>> dc_4._iterable.chunks()
[pyarrow.Table
a: int64
b: int64
----
a: [[0,1,2,3,4,5,6,7,8,9]]
b: [[1,2,3,4,5,6,7,8,9,10]], pyarrow.Table
a: int64
b: int64
----
a: [[10,11,12,13,14,15,16,17,18,19]]
b: [[11,12,13,14,15,16,17,18,19,20]]]
to_column()[source]

Convert the iterables to column-based table.

Examples:

>>> from towhee import Entity, DataFrame
>>> e = [Entity(a=a, b=b) for a,b in zip(['abc', 'def', 'ghi'], [1,2,3])]
>>> df = DataFrame(e)
>>> df
[<Entity dict_keys(['a', 'b'])>, <Entity dict_keys(['a', 'b'])>, <Entity dict_keys(['a', 'b'])>]
>>> df.to_column()
pyarrow.Table
a: string
b: int64
----
a: [["abc","def","ghi"]]
b: [[1,2,3]]