towhee.functional.mixins.column.ColumnMixin¶
- class towhee.functional.mixins.column.ColumnMixin[source]¶
Bases:
object
Mixins to support column-based storage.
Methods
chunked map
from_arrow_table
get_chunksize
Set chunk size for arrow
Convert the iterables to column-based table.
- cmap(unary_op)[source]¶
chunked map
Examples:
>>> import towhee >>> dc = towhee.dc['a'](range(10)) >>> dc = dc.to_column() >>> dc = dc.runas_op['a', 'b'](func=lambda x: x+1) >>> dc.show(limit=5, tablefmt='plain') a b 0 1 1 2 2 3 3 4 4 5 >>> dc._iterable pyarrow.Table a: int64 b: int64 ---- a: [[0,1,2,3,4,5,6,7,8,9]] b: [[1,2,3,4,5,6,7,8,9,10]] >>> len(dc._iterable) 10
- set_chunksize(chunksize)[source]¶
Set chunk size for arrow
Examples:
>>> import towhee >>> dc_1 = towhee.dc['a'](range(20)) >>> dc_1 = dc_1.set_chunksize(10) >>> dc_2 = dc_1.runas_op['a', 'b'](func=lambda x: x+1) >>> dc_1.get_chunksize(), dc_2.get_chunksize() (10, 10) >>> dc_2._iterable.chunks() [pyarrow.Table a: int64 b: int64 ---- a: [[0,1,2,3,4,5,6,7,8,9]] b: [[1,2,3,4,5,6,7,8,9,10]], pyarrow.Table a: int64 b: int64 ---- a: [[10,11,12,13,14,15,16,17,18,19]] b: [[11,12,13,14,15,16,17,18,19,20]]]
>>> dc_3 = towhee.dc['a'](range(20)).stream() >>> dc_3 = dc_3.set_chunksize(10) >>> dc_4 = dc_3.runas_op['a', 'b'](func=lambda x: x+1) >>> dc_4._iterable.chunks() [pyarrow.Table a: int64 b: int64 ---- a: [[0,1,2,3,4,5,6,7,8,9]] b: [[1,2,3,4,5,6,7,8,9,10]], pyarrow.Table a: int64 b: int64 ---- a: [[10,11,12,13,14,15,16,17,18,19]] b: [[11,12,13,14,15,16,17,18,19,20]]]
- to_column()[source]¶
Convert the iterables to column-based table.
Examples:
>>> from towhee import Entity, DataFrame >>> e = [Entity(a=a, b=b) for a,b in zip(['abc', 'def', 'ghi'], [1,2,3])] >>> df = DataFrame(e) >>> df [<Entity dict_keys(['a', 'b'])>, <Entity dict_keys(['a', 'b'])>, <Entity dict_keys(['a', 'b'])>] >>> df.to_column() pyarrow.Table a: string b: int64 ---- a: [["abc","def","ghi"]] b: [[1,2,3]]