towhee.functional.mixins.column.ColumnMixin¶
- class towhee.functional.mixins.column.ColumnMixin[source]¶
Bases:
object
Mixins to support column-based storage.
Methods
Chunked map.
Convert kwargs to Table.
get_chunksize
Set chunk size for arrow
Convert the DataCollection to column-based table DataCollection.
- cmap(unary_op)[source]¶
Chunked map.
- Parameters:
unary_op (callable) – The operation to map.
- Returns:
A new DataCollection after mapping.
- Return type:
Examples
>>> import towhee >>> dc = towhee.dc['a'](range(10)) >>> dc = dc.to_column() >>> dc = dc.runas_op['a', 'b'](func=lambda x: x+1) >>> dc.show(limit=5, tablefmt='plain') a b 0 1 1 2 2 3 3 4 4 5 >>> dc._iterable pyarrow.Table a: int64 b: int64 ---- a: [[0,1,2,3,4,5,6,7,8,9]] b: [[1,2,3,4,5,6,7,8,9,10]] >>> len(dc._iterable) 10
- classmethod create_arrow_table(**kws)[source]¶
Convert kwargs to Table.
- Returns:
The Table from the kwargs.
- Return type:
pyarrow.Table
- set_chunksize(chunksize)[source]¶
Set chunk size for arrow
- Parameters:
chuksize (int) – How many rows per chunk.
- Returns:
New DataCollection converted to Table.
- Return type:
Examples
>>> import towhee >>> dc_1 = towhee.dc['a'](range(20)) >>> dc_1 = dc_1.set_chunksize(10) >>> dc_2 = dc_1.runas_op['a', 'b'](func=lambda x: x+1) >>> dc_1.get_chunksize(), dc_2.get_chunksize() (10, 10) >>> dc_2._iterable.chunks() [pyarrow.Table a: int64 b: int64 ---- a: [[0,1,2,3,4,5,6,7,8,9]] b: [[1,2,3,4,5,6,7,8,9,10]], pyarrow.Table a: int64 b: int64 ---- a: [[10,11,12,13,14,15,16,17,18,19]] b: [[11,12,13,14,15,16,17,18,19,20]]]
>>> dc_3 = towhee.dc['a'](range(20)).stream() >>> dc_3 = dc_3.set_chunksize(10) >>> dc_4 = dc_3.runas_op['a', 'b'](func=lambda x: x+1) >>> dc_4._iterable.chunks() [pyarrow.Table a: int64 b: int64 ---- a: [[0,1,2,3,4,5,6,7,8,9]] b: [[1,2,3,4,5,6,7,8,9,10]], pyarrow.Table a: int64 b: int64 ---- a: [[10,11,12,13,14,15,16,17,18,19]] b: [[11,12,13,14,15,16,17,18,19,20]]]
- to_column()[source]¶
Convert the DataCollection to column-based table DataCollection.
- Returns:
The current DC converted to Table DC.
- Return type:
Examples
>>> from towhee import Entity, DataFrame >>> e = [Entity(a=a, b=b) for a,b in zip(['abc', 'def', 'ghi'], [1,2,3])] >>> df = DataFrame(e) >>> df [<Entity dict_keys(['a', 'b'])>, <Entity dict_keys(['a', 'b'])>, <Entity dict_keys(['a', 'b'])>] >>> df.to_column() pyarrow.Table a: string b: int64 ---- a: [["abc","def","ghi"]] b: [[1,2,3]]