towhee.datacollection.data_collection.DataCollection

class towhee.datacollection.data_collection.DataCollection(data)[source]

Bases: DisplayMixin

A pythonic computation and processing framework.

DataCollection is a pythonic computation and processing framework for unstructured data in machine learning and data science. It allows a data scientist or researcher to assemble data processing pipelines and do their model work (embedding, transforming, or classification) with a method-chaining style API.

Parameters:

data ('towhee.runtime.DataQueue') – The data to be stored in DataColletion in the form of DataQueue.

Examples

>>> from towhee.runtime.data_queue import DataQueue, ColumnType
>>> from towhee.datacollection.data_collection import DataCollection
>>> dq = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)])
>>> dq.put(('a', 'b1'))
True
>>> DataCollection(dq)
<DataCollection Schema[a: ColumnType.SCALAR, b: ColumnType.QUEUE] SIZE 1>

Methods

as_str

copy

Copy a DataCollection.

show

Print the first n lines of a DataCollection.

to_list

Convert DataCollection to list.

__add__(another: DataCollection) DataCollection[source]

Concat two DataCollections with same Schema.

Note that this function will consume tha data in the second DataCollection.

Parameters:

another ('DataCollection') – Another DataCollection to concat.

Examples

>>> from towhee.runtime.data_queue import DataQueue, ColumnType
>>> from towhee.datacollection.data_collection import DataCollection
>>> dq = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)])
>>> dq1 = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)])
>>> dq2 = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)])
>>> dq1.put(('a', 'b1'))
True
>>> dq2.put(('a', 'b2'))
True
>>> dc1 = DataCollection(dq1)
>>> dc2 = DataCollection(dq2)
>>> len(dc1)
1
>>> len(dc2)
1
>>> len(dc1 + dc2)
2
__getitem__(index: int)[source]

Get the item with given index.

Examples

>>> from towhee.runtime.data_queue import DataQueue, ColumnType
>>> from towhee.datacollection.data_collection import DataCollection
>>> dq = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)])
>>> dq.put(('a', 'b1'))
True
>>> dc = DataCollection(dq)
>>> dc[0]
<Entity dict_keys(['a', 'b'])>
__init__(data)[source]
__repr__() str[source]

String representation of the DataCollection.

Examples

>>> from towhee.runtime.data_queue import DataQueue, ColumnType
>>> from towhee.datacollection.data_collection import DataCollection
>>> dq = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)])
>>> dc = DataCollection(dq)
>>> repr(dc)
'<DataCollection Schema[a: ColumnType.SCALAR, b: ColumnType.QUEUE] SIZE 0>'
__setitem__(index: int, value: Any)[source]

Set the item to given value.

Examples

>>> from towhee.runtime.data_queue import DataQueue, ColumnType
>>> from towhee.datacollection.data_collection import DataCollection
>>> dq = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)])
>>> dq.put(('a', 'b1'))
True
>>> dc = DataCollection(dq)
>>> dc[0] = 'a'
>>> dc[0]
'a'
copy(deep: bool = False)[source]

Copy a DataCollection.

Examples

>>> from towhee.runtime.data_queue import DataQueue, ColumnType
>>> from towhee.datacollection.data_collection import DataCollection
>>> dq = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)])
>>> dq.put(('a', 'b1'))
True
>>> dc = DataCollection(dq)
>>> dc_copy = dc.copy()
>>> dc_dcopy = dc.copy(True)
>>> id(dc) == id(dc_copy)
False
>>> id(dc[0]) == id(dc_copy[0])
True
>>> id(dc) == id(dc_dcopy)
False
>>> id(dc[0]) == id(dc_dcopy[0])
False
show(limit=5, header=None, tablefmt='html', formatter={})

Print the first n lines of a DataCollection.

Parameters:
  • limit (int, optional) – The number of lines to print. Prints all if limit is negative. Defaults to 5.

  • header (_type_, optional) – The field names. Defaults to None.

  • tablefmt (str, optional) – The format of the output, supports html, plain, grid.. Defaults to ‘html’.

to_list() list[source]

Convert DataCollection to list.

Examples

>>> from towhee.runtime.data_queue import DataQueue, ColumnType
>>> from towhee.datacollection.data_collection import DataCollection
>>> dq = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)])
>>> dq.put(('a', 'b1'))
True
>>> dc = DataCollection(dq)
>>> dc.to_list()
[<Entity dict_keys(['a', 'b'])>]