towhee.datacollection.data_collection.DataCollection¶
- class towhee.datacollection.data_collection.DataCollection(data)[source]¶
Bases:
DisplayMixin
A pythonic computation and processing framework.
DataCollection is a pythonic computation and processing framework for unstructured data in machine learning and data science. It allows a data scientist or researcher to assemble data processing pipelines and do their model work (embedding, transforming, or classification) with a method-chaining style API.
- Parameters:
data ('towhee.runtime.DataQueue') – The data to be stored in DataColletion in the form of DataQueue.
Examples
>>> from towhee.runtime.data_queue import DataQueue, ColumnType >>> from towhee.datacollection.data_collection import DataCollection >>> dq = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)]) >>> dq.put(('a', 'b1')) True >>> DataCollection(dq) <DataCollection Schema[a: ColumnType.SCALAR, b: ColumnType.QUEUE] SIZE 1>
Methods
as_str
Copy a DataCollection.
Print the first n lines of a DataCollection.
Convert DataCollection to list.
- __add__(another: DataCollection) DataCollection [source]¶
Concat two DataCollections with same Schema.
Note that this function will consume tha data in the second DataCollection.
- Parameters:
another ('DataCollection') – Another DataCollection to concat.
Examples
>>> from towhee.runtime.data_queue import DataQueue, ColumnType >>> from towhee.datacollection.data_collection import DataCollection >>> dq = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)]) >>> dq1 = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)]) >>> dq2 = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)]) >>> dq1.put(('a', 'b1')) True >>> dq2.put(('a', 'b2')) True >>> dc1 = DataCollection(dq1) >>> dc2 = DataCollection(dq2) >>> len(dc1) 1 >>> len(dc2) 1 >>> len(dc1 + dc2) 2
- __getitem__(index: int)[source]¶
Get the item with given index.
Examples
>>> from towhee.runtime.data_queue import DataQueue, ColumnType >>> from towhee.datacollection.data_collection import DataCollection >>> dq = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)]) >>> dq.put(('a', 'b1')) True >>> dc = DataCollection(dq) >>> dc[0] <Entity dict_keys(['a', 'b'])>
- __repr__() str [source]¶
String representation of the DataCollection.
Examples
>>> from towhee.runtime.data_queue import DataQueue, ColumnType >>> from towhee.datacollection.data_collection import DataCollection >>> dq = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)]) >>> dc = DataCollection(dq) >>> repr(dc) '<DataCollection Schema[a: ColumnType.SCALAR, b: ColumnType.QUEUE] SIZE 0>'
- __setitem__(index: int, value: Any)[source]¶
Set the item to given value.
Examples
>>> from towhee.runtime.data_queue import DataQueue, ColumnType >>> from towhee.datacollection.data_collection import DataCollection >>> dq = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)]) >>> dq.put(('a', 'b1')) True >>> dc = DataCollection(dq) >>> dc[0] = 'a' >>> dc[0] 'a'
- copy(deep: bool = False)[source]¶
Copy a DataCollection.
Examples
>>> from towhee.runtime.data_queue import DataQueue, ColumnType >>> from towhee.datacollection.data_collection import DataCollection >>> dq = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)]) >>> dq.put(('a', 'b1')) True >>> dc = DataCollection(dq) >>> dc_copy = dc.copy() >>> dc_dcopy = dc.copy(True) >>> id(dc) == id(dc_copy) False >>> id(dc[0]) == id(dc_copy[0]) True >>> id(dc) == id(dc_dcopy) False >>> id(dc[0]) == id(dc_dcopy[0]) False
- show(limit=5, header=None, tablefmt='html', formatter={})¶
Print the first n lines of a DataCollection.
- Parameters:
limit (int, optional) – The number of lines to print. Prints all if limit is negative. Defaults to 5.
header (_type_, optional) – The field names. Defaults to None.
tablefmt (str, optional) – The format of the output, supports html, plain, grid.. Defaults to ‘html’.
- to_list() list [source]¶
Convert DataCollection to list.
Examples
>>> from towhee.runtime.data_queue import DataQueue, ColumnType >>> from towhee.datacollection.data_collection import DataCollection >>> dq = DataQueue([('a', ColumnType.SCALAR), ('b', ColumnType.QUEUE)]) >>> dq.put(('a', 'b1')) True >>> dc = DataCollection(dq) >>> dc.to_list() [<Entity dict_keys(['a', 'b'])>]