towhee.functional.mixins.dataset.DatasetMixin

class towhee.functional.mixins.dataset.DatasetMixin[source]

Bases: object

Mixin for dealing with dataset.

Methods

from_df

from_glob

Generate a file list with pattern.

read_csv

read_json

read_zip

Load files from url/path.

split_train_test

Split DataCollection to train and test data.

to_csv

Save dc as a csv file.

classmethod from_glob(*args)[source]

Generate a file list with pattern.

classmethod read_zip(url, pattern, mode='r')[source]

Load files from url/path.

Parameters:
  • zip_src (Union[str, path]) – The path leads to the image.

  • pattern (str) – The filename pattern to extract.

  • mode (str) – file open mode.

Returns:

The file handler for file in the zip file.

Return type:

(File)

split_train_test(size: list = [0.9, 0.1], **kws)[source]

Split DataCollection to train and test data.

Parameters:

size (list) – The size of the train and test.

Examples:

>>> from towhee.functional import DataCollection
>>> dc = DataCollection.range(10)
>>> train, test = dc.split_train_test(shuffle=False)
>>> train.to_list()
[0, 1, 2, 3, 4, 5, 6, 7, 8]
>>> test.to_list()
[9]
to_csv(csv_path: Union[str, Path], encoding: str = 'utf-8-sig')[source]

Save dc as a csv file.

Parameters:
  • csv_path (Union[str, Path]) – The path to save the dc to.

  • encoding (str) – The encoding to use in the output file.