towhee.dataframe.dataframe.DataFrame¶
- class towhee.dataframe.dataframe.DataFrame(name: str, columns: List[Tuple[str, str]])[source]¶
Bases:
object
A DataFrame is a collection of immutable, potentixally heterogeneous blogs of data.
- Parameters:
columns (list[Tuple[str, Any]]) – The list of the column names and their corresponding data types.
name (str) – Name of the dataframe; DataFrame names should be the same as its representation.
Methods
An acknowledgement (ack) will notice the `DataFrame`s that the rows already iterated over are no longer used, and can be garbage collected. Up to this index but not including.
gc
Garbage collection for blocked iterator list.
Garbage collection function to trigger towhee.Array GC.
Get data from dataframe based on offset and how many values requested.
Window based data retrieval.
Used by map based iterators to notify the dataframe that it is waiting on a value.
Used by window based iterators to notify the dataframe that it is waiting on a value.
Append a new value to the dataframe.
Registering an iter allows the df to keep track of where each iterator is in order to coordinate garbage collection and other processes.
Removing the iterator from the df.
Function to seal the dataframe.
Forceful unblocking/killing of all blocked iter.
Forceful unblocking/killing of the selected iter.
Attributes
columns
current_size
data
iterators
name
sealed
size
types
- ack(offset: int, iter_id: int)[source]¶
An acknowledgement (ack) will notice the `DataFrame`s that the rows already iterated over are no longer used, and can be garbage collected. Up to this index but not including.
- Parameters:
offset (int) – The latest accepted offset.
iter_id (int) – The iterator id to set offset for.
- get(offset: int, count: int = 1, iter_id: int = False)[source]¶
Get data from dataframe based on offset and how many values requested.
- Parameters:
offset (int) – The starting index to read data from.
count (int) – How many values to retrieve.
iter_id (int) – Which iterator is reading the data.
- Returns:
Different states of response codes depending on the dataframe state.
- Return type:
- get_window(start: int, end: int, step: int, comparator: str, iter_id: int = False)[source]¶
Window based data retrieval. Windows include everything up to but not including the cutoff.
- Parameters:
start (int) – The starting index to read data from.
end (int) – The current upper window limit.
step (int) – How far the window will slide for next call
comparator ('timestamp' | 'row_id') – Which value is being used for window calculations.
iter_id (int) – Which iterator is reading the data.
- Returns:
Different states of response codes depending on the dataframe state.
- Return type:
Situations:
- Unsealed:
[0, 1, 2, 3] window = (5, 10) 1. Window start is greater than the latest value in df:
block for value greater than window start
[0, 1, 2, 3] window = (0, 10) 2. Window end is greater than the latest value in df:
block for value that is greater than window end
[5, 6, 7, 8] window = (1, 4) 3. Window end is smaller than all values in df
return None and continue to next window
[0, 1, 3, 4] window = (0, 3) 4. Values exist between window start and window end:
return values, clear data up to first value
- Sealed:
[0, 1, 2, 3] window = (5, 10) 1. Window start is greater than the latest value in df:
Stop iteration
[0, 1, 2, 3] window = (0, 10) 2. Window end is greater than the latest value in df:
return remaining window values and stop iteration
[5, 6, 7, 8] window = (1, 4) 3. Window end is smaller than all values in df
return None and continue to next window
[0, 1, 3, 4] window = (0, 3) 4. Values exist between window start and window end:
return values, clear data up to first value
- notify_map_block(event: Event, offset: int, count: int, iter_id: int)[source]¶
Used by map based iterators to notify the dataframe that it is waiting on a value.
- Parameters:
event (threading.Event) – The event to trigger once the value being waited on is available.
offset (int) – The offset that is being waited on.
count (`int) – How many values are being waited on.
iter_id (int) – The iterator id of the blocked iterator.
- notify_window_block(event: Event, edge: str, cutoff: Tuple[str, int], iter_id: int)[source]¶
Used by window based iterators to notify the dataframe that it is waiting on a value.
- Parameters:
iter_id (int) – The iterator id of the blocked iterator.
event (threading.Event) – The event to trigger once the value being waited on is available.
edge ('start', 'end') – Waiting for a value >= than start, or waiting for a value greater then end.
cutoff (“timestamp” | “row_id”, int) – The window cutoff that is is being waited on.
- put(item) None [source]¶
Append a new value to the dataframe.
- Parameters:
item (dict | list | Array | tuple) – The row to insert.
- register_iter()[source]¶
Registering an iter allows the df to keep track of where each iterator is in order to coordinate garbage collection and other processes.
- Returns:
An iterator ID that is used for a majority of iterator-dataframe interaction.
- remove_iter(iter_id: int)[source]¶
Removing the iterator from the df. Allows for more garbage collecting. :param iter_id: The iterator’s id. :type iter_id: int