title: “PyTorch” weight: 4 type: docs aliases:
This requires torch to be installed.
You can read all the data into a torch.utils.data.Dataset or torch.utils.data.IterableDataset:
from torch.utils.data import DataLoader table_read = read_builder.new_read() dataset = table_read.to_torch(splits, streaming=True) dataloader = DataLoader( dataset, batch_size=2, num_workers=2, # Concurrency to read data shuffle=False ) # Collect all data from dataloader for batch_idx, batch_data in enumerate(dataloader): print(batch_data) # output: # {'user_id': tensor([1, 2]), 'behavior': ['a', 'b']} # {'user_id': tensor([3, 4]), 'behavior': ['c', 'd']} # {'user_id': tensor([5, 6]), 'behavior': ['e', 'f']} # {'user_id': tensor([7, 8]), 'behavior': ['g', 'h']}
When the streaming parameter is true, it will iteratively read; when it is false, it will read the full amount of data into memory.