Experiments

data loader 속도 높이기

자월현 2021. 8. 9.

https://stackoverflow.com/questions/9619199/best-way-to-preserve-numpy-arrays-on-disk

 

 

best way to preserve numpy arrays on disk

I am looking for a fast way to preserve large numpy arrays. I want to save them to the disk in a binary format, then read them back into memory relatively fastly. cPickle is not fast enough,

stackoverflow.com

https://discuss.pytorch.org/t/how-to-speed-up-the-data-loader/13740

 

 

How to speed up the data loader

Hi I want to know how to speed up the dataloader. I am using torch.utils.data.DataLoader(8 workers) to train resnet18 on my own dataset. My environment is Ubuntu 16.04, 3 * Titan Xp, SSD 1T. Epoch: [1079][0/232] Time 5.149 (5.149) Data 5.056 (5.056) Loss 0

discuss.pytorch.org

 

npy/npz 나 hdf5로 저장하는 것이 제일 numpy 저장할 때 좋다. 

NVIDIA DALI 는 data augmentation에 유용하다는데 augmentation을 안할 때에도 유용한지, image data랑 다른 종류 데이터가 섞여있어도 속도가 빠른 건지를 확인해봐야 할듯.

각 저장 형식마다 속도 및 용량 비교한 것은 아래 github에서 확인 가능!

https://github.com/epignatelli/array_storage_benchmark

'Experiments' 카테고리의 다른 글

Latex 자주 쓰는 모음 정리  (2) 2021.12.18
learning rate 결정하기  (0) 2021.08.11
image encoder batch normalization  (0) 2021.08.01
python pickle errors (stable-baselines3 trained agent)  (0) 2021.07.11
ResNet & iResNet  (0) 2021.02.11

댓글