So in DiffuserCamMirflickrHF.__init__() we get the dataset by this line:
self.dataset = load_dataset(repo_id, split=split)
And sometimes it takes notoriously slow, and sometimes the connection breaks and just refuses to retry.
It would be nice if we are allowed to pass some params like num_proc, cache_dir, etc to that line just to make life easier.
So in
DiffuserCamMirflickrHF.__init__()we get the dataset by this line:And sometimes it takes notoriously slow, and sometimes the connection breaks and just refuses to retry.
It would be nice if we are allowed to pass some params like
num_proc,cache_dir, etc to that line just to make life easier.