generated from sehoffmann/python-template
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
bugSomething isn't workingSomething isn't working
Description
------- Training failed with an exception -------
Traceback (most recent call last):
File "/mnt/lustre/work/martius/mot030/code/dmlcloud/dmlcloud/core/pipeline.py", line 282, in run
self._pre_run()
File "/mnt/lustre/work/martius/mot030/code/dmlcloud/dmlcloud/core/pipeline.py", line 318, in _pre_run
callback.pre_run(self)
File "/mnt/lustre/work/martius/mot030/code/dmlcloud/dmlcloud/core/callbacks.py", line 587, in pre_run
handle = torch.cuda._get_pynvml_handler(pipe.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/lustre/work/martius/mot030/conda/envs/torch-nightly/lib/python3.12/site-packages/torch/cuda/__init__.py", line 1073, in _get_pynvml_handler
handle = pynvml.nvmlDeviceGetHandleByIndex(device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/lustre/work/martius/mot030/conda/envs/torch-nightly/lib/python3.12/site-packages/pynvml.py", line 2604, in nvmlDeviceGetHandleByIndex
_nvmlCheckReturn(ret)
File "/mnt/lustre/work/martius/mot030/conda/envs/torch-nightly/lib/python3.12/site-packages/pynvml.py", line 1042, in _nvmlCheckReturn
raise NVMLError(ret)
pynvml.NVMLError_Unknown: Unknown Error
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working