2.1 KiB
FAQ
Frequently asked questions or encountered issues when running OpenFold.
Setup
-
When running unit tests (e.g.
./scripts/run_unit_tests.sh
), I see an error such asImportError: version GLIBCXX_3.4.30 not found
Solution: Make sure that the
$LD_LIBRARY_PATH
environment has been set to include the conda path, e.g.export $LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
-
I see a CUDA mismatch error, eg.
The detected CUDA version (11.8) mismatches the version that was used to compile
PyTorch (12.1). Please make sure to use the same CUDA versions.
Solution: Ensure that your system's CUDA driver and toolkit match your intended OpenFold installation (CUDA 11 by default). You can check the CUDA driver version with a command such as
nvidia-smi
- I get some error involving
fatal error: cuda_runtime.h: No such file or directory
and orninja: build stopped: subcommand failed.
.
Solution: Something went wrong with setting up some of the custom kernels. Try running
install_third_party_dependencies.sh
again or trypython3 setup.py install
from inside the OpenFold folder. Make sure to prepend the conda environment as described above before running this.
Training
-
My model training is hanging on the data loading step:
Solution: While each system is different, a few general suggestions: - Check your
$KMP_AFFINITY
environment setting and see if it is suitable for your system. - Adjust the number of data workers used to prepare data with the--num_workers
setting. Increasing the number could help with dataset processing speed. However, to many workers could cause an OOM issue. -
When I reload my pretrained model weights or checkpoints, I get
RuntimeError: Error(s) in loading state_dict for OpenFoldWrapper: Unexpected key(s) in state_dict:
Solution: This suggests that your checkpoint / model weights are in OpenFold v1 format with outdated model layer names. Convert your weights/checkpoints following this guide.