1. 下载作业脚本 512_si_pbe_md.tgz
2. 上传到服务器
3. 解压, 编辑 run.sh
将其中的 export CUDA_VISIBLE_DEVICES=3 改成对应的显卡编号
显卡编号通过 nvidia-smi 查看
[pengge@mstation ok]$ tar -zxf 512_si_pbe_md.tgz [pengge@mstation ok]$ cd 512_si_pbe_md [pengge@mstation 512_si_pbe_md]$ vim run.sh #!/bin/sh module load mkl mpi module load cuda/12.1 module load pwmat export CUDA_VISIBLE_DEVICES=3 mpirun -np 1 PWmat | tee output
4. 执行脚本 ./run.sh 即可, 要终止可以按 ctrl + c
1. 进入系统后输入命令: nvidia-smi
Fri Aug 16 15:47:10 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.76 Driver Version: 550.76 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 D On | 00000000:16:00.0 Off | Off | | 0% 45C P8 25W / 425W | 2MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce RTX 4090 D On | 00000000:34:00.0 Off | Off | | 0% 37C P8 20W / 425W | 2MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA GeForce RTX 4090 D On | 00000000:52:00.0 Off | Off | | 0% 40C P8 17W / 425W | 2MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA GeForce RTX 4090 D On | 00000000:CA:00.0 Off | Off | | 30% 44C P2 223W / 425W | 16108MiB / 24564MiB | 99% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 3 N/A N/A 15863 C PWmat 16100MiB | +-----------------------------------------------------------------------------------------+
2. 以序号 3 为例, 记录 3 号显卡 Bus-Id 00000000:CA:00.0
3. 用 root
账号登录, 输入命令 dmidecode -t slot
[root@mstation ~]# dmidecode -t slot | grep -i -10 CA:00.0 Handle 0x000D, DMI type 9, 17 bytes System Slot Information Designation: CPU SLOT1 PCIe 5.0 X16 Type: x16 <OUT OF SPEC> Current Usage: In Use Length: Long Characteristics: 3.3 V is provided Opening is shared PME signal is supported Bus Address: 0000:ca:00.0 Handle 0x000E, DMI type 9, 17 bytes System Slot Information Designation: CPU SLOT3 PCIe 5.0 X16 Type: x16 <OUT OF SPEC> Current Usage: In Use Length: Long Characteristics: 3.3 V is provided Opening is shared
3 号显卡对应的槽位是 Designation: CPU SLOT1 PCIe 5.0 X16
在服务器主板 PCI插槽旁边有相应的数字表示槽位号, 找到对应的插槽即可