mstation:busid
差别
这里会显示出您选择的修订版和当前版本之间的差别。
| 后一修订版 | 前一修订版 | ||
| mstation:busid [2024/08/16 15:47] – 创建 pengge | mstation:busid [2024/08/16 16:37] (当前版本) – pengge | ||
|---|---|---|---|
| 行 1: | 行 1: | ||
| ====== 查出故障 gpu 显卡具体槽位 ====== | ====== 查出故障 gpu 显卡具体槽位 ====== | ||
| - | 1. 进入系统后输入命令: | + | ===== 在疑似故障 gpu 显卡上跑作业 ===== |
| + | |||
| + | 1. 下载作业脚本 {{ : | ||
| + | |||
| + | 2. 上传到服务器 | ||
| + | |||
| + | 3. 解压, 编辑 '' | ||
| + | |||
| + | 显卡编号通过 <wrap hi> | ||
| + | |||
| + | <code bash> | ||
| + | [pengge@mstation ok]$ tar -zxf 512_si_pbe_md.tgz | ||
| + | [pengge@mstation ok]$ cd 512_si_pbe_md | ||
| + | [pengge@mstation 512_si_pbe_md]$ vim run.sh | ||
| + | #!/bin/sh | ||
| + | |||
| + | module load mkl mpi | ||
| + | module load cuda/12.1 | ||
| + | module load pwmat | ||
| + | |||
| + | export CUDA_VISIBLE_DEVICES=3 | ||
| + | |||
| + | mpirun -np 1 PWmat | tee output | ||
| + | </ | ||
| + | |||
| + | 4. 执行脚本 <wrap hi> | ||
| + | |||
| + | ===== 查出故障 gpu 显卡具体槽位 ===== | ||
| + | |||
| + | 1. 进入系统后输入命令: | ||
| <code bash> | <code bash> | ||
| 行 37: | 行 66: | ||
| +-----------------------------------------------------------------------------------------+ | +-----------------------------------------------------------------------------------------+ | ||
| </ | </ | ||
| + | |||
| + | 2. 以序号 <wrap safety> | ||
| + | |||
| + | 3. 用 '' | ||
| + | |||
| + | <code bash> | ||
| + | [root@mstation ~]# dmidecode -t slot | grep -i -10 CA:00.0 | ||
| + | Handle 0x000D, DMI type 9, 17 bytes | ||
| + | System Slot Information | ||
| + | Designation: | ||
| + | Type: x16 <OUT OF SPEC> | ||
| + | Current Usage: In Use | ||
| + | Length: Long | ||
| + | Characteristics: | ||
| + | 3.3 V is provided | ||
| + | Opening is shared | ||
| + | PME signal is supported | ||
| + | Bus Address: 0000: | ||
| + | |||
| + | Handle 0x000E, DMI type 9, 17 bytes | ||
| + | System Slot Information | ||
| + | Designation: | ||
| + | Type: x16 <OUT OF SPEC> | ||
| + | Current Usage: In Use | ||
| + | Length: Long | ||
| + | Characteristics: | ||
| + | 3.3 V is provided | ||
| + | Opening is shared | ||
| + | </ | ||
| + | |||
| + | 3 号显卡对应的槽位是 <wrap safety> | ||
| + | |||
| + | 在服务器主板 PCI插槽旁边有相应的数字表示槽位号, | ||
| + | |||
| + | <WRAP tip 50%> | ||
| + | - nvidia-smi 输出的 busid 00000000: | ||
| + | - dmidecode -t slot 输出的 Bus Address: 0000: | ||
| + | </ | ||
| + | |||
| + | {{: | ||
| + | |||
mstation/busid.1723794455.txt.gz · 最后更改: 2024/08/16 15:47 由 pengge
