Nsight System分析nsys profile--tracecuda,nvtx --gpu-metrics-devicesall-oout_file_namepythonpython_file_namepython args示例nsys profile--tracecuda,nvtx --gpu-metrics-devicesall-oprofile_attention_bm128_bn64_w4_s2 python my_flash_attn_test.pyNsight Compute分析ncu --kernel-namekernal_name--setfull-oout_file_namepythonpython_file_namepython args示例ncu --kernel-nameflash_attn--setfull-oflash_attn_full_bm128_bn64_w4_s2 python my_flash_attn_test.py