Compare commits

306 Commits

Author SHA1 Message Date
Sakura286 af542a7f67 python-triton: keep the nvidia C++ backend in the cmake build
Dropping nvidia from TRITON_CODEGEN_BACKENDS does not build the NVGPU/NVWS
dialect TableGen output under third_party/nvidia, but Triton core hard-
depends on it (TritonGPUTransforms via Passes.h, TritonInstrumentToLLVM),
so the wheel build failed with

  fatal error: 'nvidia/include/Dialect/NVGPU/IR/Dialect.h.inc' file not found
  fatal error: 'nvidia/include/Dialect/NVWS/IR/Dialect.h.inc' file not found

Keep the nvidia C++ libraries in the cmake build (they only need the
in-tree LLVM NVPTX target, no CUDA SDK, and stay offline) while still
packaging only the AMD Python backend, so neither ptxas nor the
proprietary libdevice.10.bc lands in the RPM.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:00:25 +08:00
CHEN Xuan 1434b62836 python-triton: add pybind11 as deps 2026-06-11 03:00:51 +08:00
CHEN Xuan 13b49f5327 python-triton: fix pybind11 cmake dir lookup
pybind11.get_cmake_dir() only knows the pip-wheel layout and raises
ImportError with the distro python3-pybind11.  Extend PYBIND11_SYSPATH to
also supply the CMake config dir and point it at the system pybind11-devel.

Unconditional get_cmake_dir() call introduced upstream in
https://github.com/triton-lang/triton/pull/4450
2026-06-11 02:56:10 +08:00
Sakura286 07fe2c2e72 python-torch: fix magma version 2026-06-10 13:29:01 +08:00
Sakura286 0b3f7ba0d4 python-triton: fix source url 2026-06-08 15:58:11 +08:00
Sakura286 9cf4295c81 python-triton: remove unecessary parts 2026-06-08 15:40:51 +08:00
Sakura286 e81e6e81fc python-triton: fix package name 2026-06-08 15:30:12 +08:00
Sakura286 61a6e27950 python-triton: init 2026-06-08 15:29:00 +08:00
Sakura286 679682ade6 python-torch: fix package buildreq names 2026-06-08 12:51:31 +08:00
Sakura286 4aa6ec5583 python-torch: reformat 2026-06-08 12:43:07 +08:00
Sakura286 b3e30b7caa Fix several package formats 2026-06-08 12:20:21 +08:00
Sakura286 0d111767d8 sha256sum check 2026-06-08 11:43:36 +08:00
Sakura286 6d3fee3cd1 hipsparselt: format buildreq 2026-06-08 11:01:58 +08:00
Sakura286 adecb03429 hipblaslt: use declarative building method 2026-06-08 10:50:57 +08:00
Sakura286 bdabb97a97 hipsparselt: reformat 2026-06-08 10:25:50 +08:00
Sakura286 7a28595e50 hipblaslt: nanobind cmake conf fix 2026-06-08 09:59:52 +08:00
Sakura286 b42f42099f hipblaslt: reorder buildoption part 2026-06-08 09:52:59 +08:00
Sakura286 27ecec9dc0 hipblaslt: remove fedora related sections 2026-06-08 09:49:23 +08:00
Sakura286 a23e6838de hipblaslt: reformat 2026-06-08 09:44:00 +08:00
Sakura286 2f1628b890 hipblaslt: fix autosetup path 2026-06-08 09:21:15 +08:00
Sakura286 6e345911b2 hipblaslt: use system nanobind 2026-06-08 09:06:23 +08:00
Sakura286 07774ffb15 magma: disabling build test 2026-06-07 22:03:23 +08:00
Sakura286 74fc6e16d4 miopen: add hipblas-common-devel as deps 2026-06-07 19:34:50 +08:00
Sakura286 c947b0e310 miopen: reformat 2026-06-07 19:31:38 +08:00
Sakura286 1e02dd5782 magma: add new arch to whitelist 2026-06-07 19:21:09 +08:00
Sakura286 3b28b45dc2 magma: add missing release section 2026-06-07 19:10:05 +08:00
Sakura286 e8fcc2382b magma: reformat 2026-06-07 19:07:14 +08:00
Sakura286 6b0480786d hipsolver: disable test due to lack of lapack 2026-06-07 18:35:41 +08:00
Sakura286 5823e62b6e hipsolver: build test 2026-06-07 18:33:39 +08:00
Sakura286 2ac35cabf1 hiprand: rename test bcond name 2026-06-07 18:19:41 +08:00
Sakura286 054470a536 hipsparse: remove unecessary parts 2026-06-07 18:17:50 +08:00
Sakura286 edcca2f203 hipsparse: remove unused cmake flags 2026-06-07 18:16:17 +08:00
Sakura286 a04dec285c hipsparse: add benchmark package 2026-06-07 18:11:13 +08:00
Sakura286 7517e1a790 hipsparse: add benchmark 2026-06-07 18:04:47 +08:00
Sakura286 ca26d6d3f1 hipsparse: split build_test and run_test part 2026-06-07 18:01:35 +08:00
Sakura286 9e24eafa24 hipsparse: use clang 2026-06-07 17:31:57 +08:00
Sakura286 522c37e248 hipsparse: re-enable tests 2026-06-07 13:14:37 +08:00
Sakura286 dfbfb5f1c8 hipsolver: re-enable test 2026-06-07 13:13:08 +08:00
Sakura286 f973757713 hiprand: fix header sections order 2026-06-07 13:03:10 +08:00
Sakura286 267edeb9ce hipfft: remove test flags 2026-06-07 13:02:21 +08:00
Sakura286 cf8e376b2a hiprand: build test but do not run 2026-06-07 12:46:55 +08:00
Sakura286 986ac3f9e4 hipfft: re-enable test 2026-06-07 12:42:00 +08:00
Sakura286 55bee7a159 hiprand: export LD_LIBRARY_PATH when testing 2026-06-07 12:40:44 +08:00
Sakura286 ce6ffadda6 hiprand: enable test; remove outdated sed parts 2026-06-07 12:35:19 +08:00
Sakura286 531c906d58 hipfft: fix format 2026-06-07 12:31:49 +08:00
Sakura286 8211251e2f hipcub: reuse clang compiler 2026-06-07 12:07:45 +08:00
Sakura286 94e53fa152 hipcub: reformat 2026-06-06 22:43:55 +08:00
Sakura286 511090debd roctracer: reformat 2026-06-06 22:37:01 +08:00
Sakura286 b278fa2fb0 rocrand: reformat 2026-06-06 22:35:00 +08:00
Sakura286 4c1605ba73 rocrand: reformat 2026-06-06 22:31:10 +08:00
Sakura286 7137ca3df2 rocm-origami: reformat 2026-06-06 22:28:39 +08:00
Sakura286 a318bc7bd0 rocm-core: reformat 2026-06-06 22:26:41 +08:00
Sakura286 0dfea5f5ce rocfft: remove unecessary files 2026-06-06 22:24:32 +08:00
Sakura286 d0e6019959 amdsmi: upgrade to 7.2.1 2026-06-06 22:21:50 +08:00
Sakura286 14c692ec72 rocfft: do not remove installed files 2026-06-06 22:15:08 +08:00
Sakura286 23c7e15505 amdsmi: downgrade to 7.1.1 version 2026-06-06 22:05:51 +08:00
Sakura286 e12a16239b amdsmi: reformat with correct patch and spdx header 2026-06-06 22:04:44 +08:00
Sakura286 6d4f36a4c6 rccl: remove test bcond 2026-06-06 21:58:10 +08:00
Sakura286 7a8720242c miopen: fix half buidreq 2026-06-06 21:46:04 +08:00
Sakura286 889f532249 half: reformat 2026-06-06 21:35:39 +08:00
Sakura286 969dc270ff frugally-deep: remove unused package 2026-06-06 21:29:32 +08:00
Sakura286 a33fc0de15 fplus: fix license issues 2026-06-06 21:22:14 +08:00
Sakura286 4ef07cd255 fplus: add buildreq cmake 2026-06-06 21:12:06 +08:00
Sakura286 ee4a432f4f fplus: reformat 2026-06-06 21:05:21 +08:00
Sakura286 c7a7aa0f43 amdsmi: reformat 2026-06-04 16:21:09 +08:00
Sakura286 921d1f5d6b amdsmi: reformat 2026-06-04 16:08:45 +08:00
Sakura286 dcd8a79008 amdsmi: fix blank line 2026-06-04 14:41:01 +08:00
Sakura286 a1225f9a53 python-torch: tolerate unparseable source in @overload body check 2026-06-03 16:14:29 +08:00
Sakura286 186a95a2d6 amdsmi: tolerate missing CPU/E-SMI symbols on non-x86_64 via wrapper patch 2026-06-03 11:22:13 +08:00
Sakura286 2e27a8dfff amdsmi: tolerate missing CPU/E-SMI symbols on non-x86_64
With -DENABLE_ESMI_LIB=OFF (all non-x86_64 arches) libamd_smi.so does
not export the CPU/E-SMI API, but the ctypesgen amdsmi_wrapper.py
eager-binds every symbol at import, so `import amdsmi` -- and therefore
`import torch` -- dies with "undefined symbol: amdsmi_get_cpu_handles".

Wrap the loaded library object in a proxy that resolves missing symbols
to stubs which only raise when actually called. This covers the whole
absent CPU API at once and, by wrapping the constructed library rather
than the ctypes.CDLL call, leaves PyTorch's own CDLL load hook intact.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 16:17:38 +08:00
Sakura286 089e273a59 python-pydantic-extra-types: also skip semantic_version in check 2026-06-01 15:45:34 +08:00
Sakura286 63c2f071b4 python-pydantic-extra-types: fix semver module exclusion name 2026-06-01 15:36:53 +08:00
Sakura286 aad046e6f0 python-pydantic-extra-types: fix BuildOptions 2026-06-01 15:30:04 +08:00
Sakura286 67ed132552 python-pydantic-extra-types: skip unpackaged optional submodules in check 2026-06-01 15:26:41 +08:00
Sakura286 f28a94aca9 python-pydantic-extra-types: init 2026-06-01 15:17:29 +08:00
Sakura286 52c35e25cf python-mistral-common: relax jsonschema requirement 2026-06-01 15:17:29 +08:00
Sakura286 e677d70c3a python-mistral-common: init 2026-06-01 15:02:54 +08:00
Sakura286 93f0485f92 python-torch: fix librocm_smi64 link (rsmi_init undefined symbol)
The sed injecting rocm_smi64 into Caffe2_PUBLIC_HIP_DEPENDENCY_LIBS
searched for "hipzrtc::hiprtc" (stray z) and never matched the real
"hiprtc::hiprtc", so librocm_smi64 was never linked and libtorch_hip.so
exported an undefined rsmi_init (called from intra_node_comm.cpp).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 09:37:20 +08:00
Sakura286 1726410184 python-torch: fix gemm/bgemm symbol mangling for clang 21
clang 21 mangles the instantiation-dependent SFINAE non-type template
parameter of at::cuda::blas::gemm/bgemm differently at an explicit
specialization (the definition) than at a deduced call site (the
reference), so libtorch_hip.so fails to dlopen with an undefined
gemm<float,float,(float*)0> symbol. Every real dtype already has an
explicit specialization, so drop the redundant SFINAE guard in %prep to
collapse the overloads and emit one consistent mangling.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-31 17:52:17 +08:00
Sakura286 f577a21b37 python-torch: hipcub package name fix 2026-05-29 14:37:18 +08:00
Sakura286 b25727666d python-torch: make ROCm symbol bridge functions weak
Without the weak attribute on the bridge, TensorMethods.cpp.o and the
bridge in Tensor.cpp.o both define the same Li0E mangling family as
strong globals and the libtorch_cpu.so link fails with duplicate symbol
errors for const_data_ptr<T,0> / mutable_data_ptr<T> / data_ptr<T>.

Marking the bridge functions weak lets the linker discard them whenever
the cpp specialisation provides the same name (preserving check_type)
while keeping them around to fill any mangling the cpp does not emit.
2026-05-29 14:18:46 +08:00
Sakura286 6ca7689012 hipsparselt: fix hipblaslt path 2026-05-29 12:03:36 +08:00
Sakura286 ed9f4f6afc hipsparselt: fix download url 2026-05-29 11:39:19 +08:00
Sakura286 45b8c3bfdc hipsparselt: fix %prep section 2026-05-29 11:38:00 +08:00
Sakura286 ddf386b754 hipsparselt: move some buildreq to test suite 2026-05-29 11:24:46 +08:00
Sakura286 8cba90661c python-torch: fix export name of c code 2026-05-29 11:20:52 +08:00
Sakura286 a4c50846a5 hipblaslt: remove unused sections 2026-05-28 22:02:58 +08:00
Sakura286 f6e62fab32 hipblaslt: reformatted 2026-05-28 21:41:30 +08:00
Sakura286 bdbd4e9972 hipblaslt: init 2026-05-28 21:34:15 +08:00
Sakura286 9bd3c11596 hipblaslt: remove 2026-05-28 21:17:30 +08:00
Sakura286 e58ef33461 hipblaslt: fix buildreq name 2026-05-28 19:37:52 +08:00
Sakura286 a6912e8806 roctracer: remove unecessary options 2026-05-28 19:26:42 +08:00
Sakura286 d780788f9d fplus: remove wrong parameters 2026-05-28 19:19:05 +08:00
Sakura286 90902bf573 fplus: remove unecessary package 2026-05-28 19:18:28 +08:00
Sakura286 b185400a3d fplus: use original name 2026-05-28 19:11:56 +08:00
Sakura286 fd94a9e6a9 frugally-deep: fix eigen3 buildreq; reorder buildreq 2026-05-28 19:03:53 +08:00
Sakura286 438e34b9e7 frugally-deep: fix requirement 2026-05-28 19:02:33 +08:00
Sakura286 f038bcd3d2 fplus: remove duplicated contents 2026-05-28 19:01:40 +08:00
Sakura286 25befba232 fplus: rename 2026-05-28 19:01:09 +08:00
Sakura286 74205a50f4 python-torch: strict restrictions of long branch factor 2026-05-28 18:58:39 +08:00
Sakura286 0a77959ccc python-torch: drop -DC10_NODEPRECATED to fix runtime undefined symbol in libtorch_hip.so
Importing torch from the resulting RPM fails at runtime with:

  ImportError: /usr/lib64/python3.13/site-packages/torch/lib/libtorch_hip.so:
  undefined symbol:
  _ZNK2at10TensorBase14const_data_ptrIfLi0EEEPKT_v
  (i.e. at::TensorBase::const_data_ptr<float, 0>() const)

Root cause:

  at::TensorBase::const_data_ptr<T>() is declared as a primary template
  in ATen/core/TensorBase.h:

      template <typename T,
                std::enable_if_t<!std::is_const_v<T>, int> = 0>
      const T* const_data_ptr() const;

  while ATen/core/TensorBase.cpp only provides explicit specializations
  for the scalar types (mangled as <float>, <double>, ...).  To make
  every translation unit resolve to those specializations, TensorBase.h
  also forward-declares them via AT_FORALL_SCALAR_TYPES_AND*(DECLARE_CAST).
  Some of those declarations sit behind C10_DEPRECATED-style guards.

  Adding -DC10_NODEPRECATED to the global CXX flags hides the
  specialization declarations from translation units that include
  TensorBase.h, so those TUs instantiate the primary template instead and
  end up referencing the <T, 0> mangling.  libtorch_cpu.so only exports
  the <T> mangling, so the references in libtorch_hip.so remain
  unresolved and only blow up at dlopen() time -- the link itself
  succeeds because lld defaults to --allow-shlib-undefined for shared
  libraries.

Fix:

  Strip -DC10_NODEPRECATED from CFLAGS/CXXFLAGS in %build (and %install
  if applicable) before invoking the build, so TensorBase.h exposes the
  explicit specializations to all TUs and HIP-side code references the
  same <T> symbols that libtorch_cpu.so exports.

Verified with:

  $ python3 -c 'import torch; print(torch.__version__)'
  2.11.0
  $ nm -DC --undefined-only \
      /usr/lib64/python3.13/site-packages/torch/lib/libtorch_hip.so \
      | grep const_data_ptr
  (no <T, 0> references remain)
2026-05-28 18:52:40 +08:00
Sakura286 034e581379 miopen: add clang-tools-extra BuildRequires 2026-05-27 17:02:00 +08:00
Sakura286 cf29fb687b miopen: add PATH for lld during HIP device linking 2026-05-27 16:51:16 +08:00
Sakura286 4480eddaed miopen: fix half.hpp include path 2026-05-27 16:23:10 +08:00
Sakura286 21f537366b fplus: reformatted 2026-05-26 01:05:56 +08:00
Sakura286 6fda8a1adb fplus: init 2026-05-26 01:02:55 +08:00
Sakura286 d72edd1c5b miopen: fix requirements 2026-05-26 00:58:01 +08:00
Sakura286 15956b129b half: fix package name 2026-05-26 00:54:00 +08:00
Sakura286 8b1f7b2fe5 amdsmi: fix install files 2026-05-25 23:55:26 +08:00
Sakura286 221ee2744a amdsmi: fix fuzz 2026-05-25 23:24:38 +08:00
Sakura286 2c72efe66f amdsmi: remove goamdsmi support of non-x86 arch 2026-05-25 23:00:04 +08:00
Sakura286 8a7af5653f amdsmi: enable download esmi package 2026-05-25 21:47:34 +08:00
Sakura286 376d10f7e1 amdsmi: remove esmi support of riscv64 2026-05-25 21:40:59 +08:00
Sakura286 8262896111 amdsmi: fix malformed patch (hunk 1 line count was wrong) 2026-05-25 21:11:34 +08:00
Sakura286 502435c5e2 amdsmi: fix cpuid.h build failure on riscv64
esmi_ib_library unconditionally includes <cpuid.h> which is an
x86-only GCC intrinsic header. Add a patch to guard the include
and the detect_packages() body with #ifdef __x86_64__, so ESMI
still compiles on riscv64 (returning ESMI_NOT_SUPPORTED at runtime).

This replaces the earlier -DENABLE_ESMI_LIB=OFF approach which
broke goamdsmi_shim.
2026-05-25 20:49:17 +08:00
Sakura286 c41e1002d8 amdsmi: disable ESMI on non-x86 to fix riscv64 build 2026-05-25 19:16:56 +08:00
Sakura286 15b1014b36 amdsmi: fix buildreq 2026-05-25 15:29:08 +08:00
Sakura286 a61f56b7fa amdsmi: remove unmet deps 2026-05-25 15:18:29 +08:00
Sakura286 32d8d69ce2 amdsmi: reformatted 2026-05-25 15:13:46 +08:00
Sakura286 572956d105 amdsmi: init 2026-05-25 14:55:37 +08:00
Sakura286 235371a524 magma: reverse to init version 2026-05-25 14:35:44 +08:00
Sakura286 6f94f69c0d rocm-origami: fix patch path for standalone tarball structure 2026-05-25 14:27:34 +08:00
Sakura286 ad93398e01 hipsolver: remove redundant files 2026-05-24 17:31:17 +08:00
Sakura286 c23f8a1aae hipsolver: drop already-applied patch, remove unused cmake opts 2026-05-24 15:02:34 +08:00
Sakura286 ab292af57f rocm-origami: fix format 2026-05-24 00:30:11 +08:00
Sakura286 8400712c23 rocm-origami: reformat 2026-05-24 00:29:31 +08:00
Sakura286 c7680921b3 rocm-origami: use declarative build 2026-05-24 00:23:23 +08:00
Sakura286 d9c05758d6 rocm-origami: reformatted 2026-05-23 23:39:00 +08:00
Sakura286 e619019452 rocm-origami: init 2026-05-23 23:38:10 +08:00
Sakura286 6aaf749822 rocm-origami: remove origami 2026-05-23 23:31:44 +08:00
Sakura286 e6a07b5769 hipblaslt: reformatted 2026-05-23 23:13:22 +08:00
Sakura286 d1b8be76a1 hipblaslt: init 2026-05-23 23:05:01 +08:00
Sakura286 e36696889a hipblaslt: remove 2026-05-23 23:00:33 +08:00
Sakura286 4d549c6061 hipblaslt: reformatted 2026-05-23 22:45:51 +08:00
Sakura286 b7eaca59be hipcub: fix install 2026-05-23 22:37:47 +08:00
Sakura286 c7fde28d0d hipcub: reformatted 2026-05-23 22:34:42 +08:00
Sakura286 a5aad5e577 roctracer: remove unpackaged doc files installed by CMake
CMake installs LICENSE.md to both share/doc/roctracer/ and
share/doc/roctracer-asan/, which are not packaged by the spec.
Remove them in %install to fix the unpackaged files error.
2026-05-23 20:23:15 +08:00
Sakura286 246dc9b70b hipfft: fix package name 2026-05-23 20:16:19 +08:00
Sakura286 c66f59a13b hipfft: reformatted 2026-05-23 16:59:32 +08:00
Sakura286 42e4f4af12 rocfft: remove unecessary flags 2026-05-23 16:38:49 +08:00
Sakura286 e5f6584bdf rocfft: build test but do not run 2026-05-23 10:02:54 +08:00
Sakura286 792db02eec rocfft: add libomp-devel as deps 2026-05-23 09:50:57 +08:00
Sakura286 dba1b00ecc rocfft: fix build req 2026-05-23 09:40:34 +08:00
Sakura286 deb0c24a12 rocfft: enable test 2026-05-22 19:50:34 +08:00
Sakura286 46088a35d2 rocfft: remove unused cmake flags 2026-05-22 19:49:55 +08:00
Sakura286 e8e883a1ad rocfft: reformated by hand 2026-05-22 18:23:37 +08:00
Sakura286 bfb407073d rocfft: restore original cmake flags and add check section 2026-05-22 18:07:44 +08:00
Sakura286 a0edeb49d7 half: fix files list, remove nonexistent cmake dir 2026-05-22 17:52:09 +08:00
Sakura286 e09773bb8d rocfft: fix missing install section and use GPU_TARGETS 2026-05-22 17:51:04 +08:00
Sakura286 886f23a6bf hiprand magma frugally-deep roctracer hipfft hipcub hipblaslt hipsparselt rocfft rocm-origami hipsolver: reformatted 2026-05-22 12:01:22 +08:00
Sakura286 8d5ea9fd15 hiprand magma frugally-deep roctracer hipfft hipcub hipblaslt hipsparselt rocfft rocm-origami hipsolver: init 2026-05-22 11:31:06 +08:00
Sakura286 bb56dc5180 half: reformated 2026-05-22 11:12:53 +08:00
Sakura286 29c193b86f half: init 2026-05-22 11:08:19 +08:00
Sakura286 820fcfc4d8 rocthrust: reformat 2026-05-22 10:43:57 +08:00
Sakura286 664ebe92dc miopen: reformat 2026-05-22 10:32:06 +08:00
Sakura286 8ec7b96b76 rocthrust: init 2026-05-22 10:16:06 +08:00
Sakura286 59a058073b miopen: init 2026-05-21 16:10:58 +08:00
Sakura286 38e8a7203b hipsparse: no test 2026-05-21 16:09:31 +08:00
Sakura286 2939420f71 hipsparse: verbose build 2026-05-21 15:56:13 +08:00
Sakura286 ee1b10fd83 hipsparse: ninja 2026-05-21 15:55:28 +08:00
Sakura286 801463d31c hipsparse: enable tests 2026-05-21 15:48:40 +08:00
Sakura286 51104ca5ab rccl: enable test 2026-05-21 15:33:51 +08:00
Sakura286 a91dddfb45 rccl: add heart beat 2026-05-21 10:23:35 +08:00
Sakura286 83ef4a074a rccl: lingker verbose 2026-05-20 17:31:02 +08:00
Sakura286 1f83cd1682 rccl: fix comment 2026-05-19 17:15:07 +08:00
Sakura286 f059e086d9 rccl: change link options 2026-05-19 17:13:43 +08:00
Sakura286 299e55ddb3 rccl: remove test package 2026-05-19 16:50:46 +08:00
Sakura286 f8784f3b5f rccl: re-enable lto flags 2026-05-19 16:45:16 +08:00
Sakura286 e4f1ca5192 rccl: remove smp flags 2026-05-19 16:44:49 +08:00
Sakura286 5172319ca9 rccl: remove unused compile options 2026-05-19 16:44:20 +08:00
Sakura286 a9f5c4e5c4 rccl: remove heartbeat test 2026-05-19 16:38:29 +08:00
Sakura286 b32c185b4e rccl: accelerate build 2026-05-18 10:48:19 +08:00
Sakura286 bc2eff50dc rccl: fix lto parallel options 2026-05-18 09:44:54 +08:00
Sakura286 5ab6177d17 rccl: enable ninja and gpu side lto 2026-05-18 09:29:36 +08:00
Sakura286 a0ad358fc2 rccl: disable tests 2026-05-18 01:10:46 +08:00
Sakura286 c94043db39 hipsparse: revert test, disable them 2026-05-17 21:34:52 +08:00
Sakura286 5700573777 hipsparse: enable test 2026-05-17 21:05:52 +08:00
Sakura286 cc5726f97d hipsparse: remove comment of clang 2026-05-17 21:05:36 +08:00
Sakura286 c908c4b131 hipsparse: add gcc-fortran 2026-05-17 21:03:30 +08:00
Sakura286 37454a3766 hipsparse: use gcc 2026-05-17 21:01:20 +08:00
Sakura286 12fea032bf hipsparse: reformat 2026-05-17 20:51:54 +08:00
Sakura286 e085da0100 rccl: add branch fix flags to device linker (target_link_options) 2026-05-17 20:42:38 +08:00
Sakura286 ebe7791e27 rccl: add duplicated -mllvm 2026-05-17 16:30:02 +08:00
Sakura286 526ce0484a rccl: add verbose 2026-05-17 16:24:57 +08:00
Sakura286 ae4dbaea3b rccl: fix llvm options 2026-05-17 10:44:24 +08:00
Sakura286 03ea023b51 rccl: fix format 2026-05-16 16:51:39 +08:00
Sakura286 7fe36b1888 hipsparse: init 2026-05-15 23:19:21 +08:00
Sakura286 f1e1bc7ae7 rccl: add llvm flags to fix exceed branch size 2026-05-15 23:16:59 +08:00
Sakura286 5233271e4f rocrand: skip tests 2026-05-15 23:01:46 +08:00
Sakura286 995a55cf18 rocrand: fix ctest LD_LIBRARY 2026-05-15 22:15:51 +08:00
Sakura286 3fa7e1440d rccl: disable lto 2026-05-15 21:53:32 +08:00
Sakura286 5e74799f7e rocrand: fix test 2026-05-15 21:39:10 +08:00
Sakura286 2124c5fcb3 rccl & rocrand: add llvm build stuffs 2026-05-15 21:05:24 +08:00
Sakura286 b3861a8539 rccl: add llvm 2026-05-15 21:03:14 +08:00
Sakura286 c657d723ad rocrand: add compiler-rt buildreq 2026-05-15 20:59:08 +08:00
Sakura286 014d308db9 rccl: add clang-tools-extra 2026-05-15 20:57:58 +08:00
Sakura286 3aeb362d18 rccl: add buildreq rocprofiler-register-devel 2026-05-15 20:27:34 +08:00
Sakura286 0b53b286dc rccl: add gtest 2026-05-15 20:23:52 +08:00
Sakura286 84612f5bf4 rocrand: reformat 2026-05-15 20:22:36 +08:00
Sakura286 e55b5c7a82 rccl: add clang 2026-05-15 20:05:34 +08:00
Sakura286 bda75b0ec0 rocrand: init 2026-05-15 20:04:16 +08:00
Sakura286 eab1e97471 rccl: build with clang 2026-05-15 19:39:54 +08:00
Sakura286 f10171edfb rccl: fix buildreqs 2026-05-15 19:01:32 +08:00
Sakura286 ee34ee9526 rccl: reformated 2026-05-15 18:54:12 +08:00
Sakura286 11f7be8eba rccl: init 2026-05-15 17:35:09 +08:00
Sakura286 7fab23cca4 Remove packages 2026-05-15 17:31:13 +08:00
Sakura286 58dc6a5166 rocm-core: init 2026-05-15 17:29:39 +08:00
Sakura286 040e4edf9d hipblaslt: init 2026-05-15 17:28:56 +08:00
Sakura286 fd21b6593a Add some packages 2026-05-09 19:16:13 +08:00
Sakura286 d1ac468a8f python-torch: claude fix 2026-04-30 17:26:42 +08:00
Sakura286 a23c081194 python-torch: mslf added 2026-04-30 15:07:53 +08:00
Sakura286 8d71034ecf python-torch: disable mslk 2026-04-30 15:00:53 +08:00
Sakura286 63d4edcbb7 python-torch: remove unused patch 2026-04-30 11:33:56 +08:00
Sakura286 af5cc788aa python-torch: temporarily remove patch 2026-04-30 10:50:45 +08:00
Sakura286 3b331cbbf2 python-torch: 2.11.0 2026-04-30 10:47:04 +08:00
Sakura286 25007fbfb9 python-torch: parallel build 2026-04-30 10:05:23 +08:00
Sakura286 50860b5898 python-torch: reverse tensorpipe 2026-04-30 08:54:41 +08:00
Sakura286 f8a9ad5b05 python-torch: local tensorpipe 2026-04-30 08:49:11 +08:00
Sakura286 29650623e6 python-torch: use system onnx 2026-04-30 08:45:27 +08:00
Sakura286 498bbf2fda python-torch: cmake_prefix_path mv out 2026-04-30 08:38:15 +08:00
Sakura286 28c476889c python-torch: disable rocm 2026-04-30 08:24:48 +08:00
Sakura286 8b985a8e40 python-torch: use system build req 2026-04-30 08:24:11 +08:00
Sakura286 4b9cbcfb7f python-torch: add amdgpu-inline-max-bb 2026-04-30 01:16:11 +08:00
Sakura286 066157025b python-torch: add o1 to HIP_CLANG_FLAGS 2026-04-29 16:47:13 +08:00
Sakura286 12baacca69 python-torch: long-branch-factor 2026-04-29 13:47:49 +08:00
Sakura286 9b306cd207 python-torch: add both flags to fix error 2026-04-29 11:34:46 +08:00
Sakura286 6798402182 fix pypi version source code 2026-04-23 15:02:16 +08:00
Sakura286 de0d19bb50 Init 2026-04-23 15:01:13 +08:00
Sakura286 b430f90f3b rocm-llvm: fix macros 2026-04-08 09:45:12 +08:00
Sakura286 45f4da568c rocm-llvm: upgrade to 7.2.1 2026-04-07 18:52:31 +08:00
Sakura286 32f381bf61 rocm-llvm: init 2026-04-07 16:22:10 +08:00
Sakura286 eecf6e8321 python-torch: build rel with debug info 2026-04-07 08:11:13 +08:00
Sakura286 1ba033ec4c python-torch: try fix branch size exceeds 2026-04-07 08:09:21 +08:00
Sakura286 ab1a0ae614 python-torch: add cmake buildtype 2026-04-07 07:15:34 +08:00
Sakura286 e8653e7f96 python-torch: remove long-branch 2026-04-05 16:36:06 +08:00
Sakura286 ee000e8ac3 python-torch: hipcc flags 2026-04-05 10:43:36 +08:00
Sakura286 96b0aa7355 python-torch: use longer branch 2026-04-03 18:48:46 +08:00
Sakura286 a1abe8ec3b python-torch: add cmake_cxx_implicit_include_dir 2026-04-03 16:52:19 +08:00
Sakura286 526bdecf36 pythont-torch: add cmake_no_system_from_imported 2026-04-03 16:42:27 +08:00
Sakura286 e1de2aa0d3 python-torch: disable build 2026-04-03 15:51:56 +08:00
Sakura286 7b8a95b48c python-torch: no onnx 2026-04-03 15:51:22 +08:00
Sakura286 4c6f368f97 python-torch: use cmake_cxx_flags 2026-04-03 15:47:11 +08:00
Sakura286 f77a39d2b2 python-torch: use build_cflags 2026-04-03 15:42:13 +08:00
Sakura286 aadfe3d4d0 python-torch: fix cxx flags 2026-04-03 13:36:13 +08:00
Sakura286 10b7b30807 python-torch: add compiler-rt 2026-04-03 12:43:51 +08:00
Sakura286 7844f7ea9c python-torch: add cmake prefix path 2026-04-03 12:07:31 +08:00
Sakura286 79de0035fe python-torch: add CMAKE_LIBRARY_PATH 2026-04-03 11:48:57 +08:00
Sakura286 64df399df4 python-torch: add cmake search path 2026-04-03 11:34:05 +08:00
Sakura286 fdae57bf35 python-torch: cpuinfo 2026-04-03 11:10:40 +08:00
Sakura286 0f5dd1cf64 python-torch: remove 3rd-party protobuf 2026-04-03 10:59:06 +08:00
Sakura286 6f1bbe64eb python-torch: toolchain 2026-04-03 10:52:39 +08:00
Sakura286 46aaefeddb python-torch: compiler 2026-04-03 10:43:59 +08:00
Sakura286 06bdc9182b python-torch: clang 2026-04-03 10:08:26 +08:00
CHEN Xuan e9585264e5 python-torch: disable parallel jobs of hip 2026-03-30 04:46:24 +00:00
Sakura286 8aba19870d python-torch: remove unused archs 2026-03-30 11:41:33 +08:00
Sakura286 54558460e7 python-torch: remove rocm-core sed 2026-03-30 11:19:57 +08:00
Sakura286 fc2dc9de21 python-torch: parallel build 2026-03-30 11:14:26 +08:00
Sakura286 05533a8fa8 python-torch: add lld linker 2026-03-30 11:09:50 +08:00
Sakura286 0f35a5dccd python-torch: do not parallel build hip code 2026-03-30 10:05:04 +08:00
Sakura286 0e13367518 python-torch: parallel-jobs only used by clang 2026-03-30 09:33:42 +08:00
Sakura286 8660d90f09 python-torch: use clang 2026-03-30 09:12:03 +08:00
Sakura286 f3e3b70934 python-torch: fix HIPOCCUPANCYMAXACTIVEBLOCKSPERMULTIPROCESSOR 2026-03-30 08:51:23 +08:00
Sakura286 371b33204c rocm-core: reformat 2026-03-27 16:18:14 +08:00
Sakura286 06e6e31fe1 fix path 2026-03-27 16:16:56 +08:00
Sakura286 b50f61f025 rocm-core: fix path 2026-03-27 16:14:44 +08:00
Sakura286 5e69e1c7d6 rocm-core: use default includedir path 2026-03-27 16:08:31 +08:00
Sakura286 754dc2e246 rocm-core: fix install 2026-03-27 15:59:40 +08:00
Sakura286 03763cb29f rocm-core: fix typo 2026-03-27 15:43:59 +08:00
Sakura286 ac041f8a77 rocm-core:init 2026-03-27 15:39:26 +08:00
Sakura286 d91bf90adf python-torch: try to fix fbgemm build error 2026-03-26 16:44:04 +08:00
Sakura286 57d576b3a5 python-torch: fix rocm buildreq 2026-03-26 16:22:54 +08:00
Sakura286 4da102d869 python-torch: rocm init 2026-03-26 16:17:16 +08:00
Sakura286 96e2502b07 python-torch: init 2026-03-26 16:16:36 +08:00
Sakura286 cc35d66464 clean some code because they are merged to openruyi 2026-03-26 15:50:20 +08:00
Sakura286 31c1ecba2e ollama: finally fix 2026-03-24 17:04:13 +08:00
Sakura286 8d9e4ba8a8 ollama: add comment; enable tests 2026-03-24 16:07:04 +08:00
Sakura286 36a5044d9e ollama: fix license 2026-03-24 16:00:01 +08:00
Sakura286 5eee66b6f7 ollama: fix %prep 2026-03-24 13:06:37 +08:00
Sakura286 b0557aaa5e ollama: fix ollama user create 2026-03-24 12:08:02 +08:00
Sakura286 7acfaf23d2 ollama: systemd 2026-03-24 11:45:22 +08:00
Sakura286 0cfa984eef ollama: reformat; add comment 2026-03-24 11:33:31 +08:00
Sakura286 d1069acf22 ollama: limit batch size 2026-03-24 10:12:03 +08:00
Sakura286 8455135367 ollama: cmake use lib, not lib64. override them 2026-03-24 09:30:33 +08:00
Sakura286 a49db53e51 ollama: fix lib64 2026-03-24 09:10:18 +08:00
Sakura286 8d7c818772 ollama: fix lib64 2026-03-24 01:21:59 +08:00
Sakura286 60973b86ab ollama: lib64 2026-03-24 01:07:04 +08:00
Sakura286 0022a83da2 ollama: remove bundle 2026-03-23 23:59:40 +08:00
Sakura286 15a0d2df76 ollama: remove bundle 2026-03-23 23:12:21 +08:00
Sakura286 1037a25dd8 ollama: fix prefix 2026-03-23 20:40:30 +08:00
Sakura286 8bad881549 ollama: test install prefix 2026-03-23 20:34:03 +08:00
Sakura286 335f7e8e95 ollama: test build with original cmake 2026-03-23 20:25:05 +08:00
Sakura286 b363778ed1 ollama: backward 2026-03-23 17:03:51 +08:00
Sakura286 bda753f919 hipblas: fix install 2026-03-23 16:50:14 +08:00
Sakura286 94da1d013d hipblas: reformat 2026-03-23 16:46:56 +08:00
Sakura286 4b67dddc3d ollama: use bundle 2026-03-23 15:11:49 +08:00
Sakura286 b3ce9b994b ollama: fixed 2026-03-23 13:48:35 +08:00
Sakura286 3b4ebf7ad0 ollama: test ggml 2026-03-23 13:05:37 +08:00
Sakura286 b296edb629 ollama: use original cmake 2026-03-23 11:27:08 +08:00
Sakura286 8ec1c1cd75 ollama: test riscv64 build, with original cmake command 2026-03-23 10:30:48 +08:00
Sakura286 7aeb0978ca ollama: disable tests temporarily 2026-03-16 16:30:53 +08:00
Sakura286 28491dd4e5 ollama: fix included files 2026-03-16 16:18:12 +08:00
Sakura286 6e795389a9 ollama: fix buildreq 2026-03-16 16:02:00 +08:00
Sakura286 18825b326c ollama: fix install path 2026-03-16 15:48:54 +08:00
Sakura286 271bc01497 ollama: fix install 2026-03-16 15:46:46 +08:00
Sakura286 92618ac659 ollama: add comment 2026-03-16 15:32:01 +08:00
Sakura286 a0343bcd69 ollama: fix build 2026-03-16 15:30:03 +08:00
78 changed files with 5032 additions and 1788 deletions
@@ -0,0 +1,38 @@
From b8b4a6bcfe35ba9539a120cfd16573123ddd9241 Mon Sep 17 00:00:00 2001
From: "Sv. Lockal" <lockalsash@gmail.com>
Date: Mon, 15 Dec 2025 03:46:35 +0800
Subject: [PATCH] Fix compilation with libdrm-2.4.130
Fix error: redefinition of 'struct drm_color_ctm_3x4'.
drm_color_ctm_3x4 structure is now defined in https://github.com/torvalds/linux/commit/e5719e7f19009d4fbedf685fc22eec9cd8de154f#diff-4c51fb416ec7cc69566cd7b795ee57eb070aa1006ad65d6962081f039ffb2718
As this structure is unused and not a part of amdsmi public interface,
it is safe to remove it.
---
include/amd_smi/impl/amdgpu_drm.h | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/include/amd_smi/impl/amdgpu_drm.h b/include/amd_smi/impl/amdgpu_drm.h
index b56a5ac4b20a..0e483d13b382 100644
--- a/include/amd_smi/impl/amdgpu_drm.h
+++ b/include/amd_smi/impl/amdgpu_drm.h
@@ -1625,15 +1625,6 @@ struct drm_amdgpu_info_uq_metadata {
#define AMDGPU_FAMILY_GC_11_5_0 150 /* GC 11.5.0 */
#define AMDGPU_FAMILY_GC_12_0_0 152 /* GC 12.0.0 */
-/* FIXME wrong namespace! */
-struct drm_color_ctm_3x4 {
- /*
- * Conversion matrix with 3x4 dimensions in S31.32 sign-magnitude
- * (not two's complement!) format.
- */
- __u64 matrix[12];
-};
-
#if defined(__cplusplus)
}
#endif
--
2.51.1
@@ -0,0 +1,36 @@
From 21afd2c2d58b8c7895df47f6a16a0781a7f0024a Mon Sep 17 00:00:00 2001
From: Sakura286 <sakura286@outlook.com>
Date: Sat, 6 Jun 2026 20:38:13 +0800
Subject: [PATCH 1/2] Disable goamdsmi_shim when ESMI is off
The Go shim in goamdsmi_shim/smiwrapper/amdsmi_go_shim.c calls CPU-only
APIs such as amdsmi_get_cpu_core_energy, amdsmi_get_threads_per_core
and amdsmi_get_processor_handles_by_type. In include/amd_smi/amdsmi.h
these are all guarded by `#ifdef ENABLE_ESMI_LIB`, so when ESMI is
disabled (e.g. on riscv64 / non-x86 architectures) the declarations
disappear and the shim fails to build with implicit-function-declaration
errors.
---
CMakeLists.txt | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 1b7375f..013c1ad 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -363,8 +363,10 @@ install(
PATTERN "build*" EXCLUDE
PATTERN ".cache*" EXCLUDE)
-# Make for goamdsmi_shim library
-add_subdirectory(goamdsmi_shim)
+# The Go shim uses CPU APIs gated by ENABLE_ESMI_LIB; only build it when ESMI is on
+if(ENABLE_ESMI_LIB)
+ add_subdirectory(goamdsmi_shim)
+endif()
#Debian package specific variables
set(CPACK_DEBIAN_PACKAGE_RECOMMENDS "python3-argcomplete, libdrm-dev, libdrm-amdgpu-dev")
--
2.53.0
@@ -0,0 +1,77 @@
From 41740f15ede6e04e46ff736bcc85ca8fd1aae641 Mon Sep 17 00:00:00 2001
From: Sakura286 <sakura286@outlook.com>
Date: Sat, 6 Jun 2026 21:55:37 +0800
Subject: [PATCH 2/2] Tolerate missing CPU/E-SMI symbols on non-x86_64
With ENABLE_ESMI_LIB=OFF (non-x86_64), libamd_smi.so omits the CPU API,
but the ctypesgen wrapper binds every symbol at import time, so the
missing CPU symbols make `import amdsmi` fail with AttributeError.
Wrap the loaded CDLL in a proxy so missing symbols resolve to a lazy
stub that only raises when actually called. `import amdsmi` then works
and GPU consumers (e.g. PyTorch with ROCM EP, which never calls the CPU
API) are unaffected. The library object is wrapped instead of ctypes.CDLL
itself so callers hooking ctypes.CDLL keep working.
---
py-interface/amdsmi_wrapper.py | 38 +++++++++++++++++++++++++++++++++-
1 file changed, 37 insertions(+), 1 deletion(-)
diff --git a/py-interface/amdsmi_wrapper.py b/py-interface/amdsmi_wrapper.py
index 99ed017..0626b72 100644
--- a/py-interface/amdsmi_wrapper.py
+++ b/py-interface/amdsmi_wrapper.py
@@ -176,6 +176,42 @@ from pathlib import Path
# 3. Relative to amdsmi_wrapper.py
# - parent directory
# - current directory
+class _AmdsmiMissingSymbol:
+ # Placeholder for a symbol absent from libamd_smi.so (e.g. the CPU/E-SMI
+ # API when the library is built with ENABLE_ESMI_LIB=OFF on non-x86_64).
+ # It accepts the restype/argtypes the wrapper assigns at import time and
+ # only raises if the symbol is ever actually called.
+ def __init__(self, name):
+ object.__setattr__(self, "_amdsmi_name", name)
+
+ def __setattr__(self, key, value):
+ object.__setattr__(self, key, value)
+
+ def __call__(self, *args, **kwargs):
+ name = object.__getattribute__(self, "_amdsmi_name")
+ raise NotImplementedError(
+ "amdsmi symbol " + repr(name) + " is unavailable: it is not "
+ "exported by this build of libamd_smi.so")
+
+
+class _AmdsmiTolerantLib:
+ # Proxy around the loaded CDLL so that binding a symbol the library does
+ # not export resolves to a stub instead of raising AttributeError at
+ # import time. We wrap the already-constructed library object rather than
+ # the ctypes.CDLL() call so that callers hooking ctypes.CDLL (e.g.
+ # PyTorch's libamd_smi.so loader) keep working.
+ def __init__(self, lib):
+ object.__setattr__(self, "_amdsmi_lib", lib)
+
+ def __getattr__(self, name):
+ try:
+ return getattr(object.__getattribute__(self, "_amdsmi_lib"), name)
+ except AttributeError:
+ if name.startswith("__"):
+ raise
+ return _AmdsmiMissingSymbol(name)
+
+
def find_smi_library():
err = OSError("Could not load libamd_smi.so")
possible_locations = []
@@ -194,7 +230,7 @@ def find_smi_library():
for location in possible_locations:
try:
lib = ctypes.CDLL(location)
- return lib, location
+ return _AmdsmiTolerantLib(lib), location
except OSError as e:
err = e
continue
--
2.53.0
+175
View File
@@ -0,0 +1,175 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%bcond test 0
%if %{with test}
%global build_test ON
%else
%global build_test OFF
%endif
%global rocm_release 7.2
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
# esmi_ib_library is not suitable for packaging
# https://github.com/amd/esmi_ib_library/issues/13
# This tag was chosen by the amdsmi project because 4.0+ introduced variables
# not found in the upstream kernel.
%global esmi_ver 4.2
%global pkg_library_version 26
Name: amdsmi
Version: %{rocm_version}
Release: %autorelease
Summary: AMD System Management Interface
License: MIT AND (GPL-2.0-only WITH Linux-syscall-note) AND NSCA
# Main license is MIT
#
# This file is GPL-2.0:
# include/amd_smi/impl/amd_hsmp.h
# esmi_ib_library/include/asm/amd_hsmp.h
# Both carry: SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
#
# NSCA covers the bundled esmi_ib_library
Url: https://github.com/ROCm/rocm-systems
#!RemoteAsset: sha256:23c31cd787d86ee35c82746fcde705eacc46517815110376f28417909ef46406
Source0: %{url}/releases/download/rocm-%{version}/%{name}.tar.gz
#!RemoteAsset: sha256:de19d222d09e2171f47f8bbd6608e5648bd547c82543379bb8fb5ed2e379e141
Source1: https://github.com/amd/esmi_ib_library/archive/refs/tags/esmi_pkg_ver-%{esmi_ver}.tar.gz
BuildSystem: cmake
# Support libdrm 2.4.130+
# https://github.com/ROCm/amdsmi/pull/165
Patch0: 0001-Fix-compilation-with-libdrm-2.4.130.patch
# -DENABLE_ESMI_LIB=OFF is not enough.
# Goamdshim references CPU/ESMI-only APIs; only build it when ESMI is on
Patch1: 2001-Disable-goamdsmi_shim-when-ESMI-is-off.patch
# Without ESMI (non-x86_64) libamd_smi.so omits the CPU API; let the ctypesgen
# wrapper tolerate the missing symbols so `import amdsmi` still works
Patch2: 2002-Tolerate-missing-CPU-E-SMI-symbols-on-non-x86_64.patch
BuildOption(conf): -G Ninja
BuildOption(conf): -DBUILD_TESTS=%{build_test}
BuildOption(conf): -DCMAKE_SKIP_INSTALL_RPATH=TRUE
%ifnarch x86_64
BuildOption(conf): -DENABLE_ESMI_LIB=OFF
%endif
BuildRequires: cmake
%if %{with test}
BuildRequires: cmake(GTest)
%endif
BuildRequires: ninja
BuildRequires: pkgconfig(libdrm)
BuildRequires: pkgconfig(libdrm_amdgpu)
BuildRequires: pkgconfig(python3)
Requires: python3dist(pyyaml)
%description
The AMD System Management Interface Library, or AMD SMI library, is a C
library for Linux that provides a user space interface for applications
to monitor and control AMD devices.
%package devel
Summary: Libraries and headers for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description devel
%{summary}
%if %{with test}
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%endif
%prep
%autosetup -p1 -N -n %{name}
%patch 0 -p1
%patch 1 -p1
%ifnarch x86_64
%patch 2 -p1
%endif
# ESMI - EPYC System Management Interface
# esmi_ib_library uses x86-only cpuid.h; guard it for non-x86 builds
%ifarch x86_64
tar xf %{SOURCE1}
mv esmi_ib_library-* esmi_ib_library
mv esmi_ib_library/License.txt esmi_ib_library_License.txt
# The esmi version check uses git tags, but we use tar's without git files.
# Just inject in the tag that we've pulled into the version check:
sed -i 's/NOT latest_esmi_tag/NOT "esmi_pkg_ver-%{esmi_ver}"/' CMakeLists.txt
%endif
# /usr/libexec/amdsmi_cli/BDF.py:126: SyntaxWarning: invalid escape sequence '\.'
sed -i -e 's@bdf_regex = "@bdf_regex = r"@' amdsmi_cli/BDF.py
# Fix script shebang
sed -i -e 's@env python3@python3@' amdsmi_cli/*.py
%install -a
mkdir -p %{buildroot}%{python3_sitearch}
mv %{buildroot}%{_datadir}/amdsmi %{buildroot}%{python3_sitearch}
mv %{buildroot}%{_datadir}/pyproject.toml %{buildroot}%{python3_sitearch}/amdsmi/
# W: unstripped-binary-or-object .../amdsmi/libamd_smi.so
# Does an explicit open, so can not just rm it; strip it instead
strip %{buildroot}%{python3_sitearch}/amdsmi/*.so
# E: non-executable-script .../amdsmi_cli/amdsmi_cli_exceptions.py 644 /usr/bin/env python3
chmod a+x %{buildroot}%{_libexecdir}/amdsmi_cli/amdsmi_*.py
rm -rf %{buildroot}%{_datadir}/example
rm -rf %{buildroot}%{_datadir}/amd_smi/example
rm -f %{buildroot}%{_datadir}/_version.py
rm -f %{buildroot}%{_datadir}/amd_smi/_version.py
rm -f %{buildroot}%{_datadir}/setup.py
rm -f %{buildroot}%{_datadir}/amd_smi/setup.py
rm -f %{buildroot}%{_docdir}/amd_smi-asan/LICENSE.txt
rm -f %{buildroot}%{_docdir}/amd-smi-lib/LICENSE.txt
rm -f %{buildroot}%{_docdir}/amd-smi-lib/README.md
rm -rf %{buildroot}%{_docdir}/amd-smi-lib/copyright
if [ -e %{buildroot}%{_datadir}/amd_smi/tests ]; then
mkdir -p %{buildroot}%{_datadir}/amdsmi
mv %{buildroot}%{_datadir}/amd_smi/tests %{buildroot}%{_datadir}/amdsmi/
fi
%files
%doc README.md
%license LICENSE
%{_bindir}/amd-smi
%{_libdir}/libamd_smi.so.%{pkg_library_version}{,.*}
%{_libexecdir}/amdsmi_cli
%{python3_sitearch}/amdsmi
%ifarch x86_64
%license esmi_ib_library_License.txt
%{_libdir}/libgoamdsmi_shim64.so.1{,.*}
%endif
%files devel
%{_includedir}/amd_smi/
%{_libdir}/cmake/amd_smi/
%{_libdir}/libamd_smi.so
%ifarch x86_64
%{_includedir}/*.h
%{_libdir}/libgoamdsmi_shim64.so
%endif
%if %{with test}
%files test
%{_datadir}/amdsmi/
%endif
%changelog
%autochangelog
+36
View File
@@ -0,0 +1,36 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
Name: fplus
Version: 0.2.28
Release: %autorelease
Summary: Helps you write concise and readable C++ code
Url: https://github.com/Dobiasd/FunctionalPlus
License: BSL-1.0
#!RemoteAsset: sha256:8864a3e9bebde6ebed71b49ac2a036cedf9ae0f02ce758bc28c21e6a2ae15803
Source0: %{url}/archive/v%{version}.tar.gz
BuildSystem: cmake
BuildRequires: cmake
%description
FunctionalPlus is a small header-only library supporting you in
reducing code noise and in dealing with only one single level
of abstraction at a time. By increasing brevity and maintainability
of your code it can improve productivity (and fun!) in the long
run. It pursues these goals by providing pure and easy-to-use
functions that free you from implementing commonly used flows of
control over and over again.
%files
%doc README.md
%license LICENSE
%{_includedir}/fplus/
%{_libdir}/cmake/FunctionalPlus/
%changelog
%autochangelog
+40
View File
@@ -0,0 +1,40 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%global rocm_release 7.1
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
Name: half
Version: %{rocm_version}
Release: %autorelease
Summary: ROCm half-precision floating point library
License: MIT
Url: https://github.com/ROCm/half
#!RemoteAsset: sha256:1b5de9e50513560265a79022fd74322b77216f9bf938be688709a8e7d1d8d09d
Source0: %{url}/archive/rocm-%{version}.tar.gz
BuildSystem: cmake
BuildRequires: cmake
BuildRequires: rocm-cmake
%description
half is a C++ header-only library providing an IEEE-754 conformant
half-precision floating point type along with arithmetic operators,
type conversions, and common mathematical functions. It is part of
the ROCm software stack.
%install -a
rm -f %{buildroot}%{_datadir}/doc/half/LICENSE.txt
%files
%license LICENSE.txt
%doc README.txt
%{_includedir}/half/
%changelog
%autochangelog
-44
View File
@@ -1,44 +0,0 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%global rocm_version 7.1.1
Name: hipblas-common
Version: %{rocm_version}
Release: %autorelease
Summary: Common files shared by hipBLAS and hipBLASLt
License: MIT
Url: https://github.com/ROCm/hipBLAS-common
#!RemoteAsset
Source0: %{url}/archive/rocm-%{rocm_version}.tar.gz
BuildArch: noarch
BuildSystem: cmake
BuildRequires: cmake
BuildRequires: gcc-c++
BuildRequires: rocm-cmake
%description
%summary
%package devel
Summary: Libraries and headers for %{name}
Provides: %{name}-static = %{version}-%{release}
%description devel
%{summary}
%install -a
rm -f %{buildroot}%{_prefix}/share/doc/hipblas-common/LICENSE.md
%files devel
%license LICENSE.md
%{_includedir}/%{name}
%{_libdir}/cmake/%{name}
%changelog
%{?autochangelog}
-91
View File
@@ -1,91 +0,0 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%global rocm_version 7.1.1
%bcond test 0
Name: hipblas
Version: %{rocm_version}
Release: %autorelease
Summary: ROCm BLAS marshalling library
License: MIT
Url: https://github.com/ROCm/hipBLAS
#!RemoteAsset
Source0: %{url}/archive/rocm-%{rocm_version}.tar.gz
BuildSystem: cmake
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(hip)
BuildRequires: cmake(hipblas-common)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(rocblas)
BuildRequires: cmake(rocsolver)
BuildRequires: compiler-rt
BuildRequires: gcc-c++
BuildRequires: gcc-fortran
BuildRequires: hipcc
BuildRequires: lld
BuildRequires: llvm
BuildRequires: rocm-cmake
BuildRequires: rocm-llvm-macros
Provides: hipblas = %{version}-%{release}
%description
hipBLAS is a Basic Linear Algebra Subprograms (BLAS) marshalling
library, with multiple supported backends. It sits between the
application and a 'worker' BLAS library, marshalling inputs into
the backend library and marshalling results back to the
application. hipBLAS exports an interface that does not require
the client to change, regardless of the chosen backend. Currently,
hipBLAS supports rocBLAS and cuBLAS as backends.
%package devel
Summary: Libraries and headers for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
Requires: cmake(hipblas-common)
%description devel
%{summary}
%if %{with test}
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%endif
%prep -a
# This is a tarball, no .git to query
sed -i -e 's@find_package(Git REQUIRED)@#find_package(Git REQUIRED)@' library/CMakeLists.txt
%build -a
rm -f %{buildroot}%{_prefix}/share/doc/hipblas/LICENSE.md
%files
%license LICENSE.md
%doc README.md
%{_libdir}/libhipblas.so.3{,.*}
%files devel
%{_includedir}/hipblas/
%{_libdir}/libhipblas.so
%{_libdir}/cmake/hipblas/
%if %{with test}
%files test
%{_bindir}/hipblas*
%endif
%changelog
%{?autochangelog}
@@ -0,0 +1,25 @@
From 43c4a61c5d8836a16feb8e53c72f255790523ff3 Mon Sep 17 00:00:00 2001
From: Tom Rix <Tom.Rix@amd.com>
Date: Mon, 3 Nov 2025 06:11:40 -0800
Subject: [PATCH] hipblaslt find origami package
---
CMakeLists.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/CMakeLists.txt b/CMakeLists.txt
index dbccca92c84f..d02df50540c2 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -218,7 +218,7 @@ if(HIPBLASLT_ENABLE_MSGPACK)
endif()
endif()
-add_subdirectory("${CMAKE_CURRENT_SOURCE_DIR}/../../shared/origami" origami)
+find_package(origami CONFIG REQUIRED)
add_subdirectory(tensilelite)
if(HIPBLASLT_ENABLE_HOST)
--
2.52.0
@@ -0,0 +1,57 @@
From 72521fcca77c010b9c8b9ce91cde925164502d6f Mon Sep 17 00:00:00 2001
From: Tom Rix <Tom.Rix@amd.com>
Date: Thu, 25 Sep 2025 13:02:55 -0700
Subject: [PATCH] hipblaslt tensilelite remove yappi dependency
Signed-off-by: Tom Rix <Tom.Rix@amd.com>
---
tensilelite/Tensile/TensileCreateLibrary/Run.py | 15 ---------------
tensilelite/requirements.txt | 2 +-
2 files changed, 1 insertion(+), 16 deletions(-)
diff --git a/tensilelite/Tensile/TensileCreateLibrary/Run.py b/tensilelite/Tensile/TensileCreateLibrary/Run.py
index 835ed9c01916..a02705f6554a 100644
--- a/tensilelite/Tensile/TensileCreateLibrary/Run.py
+++ b/tensilelite/Tensile/TensileCreateLibrary/Run.py
@@ -231,12 +231,6 @@ def writeSolutionsAndKernels(
generateSourcesAndExit=False,
compress=True,
):
- if globalParameters["PythonProfile"]:
- globalParameters["CpuThreads"] = 0
- printWarning("Python profiling is enabled. CpuThreads set to 0.")
- import yappi
- yappi.start()
-
codeObjectFiles = []
outputPath = Path(outputPath)
@@ -299,15 +293,6 @@ def writeSolutionsAndKernels(
writeHelpers(outputPath, kernelHelperObjs, KERNEL_HELPER_FILENAME_CPP, KERNEL_HELPER_FILENAME_H)
srcKernelFile = Path(outputPath) / "Kernels.cpp"
- if globalParameters["PythonProfile"]:
- yappi.stop()
- yappi.get_func_stats().save("yappi_results.profile", type="callgrind")
- with open("yappi_results.txt", "w") as f:
- yappi.get_func_stats().print_all(out=f)
- if globalParameters["CpuThreads"] != 0:
- with open("yappi_thread_stats.txt", "w") as f:
- yappi.get_thread_stats().print_all(out=f)
-
if not generateSourcesAndExit:
codeObjectFiles += buildAssemblyCodeObjectFiles(
asmToolchain.linker,
diff --git a/tensilelite/requirements.txt b/tensilelite/requirements.txt
index 60c4c1144537..e87db8445411 100644
--- a/tensilelite/requirements.txt
+++ b/tensilelite/requirements.txt
@@ -7,4 +7,4 @@ joblib>=1.1.1; python_version < '3.8'
simplejson
ujson
orjson
-yappi
+
--
2.52.0
@@ -0,0 +1,41 @@
From 1ac117ac0591a0f1bb67c34f537354c21412b2d8 Mon Sep 17 00:00:00 2001
From: Tom Rix <Tom.Rix@amd.com>
Date: Sat, 1 Nov 2025 09:43:58 -0700
Subject: [PATCH] hipblaslt tensilelite use fedora paths
---
tensilelite/Tensile/Common/GlobalParameters.py | 2 +-
tensilelite/Tensile/Toolchain/Validators.py | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tensilelite/Tensile/Common/GlobalParameters.py b/tensilelite/Tensile/Common/GlobalParameters.py
index 567188da59bd..1f8037c183a6 100644
--- a/tensilelite/Tensile/Common/GlobalParameters.py
+++ b/tensilelite/Tensile/Common/GlobalParameters.py
@@ -538,7 +538,7 @@ def assignGlobalParameters(config, isaInfoMap: Dict[IsaVersion, IsaInfo]):
else:
print2(" %24s: %8s (unspecified)" % (key, defaultValue))
- globalParameters["ROCmPath"] = "/opt/rocm"
+ globalParameters["ROCmPath"] = "/usr"
if "ROCM_PATH" in os.environ:
globalParameters["ROCmPath"] = os.environ.get("ROCM_PATH")
if "TENSILE_ROCM_PATH" in os.environ:
diff --git a/tensilelite/Tensile/Toolchain/Validators.py b/tensilelite/Tensile/Toolchain/Validators.py
index fd5dab5324c0..3ce024d31f52 100644
--- a/tensilelite/Tensile/Toolchain/Validators.py
+++ b/tensilelite/Tensile/Toolchain/Validators.py
@@ -30,8 +30,8 @@ from typing import List, NamedTuple, Union
from Tensile.Common.Utilities import isRhel8
-DEFAULT_ROCM_BIN_PATH_POSIX = Path("/opt/rocm/bin")
-DEFAULT_ROCM_LLVM_BIN_PATH_POSIX = Path("/opt/rocm/lib/llvm/bin")
+DEFAULT_ROCM_BIN_PATH_POSIX = Path("/usr/bin")
+DEFAULT_ROCM_LLVM_BIN_PATH_POSIX = Path("/usr/lib64/rocm/llvm/bin")
DEFAULT_ROCM_BIN_PATH_WINDOWS = Path("C:/Program Files/AMD/ROCm")
--
2.52.0
@@ -0,0 +1,31 @@
From 9aa4664e02e27b50083be08e5b495cbef02d6f08 Mon Sep 17 00:00:00 2001
From: Sakura286 <sakura286@outlook.com>
Date: Mon, 8 Jun 2026 09:00:08 +0800
Subject: [PATCH] hipblaslt tensilelite use system nanobind
---
tensilelite/rocisa/CMakeLists.txt | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/tensilelite/rocisa/CMakeLists.txt b/tensilelite/rocisa/CMakeLists.txt
index 3918f18..9c6fcd3 100644
--- a/tensilelite/rocisa/CMakeLists.txt
+++ b/tensilelite/rocisa/CMakeLists.txt
@@ -17,13 +17,7 @@ target_include_directories(rocisa-cpp
)
if(HIPBLASLT_BUNDLE_PYTHON_DEPS)
- include(FetchContent)
- FetchContent_Declare(
- nanobind
- GIT_REPOSITORY https://github.com/wjakob/nanobind.git
- GIT_TAG 9b3afa9dbdc23641daf26fadef7743e7127ff92f # v2.6.1
- )
- FetchContent_MakeAvailable(nanobind)
+ find_package(nanobind CONFIG REQUIRED)
set(ROCISAINST_SOURCE "${CMAKE_CURRENT_SOURCE_DIR}/rocisa/src/instruction/instruction.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/rocisa/src/instruction/common.cpp"
--
2.53.0
+197
View File
@@ -0,0 +1,197 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%global rocm_release 7.1
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
%global toolchain clang
%bcond build_test 0
%if %{with build_test}
%global cmake_test ON
%else
%global cmake_test OFF
%endif
%global tensile_version 4.33.0
# The upstream hipBLASTLt project has a hard fork of the python-tensile package
# The rocBLAS uses. The two versions are incompatible. It appears that the
# fork happened around version 4.33.0. Unfortunately hipBLASLt can no longer be
# build without using this fork.
# https://github.com/ROCm/hipBLASLt/issues/535
# The problem with the fork has been raised here.
# https://github.com/ROCm/hipBLASLt/issues/908
%global tensile_verbose 1
Name: hipblaslt
Version: %{rocm_version}
Release: %autorelease
Summary: ROCm general matrix operations beyond BLAS
License: MIT AND BSD-3-Clause
URL: https://github.com/ROCm/rocm-libraries
#!RemoteAsset: sha256:05d73038b1b4f66f3df4eb595b7cb0c8935f7aa18d0e07dbe5cc740a4b691898
Source0: %{url}/releases/download/rocm-%{version}/%{name}.tar.gz
BuildSystem: cmake
BuildOption(conf): -DBUILD_CLIENTS_TESTS=%{cmake_test}
BuildOption(conf): -DCMAKE_VERBOSE_MAKEFILE=ON
BuildOption(conf): -DGPU_TARGETS=%{rocm_gpu_list_default}
BuildOption(conf): -DHIPBLASLT_ENABLE_CLIENT=%{cmake_test}
BuildOption(conf): -DHIPBLASLT_ENABLE_MARKER=OFF
BuildOption(conf): -DHIPBLASLT_ENABLE_OPENMP=OFF
BuildOption(conf): -DHIPBLASLT_ENABLE_ROCROLLER=OFF
BuildOption(conf): -DHIPBLASLT_ENABLE_SAMPLES=OFF
BuildOption(conf): -DTensile_LIBRARY_FORMAT=msgpack
BuildOption(conf): -DTensile_VERBOSE=%{tensile_verbose}
BuildOption(conf): -DVIRTUALENV_BIN_DIR=%{_bindir}
BuildOption(conf): -Dnanobind_ROOT=%(python3 -m nanobind --cmake_dir)
BuildOption(conf): -G Ninja
# yappi is used in tensilelite to generate profiling data, we are not using that in the build
Patch0: 0001-hipblaslt-tensilelite-remove-yappi-dependency.patch
# Patch from Fedora, change hard coded vendor paths
Patch1: 0001-hipblaslt-tensilelite-use-system-paths.patch
# https://github.com/ROCm/rocm-libraries/issues/2422
Patch2: 0001-hipblaslt-find-origami-package.patch
# use the distribution-provided nanobind instead of fetching/bundling it
Patch3: 2001-hipblaslt-tensilelite-use-system-nanobind.patch
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(hip)
BuildRequires: cmake(hipblas)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(msgpack)
BuildRequires: cmake(origami)
BuildRequires: cmake(rocblas)
BuildRequires: cmake(rocm_smi)
BuildRequires: compiler-rt
BuildRequires: gcc-fortran
BuildRequires: hipcc
BuildRequires: lld
BuildRequires: llvm
BuildRequires: ninja
BuildRequires: pkgconfig(libzstd)
BuildRequires: pkgconfig(python3)
BuildRequires: pkgconfig(zlib)
# https://github.com/ROCm/hipBLASLt/issues/1734
BuildRequires: python3dist(msgpack)
# nanobind is used to build the rocisa native module (build-time only)
BuildRequires: python3dist(nanobind)
BuildRequires: python3dist(setuptools)
BuildRequires: python3dist(pyyaml)
BuildRequires: python3dist(joblib)
BuildRequires: rocm-cmake
BuildRequires: rocm-device-libs
BuildRequires: rocm-llvm-macros
BuildRequires: rocminfo
%if %{with build_test}
BuildRequires: cmake(openblas)
BuildRequires: cmake(GMock)
BuildRequires: cmake(GTest)
%endif
%description
hipBLASLt is a library that provides general matrix-matrix
operations. It has a flexible API that extends functionalities
beyond a traditional BLAS library, such as adding flexibility
to matrix data layouts, input types, compute types, and
algorithmic implementations and heuristics.
%package devel
Summary: Libraries and headers for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description devel
%{summary}
%if %{with build_test}
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%endif
%prep -a
# Use PATH to find where TensileGetPath and other tensile bins are
sed -i -e 's@${Tensile_PREFIX}/bin/TensileGetPath@TensileGetPath@g' tensilelite/Tensile/cmake/TensileConfig.cmake
# defer to cmdline
sed -i -e 's@set(CMAKE_INSTALL_LIBDIR@#set(CMAKE_INSTALL_LIBDIR@' CMakeLists.txt
# Do not use virtualenv_install
sed -i -e 's@virtualenv_install@#virtualenv_install@' CMakeLists.txt
# Disable trying to download rocm-cmake
sed -i -e 's@if(NOT ROCmCMakeBuildTools_FOUND)@if(FALSE)@' cmake/dependencies.cmake
# HIPBLASLT_ENABLE_OPENMP is OFF yet it is still being used
# https://github.com/ROCm/rocm-libraries/issues/3201
sed -i -e '/OpenMP::OpenMP_CXX/d' clients/CMakeLists.txt
sed -i -e '/omp/d' clients/common/src/blis_interface.cpp
sed -i -e '/#include <omp.h>/d' clients/common/include/testing_matmul.hpp
sed -i -e '/#include <omp.h>/d' clients/common/include/hipblaslt_init.hpp
sed -i -e '/#include <omp.h>/d' clients/common/src/cblas_interface.cpp
# We are building from a tarball, not a git repo
sed -i -e 's@find_package(Git REQUIRED)@#find_package(Git REQUIRED)@' cmake/dependencies.cmake
# Forcefully replace all mentions of 'amdclang' with 'clang' in the Tensile Python files
find tensilelite -type f -name "*.py" -exec sed -i 's/amdclang++/clang++/g; s/amdclang/clang/g' {} +
%build -p
# Do a manual install instead of cmake's virtualenv
cd tensilelite
TL=$PWD
python3 setup.py install --root $TL
cd ..
# Should not have to do this
CLANG_PATH=`hipconfig --hipclangpath`
ROCM_CLANG=${CLANG_PATH}/clang
RESOURCE_DIR=`${ROCM_CLANG} -print-resource-dir`
export DEVICE_LIB_PATH=${RESOURCE_DIR}/amdgcn/bitcode
export TENSILE_ROCM_ASSEMBLER_PATH=${CLANG_PATH}/clang++
export TENSILE_ROCM_OFFLOAD_BUNDLER_PATH=${CLANG_PATH}/clang-offload-bundler
# Look for the just built tensilelite
export PATH=${TL}/%{_bindir}:$PATH
export PYTHONPATH=${TL}%{python3_sitelib}:$PYTHONPATH
export Tensile_DIR=${TL}%{python3_sitelib}/Tensile
%install -a
rm -f %{buildroot}%{_datadir}/doc/hipblaslt/LICENSE.md
%files
%doc README.md
%license LICENSE.md
%{_libdir}/libhipblaslt.so.*
%{_libdir}/hipblaslt/
%files devel
%{_includedir}/hipblaslt/
%{_includedir}/hipblaslt-export.h
%{_includedir}/hipblaslt-version.h
%{_libdir}/cmake/hipblaslt/
%{_libdir}/libhipblaslt.so
%if %{with build_test}
%files test
%{_bindir}/hipblaslt*
%{_bindir}/sequence.yaml
%endif
%changelog
%autochangelog
+84
View File
@@ -0,0 +1,84 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%bcond test 0
%if %{with test}
%global build_test ON
%else
%global build_test OFF
%endif
%global toolchain clang
%global rocm_release 7.1
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
Name: hipcub
Version: %{rocm_version}
Release: %autorelease
Summary: ROCm port of CUDA CUB (header-only)
License: BSD-3-Clause AND MIT
Url: https://github.com/ROCm/rocm-libraries
#!RemoteAsset: sha256:6dadbb7689c7906493ec42f56792d9557f0293670a86059c9c188851f399647b
Source: %{url}/releases/download/rocm-%{version}/hipcub.tar.gz
BuildSystem: cmake
BuildOption(conf): -G Ninja
BuildOption(conf): -DGPU_TARGETS=%{rocm_gpu_list_default}
BuildOption(conf): -DBUILD_TEST=%{build_test}
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(hip)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(rocprim)
BuildRequires: compiler-rt
BuildRequires: lld
BuildRequires: llvm
BuildRequires: ninja
BuildRequires: rocm-cmake
BuildRequires: rocm-llvm-macros
%description
hipCUB is a thin header-only wrapper library on top of rocPRIM which enables
developers to render portable HIP code. Existing CUDA CUB source code can
be recompiled in HIP using hipCUB.
%if %{with test}
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%endif
%prep -a
# Fix cmake install lib directory
sed -i -e 's/ROCM_INSTALL_LIBDIR lib/ROCM_INSTALL_LIBDIR %{_lib}/' \
cmake/ROCMExportTargetsHeaderOnly.cmake
%install -a
rm -f %{buildroot}/%{_datadir}/doc/hipcub/LICENSE.txt
%files
%doc README.md
%license LICENSE.txt
%{_includedir}/hipcub/
%{_libdir}/cmake/hipcub/
%if %{with test}
%files test
%{_bindir}/test_*
%{_bindir}/hipcub/
%endif
%changelog
%autochangelog
@@ -0,0 +1,24 @@
From cfa8e85698486f791008fcade4ec2dff8ddd99d9 Mon Sep 17 00:00:00 2001
From: Tom Rix <Tom.Rix@amd.com>
Date: Fri, 31 Oct 2025 09:10:07 -0700
Subject: [PATCH] hipfft hipfftw soversion
---
library/CMakeLists.txt | 1 +
1 file changed, 1 insertion(+)
diff --git a/library/CMakeLists.txt b/library/CMakeLists.txt
index 8c97cc4eeb97..97cacb3451d3 100644
--- a/library/CMakeLists.txt
+++ b/library/CMakeLists.txt
@@ -164,6 +164,7 @@ endif()
# nvcc can not recognize shared library file name with suffix other than *.so when linking.
if (NOT BUILD_WITH_COMPILER STREQUAL "HIP-NVCC")
rocm_set_soversion(hipfft ${hipfft_SOVERSION})
+ rocm_set_soversion(hipfftw ${hipfft_SOVERSION})
endif()
# Generate export header
--
2.51.0
+105
View File
@@ -0,0 +1,105 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%bcond test 1
%global rocm_release 7.1
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
# rocm stack builds with clang
%global toolchain clang
Name: hipfft
Version: %{rocm_version}
Release: %autorelease
Summary: ROCm FFT marshalling library
Url: https://github.com/ROCm/rocm-libraries
VCS: git:https://github.com/ROCm/hipFFT.git
License: MIT
#!RemoteAsset: sha256:f6f0352b5f9ffe53c88cea5fa40572eef0c0c1e2e50dce6f85d2c68e47afc63e
Source: %{url}/releases/download/rocm-%{version}/hipfft.tar.gz
Patch1: 0001-hipfft-hipfftw-soversion.patch
BuildSystem: cmake
BuildOption(conf): -G Ninja
BuildOption(conf): -DGPU_TARGETS=%{rocm_gpu_list_default}
BuildOption(conf): -DBUILD_CLIENTS_TESTS=ON
BuildOption(conf): -DBUILD_CLIENTS_TESTS_OPENMP=OFF
BuildRequires: boost-devel
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(GTest)
BuildRequires: cmake(hip)
BuildRequires: cmake(hiprand)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(rocfft)
BuildRequires: cmake(rocrand)
BuildRequires: compiler-rt
BuildRequires: lld
BuildRequires: llvm
BuildRequires: ninja
BuildRequires: pkgconfig(fftw3)
BuildRequires: rocm-cmake
BuildRequires: rocm-device-libs
BuildRequires: rocm-llvm-macros
%description
hipFFT is a FFT marshalling library. Currently, hipFFT supports either
rocFFT or cuFFT as backends. hipFFT exports an interface that does not
require the client to change, regardless of the chosen backend.
%package devel
Summary: The hipFFT development package
Requires: %{name}%{?_isa} = %{version}-%{release}
Requires: cmake(rocfft)
%description devel
The hipFFT development package.
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%prep -a
# CMake Error at clients/tests/CMakeLists.txt:87 (find_package):
# No "FindHIP.cmake" found in CMAKE_MODULE_PATH.
# Remove MODULE
sed -i -e 's@find_package( HIP MODULE REQUIRED )@find_package( HIP REQUIRED )@' \
clients/tests/CMakeLists.txt
%install -a
rm -f %{buildroot}/%{_datadir}/doc/hipfft/LICENSE.md
%if %{with test}
%check -p
export LD_LIBRARY_PATH=$PWD/%{__cmake_builddir}/library:$LD_LIBRARY_PATH
%endif
%files
%doc README.md
%license LICENSE.md
%{_libdir}/libhipfft.so.0{,.*}
%{_libdir}/libhipfftw.so.0{,.*}
%files devel
%{_includedir}/hipfft/
%{_libdir}/cmake/hipfft/
%{_libdir}/libhipfft.so
%{_libdir}/libhipfftw.so
%files test
%{_bindir}/hipfft-test
%changelog
%autochangelog
+102
View File
@@ -0,0 +1,102 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
# HIP error 100: no ROCm-capable device is detected
# hipRAND needs a GPU to run tests, but we could still
# keep the test cases for packagers who have a GPU, so make it optional.
%bcond run_test 0
%global rocm_release 7.1
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
# rocm stack builds with clang
%global toolchain clang
Name: hiprand
Version: %{rocm_version}
Release: %autorelease
Summary: HIP random number generator
License: MIT AND BSD-3-Clause
Url: https://github.com/ROCm/rocm-libraries
#!RemoteAsset: sha256:41e4053a3c16ea4bdc6e94fff428d8ffe7279e9cfa7ec142afc50169aae2c1f8
Source: %{url}/releases/download/rocm-%{version}/hiprand.tar.gz
BuildSystem: cmake
BuildOption(conf): -G Ninja
BuildOption(conf): -DCMAKE_VERBOSE_MAKEFILE=ON
BuildOption(conf): -DAMDGPU_TARGETS=%{rocm_gpu_list_default}
BuildOption(conf): -DBUILD_TEST=ON
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(GTest)
BuildRequires: cmake(hip)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(rocrand)
BuildRequires: compiler-rt
BuildRequires: lld
BuildRequires: llvm
BuildRequires: ninja
BuildRequires: rocm-cmake
BuildRequires: rocm-device-libs
BuildRequires: rocm-llvm-macros
%description
hipRAND is a RAND marshalling library, with multiple supported backends. It
sits between the application and the backend RAND library, marshalling inputs
into the backend and results back to the application. hipRAND exports an
interface that does not require the client to change, regardless of the chosen
backend. Currently, hipRAND supports either rocRAND or cuRAND.
%package devel
Summary: The hipRAND development package
Requires: %{name}%{?_isa} = %{version}-%{release}
Requires: cmake(rocrand)
%description devel
The hipRAND development package.
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%prep -a
# Remove RPATH
sed -i '/INSTALL_RPATH/d' CMakeLists.txt
%install -a
rm -f %{buildroot}%{_datadir}/doc/hiprand/LICENSE.md
rm -f %{buildroot}%{_bindir}/hipRAND/CTestTestfile.cmake
%check -p
export LD_LIBRARY_PATH=$PWD/%{__cmake_builddir}/library:$LD_LIBRARY_PATH
%if %{without run_test}
%check
%endif
%files
%doc README.md
%license LICENSE.md
%{_libdir}/libhiprand.so.1{,.*}
%files devel
%{_includedir}/hiprand/
%{_libdir}/cmake/hiprand/
%{_libdir}/libhiprand.so
%files test
%{_bindir}/test*
%changelog
%autochangelog
@@ -0,0 +1,24 @@
From 2d228ee883a538d67af24e944a2a24a95d779345 Mon Sep 17 00:00:00 2001
From: Tom Rix <Tom.Rix@amd.com>
Date: Sun, 21 Sep 2025 08:37:33 -0700
Subject: [PATCH] hipsolver so version fortran bindings
---
library/src/CMakeLists.txt | 1 +
1 file changed, 1 insertion(+)
diff --git a/library/src/CMakeLists.txt b/library/src/CMakeLists.txt
index a4e8649cdcae..69437d603d7a 100644
--- a/library/src/CMakeLists.txt
+++ b/library/src/CMakeLists.txt
@@ -75,6 +75,7 @@ set(hipsolver_f90_source
if(BUILD_FORTRAN_BINDINGS)
# Create hipSOLVER Fortran module
add_library(hipsolver_fortran ${hipsolver_f90_source})
+ rocm_set_soversion(hipsolver_fortran ${hipsolver_SOVERSION})
rocm_install(TARGETS hipsolver_fortran)
endif()
--
2.51.0
+124
View File
@@ -0,0 +1,124 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
# TODO: hipSOLVER need lapack to build test/benchmark/sample
# But openblas on openRuyi does not provide this
%bcond build_test 0
%if %{with build_test}
%global cmake_test ON
%else
%global cmake_test OFF
%endif
# hipSOLVER needs a GPU to run tests, but we could still
# keep the test cases for packagers who have a GPU.
%bcond run_test 0
%global rocm_release 7.1
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
# rocm stack builds with clang
%global toolchain clang
# Fortran is only used in testing
# clang and gfortran fedora toolchain args do not mix
%global build_fflags %{nil}
Name: hipsolver
Version: %{rocm_version}
Release: %autorelease
Summary: ROCm SOLVER marshalling library (LAPACK)
License: MIT
Url: https://github.com/ROCm/hipSOLVER
#!RemoteAsset: sha256:bd664e3cd43bfcc7e94d5a387c27262c4b218d6d2e71e086992b174349dd1c10
Source: %{url}/archive/rocm-%{version}.tar.gz
BuildSystem: cmake
BuildOption(conf): -G Ninja
BuildOption(conf): -DBUILD_CLIENTS_TESTS=%{cmake_test}
BuildOption(conf): -DBUILD_CLIENTS_BENCHMARKS=%{cmake_test}
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(hip)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(rocblas)
BuildRequires: cmake(rocsolver)
BuildRequires: cmake(rocsparse)
BuildRequires: compiler-rt
BuildRequires: gcc-fortran
BuildRequires: lld
BuildRequires: llvm
BuildRequires: ninja
BuildRequires: rocm-cmake
BuildRequires: rocm-device-libs
BuildRequires: rocm-llvm-macros
%if %{with build_test}
BuildRequires: cmake(GTest)
BuildRequires: cmake(hipsparse)
BuildRequires: pkgconfig(openblas)
%endif
%description
hipSOLVER is a LAPACK marshalling library, with multiple supported backends.
It sits between the application and a "worker" SOLVER library, marshalling
inputs into the backend library and results back to the application. hipSOLVER
exports an interface that does not require the client to change, regardless of
the chosen backend.
%package devel
Summary: The hipSOLVER development package
Requires: %{name}%{?_isa} = %{version}-%{release}
Requires: cmake(rocblas)
Requires: cmake(rocsolver)
Requires: cmake(rocsparse)
%description devel
The hipSOLVER development package.
%if %{with build_test}
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%endif
%install -a
rm -f %{buildroot}%{_datadir}/doc/hipsolver/LICENSE.md
%check -p
export LD_LIBRARY_PATH=$PWD/%{__cmake_builddir}/library:$LD_LIBRARY_PATH
%if %{without test}
%check
%endif
%files
%doc README.md
%license LICENSE.md
%{_libdir}/libhipsolver.so.1{,.*}
%{_libdir}/libhipsolver_fortran.so.1{,.*}
%files devel
%{_includedir}/hipsolver/
%{_libdir}/libhipsolver.so
%{_libdir}/libhipsolver_fortran.so
%{_libdir}/cmake/hipsolver/
%if %{with build_test}
%files test
%{_datadir}/hipsolver/
%{_bindir}/hipsolver*
%endif
%changelog
%autochangelog
+128
View File
@@ -0,0 +1,128 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
# hipSPARSE need to download about 19 testing matrix
# It is verbose to add them to SOURCE and %%prep section
%bcond build_test 0
%if %{with build_test}
%global cmake_test ON
%else
%global cmake_test OFF
%endif
# hipSPARSE needs a GPU to run tests, but we could still
# keep the test cases for packagers who have a GPU.
%bcond run_test 0
# This ROCm package is built with clang by default
%global toolchain clang
%global rocm_release 7.1
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
Name: hipsparse
Version: %{rocm_version}
Release: %autorelease
Summary: ROCm SPARSE marshalling library
License: MIT
Url: https://github.com/ROCm/hipSPARSE
#!RemoteAsset: sha256:b001834d8e65c3878d1a69d08803d5b6ce4fe623e78099fe51cb146d0ffa10e7
Source: %{url}/archive/rocm-%{version}.tar.gz
BuildSystem: cmake
BuildOption(conf): -G Ninja
BuildOption(conf): -DCMAKE_VERBOSE_MAKEFILE=ON
BuildOption(conf): -DGPU_TARGETS=%{rocm_gpu_list_default}
BuildOption(conf): -DBUILD_CLIENTS_SAMPLES=OFF
BuildOption(conf): -DBUILD_CLIENTS_BENCHMARKS=ON
BuildOption(conf): -DBUILD_CLIENTS_TESTS=%{cmake_test}
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
%if %{with build_test}
BuildRequires: cmake(GTest)
%endif
BuildRequires: cmake(hip)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(rocprim)
BuildRequires: cmake(rocsparse)
BuildRequires: compiler-rt
BuildRequires: gcc-fortran
BuildRequires: lld
BuildRequires: llvm
BuildRequires: ninja
BuildRequires: rocm-cmake
BuildRequires: rocm-device-libs
BuildRequires: rocm-llvm-macros
%description
hipSPARSE is a SPARSE marshalling library with multiple
supported backends. It sits between your application and
a 'worker' SPARSE library, where it marshals inputs to
the backend library and marshals results to your
application. hipSPARSE exports an interface that doesn't
require the client to change, regardless of the chosen
backend. Currently, hipSPARSE supports rocSPARSE and
cuSPARSE backends.
%package benchmark
Summary: Benchmark for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description benchmark
%{summary}
%package devel
Summary: Libraries and headers for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description devel
%{summary}
%if %{with build_test}
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%endif
%install -a
rm -f %{buildroot}%{_datadir}/doc/hipsparse/LICENSE.md
%check -p
export LD_LIBRARY_PATH=$PWD/%{__cmake_builddir}/library:$LD_LIBRARY_PATH
%if %{without run_test}
%check
%endif
%files
%doc README.md
%license LICENSE.md
%{_libdir}/libhipsparse.so.4{,.*}
%files benchmark
%{_bindir}/hipsparse-bench
%files devel
%{_includedir}/hipsparse/
%{_libdir}/cmake/hipsparse/
%{_libdir}/libhipsparse.so
%if %{with build_test}
%files test
%{_bindir}/hipsparse*
%{_datadir}/hipsparse/
%endif
%changelog
%autochangelog
@@ -0,0 +1,25 @@
From 43c4a61c5d8836a16feb8e53c72f255790523ff3 Mon Sep 17 00:00:00 2001
From: Tom Rix <Tom.Rix@amd.com>
Date: Mon, 3 Nov 2025 06:11:40 -0800
Subject: [PATCH] hipblaslt find origami package
---
CMakeLists.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/CMakeLists.txt b/CMakeLists.txt
index dbccca92c84f..d02df50540c2 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -218,7 +218,7 @@ if(HIPBLASLT_ENABLE_MSGPACK)
endif()
endif()
-add_subdirectory("${CMAKE_CURRENT_SOURCE_DIR}/../../shared/origami" origami)
+find_package(origami CONFIG REQUIRED)
add_subdirectory(tensilelite)
if(HIPBLASLT_ENABLE_HOST)
--
2.52.0
@@ -0,0 +1,57 @@
From c20b846b6d594464eccf865045ef0ef10384f407 Mon Sep 17 00:00:00 2001
From: Tom Rix <Tom.Rix@amd.com>
Date: Thu, 25 Sep 2025 13:02:55 -0700
Subject: [PATCH] hipblaslt tensilelite remove yappi dependency
Signed-off-by: Tom Rix <Tom.Rix@amd.com>
---
tensilelite/Tensile/TensileCreateLibrary/Run.py | 15 ---------------
tensilelite/requirements.txt | 2 +-
2 files changed, 1 insertion(+), 16 deletions(-)
diff --git a/tensilelite/Tensile/TensileCreateLibrary/Run.py b/tensilelite/Tensile/TensileCreateLibrary/Run.py
index f0bbe8acd127..fc076e6935e8 100644
--- a/tensilelite/Tensile/TensileCreateLibrary/Run.py
+++ b/tensilelite/Tensile/TensileCreateLibrary/Run.py
@@ -231,12 +231,6 @@ def writeSolutionsAndKernels(
generateSourcesAndExit=False,
compress=True,
):
- if globalParameters["PythonProfile"]:
- globalParameters["CpuThreads"] = 0
- printWarning("Python profiling is enabled. CpuThreads set to 0.")
- import yappi
- yappi.start()
-
codeObjectFiles = []
outputPath = Path(outputPath)
@@ -299,15 +293,6 @@ def writeSolutionsAndKernels(
writeHelpers(outputPath, kernelHelperObjs, KERNEL_HELPER_FILENAME_CPP, KERNEL_HELPER_FILENAME_H)
srcKernelFile = Path(outputPath) / "Kernels.cpp"
- if globalParameters["PythonProfile"]:
- yappi.stop()
- yappi.get_func_stats().save("yappi_results.profile", type="callgrind")
- with open("yappi_results.txt", "w") as f:
- yappi.get_func_stats().print_all(out=f)
- if globalParameters["CpuThreads"] != 0:
- with open("yappi_thread_stats.txt", "w") as f:
- yappi.get_thread_stats().print_all(out=f)
-
if not generateSourcesAndExit:
codeObjectFiles += buildAssemblyCodeObjectFiles(
asmToolchain.linker,
diff --git a/tensilelite/requirements.txt b/tensilelite/requirements.txt
index 60c4c1144537..e87db8445411 100644
--- a/tensilelite/requirements.txt
+++ b/tensilelite/requirements.txt
@@ -7,4 +7,4 @@ joblib>=1.1.1; python_version < '3.8'
simplejson
ujson
orjson
-yappi
+
--
2.51.0
@@ -0,0 +1,16 @@
--- ./hipBLASLt/tensilelite/Tensile/Toolchain/Validators.py 2025-11-14 05:43:51
+++ ./hipBLASLt/tensilelite/Tensile/Toolchain/Validators.py.mod 2026-03-04 16:10:09
@@ -114,11 +114,11 @@
class ToolchainDefaults(NamedTuple):
- CXX_COMPILER = osSelect(linux="amdclang++", windows="clang++.exe")
- C_COMPILER = osSelect(linux="amdclang", windows="clang.exe")
+ CXX_COMPILER = osSelect(linux="clang++", windows="clang++.exe")
+ C_COMPILER = osSelect(linux="clang", windows="clang.exe")
OFFLOAD_BUNDLER = osSelect(linux="clang-offload-bundler", windows="clang-offload-bundler.exe")
DEVICE_ENUMERATOR = osSelect(linux="rocm_agent_enumerator" if isRhel8() else "amdgpu-arch", windows="hipinfo")
- ASSEMBLER = osSelect(linux="amdclang++", windows="clang++.exe")
+ ASSEMBLER = osSelect(linux="clang++", windows="clang++.exe")
HIP_CONFIG = osSelect(linux="hipconfig", windows="hipconfig.exe")
@@ -0,0 +1,41 @@
From 1ac117ac0591a0f1bb67c34f537354c21412b2d8 Mon Sep 17 00:00:00 2001
From: Tom Rix <Tom.Rix@amd.com>
Date: Sat, 1 Nov 2025 09:43:58 -0700
Subject: [PATCH] hipblaslt tensilelite use fedora paths
---
tensilelite/Tensile/Common/GlobalParameters.py | 2 +-
tensilelite/Tensile/Toolchain/Validators.py | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tensilelite/Tensile/Common/GlobalParameters.py b/tensilelite/Tensile/Common/GlobalParameters.py
index 567188da59bd..1f8037c183a6 100644
--- a/tensilelite/Tensile/Common/GlobalParameters.py
+++ b/tensilelite/Tensile/Common/GlobalParameters.py
@@ -538,7 +538,7 @@ def assignGlobalParameters(config, isaInfoMap: Dict[IsaVersion, IsaInfo]):
else:
print2(" %24s: %8s (unspecified)" % (key, defaultValue))
- globalParameters["ROCmPath"] = "/opt/rocm"
+ globalParameters["ROCmPath"] = "/usr"
if "ROCM_PATH" in os.environ:
globalParameters["ROCmPath"] = os.environ.get("ROCM_PATH")
if "TENSILE_ROCM_PATH" in os.environ:
diff --git a/tensilelite/Tensile/Toolchain/Validators.py b/tensilelite/Tensile/Toolchain/Validators.py
index fd5dab5324c0..3ce024d31f52 100644
--- a/tensilelite/Tensile/Toolchain/Validators.py
+++ b/tensilelite/Tensile/Toolchain/Validators.py
@@ -30,8 +30,8 @@ from typing import List, NamedTuple, Union
from Tensile.Common.Utilities import isRhel8
-DEFAULT_ROCM_BIN_PATH_POSIX = Path("/opt/rocm/bin")
-DEFAULT_ROCM_LLVM_BIN_PATH_POSIX = Path("/opt/rocm/lib/llvm/bin")
+DEFAULT_ROCM_BIN_PATH_POSIX = Path("/usr/bin")
+DEFAULT_ROCM_LLVM_BIN_PATH_POSIX = Path("/usr/lib64/rocm/llvm/bin")
DEFAULT_ROCM_BIN_PATH_WINDOWS = Path("C:/Program Files/AMD/ROCm")
--
2.52.0
+205
View File
@@ -0,0 +1,205 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%global rocm_release 7.1
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
%global toolchain clang
%global tensile_version 4.33.0
%global tensile_verbose 1
%bcond build_test 0
%if %{with build_test}
%global cmake_test ON
%else
%global cmake_test OFF
%endif
Name: hipsparselt
Version: %{rocm_version}
Release: %autorelease
Summary: A SPARSE marshaling library
License: MIT
URL: https://github.com/ROCm/rocm-libraries
#!RemoteAsset: sha256:7672d1ac94d2694999b6937d19f5e92e67fb844eea394b4e8525c531fd1acd8c
Source0: %{url}/releases/download/rocm-%{version}/%{name}.tar.gz
#!RemoteAsset: sha256:05d73038b1b4f66f3df4eb595b7cb0c8935f7aa18d0e07dbe5cc740a4b691898
Source1: %{url}/releases/download/rocm-%{version}/hipblaslt.tar.gz
# Patches for hipBLASLt's tensilelite (applied during prep inside hipBLASLt/)
Source2: 0001-hipblaslt-tensilelite-remove-yappi-dependency.patch
Source3: 0001-hipblaslt-tensilelite-use-system-paths.patch
Source4: 0001-hipblaslt-find-origami-package.patch
BuildSystem: cmake
BuildOption(conf): -DBLAS_INCLUDE_DIR=%{_includedir}/flexiblas
BuildOption(conf): -DBUILD_CLIENTS_TESTS=%{cmake_test}
BuildOption(conf): -DBUILD_FILE_REORG_BACKWARD_COMPATIBILITY=OFF
BuildOption(conf): -DBUILD_VERBOSE=ON
BuildOption(conf): -DCMAKE_Fortran_COMPILER=gcc-fortran
BuildOption(conf): -DCMAKE_VERBOSE_MAKEFILE=ON
BuildOption(conf): -DGPU_TARGETS=%{rocm_gpu_list_default}
BuildOption(conf): -DTensile_COMPILER=clang++
BuildOption(conf): -DTensile_LIBRARY_FORMAT=msgpack
BuildOption(conf): -DTensile_VERBOSE=%{tensile_verbose}
BuildOption(conf): -DVIRTUALENV_BIN_DIR=%{_bindir}
BuildOption(conf): -Dnanobind_ROOT=%(python3 -m nanobind --cmake_dir)
BuildOption(conf): -G Ninja
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(hip)
BuildRequires: cmake(hipsparse)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(origami)
BuildRequires: cmake(rocm_smi)
BuildRequires: cmake(rocsparse)
BuildRequires: compiler-rt
BuildRequires: gcc-fortran
BuildRequires: lld
BuildRequires: llvm
BuildRequires: ninja
BuildRequires: pkgconfig(libzstd)
BuildRequires: pkgconfig(msgpack)
BuildRequires: pkgconfig(python3)
BuildRequires: pkgconfig(zlib)
BuildRequires: python3dist(joblib)
BuildRequires: python3dist(msgpack)
# nanobind is used to build the rocisa native module (build-time only)
BuildRequires: python3dist(nanobind)
BuildRequires: python3dist(pyyaml)
BuildRequires: python3dist(setuptools)
BuildRequires: rocm-cmake
BuildRequires: rocm-device-libs
BuildRequires: rocminfo
BuildRequires: rocm-llvm-macros
BuildRequires: roctracer-devel
%if %{with build_test}
BuildRequires: chrpath
BuildRequires: pkgconfig(openblas)
BuildRequires: pkgconfig(gtest)
BuildRequires: pkgconfig(gmock)
%endif
%description
hipSPARSELt is a SPARSE marshaling library that provides general sparse
matrix-matrix multiplication using structured sparsity. It offers a flexible
API and supports multiple backends.
%package devel
Summary: The hipSPARSELt development package
Requires: %{name}%{?_isa} = %{version}-%{release}
%description devel
The hipSPARSELt development package.
%if %{with build_test}
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%endif
%prep
%autosetup -p1 -n %{name}
tar xf %{SOURCE1}
cd hipblaslt
patch -p1 < %{SOURCE2}
patch -p1 < %{SOURCE3}
patch -p1 < %{SOURCE4}
# Use PATH to find where TensileGetPath and other tensile bins are
sed -i -e 's@${Tensile_PREFIX}/bin/TensileGetPath@TensileGetPath@g' \
tensilelite/Tensile/cmake/TensileConfig.cmake
# Make sure hip/hip_runtime.h is found
sed -i -e 's@-x hip @-I%{_includedir} -x hip @' device-library/matrix-transform/CMakeLists.txt
sed -i -e 's@"-D__HIP_HCC_COMPAT_MODE__=1"@"-D__HIP_HCC_COMPAT_MODE__=1","-I%{_includedir}"@' \
tensilelite/Tensile/Toolchain/Component.py
# Use the distribution-provided nanobind instead of fetching/bundling it
sed -i -e 's@FetchContent_MakeAvailable(nanobind)@find_package(nanobind CONFIG REQUIRED)@' \
tensilelite/rocisa/CMakeLists.txt
# disable openmp in hipBLASLt
sed -i -e 's@option(HIPBLASLT_ENABLE_OPENMP "Use OpenMP to improve performance." ON)@option(HIPBLASLT_ENABLE_OPENMP "Use OpenMP to improve performance." OFF)@' CMakeLists.txt
cd ..
# Point hipBLASLt path at the bundled in-source copy (default looks in ../hipblaslt)
sed -i -e 's@${CMAKE_CURRENT_SOURCE_DIR}/../hipblaslt@${CMAKE_CURRENT_SOURCE_DIR}/hipblaslt@' CMakeLists.txt
# Prevent the virtualenv install from cmake
sed -i -e 's@virtualenv_install@#virtualenv_install@' CMakeLists.txt
# Unforce the setting of libdir
sed -i -e 's@set(CMAKE_INSTALL_LIBDIR@#set(CMAKE_INSTALL_LIBDIR@' CMakeLists.txt
# Change looking for cblas to flexiblas
sed -i -e 's@find_package( cblas REQUIRED CONFIG )@#find_package( cblas REQUIRED CONFIG )@' clients/CMakeLists.txt
sed -i -e 's@set( BLAS_LIBRARY "blas" )@set( BLAS_LIBRARY "flexiblas" )@' clients/CMakeLists.txt
sed -i -e 's@lapack cblas@flexiblas@' clients/gtest/CMakeLists.txt
# We are building from a tarball, not a git repo
sed -i -e 's@find_package(Git REQUIRED)@#find_package(Git REQUIRED)@' hipblaslt/cmake/dependencies.cmake
sed -i -e 's@find_package(Git REQUIRED)@#find_package(Git REQUIRED)@' cmake/Dependencies.cmake
# Replace all mentions of 'amdclang' with 'clang' in Tensile Python files
find hipblaslt/tensilelite -type f -name "*.py" -exec sed -i 's/amdclang++/clang++/g; s/amdclang/clang/g' {} +
%build -p
# Do a manual install of tensilelite instead of cmake's virtualenv, then point
# Tensile at it for build-time kernel generation (same approach as hipblaslt)
cd hipblaslt/tensilelite
TL=$PWD
python3 setup.py install --root $TL
cd ../..
export PATH=%{_prefix}/bin:%{rocmllvm_bindir}:$PATH
CLANG_PATH=`hipconfig --hipclangpath`
ROCM_CLANG=${CLANG_PATH}/clang
RESOURCE_DIR=`${ROCM_CLANG} -print-resource-dir`
export DEVICE_LIB_PATH=${RESOURCE_DIR}/amdgcn/bitcode
export TENSILE_ROCM_ASSEMBLER_PATH=${CLANG_PATH}/clang++
export TENSILE_ROCM_OFFLOAD_BUNDLER_PATH=${CLANG_PATH}/clang-offload-bundler
export PATH=${TL}/%{_bindir}:$PATH
export PYTHONPATH=${TL}%{python3_sitelib}:$PYTHONPATH
export Tensile_DIR=${TL}%{python3_sitelib}/Tensile
%install -a
rm -f %{buildroot}%{_datadir}/doc/hipsparselt/LICENSE.md
# Strip and fix permissions on hsaco kernel files
%{rocmllvm_bindir}/llvm-strip %{buildroot}%{_libdir}/hipsparselt/library/Kernels*.hsaco
chmod a+x %{buildroot}%{_libdir}/hipsparselt/library/Kernels*.hsaco
%files
%doc README.md
%license LICENSE.md
%{_libdir}/libhipsparselt.so.*
%{_libdir}/hipsparselt/
%files devel
%{_includedir}/hipsparselt/
%{_libdir}/cmake/hipsparselt/
%{_libdir}/libhipsparselt.so
%if %{with build_test}
%files test
%{_bindir}/hipsparselt*
%endif
%changelog
%autochangelog
+135
View File
@@ -0,0 +1,135 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
# rocm toolchain uses the hipcc wrapper of clang
%global toolchain clang
%bcond test 0
Name: magma
Version: 2.10.0
Release: %autorelease
Summary: Matrix Algebra on GPU and Multi-core Architectures
License: BSD-3-Clause
Url: https://icl.utk.edu/magma/
VCS: git:https://github.com/icl-utk-edu/magma.git
#!RemoteAsset: sha256:26347adbccbe7a6693d6b3f3c0ab5620037eb3a62b5ef69d05e40289472a82a4
Source0: https://github.com/icl-utk-edu/%{name}/archive/v%{version}.tar.gz
BuildOption(conf): -G Ninja
BuildOption(conf): -DBLA_VENDOR=OpenBLAS
BuildOption(conf): -DAMDGPU_TARGETS=%{rocm_gpu_list_default}
BuildOption(conf): -DMAGMA_ENABLE_HIP=ON
BuildOption(conf): -DUSE_FORTRAN=OFF
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(hip)
BuildRequires: cmake(hipblas)
BuildRequires: cmake(hipsparse)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(openblas)
BuildRequires: compiler-rt
BuildRequires: gcc-c++
BuildRequires: hipcc
BuildRequires: lld
BuildRequires: llvm
BuildRequires: ninja
BuildRequires: python3
BuildRequires: rocm-cmake
BuildRequires: rocm-device-libs
BuildRequires: rocm-llvm-macros
%description
Matrix Algebra on GPU and Multi-core Architectures (MAGMA) is a collection
of next-generation linear algebra libraries for heterogeneous computing.
The MAGMA package supports interfaces for current linear algebra packages
and standards (e.g., LAPACK and BLAS) to enable computational scientists
to easily port any linear algebrareliant software component to
heterogeneous computing systems. MAGMA enables applications to fully
exploit the power of current hybrid systems of many-core CPUs and
multi-GPUs/coprocessors to deliver the fastest possible time to accurate
solutions within given energy constraints.
MAGMA features LAPACK-compliant routines for multi-core CPUs enhanced with
NVIDIA or AMD GPUs. MAGMA 2.7.2 now includes more than 400 routines that
cover one-sided dense matrix factorizations and solvers, two-sided
factorizations, and eigen/singular-value problem solvers, as well as a
subset of highly optimized BLAS for GPUs. A MagmaDNN package has been
added and further enhanced to provide high-performance data analytics,
including functionalities for machine learning applications that use MAGMA
as their computational back end. The MAGMA Sparse and MAGMA Batched
packages have been included since MAGMA 1.6.
%package devel
Summary: Libraries and headers for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description devel
%{summary}
%prep -a
# Add newer gfx targets to Makefile's valid arch whitelist
# https://bitbucket.org/icl/magma/issues/76/a-few-new-rocm-gpus
sed -i -e 's@1032 1033@1032 1033 1100 1101 1102 1103 1150 1151 1152 1153 1200 1201@' Makefile
%if %{with test}
# Remove a test that fails to link (undefined magma_generate_matrix)
sed -i -e '/testing_zgenerate.cpp/d' testing/Makefile.src
%else
# Disable building tests
sed -i -e 's@include_directories( testing )@#include_directories( testing )@' CMakeLists.txt
sed -i -e 's@foreach( filename ${testing_all} )@foreach( filename ${no_testing_all} )@' CMakeLists.txt
sed -i -e 's@add_custom_target( testing DEPENDS ${testing} )@#add_custom_target( testing DEPENDS ${testing} )@' CMakeLists.txt
sed -i -e 's@foreach( TEST ${sparse_testing_all} )@foreach( TEST ${no_sparse_testing_all} )@' CMakeLists.txt
sed -i -e 's@add_custom_target( sparse-testing DEPENDS ${sparse-testing} )@#add_custom_target( sparse-testing DEPENDS ${sparse-testing} )@' CMakeLists.txt
%endif
# Change the bin,lib install locations
sed -i -e 's@DESTINATION lib@DESTINATION ${CMAKE_INSTALL_LIBDIR}@' CMakeLists.txt
sed -i -e 's@DESTINATION bin@DESTINATION ${CMAKE_INSTALL_BINDIR}@' CMakeLists.txt
# python to python3, need env to find local bits like magmasubs.py
sed -i -e 's@env python@env python3@' tools/checklist_run_tests.py
sed -i -e 's@env python@env python3@' tools/check-style.py
sed -i -e 's@env python@env python3@' tools/parse-magma.py
# Remove some files we do not need to similify licenses
# GPL, results for cuda
rm -rf results/*
# ICS, Copy of strlcpy - just use strlcpy
sed -i -e '/strlcpy/d' control/Makefile.src
sed -i -e '/strlcpy/d' include/magma_auxiliary.h
sed -i -e 's@magma_strlcpy@strlcpy@' control/trace.cpp
rm control/strlcpy.cpp
%build -p
echo "BACKEND = hip" > make.inc
echo "FORT = false" >> make.inc
echo "GPU_TARGET = gfx1100;gfx1200;gfx1201" >> make.inc
make generate
%if %{with test}
%check
%{_vpath_builddir}/testing/testing_sgemm
%endif
%files
%license COPYRIGHT
%{_libdir}/libmagma.so{,.*}
%{_libdir}/libmagma_sparse.so{,.*}
%files devel
%{_includedir}/*.h
%{_libdir}/pkgconfig/%{name}.pc
%{_libdir}/libmagma.so
%{_libdir}/libmagma_sparse.so
%changelog
%autochangelog
+147
View File
@@ -0,0 +1,147 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
# Tests require an AMD GPU; keep the bcond for packagers with hardware.
%bcond test 0
%global rocm_release 7.1
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
# rocm stack builds with clang
%global toolchain clang
Name: miopen
Version: %{rocm_version}
Release: %autorelease
Summary: AMD's Machine Intelligence Library
License: MIT AND BSD-2-Clause AND Apache-2.0
Url: https://github.com/ROCm/MIOpen
#!RemoteAsset: sha256:98c72a2b5ca541d6c172facdf0f15729207ab52ca9af36c00e2480c5b27c5b99
Source: %{url}/archive/rocm-%{version}.tar.gz
BuildSystem: cmake
# Adds MIOPEN_PARALLEL_{COMPILE,LINK}_JOBS options to limit Ninja job pools
# and avoid OOM on memory-constrained build hosts (upstream patch)
Patch0: 0001-miopen-add-link-and-compile-pools.patch
BuildOption(conf): -G Ninja
BuildOption(conf): -DGPU_TARGETS=%{rocm_gpu_list_default}
BuildOption(conf): -DBoost_USE_STATIC_LIBS=OFF
BuildOption(conf): -DMIOPEN_BUILD_DRIVER=OFF
BuildOption(conf): -DMIOPEN_ENABLE_AI_IMMED_MODE_FALLBACK=OFF
BuildOption(conf): -DMIOPEN_ENABLE_AI_KERNEL_TUNING=OFF
%if %{with test}
BuildOption(conf): -DBUILD_TESTING=ON
BuildOption(conf): -DMIOPEN_TEST_ALL=ON
%else
BuildOption(conf): -DBUILD_TESTING=OFF
%endif
# Disable optional backends not yet packaged on openRuyi
BuildOption(conf): -DMIOPEN_USE_COMPOSABLEKERNEL=OFF
BuildOption(conf): -DMIOPEN_USE_MLIR=OFF
BuildRequires: boost-devel
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(hip)
BuildRequires: cmake(hipblaslt)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(rocblas)
BuildRequires: cmake(rocrand)
%if %{with test}
BuildRequires: cmake(GTest)
%endif
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: compiler-rt
BuildRequires: half
BuildRequires: hipcc
BuildRequires: hipblas-common-devel
BuildRequires: lld
BuildRequires: llvm
BuildRequires: ninja
BuildRequires: pkgconfig(bzip2)
BuildRequires: pkgconfig(libzstd)
BuildRequires: pkgconfig(nlohmann_json)
BuildRequires: pkgconfig(sqlite3)
BuildRequires: rocm-cmake
BuildRequires: rocm-device-libs
BuildRequires: rocm-llvm-macros
# roctracer uses find_path/find_library rather than find_package; no cmake()/pkgconfig() provided
# FIXME
BuildRequires: roctracer-devel
Requires: cmake(hip)
Requires: cmake(rocrand)
Requires: gcc-c++
%description
AMD's library for high performance machine learning primitives.
MIOpen supports convolution, batch normalization, activation, pooling,
RNN/LSTM/GRU, and attention/transformer operations for the HIP backend.
%package devel
Summary: Libraries and headers for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description devel
%{summary}
%if %{with test}
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%endif
%prep -a
# clang-tidy is brittle and not needed when rebuilding from a tarball
sed -i -e 's@clang-tidy@true@' cmake/ClangTidy.cmake
# half_float::detail::expr is not present in all half versions
sed -i -e 's@std::is_same_v<T, half_float::detail::expr>@0@' test/verify.hpp
# MIOpen tries to download googletest; disable when not needed
%if %{without test}
sed -i -e 's@add_subdirectory(test)@#add_subdirectory(test)@' CMakeLists.txt
sed -i -e 's@add_subdirectory(speedtests)@#add_subdirectory(speedtests)@' CMakeLists.txt
%endif
# Use the standard data directory for the MIOpen kernel database
sed -i -e 's@GetLibPath().parent_path() / "share/miopen/db"@"%{_datadir}/miopen/db"@' src/db_path.cpp.in
# -fno-offload-uniform-block is unsupported on this ROCm version
sed -i -e 's@opts.push_back("-fno-offload-uniform-block");@//opts.push_back("-fno-offload-uniform-block");@' src/comgr.cpp
# Fix the path used to locate the ROCm clang binary at build time
sed -i -e 's@llvm/bin/clang@bin/clang@' src/hip/hip_build_utils.cpp
%install -a
rm -f %{buildroot}%{_datadir}/doc/miopen-hip/LICENSE.md
%files
%doc README.md
%license LICENSE.md
%{_libdir}/libMIOpen.so.1{,.*}
%{_libexecdir}/miopen/
%files devel
%{_datadir}/miopen/
%{_includedir}/miopen/
%{_libdir}/cmake/miopen/
%{_libdir}/libMIOpen.so
%if %{with test}
%files test
%{_bindir}/test*
%endif
%changelog
%autochangelog
-61
View File
@@ -1,61 +0,0 @@
From 53d2ea9ad3cc20e1beac2e1c014082c25e221182 Mon Sep 17 00:00:00 2001
From: Takatoshi Kondo <redboltz@gmail.com>
Date: Sun, 26 Aug 2018 10:58:47 +0900
Subject: [PATCH] Fixed #724.
Fixed type mismatch in msgpack_timestamp.
Added 64bit singed postfix.
---
include/msgpack/timestamp.h | 8 ++++++--
include/msgpack/v1/adaptor/cpp11/chrono.hpp | 4 ++--
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/include/msgpack/timestamp.h b/include/msgpack/timestamp.h
index 4d7df83d..76139312 100644
--- a/include/msgpack/timestamp.h
+++ b/include/msgpack/timestamp.h
@@ -28,13 +28,17 @@ static inline bool msgpack_object_to_timestamp(const msgpack_object* obj, msgpac
switch (obj->via.ext.size) {
case 4:
ts->tv_nsec = 0;
- _msgpack_load32(uint32_t, obj->via.ext.ptr, &ts->tv_sec);
+ {
+ uint32_t v;
+ _msgpack_load32(uint32_t, obj->via.ext.ptr, &v);
+ ts->tv_sec = v;
+ }
return true;
case 8: {
uint64_t value;
_msgpack_load64(uint64_t, obj->via.ext.ptr, &value);
ts->tv_nsec = (uint32_t)(value >> 34);
- ts->tv_sec = value & 0x00000003ffffffffL;
+ ts->tv_sec = value & 0x00000003ffffffffLL;
return true;
}
case 12:
diff --git a/include/msgpack/v1/adaptor/cpp11/chrono.hpp b/include/msgpack/v1/adaptor/cpp11/chrono.hpp
index 1e08355e..db2035b7 100644
--- a/include/msgpack/v1/adaptor/cpp11/chrono.hpp
+++ b/include/msgpack/v1/adaptor/cpp11/chrono.hpp
@@ -41,7 +41,7 @@ struct as<std::chrono::system_clock::time_point> {
uint64_t value;
_msgpack_load64(uint64_t, o.via.ext.data(), &value);
uint32_t nanosec = static_cast<uint32_t>(value >> 34);
- uint64_t sec = value & 0x00000003ffffffffL;
+ uint64_t sec = value & 0x00000003ffffffffLL;
tp += std::chrono::duration_cast<std::chrono::system_clock::duration>(
std::chrono::nanoseconds(nanosec));
tp += std::chrono::seconds(sec);
@@ -79,7 +79,7 @@ struct convert<std::chrono::system_clock::time_point> {
uint64_t value;
_msgpack_load64(uint64_t, o.via.ext.data(), &value);
uint32_t nanosec = static_cast<uint32_t>(value >> 34);
- uint64_t sec = value & 0x00000003ffffffffL;
+ uint64_t sec = value & 0x00000003ffffffffLL;
tp += std::chrono::duration_cast<std::chrono::system_clock::duration>(
std::chrono::nanoseconds(nanosec));
tp += std::chrono::seconds(sec);
--
2.17.1
-47
View File
@@ -1,47 +0,0 @@
From 232fff18d4f07aa25338da88ce704675f9fea465 Mon Sep 17 00:00:00 2001
From: Takatoshi Kondo <redboltz@gmail.com>
Date: Tue, 6 Aug 2024 09:36:04 +0900
Subject: [PATCH 1/5] Fixed cmake warnings.
---
CMakeLists.txt | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 8dc6d610a..c75c908f9 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -1,8 +1,7 @@
-CMAKE_MINIMUM_REQUIRED (VERSION 2.8.12)
+CMAKE_MINIMUM_REQUIRED (VERSION 3.10)
-IF ((CMAKE_VERSION VERSION_GREATER 3.1) OR
- (CMAKE_VERSION VERSION_EQUAL 3.1))
- CMAKE_POLICY(SET CMP0054 NEW)
+IF (MSGPACK_USE_BOOST)
+ CMAKE_POLICY(SET CMP0167 NEW)
ENDIF ()
PROJECT (msgpack)
@@ -285,7 +285,6 @@
# MEMORYCHECK_COMMAND_OPTIONS needs to place prior to CTEST_MEMORYCHECK_COMMAND
SET (MEMORYCHECK_COMMAND_OPTIONS "--leak-check=full --show-leak-kinds=definite,possible --error-exitcode=1")
FIND_PROGRAM(CTEST_MEMORYCHECK_COMMAND NAMES valgrind)
- INCLUDE(Dart)
ADD_SUBDIRECTORY (test)
ENDIF ()
diff --git a/include/msgpack/type.hpp b/include/msgpack/type.hpp
index 1ab49745f..9ef3e86d3 100644
--- a/include/msgpack/type.hpp
+++ b/include/msgpack/type.hpp
@@ -60,7 +60,9 @@
#if defined(MSGPACK_USE_BOOST)
#include "adaptor/boost/fusion.hpp"
+#if !defined(MSGPACK_USE_CPP03)
#include "adaptor/boost/msgpack_variant.hpp"
+#endif // !defined(MSGPACK_USE_CPP03)
#include "adaptor/boost/optional.hpp"
#include "adaptor/boost/string_ref.hpp"
#include "adaptor/boost/string_view.hpp"
-63
View File
@@ -1,63 +0,0 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
Name: msgpack
Version: 3.1.0
Release: %autorelease
Summary: Binary-based efficient object serialization library
License: BSL-1.0
URL: http://msgpack.org
#!RemoteAsset
Source0: https://github.com/msgpack/msgpack-c/releases/download/cpp-%{version}/%{name}-%{version}.tar.gz
BuildSystem: cmake
BuildOption(conf): -DCMAKE_POLICY_VERSION_MINIMUM=3.5
# https://github.com/msgpack/msgpack-c/commit/53d2ea9ad3cc20e1beac2e1c014082c25e221182
Patch0: 0001-Fixed-724.patch
Patch1: 0002-msgpack-cmake4.patch
BuildRequires: cmake
BuildRequires: gcc-c++
BuildRequires: doxygen
# for %%check
BuildRequires: pkgconfig(gtest)
BuildRequires: pkgconfig(zlib)
%description
MessagePack is a binary-based efficient object serialization
library. It enables to exchange structured objects between many
languages like JSON. But unlike JSON, it is very fast and small.
%package devel
Summary: Libraries and header files for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description devel
Libraries and header files for %{name}
%prep -a
# gtest 1.17.0 requires at least C++17
sed -i "s|-std=c++98|-std=gnu++17|g" CMakeLists.txt
%check -p
# https://github.com/msgpack/msgpack-c/issues/697
export GTEST_FILTER=-object_with_zone.ext_empty
%files
%license LICENSE_1_0.txt COPYING
%doc AUTHORS ChangeLog NOTICE README README.md
%{_libdir}/*.so.*
%files devel
%{_includedir}/*
%{_libdir}/*.so
%{_libdir}/pkgconfig/msgpack.pc
%{_libdir}/cmake/msgpack
%changelog
%autochangelog
@@ -0,0 +1,22 @@
From b976f614ed2ce3bf98f15e9a93761aafe15ba5a9 Mon Sep 17 00:00:00 2001
From: Sakura286 <chenxuan@iscas.ac.cn>
Date: Mon, 16 Mar 2026 15:26:02 +0800
Subject: [PATCH] disable httpmuxgo121 on newer version of go
---
main.go | 2 ++
1 file changed, 2 insertions(+)
diff --git a/main.go b/main.go
index 650e03a..9a343f3 100644
--- a/main.go
+++ b/main.go
@@ -1,3 +1,5 @@
+//go:debug httpmuxgo121=0
+
package main
import (
--
2.53.0
@@ -0,0 +1,69 @@
From eb50883178cae3a721ca8658dde6988ee22c8918 Mon Sep 17 00:00:00 2001
From: Sakura286 <chenxuan@iscas.ac.cn>
Date: Mon, 16 Mar 2026 15:45:18 +0800
Subject: [PATCH] use lib64 instead of lib
---
CMakeLists.txt | 4 ++--
discover/types.go | 2 +-
ml/backend/ggml/ggml/src/ggml.go | 2 +-
ml/path.go | 2 +-
4 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 1aa976a..4bd9250 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -37,8 +37,8 @@ if (CMAKE_OSX_ARCHITECTURES MATCHES "x86_64")
set(CMAKE_INSTALL_RPATH "@loader_path")
endif()
-set(OLLAMA_BUILD_DIR ${CMAKE_BINARY_DIR}/lib/ollama)
-set(OLLAMA_INSTALL_DIR ${CMAKE_INSTALL_PREFIX}/lib/ollama/${OLLAMA_RUNNER_DIR})
+set(OLLAMA_BUILD_DIR ${CMAKE_BINARY_DIR}/lib64/ollama)
+set(OLLAMA_INSTALL_DIR ${CMAKE_INSTALL_PREFIX}/lib64/ollama/${OLLAMA_RUNNER_DIR})
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${OLLAMA_BUILD_DIR})
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY_DEBUG ${OLLAMA_BUILD_DIR})
diff --git a/discover/types.go b/discover/types.go
index efc69ec..5848ea4 100644
--- a/discover/types.go
+++ b/discover/types.go
@@ -31,7 +31,7 @@ func LogDetails(devices []ml.DeviceInfo) {
for _, dev := range devices {
var libs []string
for _, dir := range dev.LibraryPath {
- if strings.Contains(dir, filepath.Join("lib", "ollama")) {
+ if strings.Contains(dir, filepath.Join("lib64", "ollama")) {
libs = append(libs, filepath.Base(dir))
}
}
diff --git a/ml/backend/ggml/ggml/src/ggml.go b/ml/backend/ggml/ggml/src/ggml.go
index 7e21591..23f58a1 100644
--- a/ml/backend/ggml/ggml/src/ggml.go
+++ b/ml/backend/ggml/ggml/src/ggml.go
@@ -65,7 +65,7 @@ var OnceLoad = sync.OnceFunc(func() {
case "windows":
value = filepath.Join(filepath.Dir(exe), "lib", "ollama")
default:
- value = filepath.Join(filepath.Dir(exe), "..", "lib", "ollama")
+ value = filepath.Join(filepath.Dir(exe), "..", "lib64", "ollama")
}
// Avoid potentially loading incompatible GGML libraries
diff --git a/ml/path.go b/ml/path.go
index ac93af4..3af726c 100644
--- a/ml/path.go
+++ b/ml/path.go
@@ -28,7 +28,7 @@ var LibOllamaPath string = func() string {
case "windows":
libPath = filepath.Join(filepath.Dir(exe), "lib", "ollama")
case "linux":
- libPath = filepath.Join(filepath.Dir(exe), "..", "lib", "ollama")
+ libPath = filepath.Join(filepath.Dir(exe), "..", "lib64", "ollama")
case "darwin":
libPath = filepath.Dir(exe)
}
--
2.53.0
@@ -0,0 +1,13 @@
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 2820dee..44e0b43 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -28,7 +28,7 @@ set(GGML_CUDA_FA ON)
set(GGML_CUDA_COMPRESSION_MODE default)
if((CMAKE_OSX_ARCHITECTURES AND NOT CMAKE_OSX_ARCHITECTURES MATCHES "arm64")
- OR (NOT CMAKE_OSX_ARCHITECTURES AND NOT CMAKE_SYSTEM_PROCESSOR MATCHES "arm|aarch64|ARM64|ARMv[0-9]+"))
+ OR (NOT CMAKE_OSX_ARCHITECTURES AND NOT CMAKE_SYSTEM_PROCESSOR MATCHES "arm|aarch64|ARM64|ARMv[0-9]+|riscv64"))
set(GGML_CPU_ALL_VARIANTS ON)
endif()
@@ -0,0 +1,25 @@
From 85aee3e710288343eca393272a418483ac547b83 Mon Sep 17 00:00:00 2001
From: Sakura286 <chenxuan@iscas.ac.cn>
Date: Tue, 24 Mar 2026 10:11:07 +0800
Subject: [PATCH] limit batch size to stablize
---
api/types.go | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/api/types.go b/api/types.go
index 3ccf3ce..cba3ffd 100644
--- a/api/types.go
+++ b/api/types.go
@@ -896,7 +896,7 @@ func DefaultOptions() Options {
Runner: Runner{
// options set when the model is loaded
NumCtx: int(envconfig.ContextLength()),
- NumBatch: 512,
+ NumBatch: 8,
NumGPU: -1, // -1 here indicates that NumGPU should be set dynamically
NumThread: 0, // let the runtime decide
UseMMap: nil,
--
2.53.0
+13
View File
@@ -0,0 +1,13 @@
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
[Install]
WantedBy=default.target
+81 -23
View File
@@ -1,6 +1,7 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: misaka00251 <liuxin@iscas.ac.cn>
# SPDX-FileContributor: Sakura286 <chenxuan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
@@ -11,24 +12,24 @@
# Ollama bundles some ggml libs
# They should be kept private and the scans of these files should be disabled
%global __provides_exclude_from ^%{_exec_prefix}/lib/ollama/.*\\.so(\\..*)?$
%global __requires_exclude ^libggml-base\\.so\\.0\\(\\).*
%global __provides_exclude lib.*\\.so(\\..*)?
%global __requires_exclude libggml-.*\\.so(\\..*)?
Name: ollama
Version: 0.13.5
Release: %autorelease
Summary: Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
License: Apache-2.0 AND MIT
URL: https://github.com/ollama/ollama
#!RemoteAsset
Source0: %{url}/archive/refs/tags/v%{version}.tar.gz
License: MIT
URL: https://ollama.com/
VCS: git:https://github.com/ollama/ollama
#!RemoteAsset: sha256:6b6bc20a52c11341aa296eecce5ee6782f05815224a4196983b0aa2f1453c19f
Source0: https://github.com/ollama/ollama/archive/refs/tags/v%{version}.tar.gz
Source1: ollama.service
Source2: ollama.sysusers
BuildSystem: golang
BuildOption(prep): -n %{_name}-%{version}
Patch0: 0001-ollama-0.14.2_add-riscv.patch
Patch1: 0002-go-riscv64.patch
BuildRequires: cmake
BuildRequires: fdupes
BuildRequires: gcc-c++
@@ -60,12 +61,15 @@ BuildRequires: go(golang.org/x/tools)
BuildRequires: go(gonum.org/v1/gonum)
BuildRequires: go(google.golang.org/protobuf)
BuildRequires: ninja
BuildRequires: systemd-rpm-macros
%if %{with rocm}
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(Clang)
BuildRequires: cmake(hip)
BuildRequires: cmake(hipblas)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(LLD)
BuildRequires: cmake(LLVM)
BuildRequires: cmake(rocblas)
BuildRequires: cmake(rocsolver)
BuildRequires: pkgconfig(libdrm_amdgpu)
@@ -73,17 +77,37 @@ BuildRequires: pkgconfig(libelf)
BuildRequires: pkgconfig(numa)
BuildRequires: rocm-llvm-macros
BuildRequires: rocminfo
BuildRequires: clang-devel
BuildRequires: clang-tools-extra-devel
BuildRequires: compiler-rt
BuildRequires: hipcc
BuildRequires: lld-devel
BuildRequires: llvm-devel
%endif
%{?systemd_requires}
%if %{with rocm}
Requires: hipblas
Requires: rocblas
%endif
%patchlist
# Ollama vendors ggml code, but it does not sync riscv64 code by default
# Manually sync riscv64 code here
0001-ollama-0.14.2_add-riscv.patch
# Ollama put ggml-cpu code(cpp) inside 'ollama' binary file(go)
0002-go-riscv64.patch
# Golang buildsystem on openRuyi use GO11MODULE=off, makes
# httpmuxgo121=1, which is deprecated in newer version of go
# Without this patch, ollama cannot provide even the basic http functions
# https://github.com/jkroepke/openvpn-auth-oauth2/pull/706
0003-disable-httpmuxgo121-on-newer-version-of-go.patch
# This patch breaks dlopen of ollama, temporarily disable it
# Install ollama to /usr/lib as workaround
# 0004-use-lib64-instead-of-lib.patch
# GGML_CPU_ALL_VARIANTS only supports x86_64
0005-disable-cpu-variants.patch
# Llama.cpp(ggml) on riscv64's ROCm frequently produce nonsense
# Give parameter '-b 8 -ub 8' can stabilize it
0006-limit-batch-size-to-stabilize.patch
%description
Ollama is an open-source platform designed to run large language models locally.
It allows users to generate text, assist with coding, and create content privately
@@ -95,28 +119,62 @@ rm -rf llama/llama.cpp/vendor
# Ollama use a mix build of cmake and go.
# Ollama binary built by go will use dlopen to load *.so built by cmake.
# Building order is not important.
# Building order of go/cmake is not important.
%build -a
cmake \
-B build \
%if %{with rocm}
-DCMAKE_HIP_COMPILER=%rocmllvm_bindir/clang++ \
-DAMDGPU_TARGETS=%{rocm_gpu_list_default} \
%endif
%cmake \
-G Ninja \
-W no-dev
cmake --build build --parallel
-W no-dev \
-DCMAKE_INSTALL_LIBDIR:PATH=lib \
-DCMAKE_INSTALL_FULL_LIBDIR:PATH=/usr/lib \
-DLIB_INSTALL_DIR:PATH=/usr/lib \
-DLIB_SUFFIX= \
%if %{with rocm}
-DCMAKE_HIP_COMPILER=%{rocmllvm_bindir}/clang++ \
-DAMDGPU_TARGETS=%{rocm_gpu_list_default}
%endif
%cmake_build
%install
%buildsystem_golang_install
%cmake_install
# Remove bundled contents
rm -rvf %{buildroot}%{_bindir}/lib* \
%{buildroot}%{_exec_prefix}/lib/ollama/libamd* \
%{buildroot}%{_exec_prefix}/lib/ollama/libdrm* \
%{buildroot}%{_exec_prefix}/lib/ollama/libelf* \
%{buildroot}%{_exec_prefix}/lib/ollama/libhip* \
%{buildroot}%{_exec_prefix}/lib/ollama/libhsa* \
%{buildroot}%{_exec_prefix}/lib/ollama/libnuma* \
%{buildroot}%{_exec_prefix}/lib/ollama/libroc* \
%{buildroot}%{_exec_prefix}/lib/ollama/libroc* \
%{buildroot}%{_exec_prefix}/lib/ollama/rocblas/
install -p -D -m 0644 %{SOURCE1} %{buildroot}%{_unitdir}/ollama.service
install -p -D -m 0644 %{SOURCE2} %{buildroot}%{_sysusersdir}/ollama.conf
# home dir
mkdir -p %{buildroot}%{_var}/lib/ollama
%pre
%sysusers_create_package ollama %{SOURCE2}
%preun
%systemd_preun ollama.service
%post
%systemd_post ollama.service
%postun
%systemd_postun_with_restart ollama.service
%files
%license LICENSE*
%doc README*
%{_bindir}/%{_name}
%dir %{_exec_prefix}/lib/ollama
%attr(0755,ollama,ollama) %dir %{_var}/lib/ollama/
%{_bindir}/ollama
%{_exec_prefix}/lib/ollama/*
%{_unitdir}/ollama.service
%{_sysusersdir}/ollama.conf
%changelog
%{?autochangelog}
+1
View File
@@ -0,0 +1 @@
u ollama - "Runs Ollama" /var/lib/ollama /sbin/nologin
@@ -0,0 +1,56 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%global srcname mistral-common
%global pypi_name mistral_common
Name: python-%{srcname}
Version: 1.11.2
Release: %autorelease
Summary: Library of common utilities for Mistral AI
License: Apache-2.0
URL: https://github.com/mistralai/mistral-common
#!RemoteAsset: sha256:79f68fc2d1190f28637f40e053f919c8c2697e00b2aa679ddee562a95183f4ad
Source0: https://files.pythonhosted.org/packages/source/m/%{pypi_name}/%{pypi_name}-%{version}.tar.gz
# Upstream does not ship the license file in the sdist, fetch it separately
#!RemoteAsset: sha256:5ed6f79e77734b5a60740dd821af5ecac9a6f33709c860eea4e20fcb6cca7fcc
Source1: https://raw.githubusercontent.com/mistralai/mistral-common/v%{version}/LICENCE
BuildArch: noarch
BuildSystem: pyproject
BuildOption(install): %{pypi_name}
# These modules require the optional "server" extra (fastapi, click, pydantic-settings)
BuildOption(check): -e 'mistral_common.experimental.app.*'
BuildRequires: pyproject-rpm-macros
BuildRequires: pkgconfig(python3)
BuildRequires: python3dist(pip)
BuildRequires: python3dist(setuptools)
BuildRequires: python3dist(wheel)
Provides: python3-%{srcname} = %{version}-%{release}
%python_provide python3-%{srcname}
%description
mistral-common is a library of common utilities for Mistral AI, providing
tokenizers, request and response schemas, and validation helpers used across
Mistral's models and tooling.
%prep -a
cp -p %{SOURCE1} LICENCE
# Relax jsonschema lower bound to match the version available in the repo
sed -i 's/jsonschema>=4.21.1/jsonschema>=4.17.3/' pyproject.toml
%generate_buildrequires
%pyproject_buildrequires
%files -f %{pyproject_files}
%doc README.md
%license LICENCE
%{_bindir}/mistral_common
%changelog
%autochangelog
+48
View File
@@ -0,0 +1,48 @@
# SPDX-FileCopyrightText: (C) 2025, 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2025, 2026 openRuyi Project Contributors
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%global srcname msgpack
Name: python-%{srcname}
Version: 1.1.2
Release: %autorelease
Summary: Python MessagePack (de)serializer
License: Apache-2.0
URL: https://msgpack.org/
#!RemoteAsset: sha256:3b60763c1373dd60f398488069bcdc703cd08a711477b5d480eecc9f9626f47e
Source0: https://files.pythonhosted.org/packages/source/m/%{srcname}/%{srcname}-%{version}.tar.gz
BuildSystem: pyproject
BuildOption(install): -l %{srcname}
BuildRequires: gcc-c++
BuildRequires: pyproject-rpm-macros
BuildRequires: pkgconfig(python3)
BuildRequires: python3dist(pip)
BuildRequires: python3dist(setuptools)
Provides: python3-%{srcname} = %{version}-%{release}
Provides: python3-%{srcname}%{?_isa} = %{version}-%{release}
%python_provide python3-%{srcname}
%description
MessagePack is a binary-based efficient data interchange format that is
focused on high performance. It is like JSON, but very fast and small.
This is a Python (de)serializer for MessagePack.
%prep -a
# There is a circular dependency with python-msgpack-ext
rm -rf test/test_timestamp.py
%generate_buildrequires
%pyproject_buildrequires
%files -f %{pyproject_files}
%doc README.md
%license COPYING
%changelog
%autochangelog
@@ -0,0 +1,54 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%global srcname pydantic-extra-types
%global pypi_name pydantic_extra_types
Name: python-%{srcname}
Version: 2.11.1
Release: %autorelease
Summary: Extra Pydantic types
License: MIT
URL: https://github.com/pydantic/pydantic-extra-types
#!RemoteAsset: sha256:46792d2307383859e923d8fcefa82108b1a141f8a9c0198982b3832ab5ef1049
Source0: https://files.pythonhosted.org/packages/source/p/%{pypi_name}/%{pypi_name}-%{version}.tar.gz
BuildArch: noarch
BuildSystem: pyproject
BuildOption(install): %{pypi_name}
# Skip submodules whose optional dependencies are not packaged yet
BuildOption(check): -e 'pydantic_extra_types.cron'
BuildOption(check): -e 'pydantic_extra_types.mongo_object_id'
BuildOption(check): -e 'pydantic_extra_types.pendulum_dt'
BuildOption(check): -e 'pydantic_extra_types.phone_numbers'
BuildOption(check): -e 'pydantic_extra_types.semantic_version'
BuildOption(check): -e 'pydantic_extra_types.semver'
BuildOption(check): -e 'pydantic_extra_types.ulid'
BuildRequires: pyproject-rpm-macros
BuildRequires: pkgconfig(python3)
BuildRequires: python3dist(hatchling)
BuildRequires: python3dist(pip)
BuildRequires: python3dist(pycountry)
BuildRequires: python3dist(setuptools)
Provides: python3-%{srcname} = %{version}-%{release}
%python_provide python3-%{srcname}
%description
Extra Pydantic types provides a collection of additional field types and
validators for Pydantic, such as country codes, phone numbers, colors,
coordinates and currency codes.
%generate_buildrequires
%pyproject_buildrequires
%files -f %{pyproject_files}
%doc README.md
%license LICENSE
%changelog
%autochangelog
@@ -1,55 +0,0 @@
From 353550790f659b320ea8753b7d4a6fd701bd1a79 Mon Sep 17 00:00:00 2001
From: Sakura286 <chenxuan@iscas.ac.cn>
Date: Fri, 6 Mar 2026 16:36:36 +0800
Subject: [PATCH 1/6] fix python shebang
---
Tensile/Configs/miopen/convert_cfg.py | 2 +-
Tensile/Tests/create_tests.py | 2 +-
Tensile/bin/Tensile | 2 +-
Tensile/bin/TensileCreateLibrary | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/Tensile/Configs/miopen/convert_cfg.py b/Tensile/Configs/miopen/convert_cfg.py
index c62d26f..3b5c114 100644
--- a/Tensile/Configs/miopen/convert_cfg.py
+++ b/Tensile/Configs/miopen/convert_cfg.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python
+#!/usr/bin/python3
################################################################################
#
diff --git a/Tensile/Tests/create_tests.py b/Tensile/Tests/create_tests.py
index 2b08e3f..94f7345 100755
--- a/Tensile/Tests/create_tests.py
+++ b/Tensile/Tests/create_tests.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python
+#!/usr/bin/python3
################################################################################
#
diff --git a/Tensile/bin/Tensile b/Tensile/bin/Tensile
index 1c53682..2ac7d57 100755
--- a/Tensile/bin/Tensile
+++ b/Tensile/bin/Tensile
@@ -1,4 +1,4 @@
-#!/usr/bin/env python3
+#!/usr/bin/python3
################################################################################
#
diff --git a/Tensile/bin/TensileCreateLibrary b/Tensile/bin/TensileCreateLibrary
index e90be28..8e966c3 100755
--- a/Tensile/bin/TensileCreateLibrary
+++ b/Tensile/bin/TensileCreateLibrary
@@ -1,4 +1,4 @@
-#!/usr/bin/env python3
+#!/usr/bin/python3
################################################################################
#
--
2.51.0
@@ -1,25 +0,0 @@
From 9d2c031ba924572914b72794f94f1def07aa225c Mon Sep 17 00:00:00 2001
From: Sakura286 <chenxuan@iscas.ac.cn>
Date: Fri, 6 Mar 2026 16:37:40 +0800
Subject: [PATCH 2/6] fix tensile get path
---
Tensile/cmake/TensileConfig.cmake | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Tensile/cmake/TensileConfig.cmake b/Tensile/cmake/TensileConfig.cmake
index 62682d7..de275b0 100644
--- a/Tensile/cmake/TensileConfig.cmake
+++ b/Tensile/cmake/TensileConfig.cmake
@@ -45,7 +45,7 @@ if(NOT DEFINED Tensile_ROOT)
if (WIN32)
execute_process(COMMAND "${Tensile_PREFIX}/bin/TensileGetPath.exe" OUTPUT_VARIABLE Tensile_ROOT)
else()
- execute_process(COMMAND "${Tensile_PREFIX}/bin/TensileGetPath" OUTPUT_VARIABLE Tensile_ROOT)
+ execute_process(COMMAND "TensileGetPath" OUTPUT_VARIABLE Tensile_ROOT)
endif()
endif()
list(APPEND CMAKE_MODULE_PATH "${Tensile_ROOT}/Source/cmake/")
--
2.51.0
@@ -1,34 +0,0 @@
From b1a90000e009daf6c91dc9e0837a36d9f4735a34 Mon Sep 17 00:00:00 2001
From: Sakura286 <chenxuan@iscas.ac.cn>
Date: Fri, 6 Mar 2026 16:39:22 +0800
Subject: [PATCH 3/6] reduce requirements
---
docs/sphinx/requirements.in | 1 -
docs/sphinx/requirements.txt | 2 --
2 files changed, 3 deletions(-)
diff --git a/docs/sphinx/requirements.in b/docs/sphinx/requirements.in
index 4184a90..f818da0 100644
--- a/docs/sphinx/requirements.in
+++ b/docs/sphinx/requirements.in
@@ -1,3 +1,2 @@
rocm-docs-core==1.20.0
autodoc
-joblib # Required dependency for API doc-string generation
diff --git a/docs/sphinx/requirements.txt b/docs/sphinx/requirements.txt
index e9b7e28..2dd28d9 100644
--- a/docs/sphinx/requirements.txt
+++ b/docs/sphinx/requirements.txt
@@ -91,8 +91,6 @@ jinja2==3.1.4
# via
# myst-parser
# sphinx
-joblib==1.5.1
- # via -r requirements.in
jsonschema==4.23.0
# via nbformat
jsonschema-specifications==2024.10.1
--
2.51.0
@@ -1,25 +0,0 @@
From 5e9b360710400fcace517f055edf54da2ff0e076 Mon Sep 17 00:00:00 2001
From: Sakura286 <chenxuan@iscas.ac.cn>
Date: Fri, 6 Mar 2026 16:41:52 +0800
Subject: [PATCH 4/6] ignore asm cap cache
---
Tensile/Common.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Tensile/Common.py b/Tensile/Common.py
index 86c6c57..376afc4 100644
--- a/Tensile/Common.py
+++ b/Tensile/Common.py
@@ -307,7 +307,7 @@ globalParameters["SeparateArchitectures"] = False # write Tensile library metada
globalParameters["LazyLibraryLoading"] = False # Load library and code object files when needed instead of at startup
-globalParameters["IgnoreAsmCapCache"] = False # Ignore checking for discrepancies between derived and cached asm caps
+globalParameters["IgnoreAsmCapCache"] = True # Ignore checking for discrepancies between derived and cached asm caps
globalParameters["ExperimentalLogicDir"] = "/experimental/"
--
2.51.0
@@ -1,71 +0,0 @@
From 746af14c11ee1b455f1adbab8bf6bc3c93fb3fde Mon Sep 17 00:00:00 2001
From: Sakura286 <chenxuan@iscas.ac.cn>
Date: Fri, 6 Mar 2026 16:45:10 +0800
Subject: [PATCH 5/6] no amdclang when rocm-llvm is unbundled
---
Tensile/Common.py | 4 ++--
Tensile/Utilities/Toolchain.py | 10 +++++-----
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/Tensile/Common.py b/Tensile/Common.py
index 376afc4..90c579e 100644
--- a/Tensile/Common.py
+++ b/Tensile/Common.py
@@ -269,7 +269,7 @@ globalParameters["DictLibraryLogic"] = False
globalParameters["CurrentISA"] = (0,0,0)
globalParameters["ROCmAgentEnumeratorPath"] = None # /opt/rocm/bin/rocm_agent_enumerator
globalParameters["ROCmSMIPath"] = None # /opt/rocm/bin/rocm-smi
-globalParameters["AssemblerPath"] = None # /opt/rocm/llvm/bin/clang++
+globalParameters["AssemblerPath"] = "clang++" # /opt/rocm/llvm/bin/clang++
globalParameters["WorkingPath"] = os.getcwd() # path where tensile called from
globalParameters["IndexChars"] = "IJKLMNOPQRSTUVWXYZ" # which characters to use for C[ij]=Sum[k] A[ik]*B[jk]
globalParameters["ScriptPath"] = os.path.dirname(os.path.realpath(__file__)) # path to Tensile/Tensile.py
@@ -279,7 +279,7 @@ globalParameters["HipClangVersion"] = "0.0.0"
globalParameters["RuntimeLanguage"] = "HIP"
globalParameters["CodeObjectVersion"] = "default"
-globalParameters["CxxCompiler"] = "amdclang++" if os.name != "nt" else "clang++"
+globalParameters["CxxCompiler"] = "hipcc" if os.name != "nt" else "clang++"
globalParameters["CCompiler"] = "amdclang" if os.name != "nt" else "clang"
globalParameters["Architecture"] = "all"
diff --git a/Tensile/Utilities/Toolchain.py b/Tensile/Utilities/Toolchain.py
index ee9cbee..e3de82b 100644
--- a/Tensile/Utilities/Toolchain.py
+++ b/Tensile/Utilities/Toolchain.py
@@ -106,10 +106,10 @@ def _posixSearchPaths() -> List[Path]:
class ToolchainDefaults(NamedTuple):
- CXX_COMPILER = osSelect(linux="amdclang++", windows="clang++.exe")
- C_COMPILER = osSelect(linux="amdclang", windows="clang.exe")
+ CXX_COMPILER = osSelect(linux="hipcc", windows="clang++.exe")
+ C_COMPILER = osSelect(linux="clang", windows="clang.exe")
OFFLOAD_BUNDLER = osSelect(linux="clang-offload-bundler", windows="clang-offload-bundler.exe")
- ASSEMBLER = osSelect(linux="amdclang++", windows="clang++.exe")
+ ASSEMBLER = osSelect(linux="clang++", windows="clang++.exe")
HIP_CONFIG = osSelect(linux="hipconfig", windows="hipconfig")
DEVICE_ENUMERATOR = osSelect(linux="rocm_agent_enumerator", windows="hipinfo.exe")
@@ -132,7 +132,7 @@ def supportedCCompiler(compiler: str) -> bool:
Return:
If supported True; otherwise, False.
"""
- return _supportedComponent(compiler, ["amdclang", "clang", "hipcc"])
+ return _supportedComponent(compiler, ["clang", "clang", "hipcc"])
def supportedCxxCompiler(compiler: str) -> bool:
@@ -144,7 +144,7 @@ def supportedCxxCompiler(compiler: str) -> bool:
Return:
If supported True; otherwise, False.
"""
- return _supportedComponent(compiler, ["amdclang++", "clang++", "hipcc"])
+ return _supportedComponent(compiler, ["clang++", "clang++", "hipcc"])
def supportedOffloadBundler(bundler: str) -> bool:
--
2.51.0
@@ -1,128 +0,0 @@
From dfefd5482684998206290e2e62dc0c84dcc7d64e Mon Sep 17 00:00:00 2001
From: Sakura286 <chenxuan@iscas.ac.cn>
Date: Fri, 6 Mar 2026 16:49:43 +0800
Subject: [PATCH 6/6] use system path instead of default
---
Tensile/Common.py | 2 +-
Tensile/Source/CMakeLists.txt | 4 ++--
Tensile/Source/FindHIP.cmake | 4 ++--
Tensile/Source/cmake/FindROCmSMI.cmake | 2 +-
Tensile/Tests/hipModuleLoad_timing/Makefile | 6 +++---
Tensile/Utilities/Toolchain.py | 5 ++---
6 files changed, 11 insertions(+), 12 deletions(-)
diff --git a/Tensile/Common.py b/Tensile/Common.py
index 90c579e..3589e01 100644
--- a/Tensile/Common.py
+++ b/Tensile/Common.py
@@ -2415,7 +2415,7 @@ def assignGlobalParameters( config, capabilitiesCache: Optional[dict] = None ):
if "KeepBuildTmp" in config:
globalParameters["KeepBuildTmp"] = config["KeepBuildTmp"]
- globalParameters["ROCmPath"] = "/opt/rocm"
+ globalParameters["ROCmPath"] = "/usr"
if "ROCM_PATH" in os.environ:
globalParameters["ROCmPath"] = os.environ.get("ROCM_PATH")
if "TENSILE_ROCM_PATH" in os.environ:
diff --git a/Tensile/Source/CMakeLists.txt b/Tensile/Source/CMakeLists.txt
index b96e308..c756756 100644
--- a/Tensile/Source/CMakeLists.txt
+++ b/Tensile/Source/CMakeLists.txt
@@ -26,7 +26,7 @@ cmake_minimum_required(VERSION 3.13)
# Override all paths arguments as they do not work properly
file(TO_CMAKE_PATH "$ENV{ROCM_PATH}" ROCM_PATH_ENV_VALUE)
-list(APPEND CMAKE_PREFIX_PATH ${ROCM_PATH_ENV_VALUE} /opt/rocm)
+list(APPEND CMAKE_PREFIX_PATH ${ROCM_PATH_ENV_VALUE} /usr)
project(Tensile)
@@ -65,7 +65,7 @@ CMAKE_DEPENDENT_OPTION(TENSILE_BUILD_CLIENT "Build the benchmarking client" ON
"TENSILE_USE_HIP" OFF)
if(TENSILE_USE_HIP)
- find_package(HIP REQUIRED CONFIG PATHS ${ROCM_PATH_ENV_VALUE} /opt/rocm)
+ find_package(HIP REQUIRED CONFIG PATHS ${ROCM_PATH_ENV_VALUE} /usr)
endif()
if(TENSILE_USE_OPENMP)
diff --git a/Tensile/Source/FindHIP.cmake b/Tensile/Source/FindHIP.cmake
index d299357..ba8597f 100644
--- a/Tensile/Source/FindHIP.cmake
+++ b/Tensile/Source/FindHIP.cmake
@@ -79,7 +79,7 @@ else()
hip/hip_runtime.h
PATHS
ENV HIP_PATH
- /opt/rocm
+ /usr
PATH_SUFFIXES
/include/hip
/include
@@ -98,7 +98,7 @@ else()
NAMES hipcc
PATHS
ENV HIP_PATH
- /opt/rocm
+ /usr
PATH_SUFFIXES
/bin
)
diff --git a/Tensile/Source/cmake/FindROCmSMI.cmake b/Tensile/Source/cmake/FindROCmSMI.cmake
index 0498766..071232a 100644
--- a/Tensile/Source/cmake/FindROCmSMI.cmake
+++ b/Tensile/Source/cmake/FindROCmSMI.cmake
@@ -24,7 +24,7 @@
if(NOT ROCM_ROOT)
if(NOT ROCM_DIR)
- set(ROCM_ROOT "/opt/rocm")
+ set(ROCM_ROOT "/usr")
else()
set(ROCM_DIR "${ROCM_DIR}/../../..")
endif()
diff --git a/Tensile/Tests/hipModuleLoad_timing/Makefile b/Tensile/Tests/hipModuleLoad_timing/Makefile
index 671167d..2177143 100644
--- a/Tensile/Tests/hipModuleLoad_timing/Makefile
+++ b/Tensile/Tests/hipModuleLoad_timing/Makefile
@@ -22,10 +22,10 @@
#
################################################################################
-CXX?=/opt/rocm/hip/bin/amdclang++
-LIBFLAGS=-L/opt/rocm/hip/lib/
+CXX?=/usr/bin/amdclang++
+LIBFLAGS=-L/usr/lib64/
LIBS=-lamdhip64
-INCFLAGS=-I/opt/rocm/hip/include/
+INCFLAGS=-I/usr/include/
hipModuleLoadTiming.out: hipModuleLoadTiming.o
$(CXX) -o $@ $(LIBFLAGS) $^
diff --git a/Tensile/Utilities/Toolchain.py b/Tensile/Utilities/Toolchain.py
index e3de82b..e6ee7f3 100644
--- a/Tensile/Utilities/Toolchain.py
+++ b/Tensile/Utilities/Toolchain.py
@@ -29,8 +29,8 @@ from subprocess import PIPE, run
from typing import List, NamedTuple, Union
from warnings import warn
-DEFAULT_ROCM_BIN_PATH_POSIX = Path("/opt/rocm/bin")
-DEFAULT_ROCM_LLVM_BIN_PATH_POSIX = Path("/opt/rocm/lib/llvm/bin")
+DEFAULT_ROCM_BIN_PATH_POSIX = Path("/usr/bin")
+DEFAULT_ROCM_LLVM_BIN_PATH_POSIX = Path("/usr/bin")
DEFAULT_ROCM_BIN_PATH_WINDOWS = Path("C:/Program Files/AMD/ROCm")
@@ -89,7 +89,6 @@ def _posixSearchPaths() -> List[Path]:
if os.environ.get("ROCM_PATH"):
for p in os.environ["ROCM_PATH"].split(os.pathsep):
searchPaths.append(Path(p) / "bin")
- searchPaths.append(Path(p) / "lib" / "llvm" / "bin")
searchPaths.extend(
[
--
2.51.0
-91
View File
@@ -1,91 +0,0 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%global srcname tensile
%global upstreamname Tensile
%global rocm_version 7.1.1
Name: python-%{srcname}
Version: %{rocm_version}
Release: %autorelease
Summary: Tool for creating benchmark-driven backend libraries for GEMMs
License: MIT
URL: https://github.com/ROCm/Tensile
#!RemoteAsset
Source0: %{url}/archive/rocm-%{rocm_version}.tar.gz
BuildSystem: pyproject
BuildOption(install): -l %{upstreamname}
BuildRequires: python3-devel
Requires: cmake-filesystem
Requires: hipcc
Requires: rocminfo
Requires: python3dist(msgpack)
Requires: python3dist(pyyaml)
Provides: python3-%{srcname}
%python_provide python3-%{srcname}
%patchlist
0001-fix-python-shebang.patch
0002-fix-tensile-get-path.patch
# TODO: joblib is not enabled on openRuyi
0003-reduce-requirements.patch
0004-ignore-asm-cap-cache.patch
# no bundled clang is used on openRuyi
0005-no-amdclang-when-rocm-llvm-is-unbundled.patch
# /opt is not used on openRuyi packaging
0006-use-system-path-instead-of-default.patch
%description
Tensile is a tool for creating benchmark-driven backend libraries for GEMMs,
GEMM-like problems (such as batched GEMM), and general N-dimensional tensor
contractions on a GPU. The Tensile library is mainly used as backend library to
rocBLAS. Tensile acts as the performance backbone for a wide variety of
'compute' applications running on AMD GPUs.
%prep -a
#Fix a few things:
chmod 755 Tensile/Configs/miopen/convert_cfg.py
%generate_buildrequires
%pyproject_buildrequires
%install -a
# /usr/cmake/* -> /usr/lib/cmake/Tensile
mkdir -p %{buildroot}%{_datadir}/cmake/Tensile
mv %{buildroot}%{_prefix}/cmake/* %{buildroot}%{_datadir}/cmake/Tensile/
rm -rf %{buildroot}%{_prefix}/cmake
# Do not distribute broken bins
rm %{buildroot}%{_bindir}/tensile*
# rm hard links and replace
rm %{buildroot}%{python3_sitelib}/%{upstreamname}/cmake/*.cmake
mv %{buildroot}%{_datadir}/cmake/Tensile/*.cmake %{buildroot}%{python3_sitelib}/%{upstreamname}/cmake/
%pyproject_save_files %{upstreamname}
%check
# 1. tensile requires GPU hardware at runtime
# 2. optional dependencies (joblib) are intentionally excluded
%files -f %{pyproject_files}
%doc README.md
%license LICENSE.md
# Do not distribute tests
%exclude %{python3_sitelib}/%{upstreamname}/Tests
%{_bindir}/Tensile
%{_bindir}/TensileBenchmarkCluster
%{_bindir}/TensileCreateLibrary
%{_bindir}/TensileGetPath
%{_bindir}/TensileRetuneLibrary
%changelog
%{?autochangelog}
@@ -0,0 +1,45 @@
--- a/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp
+++ b/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp
@@ -53,11 +53,11 @@
} initializer;
} // namespace (anonymous)
-#define AT_MAGMA_VERSION MAGMA_VERSION_MAJOR*100 + MAGMA_VERSION_MINOR*10 + MAGMA_VERSION_MICRO
+#define AT_MAGMA_VERSION MAGMA_VERSION_MAJOR*10000 + MAGMA_VERSION_MINOR*100 + MAGMA_VERSION_MICRO
-// Check that MAGMA never releases MAGMA_VERSION_MINOR >= 10 or MAGMA_VERSION_MICRO >= 10
-#if MAGMA_VERSION_MINOR >= 10 || MAGMA_VERSION_MICRO >= 10
-#error "MAGMA release minor or micro version >= 10, please correct AT_MAGMA_VERSION"
+// Check that MAGMA never releases MAGMA_VERSION_MINOR >= 100 or MAGMA_VERSION_MICRO >= 100
+#if MAGMA_VERSION_MINOR >= 100 || MAGMA_VERSION_MICRO >= 100
+#error "MAGMA release minor or micro version >= 100, please correct AT_MAGMA_VERSION"
#endif
#else
@@ -153,7 +153,7 @@
scalar_t** dB_array, magma_int_t lddb, magma_int_t& info,
magma_int_t batchsize, const MAGMAQueue& magma_queue, magma_trans_t trans);
-#if AT_MAGMA_VERSION >= 254
+#if AT_MAGMA_VERSION >= 20504
template <>
void magmaLdlHermitian<double>(
@@ -209,7 +209,7 @@
AT_CUDA_CHECK(cudaGetLastError());
}
-#endif // AT_MAGMA_VERSION >= 254
+#endif // AT_MAGMA_VERSION >= 20504
template<>
void magmaLu<double>(
@@ -818,7 +818,7 @@
// If cusolver and magma 2.5.4+ are both available and hermitian=true,
// call magma for complex inputs
#ifdef USE_LINALG_SOLVER
-#if AT_MAGMA_ENABLED() && (AT_MAGMA_VERSION >= 254)
+#if AT_MAGMA_ENABLED() && (AT_MAGMA_VERSION >= 20504)
if (LD.is_complex() && hermitian) {
return ldl_factor_magma(
LD, pivots, info, upper, hermitian);
+350
View File
@@ -0,0 +1,350 @@
#
# License Details
# Main license BSD 3-Clause
#
# Apache-2.0
# android/libs/fbjni/LICENSE
# android/libs/fbjni/CMakeLists.txt
# android/libs/fbjni/build.gradle
# android/libs/fbjni/cxx/fbjni/ByteBuffer.cpp
# android/libs/fbjni/cxx/fbjni/ByteBuffer.h
# android/libs/fbjni/cxx/fbjni/Context.h
# android/libs/fbjni/cxx/fbjni/File.h
# android/libs/fbjni/cxx/fbjni/JThread.h
# android/libs/fbjni/cxx/fbjni/NativeRunnable.h
# android/libs/fbjni/cxx/fbjni/OnLoad.cpp
# android/libs/fbjni/cxx/fbjni/ReadableByteChannel.cpp
# android/libs/fbjni/cxx/fbjni/ReadableByteChannel.h
# android/libs/fbjni/cxx/fbjni/detail/Boxed.h
# android/libs/fbjni/cxx/fbjni/detail/Common.h
# android/libs/fbjni/cxx/fbjni/detail/CoreClasses-inl.h
# android/libs/fbjni/cxx/fbjni/detail/CoreClasses.h
# android/libs/fbjni/cxx/fbjni/detail/Environment.cpp
# android/libs/fbjni/cxx/fbjni/detail/Environment.h
# android/libs/fbjni/cxx/fbjni/detail/Exceptions.cpp
# android/libs/fbjni/cxx/fbjni/detail/Exceptions.h
# android/libs/fbjni/cxx/fbjni/detail/FbjniApi.h
# android/libs/fbjni/cxx/fbjni/detail/Hybrid.cpp
# android/libs/fbjni/cxx/fbjni/detail/Hybrid.h
# android/libs/fbjni/cxx/fbjni/detail/Iterator-inl.h
# android/libs/fbjni/cxx/fbjni/detail/Iterator.h
# android/libs/fbjni/cxx/fbjni/detail/JWeakReference.h
# android/libs/fbjni/cxx/fbjni/detail/Log.h
# android/libs/fbjni/cxx/fbjni/detail/Meta-forward.h
# android/libs/fbjni/cxx/fbjni/detail/Meta-inl.h
# android/libs/fbjni/cxx/fbjni/detail/Meta.cpp
# android/libs/fbjni/cxx/fbjni/detail/Meta.h
# android/libs/fbjni/cxx/fbjni/detail/MetaConvert.h
# android/libs/fbjni/cxx/fbjni/detail/ReferenceAllocators-inl.h
# android/libs/fbjni/cxx/fbjni/detail/ReferenceAllocators.h
# android/libs/fbjni/cxx/fbjni/detail/References-forward.h
# android/libs/fbjni/cxx/fbjni/detail/References-inl.h
# android/libs/fbjni/cxx/fbjni/detail/References.cpp
# android/libs/fbjni/cxx/fbjni/detail/References.h
# android/libs/fbjni/cxx/fbjni/detail/Registration-inl.h
# android/libs/fbjni/cxx/fbjni/detail/Registration.h
# android/libs/fbjni/cxx/fbjni/detail/SimpleFixedString.h
# android/libs/fbjni/cxx/fbjni/detail/TypeTraits.h
# android/libs/fbjni/cxx/fbjni/detail/utf8.cpp
# android/libs/fbjni/cxx/fbjni/detail/utf8.h
# android/libs/fbjni/cxx/fbjni/fbjni.cpp
# android/libs/fbjni/cxx/fbjni/fbjni.h
# android/libs/fbjni/cxx/lyra/cxa_throw.cpp
# android/libs/fbjni/cxx/lyra/lyra.cpp
# android/libs/fbjni/cxx/lyra/lyra.h
# android/libs/fbjni/cxx/lyra/lyra_breakpad.cpp
# android/libs/fbjni/cxx/lyra/lyra_exceptions.cpp
# android/libs/fbjni/cxx/lyra/lyra_exceptions.h
# android/libs/fbjni/gradle.properties
# android/libs/fbjni/gradle/android-tasks.gradle
# android/libs/fbjni/gradle/release.gradle
# android/libs/fbjni/gradlew
# android/libs/fbjni/gradlew.bat
# android/libs/fbjni/host.gradle
# android/libs/fbjni/java/com/facebook/jni/CppException.java
# android/libs/fbjni/java/com/facebook/jni/CppSystemErrorException.java
# android/libs/fbjni/java/com/facebook/jni/DestructorThread.java
# android/libs/fbjni/java/com/facebook/jni/HybridClassBase.java
# android/libs/fbjni/java/com/facebook/jni/HybridData.java
# android/libs/fbjni/java/com/facebook/jni/IteratorHelper.java
# android/libs/fbjni/java/com/facebook/jni/MapIteratorHelper.java
# android/libs/fbjni/java/com/facebook/jni/NativeRunnable.java
# android/libs/fbjni/java/com/facebook/jni/ThreadScopeSupport.java
# android/libs/fbjni/java/com/facebook/jni/UnknownCppException.java
# android/libs/fbjni/java/com/facebook/jni/annotations/DoNotStrip.java
# android/libs/fbjni/scripts/android-setup.sh
# android/libs/fbjni/scripts/run-host-tests.sh
# android/libs/fbjni/settings.gradle
# android/libs/fbjni/test/BaseFBJniTests.java
# android/libs/fbjni/test/ByteBufferTests.java
# android/libs/fbjni/test/DocTests.java
# android/libs/fbjni/test/FBJniTests.java
# android/libs/fbjni/test/HybridTests.java
# android/libs/fbjni/test/IteratorTests.java
# android/libs/fbjni/test/PrimitiveArrayTests.java
# android/libs/fbjni/test/ReadableByteChannelTests.java
# android/libs/fbjni/test/jni/CMakeLists.txt
# android/libs/fbjni/test/jni/byte_buffer_tests.cpp
# android/libs/fbjni/test/jni/doc_tests.cpp
# android/libs/fbjni/test/jni/expect.h
# android/libs/fbjni/test/jni/fbjni_onload.cpp
# android/libs/fbjni/test/jni/fbjni_tests.cpp
# android/libs/fbjni/test/jni/hybrid_tests.cpp
# android/libs/fbjni/test/jni/inter_dso_exception_test_1/Test.cpp
# android/libs/fbjni/test/jni/inter_dso_exception_test_1/Test.h
# android/libs/fbjni/test/jni/inter_dso_exception_test_2/Test.cpp
# android/libs/fbjni/test/jni/inter_dso_exception_test_2/Test.h
# android/libs/fbjni/test/jni/iterator_tests.cpp
# android/libs/fbjni/test/jni/modified_utf8_test.cpp
# android/libs/fbjni/test/jni/no_rtti.cpp
# android/libs/fbjni/test/jni/no_rtti.h
# android/libs/fbjni/test/jni/primitive_array_tests.cpp
# android/libs/fbjni/test/jni/readable_byte_channel_tests.cpp
# android/libs/fbjni/test/jni/simple_fixed_string_tests.cpp
# android/libs/fbjni/test/jni/utf16toUTF8_test.cpp
# android/pytorch_android/host/build.gradle
# aten/src/ATen/cuda/llvm_basic.cpp
# aten/src/ATen/cuda/llvm_complex.cpp
# aten/src/ATen/native/quantized/cpu/qnnpack/confu.yaml
# aten/src/ATen/native/quantized/cpu/qnnpack/src/requantization/gemmlowp-neon.c
# aten/src/ATen/native/quantized/cpu/qnnpack/src/requantization/gemmlowp-scalar.h
# aten/src/ATen/native/quantized/cpu/qnnpack/src/requantization/gemmlowp-sse.h
# aten/src/ATen/nnapi/codegen.py
# aten/src/ATen/nnapi/NeuralNetworks.h
# aten/src/ATen/nnapi/nnapi_wrapper.cpp
# aten/src/ATen/nnapi/nnapi_wrapper.h
# binaries/benchmark_args.h
# binaries/benchmark_helper.cc
# binaries/benchmark_helper.h
# binaries/compare_models_torch.cc
# binaries/convert_and_benchmark.cc
# binaries/convert_caffe_image_db.cc
# binaries/convert_db.cc
# binaries/convert_encoded_to_raw_leveldb.cc
# binaries/convert_image_to_tensor.cc
# binaries/core_overhead_benchmark.cc
# binaries/core_overhead_benchmark_gpu.cc
# binaries/db_throughput.cc
# binaries/dump_operator_names.cc
# binaries/inspect_gpu.cc
# binaries/load_benchmark_torch.cc
# binaries/make_cifar_db.cc
# binaries/make_image_db.cc
# binaries/make_mnist_db.cc
# binaries/optimize_for_mobile.cc
# binaries/parallel_info.cc
# binaries/predictor_verifier.cc
# binaries/print_core_object_sizes_gpu.cc
# binaries/print_registered_core_operators.cc
# binaries/run_plan.cc
# binaries/run_plan_mpi.cc
# binaries/speed_benchmark.cc
# binaries/speed_benchmark_torch.cc
# binaries/split_db.cc
# binaries/tsv_2_proto.cc
# binaries/tutorial_blob.cc
# binaries/zmq_feeder.cc
# c10/test/util/small_vector_test.cpp
# c10/util/FunctionRef.h
# c10/util/SmallVector.cpp
# c10/util/SmallVector.h
# c10/util/llvmMathExtras.h
# c10/util/sparse_bitset.h
# caffe2/contrib/aten/gen_op.py
# caffe2/contrib/fakelowp/fp16_fc_acc_op.cc
# caffe2/contrib/fakelowp/fp16_fc_acc_op.h
# caffe2/contrib/gloo/allgather_ops.cc
# caffe2/contrib/gloo/allgather_ops.h
# caffe2/contrib/gloo/reduce_scatter_ops.cc
# caffe2/contrib/gloo/reduce_scatter_ops.h
# caffe2/core/hip/common_miopen.h
# caffe2/core/hip/common_miopen.hip
# caffe2/core/net_async_tracing.cc
# caffe2/core/net_async_tracing.h
# caffe2/core/net_async_tracing_test.cc
# caffe2/experiments/operators/fully_connected_op_decomposition.cc
# caffe2/experiments/operators/fully_connected_op_decomposition.h
# caffe2/experiments/operators/fully_connected_op_decomposition_gpu.cc
# caffe2/experiments/operators/fully_connected_op_prune.cc
# caffe2/experiments/operators/fully_connected_op_prune.h
# caffe2/experiments/operators/fully_connected_op_sparse.cc
# caffe2/experiments/operators/fully_connected_op_sparse.h
# caffe2/experiments/operators/funhash_op.cc
# caffe2/experiments/operators/funhash_op.h
# caffe2/experiments/operators/sparse_funhash_op.cc
# caffe2/experiments/operators/sparse_funhash_op.h
# caffe2/experiments/operators/sparse_matrix_reshape_op.cc
# caffe2/experiments/operators/sparse_matrix_reshape_op.h
# caffe2/experiments/operators/tt_contraction_op.cc
# caffe2/experiments/operators/tt_contraction_op.h
# caffe2/experiments/operators/tt_contraction_op_gpu.cc
# caffe2/experiments/operators/tt_pad_op.cc
# caffe2/experiments/operators/tt_pad_op.h
# caffe2/experiments/python/SparseTransformer.py
# caffe2/experiments/python/convnet_benchmarks.py
# caffe2/experiments/python/device_reduce_sum_bench.py
# caffe2/experiments/python/funhash_op_test.py
# caffe2/experiments/python/net_construct_bench.py
# caffe2/experiments/python/sparse_funhash_op_test.py
# caffe2/experiments/python/sparse_reshape_op_test.py
# caffe2/experiments/python/tt_contraction_op_test.py
# caffe2/experiments/python/tt_pad_op_test.py
# caffe2/mobile/contrib/libvulkan-stub/include/vulkan/vk_platform.h
# caffe2/mobile/contrib/libvulkan-stub/include/vulkan/vulkan.h
# caffe2/mobile/contrib/nnapi/NeuralNetworks.h
# caffe2/mobile/contrib/nnapi/dlnnapi.c
# caffe2/mobile/contrib/nnapi/nnapi_benchmark.cc
# caffe2/observers/profile_observer.cc
# caffe2/observers/profile_observer.h
# caffe2/operators/hip/conv_op_miopen.hip
# caffe2/operators/hip/local_response_normalization_op_miopen.hip
# caffe2/operators/hip/pool_op_miopen.hip
# caffe2/operators/hip/spatial_batch_norm_op_miopen.hip
# caffe2/operators/quantized/int8_utils.h
# caffe2/operators/stump_func_op.cc
# caffe2/operators/stump_func_op.cu
# caffe2/operators/stump_func_op.h
# caffe2/operators/unique_ops.cc
# caffe2/operators/unique_ops.cu
# caffe2/operators/unique_ops.h
# caffe2/operators/upsample_op.cc
# caffe2/operators/upsample_op.h
# caffe2/opt/fusion.h
# caffe2/python/layers/label_smooth.py
# caffe2/python/mint/static/css/simple-sidebar.css
# caffe2/python/modeling/get_entry_from_blobs.py
# caffe2/python/modeling/get_entry_from_blobs_test.py
# caffe2/python/modeling/gradient_clipping_test.py
# caffe2/python/operator_test/unique_ops_test.py
# caffe2/python/operator_test/upsample_op_test.py
# caffe2/python/operator_test/weight_scale_test.py
# caffe2/python/pybind_state_int8.cc
# caffe2/python/transformations.py
# caffe2/python/transformations_test.py
# caffe2/quantization/server/batch_matmul_dnnlowp_op.cc
# caffe2/quantization/server/batch_matmul_dnnlowp_op.h
# caffe2/quantization/server/compute_equalization_scale_test.py
# caffe2/quantization/server/elementwise_linear_dnnlowp_op.cc
# caffe2/quantization/server/elementwise_linear_dnnlowp_op.h
# caffe2/quantization/server/elementwise_sum_relu_op.cc
# caffe2/quantization/server/fb_fc_packed_op.cc
# caffe2/quantization/server/fb_fc_packed_op.h
# caffe2/quantization/server/fbgemm_fp16_pack_op.cc
# caffe2/quantization/server/fbgemm_fp16_pack_op.h
# caffe2/quantization/server/fully_connected_fake_lowp_op.cc
# caffe2/quantization/server/fully_connected_fake_lowp_op.h
# caffe2/quantization/server/int8_gen_quant_params_min_max_test.py
# caffe2/quantization/server/int8_gen_quant_params_test.py
# caffe2/quantization/server/int8_quant_scheme_blob_fill_test.py
# caffe2/quantization/server/spatial_batch_norm_relu_op.cc
# caffe2/sgd/weight_scale_op.cc
# caffe2/sgd/weight_scale_op.h
# caffe2/utils/bench_utils.h
# functorch/examples/maml_omniglot/maml-omniglot-higher.py
# functorch/examples/maml_omniglot/maml-omniglot-ptonly.py
# functorch/examples/maml_omniglot/maml-omniglot-transforms.py
# functorch/examples/maml_omniglot/support/omniglot_loaders.py
# modules/detectron/group_spatial_softmax_op.cc
# modules/detectron/group_spatial_softmax_op.cu
# modules/detectron/group_spatial_softmax_op.h
# modules/detectron/ps_roi_pool_op.cc
# modules/detectron/ps_roi_pool_op.h
# modules/detectron/roi_pool_f_op.cc
# modules/detectron/roi_pool_f_op.cu
# modules/detectron/roi_pool_f_op.h
# modules/detectron/sample_as_op.cc
# modules/detectron/sample_as_op.cu
# modules/detectron/sample_as_op.h
# modules/detectron/select_smooth_l1_loss_op.cc
# modules/detectron/select_smooth_l1_loss_op.cu
# modules/detectron/select_smooth_l1_loss_op.h
# modules/detectron/sigmoid_cross_entropy_loss_op.cc
# modules/detectron/sigmoid_cross_entropy_loss_op.cu
# modules/detectron/sigmoid_cross_entropy_loss_op.h
# modules/detectron/sigmoid_focal_loss_op.cc
# modules/detectron/sigmoid_focal_loss_op.cu
# modules/detectron/sigmoid_focal_loss_op.h
# modules/detectron/smooth_l1_loss_op.cc
# modules/detectron/smooth_l1_loss_op.cu
# modules/detectron/smooth_l1_loss_op.h
# modules/detectron/softmax_focal_loss_op.cc
# modules/detectron/softmax_focal_loss_op.cu
# modules/detectron/softmax_focal_loss_op.h
# modules/detectron/spatial_narrow_as_op.cc
# modules/detectron/spatial_narrow_as_op.cu
# modules/detectron/spatial_narrow_as_op.h
# modules/detectron/upsample_nearest_op.cc
# modules/detectron/upsample_nearest_op.h
# modules/module_test/module_test_dynamic.cc
# modules/rocksdb/rocksdb.cc
# scripts/apache_header.txt
# scripts/apache_python.txt
# torch/distributions/lkj_cholesky.py
#
# Apache 2.0 AND BSD 2-Clause
# caffe2/operators/deform_conv_op.cu
#
# Apache 2.0 AND BSD 2-Clause AND MIT
# modules/detectron/ps_roi_pool_op.cu
#
# Apache 2.0 AND BSD 2-Clause
# modules/detectron/upsample_nearest_op.cu
#
# BSD 0-Clause
# torch/csrc/utils/pythoncapi_compat.h
#
# BSD 2-Clause
# aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/LICENSE
# caffe2/image/transform_gpu.cu
# caffe2/image/transform_gpu.h
#
# BSL-1.0
# c10/util/flat_hash_map.h
# c10/util/hash.h
# c10/util/Optional.h
# c10/util/order_preserving_flat_hash_map.h
# c10/util/strong_type.h
# c10/util/variant.h
#
# GPL-3.0-or-later AND MIT
# c10/util/reverse_iterator.h
#
# Khronos
# These files are for OpenCL, an unused option
# Replace them later, as-needed with the opencl-headers.rpm
#
# caffe2/contrib/opencl/OpenCL/cl.hpp
# caffe2/mobile/contrib/libopencl-stub/include/CL/cl.h
# caffe2/mobile/contrib/libopencl-stub/include/CL/cl.hpp
# caffe2/mobile/contrib/libopencl-stub/include/CL/cl_ext.h
# caffe2/mobile/contrib/libopencl-stub/include/CL/cl_gl.h
# caffe2/mobile/contrib/libopencl-stub/include/CL/cl_gl_ext.h
# caffe2/mobile/contrib/libopencl-stub/include/CL/cl_platform.h
# caffe2/mobile/contrib/libopencl-stub/include/CL/opencl.h
#
# MIT
# android/libs/fbjni/googletest-CMakeLists.txt.in
# c10/util/BFloat16-math.h
# caffe2/mobile/contrib/libvulkan-stub/include/libvulkan-stub.h
# caffe2/mobile/contrib/libvulkan-stub/src/libvulkan-stub.c
# caffe2/onnx/torch_ops/defs.cc
# cmake/Modules_CUDA_fix/upstream/FindCUDA/make2cmake.cmake
# cmake/Modules_CUDA_fix/upstream/FindCUDA/parse_cubin.cmake
# cmake/Modules_CUDA_fix/upstream/FindCUDA/run_nvcc.cmake
# functorch/einops/_parsing.py
# test/functorch/test_parsing.py
# test/functorch/test_rearrange.py
# third_party/miniz-2.1.0/LICENSE
# third_party/miniz-2.1.0/miniz.c
# tools/coverage_plugins_package/setup.py
# torch/_appdirs.py
# torch/utils/hipify/hipify_python.py
#
# Public Domain
# caffe2/mobile/contrib/libopencl-stub/LICENSE
# caffe2/utils/murmur_hash3.cc
# caffe2/utils/murmur_hash3.h
#
# Zlib
# aten/src/ATen/native/cpu/avx_mathfun.h
+578
View File
@@ -0,0 +1,578 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
#
# Originally extracted from Fedora Project
# Authors: The Fedora Project Contributors
%global srcname torch
%global toolchain clang
%global pypi_version 2.11.0
%global miniz_version 3.0.2
# For -test subpackage
# suitable only for local testing
# Install and do something like
# export LD_LIBRARY_PATH=/usr/lib64/python3.12/site-packages/torch/lib
# /usr/lib64/python3.12/site-packages/torch/bin/test_api, test_lazy
%bcond test 0
%bcond rocm 1
# For testing distributed+rccl etc.
# TODO: openmpi not included in openRuyi
%bcond mpi 0
%global _lto_cflags %nil
# Disable dwz with rocm because memory can be exhausted
%if %{with rocm}
%define _find_debuginfo_dwz_opts %{nil}
%endif
# Pytorch third-party buildrequires
#
# These system_xxx is kept for debug with some reasons:
#
# 1. some package that is not included in openRuyi.
# 2. some package on openRuyi lack some required component.
# 3. the corresponding version is mismatched with openRuyi.
%bcond system_flatbuffers 0
# Pytorch hardcode httplib to third_party/cpp-httplib
%bcond system_httplib 0
# TODO: kineto not included in openruyi
%bcond system_kineto 0
# TODO: tensorpipe not included in openRuyi
%bcond system_tensorpipe 0
Name: python-%{srcname}
Version: %{pypi_version}
Release: %autorelease
Summary: PyTorch AI/ML framework
# See license.txt for license details
License: BSD-3-Clause AND BSD-2-Clause AND 0BSD AND Apache-2.0 AND MIT AND BSL-1.0 AND GPL-3.0-or-later AND Zlib
URL: https://pytorch.org/
#!RemoteAsset: sha256:52872a6bbdc42334b00051d88a92f801cfd9be730abdd2b37a2d08996f53bb29
Source0: https://github.com/pytorch/pytorch/archive/refs/tags/v%{version}.tar.gz
%if %{without system_flatbuffers}
%global flatbuffers_version 24.12.23
#!RemoteAsset: sha256:7e2ef35f1af9e2aa0c6a7d0a09298c2cb86caf3d4f58c0658b306256e5bcab10
Source1: https://github.com/google/flatbuffers/archive/refs/tags/v%{flatbuffers_version}.tar.gz
%endif
%if %{without system_tensorpipe}
# Developement on tensorpipe has stopped, repo made read only July 1, 2023, this is the last commit
%global tp_commit 2b4cd91092d335a697416b2a3cb398283246849d
%global tp_scommit 2b4cd91
#!RemoteAsset: sha256:0e85ca56bfe25ed7b3026d2784f716eb10ed1328ade346e3a252814752c57eeb
Source2: https://github.com/pytorch/tensorpipe/archive/%{tp_commit}/tensorpipe-%{tp_scommit}.tar.gz
# The old libuv tensorpipe uses
#!RemoteAsset: sha256:6cfeb5f4bab271462b4a2cc77d4ecec847fdbdc26b72019c27ae21509e6f94fa
Source3: https://github.com/libuv/libuv/archive/refs/tags/v1.41.0.tar.gz
# Developement afaik on libnop has stopped, this is the last commit
%global nop_commit 910b55815be16109f04f4180e9adee14fb4ce281
%global nop_scommit 910b558
#!RemoteAsset: sha256:ec3604671f8ea11aed9588825f9098057ebfef7a8908e97459835150eea9f63a
Source4: https://github.com/google/libnop/archive/%{nop_commit}/libnop-%{nop_scommit}.tar.gz
%endif
%if %{without system_httplib}
%global hl_commit 4d7c9a788de136071ccf0dd4e96239151e2adadb
%global hl_scommit 4d7c9a7
#!RemoteAsset: sha256:8ecb7bbe844f9b4a1418b8a015d0f815d021d2c0d53291387122cb510c8783ef
Source5: https://github.com/yhirose/cpp-httplib/archive/%{hl_commit}/cpp-httplib-%{hl_scommit}.tar.gz
%endif
%if %{without system_kineto}
%global ki_commit 23b5bb5764b3dec988e25c52098407e508d84bb4
%global ki_scommit 23b5bb5
#!RemoteAsset: sha256:5b85352628319e22c48b589d2f423f3761479058f87a3ecc328818f16e4394c6
Source6: https://github.com/pytorch/kineto/archive/%{ki_commit}/kineto-%{ki_scommit}.tar.gz
%endif
%global mslk_commit 3d332d1c0c0ac7765852c97b3979c9ef913e037f
%global mslk_scommit 3d332d1
#!RemoteAsset: sha256:1944e67d1baeffef3bb8f89793ea06e0f05b88aac4d5cd89b4558a21aca6754b
Source7: https://github.com/meta-pytorch/MSLK/archive/%{mslk_commit}/MSLK-%{mslk_scommit}.tar.gz
# pytorch upstream issue #173707: libtorch_hip.so references the
# const_data_ptr / mutable_data_ptr / data_ptr template family with a
# different (non-SFINAE) mangling than libtorch_cpu.so exports.
# Appended to aten/src/ATen/core/Tensor.cpp in %prep when rocm is enabled.
Source8: pytorch-rocm-symbol-bridge.cpp
# Fix magma version encoding
# https://github.com/pytorch/pytorch/pull/180388
Patch0: 0001-pytorch-magma-2.10.0-version-encoding.patch
BuildRequires: cmake
BuildRequires: cmake(concurrentqueue)
BuildRequires: cmake(sleef)
BuildRequires: cpuinfo
# Although eigen3 enabled on openruyi, it cannot be detected during conf
# TODO: Fix this
BuildRequires: eigen3
BuildRequires: foxi-devel
BuildRequires: libomp-devel
BuildRequires: ninja
BuildRequires: pkgconfig(fmt)
BuildRequires: pkgconfig(nlohmann_json)
BuildRequires: pkgconfig(numa)
BuildRequires: pkgconfig(openblas64)
BuildRequires: pkgconfig(protobuf)
BuildRequires: pkgconfig(valgrind)
BuildRequires: pocketfft-devel
BuildRequires: pthreadpool-devel
BuildRequires: fp16-devel
BuildRequires: fxdiv-devel
BuildRequires: psimd-devel
BuildRequires: xnnpack-devel = 0+git20260211.312eb7e
BuildRequires: pkgconfig(python3)
BuildRequires: python3dist(filelock)
BuildRequires: python3dist(jinja2)
BuildRequires: python3dist(networkx)
BuildRequires: python3dist(numpy)
BuildRequires: python3dist(pip)
BuildRequires: python3dist(pybind11)
BuildRequires: python3dist(pyyaml)
BuildRequires: python3dist(setuptools)
BuildRequires: python3dist(sympy)
BuildRequires: python3dist(typing-extensions)
%if %{with system_httplib}
BuildRequires: cmake(httplib)
%endif
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: libstdc++-devel
BuildRequires: compiler-rt
BuildRequires: cmake(LLVM)
BuildRequires: lld
BuildRequires: cmake(ONNX)
BuildRequires: cmake(onnxruntime)
%if %{with mpi}
BuildRequires: openmpi-devel
%endif
%if %{with system_flatbuffers}
BuildRequires: pkgconfig(flatbuffers)
%endif
%if %{with rocm}
BuildRequires: cmake(hipblas)
BuildRequires: cmake(hipblaslt)
BuildRequires: cmake(hipcub)
BuildRequires: cmake(hipfft)
BuildRequires: cmake(hiprand)
BuildRequires: cmake(hipsparse)
BuildRequires: cmake(hipsparselt)
BuildRequires: cmake(hipsolver)
BuildRequires: cmake(miopen)
BuildRequires: cmake(rocblas)
BuildRequires: cmake(rocrand)
BuildRequires: cmake(rocfft)
BuildRequires: cmake(rccl)
BuildRequires: cmake(rocprim)
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(rocm-core)
BuildRequires: cmake(hip)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(rocsolver)
BuildRequires: cmake(rocm_smi)
BuildRequires: cmake(rocthrust)
BuildRequires: pkgconfig(magma)
BuildRequires: rocm-cmake
BuildRequires: rocm-llvm-macros
BuildRequires: roctracer-devel
%endif
Requires: python3dist(dill)
Requires: python3dist(pyyaml)
%if %{with rocm}
Requires: amdsmi
%endif
# As convention
Provides: pytorch = %{version}-%{release}
Provides: python3-%{srcname} = %{version}-%{release}
Provides: python3-%{srcname}%{?_isa} = %{version}-%{release}
%python_provide python3-%{srcname}
%description
PyTorch is a Python package that provides two high-level features:
* Tensor computation (like NumPy) with strong GPU acceleration
* Deep neural networks built on a tape-based autograd system
You can reuse your favorite Python packages such as NumPy, SciPy,
and Cython to extend PyTorch when needed.
%prep
%autosetup -p1 -n pytorch-%{version}
# GitHub release tarballs identify the version as an alpha, so replace that
echo "%{pypi_version}" > version.txt
# Remove bundled egg-info
rm -rf %{srcname}.egg-info
%if %{without system_flatbuffers}
tar xf %{SOURCE1}
rm -rf third_party/flatbuffers/*
cp -r flatbuffers-%{flatbuffers_version}/* third_party/flatbuffers/
%endif
%if %{without system_tensorpipe}
tar xf %{SOURCE2}
rm -rf third_party/tensorpipe/*
cp -r tensorpipe-*/* third_party/tensorpipe/
tar xf %{SOURCE3}
rm -rf third_party/tensorpipe/third_party/libuv/*
cp -r libuv-*/* third_party/tensorpipe/third_party/libuv/
tar xf %{SOURCE4}
rm -rf third_party/tensorpipe/third_party/libnop/*
cp -r libnop-*/* third_party/tensorpipe/third_party/libnop/
# gcc 15 include cstdint
sed -i '/#include <tensorpipe.*/a#include <cstdint>' third_party/tensorpipe/tensorpipe/common/allocator.h
sed -i '/#include <tensorpipe.*/a#include <cstdint>' third_party/tensorpipe/tensorpipe/common/memory.h
%endif
%if %{without system_httplib}
tar xf %{SOURCE5}
rm -rf third_party/cpp-httplib/*
cp -r cpp-httplib-*/* third_party/cpp-httplib/
%endif
%if %{without system_kineto}
tar xf %{SOURCE6}
rm -rf third_party/kineto/*
cp -r kineto-*/* third_party/kineto/
%endif
tar xf %{SOURCE7}
rm -rf third_party/mslk/*
cp -r MSLK-*/* third_party/mslk/
# Adjust for amd gpu targets currently supported
# only gfx1100 supported on openruyi
sed -i -e 's@"gfx90a", "gfx942",@@' aten/src/ATen/native/cuda/Blas.cpp
sed -i -e 's@"gfx1100", "gfx1101", "gfx1200", "gfx1201", "gfx908"@"gfx1100", "gfx1101",@' aten/src/ATen/native/cuda/Blas.cpp
sed -i -e 's@"gfx950", "gfx1150", "gfx1151"@@' aten/src/ATen/native/cuda/Blas.cpp
# Need to pip this
sed -i -e '/fsspec/d' setup.py
# Use system sympy
sed -i -e 's@sympy==1.13.1@sympy>=1.13.1@' setup.py
# A new dependency
# Connected to USE_FLASH_ATTENTION, since this is off, do not need it
sed -i -e '/aotriton.cmake/d' cmake/Dependencies.cmake
# Compress hip
sed -i -e 's@HIP_CLANG_FLAGS -fno-gpu-rdc@HIP_CLANG_FLAGS -fno-gpu-rdc --offload-compress@' cmake/Dependencies.cmake
# Silence noisy warning
sed -i -e 's@HIP_CLANG_FLAGS -fno-gpu-rdc@HIP_CLANG_FLAGS -fno-gpu-rdc -Wno-pass-failed@' cmake/Dependencies.cmake
sed -i -e 's@HIP_CLANG_FLAGS -fno-gpu-rdc@HIP_CLANG_FLAGS -fno-gpu-rdc -Wno-unused-command-line-argument@' cmake/Dependencies.cmake
sed -i -e 's@HIP_CLANG_FLAGS -fno-gpu-rdc@HIP_CLANG_FLAGS -fno-gpu-rdc -Wno-unused-result@' cmake/Dependencies.cmake
sed -i -e 's@HIP_CLANG_FLAGS -fno-gpu-rdc@HIP_CLANG_FLAGS -fno-gpu-rdc -Wno-deprecated-declarations@' cmake/Dependencies.cmake
# Fix: error: branch size exceeds simm16 (AMDGPUAsmBackend.cpp)
# -amdgpu-s-branch-bits=15(default is 16) and -amdgpu-long-branch-factor=2 are needed to avoid 'branch size exceed simm16' error
sed -i -e 's@HIP_CLANG_FLAGS -fno-gpu-rdc@HIP_CLANG_FLAGS -fno-gpu-rdc -mllvm --amdgpu-s-branch-bits=15@' cmake/Dependencies.cmake
sed -i -e 's@HIP_CLANG_FLAGS -fno-gpu-rdc@HIP_CLANG_FLAGS -fno-gpu-rdc -mllvm --amdgpu-long-branch-factor=2@' cmake/Dependencies.cmake
# Use parallel jobs for GPU offload compilation
sed -i -e 's@HIP_CLANG_FLAGS -fno-gpu-rdc@HIP_CLANG_FLAGS -fno-gpu-rdc --offload-jobs=8@' cmake/Dependencies.cmake
# Need to link with librocm_smi64 (intra_node_comm.cpp calls rsmi_init /
# rsmi_is_P2P_accessible). The target string is "hiprtc::hiprtc" — the previous
# pattern "hipzrtc::hiprtc" had a stray 'z' so the sed was a no-op and
# libtorch_hip.so ended up with an undefined rsmi_init symbol.
sed -i -e 's@hiprtc::hiprtc@hiprtc::hiprtc rocm_smi64@' cmake/Dependencies.cmake
# No third_party fmt, use system
sed -i -e 's@fmt::fmt-header-only@fmt@' CMakeLists.txt
sed -i -e 's@fmt::fmt-header-only@fmt@' aten/src/ATen/CMakeLists.txt
sed -i -e 's@list(APPEND ATen_HIP_INCLUDE $<TARGET_PROPERTY:fmt,INTERFACE_INCLUDE_DIRECTORIES>)@@' aten/src/ATen/CMakeLists.txt
sed -i -e 's@fmt::fmt-header-only@fmt@' third_party/kineto/libkineto/CMakeLists.txt
sed -i -e 's@fmt::fmt-header-only@fmt@' c10/CMakeLists.txt
sed -i -e 's@fmt::fmt-header-only@fmt@' torch/CMakeLists.txt
sed -i -e 's@fmt::fmt-header-only@fmt@' cmake/Dependencies.cmake
sed -i -e 's@fmt::fmt-header-only@fmt@' caffe2/CMakeLists.txt
sed -i -e 's@add_subdirectory(${PROJECT_SOURCE_DIR}/third_party/fmt)@#add_subdirectory(${PROJECT_SOURCE_DIR}/third_party/fmt)@' cmake/Dependencies.cmake
sed -i -e 's@set_target_properties(fmt-header-only PROPERTIES INTERFACE_COMPILE_FEATURES "")@#set_target_properties(fmt-header-only PROPERTIES INTERFACE_COMPILE_FEATURES "")@' cmake/Dependencies.cmake
sed -i -e 's@list(APPEND Caffe2_DEPENDENCY_LIBS fmt::fmt-header-only)@#list(APPEND Caffe2_DEPENDENCY_LIBS fmt::fmt-header-only)@' cmake/Dependencies.cmake
# No third_party FXdiv
sed -i -e 's@if(NOT TARGET fxdiv)@if(MSVC AND USE_XNNPACK)@' caffe2/CMakeLists.txt
sed -i -e 's@TARGET_LINK_LIBRARIES(torch_cpu PRIVATE fxdiv)@#TARGET_LINK_LIBRARIES(torch_cpu PRIVATE fxdiv)@' caffe2/CMakeLists.txt
# https://github.com/pytorch/pytorch/issues/149803
# Tries to checkout nccl
sed -i -e 's@ checkout_nccl()@ True@' tools/build_pytorch_libs.py
# Disable the use of check_submodule's in the setup.py, we are a tarball, not a git repo
sed -i -e 's@check_submodules()$@#check_submodules()@' setup.py
# Release comes fully loaded with third party src
# Remove what we can
#
# For 2.1 this is all but miniz-2.1.0
# Instead of building as a library, caffe2 reaches into
# the third_party dir to compile the file.
# mimiz is licensed MIT
# https://github.com/richgel999/miniz/blob/master/LICENSE
mv third_party/miniz-%{miniz_version} .
#
# setup.py depends on this script
mv third_party/build_bundled.py .
%if %{without system_flatbuffers}
# Need the just untarred flatbuffers/flatbuffers.h
mv third_party/flatbuffers .
%endif
%if %{without system_tensorpipe}
mv third_party/tensorpipe .
%endif
%if %{without system_httplib}
mv third_party/cpp-httplib .
%endif
%if %{without system_kineto}
mv third_party/kineto .
%endif
mv third_party/mslk .
# Remove everything
rm -rf third_party/*
# Put stuff back
mv build_bundled.py third_party
mv miniz-%{miniz_version} third_party
%if %{without system_flatbuffers}
mv flatbuffers third_party
%endif
%if %{without system_tensorpipe}
mv tensorpipe third_party
%endif
%if %{without system_httplib}
mv cpp-httplib third_party
%endif
%if %{without system_kineto}
mv kineto third_party
%endif
mv mslk third_party
# Fake out pocketfft, and system header will be used
mkdir third_party/pocketfft
cp /usr/include/pocketfft_hdronly.h third_party/pocketfft/
# Use the system valgrind headers
mkdir third_party/valgrind-headers
cp %{_includedir}/valgrind/* third_party/valgrind-headers
# Fix installing to /usr/lib64
sed -i -e 's@DESTINATION ${PYTHON_LIB_REL_PATH}@DESTINATION ${CMAKE_INSTALL_PREFIX}/${PYTHON_LIB_REL_PATH}@' caffe2/CMakeLists.txt
# reenable foxi linking
sed -i -e 's@list(APPEND Caffe2_DEPENDENCY_LIBS foxi_loader)@#list(APPEND Caffe2_DEPENDENCY_LIBS foxi_loader)@' cmake/Dependencies.cmake
%if %{without system_tensorpipe}
# cmake version changed
sed -i -e 's@cmake_minimum_required(VERSION 3.4)@cmake_minimum_required(VERSION 3.5)@' third_party/tensorpipe/third_party/libuv/CMakeLists.txt
sed -i -e 's@cmake_minimum_required(VERSION 3.4)@cmake_minimum_required(VERSION 3.5)@' libuv*/CMakeLists.txt
%endif
%if %{with rocm}
# Fix: hipOccupancyMaxActiveBlocksPerMultiprocessor is overloaded in new ROCm,
# force using hipModuleOccupancyMaxActiveBlocksPerMultiprocessor
sed -i -e 's/TORCH_HIP_VERSION < 305/TORCH_HIP_VERSION < 305 \&\& TORCH_HIP_VERSION > 0/' \
aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h
# pytorch upstream issue #173707 (gemm/bgemm variant):
# clang 21 mangles the instantiation-dependent SFINAE non-type template parameter
# typename std::enable_if<...,Dtype>::type* = nullptr
# of at::cuda::blas::gemm/bgemm differently at an explicit specialization (the
# definition, Tn...enable_if form) than at a deduced call site (the reference,
# ...IffLPf0E... form), so libtorch_hip.so fails to dlopen with e.g.
# undefined symbol: _ZN2at4cuda4blas4gemmIffLPf0EEEvcclllNS_10OpMathTypeIT_E4typeEPKS5_lS9_lS7_PT0_l
# Every real dtype is provided by an explicit specialization, so the SFINAE guard
# is redundant: drop it so the two overloads collapse to one primary template and
# clang emits a single consistent mangling everywhere. Must run before hipify.
sed -i \
-e 's/, typename std::enable_if<!CUDABLAS_GEMM_DTYPE_IS_FLOAT_TYPE_AND_C_DTYPE_IS_FLOAT, Dtype>::type\* = nullptr>/>/g' \
-e 's/, typename std::enable_if<CUDABLAS_GEMM_DTYPE_IS_FLOAT_TYPE_AND_C_DTYPE_IS_FLOAT, Dtype>::type\* = nullptr>/>/g' \
aten/src/ATen/cuda/CUDABlas.h
# hipify
./tools/amd_build/build_amd.py
# use any hip, correct CMAKE_MODULE_PATH
sed -i -e 's@lib/cmake/hip@lib64/cmake/hip@' cmake/public/LoadHIP.cmake
sed -i -e 's@HIP 1.0@HIP MODULE@' cmake/public/LoadHIP.cmake
# silence an assert
# sed -i -e '/qvalue = std::clamp(qvalue, qmin, qmax);/d' aten/src/ATen/native/cuda/IndexKernel.cu
# Append ROCm symbol bridge — see Source8 header for full context.
# Without this, libtorch_hip.so dlopen fails on:
# undefined symbol: _ZNK2at10TensorBase14const_data_ptrI*Li0EEEPK*v
cat %{SOURCE8} >> aten/src/ATen/core/Tensor.cpp
%endif
# moodycamel include path needs adjusting to use the system's
sed -i -e 's@${PROJECT_SOURCE_DIR}/third_party/concurrentqueue@/usr/include/concurrentqueue@' cmake/Dependencies.cmake
%build
# Control the number of jobs
# The build can fail if too many threads exceed the physical memory
# Run at least one thread, more if CPU & memory resources are available.
COMPILE_JOBS=`nproc`
if [ ${COMPILE_JOBS}x = x ]; then
COMPILE_JOBS=1
fi
# Take into account memory usage per core, do not thrash real memory
# TraceType/VariableType files can consume 4GB+ per compilation unit
# Use a more conservative estimate: 4GB per job for safety
BUILD_MEM=4
MEM_KB=0
MEM_KB=`cat /proc/meminfo | grep MemTotal | awk '{ print $2 }'`
MEM_MB=`eval "expr ${MEM_KB} / 1024"`
MEM_GB=`eval "expr ${MEM_MB} / 1024"`
COMPILE_JOBS_MEM=`eval "expr 1 + ${MEM_GB} / ${BUILD_MEM}"`
if [ "$COMPILE_JOBS_MEM" -lt "$COMPILE_JOBS" ]; then
COMPILE_JOBS=$COMPILE_JOBS_MEM
fi
# Ensure at least 2 jobs to avoid single-threading the large files
if [ "$COMPILE_JOBS" -lt 2 ]; then
COMPILE_JOBS=2
fi
export MAX_JOBS=$COMPILE_JOBS
# For verbose cmake output
# export VERBOSE=ON
# For verbose linking
# export CMAKE_SHARED_LINKER_FLAGS=-Wl,--verbose
# Manually set this hardening flag
export CMAKE_EXE_LINKER_FLAGS=-pie
export BUILD_CUSTOM_PROTOBUF=OFF
export BUILD_NVFUSER=OFF
export BUILD_SHARED_LIBS=ON
export BUILD_TEST=OFF
# Use Release instead of RelWithDebInfo to reduce compile time and memory
# for huge generated files like TraceType/VariableType (saves ~30% compile time)
export CMAKE_BUILD_TYPE=Release
export CMAKE_FIND_PACKAGE_PREFER_CONFIG=ON
export CAFFE2_LINK_LOCAL_PROTOBUF=OFF
export INTERN_BUILD_MOBILE=OFF
export USE_DISTRIBUTED=OFF
export USE_CUDA=OFF
export USE_FAKELOWP=OFF
export USE_FBGEMM=OFF
export USE_FLASH_ATTENTION=OFF
export USE_GLOO=OFF
export USE_ITT=OFF
export USE_KINETO=OFF
export USE_KLEIDIAI=OFF
export USE_LITE_INTERPRETER_PROFILER=OFF
export USE_LITE_PROTO=OFF
export USE_MAGMA=OFF
export USE_MEM_EFF_ATTENTION=OFF
export USE_MKLDNN=OFF
export USE_MPI=OFF
export USE_MSLK=OFF
export USE_NCCL=OFF
export USE_NNPACK=OFF
export USE_NUMPY=ON
export USE_OPENMP=ON
export USE_PYTORCH_QNNPACK=OFF
export USE_ROCM=OFF
export USE_SYSTEM_SLEEF=ON
export USE_SYSTEM_EIGEN_INSTALL=ON
export USE_SYSTEM_ONNX=ON
export USE_SYSTEM_PYBIND11=ON
export USE_SYSTEM_LIBS=OFF
export USE_SYSTEM_NCCL=OFF
export USE_XNNPACK=OFF
export USE_XPU=OFF
export USE_SYSTEM_PTHREADPOOL=ON
export USE_SYSTEM_CPUINFO=ON
export USE_SYSTEM_FP16=ON
export USE_SYSTEM_FXDIV=ON
export USE_SYSTEM_PSIMD=ON
export USE_SYSTEM_XNNPACK=OFF
export USE_DISTRIBUTED=ON
export USE_TENSORPIPE=ON
%if %{without system_tensorpipe}
export TP_BUILD_LIBUV=OFF
%endif
%if %{with mpi}
export USE_MPI=ON
%endif
%if %{with rocm}
export USE_ROCM=ON
export USE_ROCM_CK_SDPA=OFF
export USE_ROCM_CK_GEMM=OFF
export USE_FBGEMM_GENAI=OFF
export USE_MAGMA=ON
export HIP_PATH=`hipconfig -p`
export ROCM_PATH=`hipconfig -R`
# pytorch uses clang, not hipcc
export HIP_CLANG_PATH=%{rocmllvm_bindir}
export PYTORCH_ROCM_ARCH=%{rocm_gpu_list_default}
export CMAKE_NO_SYSTEM_FROM_IMPORTED=ON
# export CMAKE_BUILD_TYPE=Debug
%endif
export CMAKE_CXX_IMPLICIT_INCLUDE_DIRECTORIES="/usr/include"
export CMAKE_C_IMPLICIT_INCLUDE_DIRECTORIES="/usr/include"
export LDFLAGS="-fuse-ld=lld %{?__global_ldflags}"
export CMAKE_LIBRARY_PATH=/usr/lib64
export CMAKE_PREFIX_PATH="/usr:/usr/lib64/cmake:/usr/lib/python3.13/site-packages"
%pyproject_wheel
%install
%if %{with rocm}
export USE_ROCM=ON
export USE_ROCM_CK=OFF
export HIP_PATH=`hipconfig -p`
export ROCM_PATH=`hipconfig -R`
# pytorch uses clang, not hipcc
export HIP_CLANG_PATH=%{rocmllvm_bindir}
export PYTORCH_ROCM_ARCH=%{rocm_gpu_list_default}
%endif
%pyproject_install
%pyproject_save_files '*torch*'
%check
# Not working yet
%files
%license LICENSE
%doc README.md
%{_bindir}/torchrun
%{python3_sitearch}/%{srcname}*
%{python3_sitearch}/functorch
%changelog
%{?autochangelog}
@@ -0,0 +1,183 @@
// === openRuyi ROCm symbol bridge for pytorch 2.11 / clang 21 ===
//
// Appended to aten/src/ATen/core/Tensor.cpp by python-torch.spec.
//
// Background:
// libtorch_hip.so references the TensorBase data-pointer template family
// (const_data_ptr / mutable_data_ptr / data_ptr) using a NON-SFINAE
// mangling form (...Li0EEE... — only the non-type template parameter
// value is encoded). The explicit specialisations emitted from
// TensorMethods.cpp into libtorch_cpu.so may or may not be mangled
// the same way depending on clang's handling of SFINAE NTTPs in this
// specific clang/ROCm/arch combination (we have observed both: cpu
// exporting the Li0E form, and cpu only exporting the Tn...enable_if
// form). No link-time error in the latter case thanks to lld's
// --allow-shlib-undefined → runtime dlopen reports
// "undefined symbol: _ZNK2at10TensorBase14const_data_ptrI*Li0EEEPK*v"
// when libtorch_hip.so is loaded.
//
// Reference: https://github.com/pytorch/pytorch/issues/173707
// (closed as not planned; pytorch treats this as a clang/ROCm gap)
//
// Bridge strategy:
// Provide every plausibly-missing mangling as a weak free function
// linked into libtorch_cpu.so with default visibility. Where the cpp
// specialisation already provides the same mangled name as a strong
// symbol, the linker discards the bridge weak symbol and uses the
// cpp version (which preserves the runtime check_type call). Where
// the cpp does NOT emit the same mangling, the bridge weak symbol
// fills the gap so libtorch_hip.so dlopen resolves. Each bridge body
// delegates to the non-templated public accessor on TensorBase,
// which returns the raw underlying data pointer.
//
// Semantics note:
// The non-templated TensorBase::const_data_ptr() / mutable_data_ptr() /
// data_ptr() skip the scalar-type runtime check that the templated
// specialisations perform. In practice this only matters for HIP code
// paths that already dispatched on dtype before reaching this call —
// which is the common case in ATen kernels.
//
// gemm stubs from the upstream issue are intentionally NOT included:
// `nm -DC --undefined-only libtorch_hip.so` on this build shows no
// undefined at::cuda::blas::gemm symbols, and the upstream stubs have
// empty bodies (silent functional failure if ever called).
#include <ATen/core/TensorBase.h>
#include <c10/util/BFloat16.h>
#include <c10/util/Float8_e4m3fn.h>
#include <c10/util/Float8_e4m3fnuz.h>
#include <c10/util/Float8_e5m2.h>
#include <c10/util/Float8_e5m2fnuz.h>
#include <c10/util/Float8_e8m0fnu.h>
#include <c10/util/Half.h>
#include <c10/util/complex.h>
#include <c10/util/qint8.h>
#include <c10/util/qint32.h>
#include <c10/util/quint8.h>
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wmissing-prototypes"
#pragma GCC visibility push(default)
extern "C" {
// `weak` is essential: the cpp's TensorMethods.cpp specialisation emits
// the same mangled name as a strong global on some build configurations.
// Without `weak` the link step fails with a duplicate-symbol error.
#define BRIDGE_READ(MangledName) \
__attribute__((weak, visibility("default"))) \
const void* MangledName(const at::TensorBase* t) { return t->const_data_ptr(); }
#define BRIDGE_WRITE(MangledName) \
__attribute__((weak, visibility("default"))) \
void* MangledName(const at::TensorBase* t) { return t->mutable_data_ptr(); }
// ---- const_data_ptr<T, 0> (non-const T, plain Li0E form) ----
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIaLi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIbLi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIdLi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIfLi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIhLi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIiLi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIjLi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIlLi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrImLi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIsLi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrItLi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIN3c104HalfELi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIN3c108BFloat16ELi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIN3c107complexIdEELi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIN3c107complexIfEELi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIN3c107complexINS2_4HalfEEELi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIN3c1011Float8_e5m2ELi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIN3c1013Float8_e4m3fnELi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIN3c1014Float8_e8m0fnuELi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIN3c1015Float8_e4m3fnuzELi0EEEPKT_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIN3c1015Float8_e5m2fnuzELi0EEEPKT_v)
// ---- const_data_ptr<KT, 0> (const-qualified T, plain Li0E form) ----
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKaLi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKbLi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKdLi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKfLi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKhLi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKiLi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKjLi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKlLi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKmLi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKsLi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKtLi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKN3c104HalfELi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKN3c108BFloat16ELi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKN3c107complexIdEELi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKN3c107complexIfEELi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKN3c107complexINS2_4HalfEEELi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKN3c105qint8ELi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKN3c106qint32ELi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKN3c106quint8ELi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKN3c1011Float8_e5m2ELi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKN3c1013Float8_e4m3fnELi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKN3c1014Float8_e8m0fnuELi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKN3c1015Float8_e4m3fnuzELi0EEEPKNSt12remove_constIT_E4typeEv)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIKN3c1015Float8_e5m2fnuzELi0EEEPKNSt12remove_constIT_E4typeEv)
// ---- const_data_ptr<T, Tn enable_if<!is_const_v<T>>... 0> (SFINAE form, non-const T) ----
// libtorch_hip.so emits these for a few primitives even when most TUs use the Li0E form.
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIdTnNSt9enable_ifIXntsr3stdE10is_const_vIT_EEiE4typeELi0EEEPKS3_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIfTnNSt9enable_ifIXntsr3stdE10is_const_vIT_EEiE4typeELi0EEEPKS3_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIiTnNSt9enable_ifIXntsr3stdE10is_const_vIT_EEiE4typeELi0EEEPKS3_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIlTnNSt9enable_ifIXntsr3stdE10is_const_vIT_EEiE4typeELi0EEEPKS3_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIN3c104HalfETnNSt9enable_ifIXntsr3stdE10is_const_vIT_EEiE4typeELi0EEEPKS5_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIN3c108BFloat16ETnNSt9enable_ifIXntsr3stdE10is_const_vIT_EEiE4typeELi0EEEPKS5_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIN3c107complexIdEETnNSt9enable_ifIXntsr3stdE10is_const_vIT_EEiE4typeELi0EEEPKS6_v)
BRIDGE_READ(_ZNK2at10TensorBase14const_data_ptrIN3c107complexIfEETnNSt9enable_ifIXntsr3stdE10is_const_vIT_EEiE4typeELi0EEEPKS6_v)
// ---- mutable_data_ptr<T> ----
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIaEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIbEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIdEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIfEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIhEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIiEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIjEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIlEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrImEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIsEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrItEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIN3c104HalfEEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIN3c108BFloat16EEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIN3c107complexIdEEEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIN3c107complexIfEEEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIN3c107complexINS2_4HalfEEEEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIN3c1011Float8_e5m2EEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIN3c1013Float8_e4m3fnEEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIN3c1014Float8_e8m0fnuEEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIN3c1015Float8_e4m3fnuzEEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIN3c1015Float8_e5m2fnuzEEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIN3c105qint8EEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIN3c106qint32EEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase16mutable_data_ptrIN3c106quint8EEEPT_v)
// ---- data_ptr<T> (legacy mutable accessor) ----
BRIDGE_WRITE(_ZNK2at10TensorBase8data_ptrIaEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase8data_ptrIbEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase8data_ptrIdEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase8data_ptrIfEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase8data_ptrIhEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase8data_ptrIiEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase8data_ptrIlEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase8data_ptrIsEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase8data_ptrIN3c104HalfEEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase8data_ptrIN3c108BFloat16EEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase8data_ptrIN3c107complexIdEEEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase8data_ptrIN3c107complexIfEEEEPT_v)
BRIDGE_WRITE(_ZNK2at10TensorBase8data_ptrIN3c107complexINS2_4HalfEEEEEPT_v)
#undef BRIDGE_READ
#undef BRIDGE_WRITE
} // extern "C"
#pragma GCC visibility pop
#pragma GCC diagnostic pop
// === openRuyi ROCm symbol bridge end ===
@@ -0,0 +1,44 @@
diff --git a/setup.py b/setup.py
index fe4a78d..788dc1c 100644
--- a/setup.py
+++ b/setup.py
@@ -463,7 +463,14 @@
"-DCMAKE_EXPORT_COMPILE_COMMANDS=ON", "-DLLVM_ENABLE_WERROR=ON",
"-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=" + extdir, "-DTRITON_BUILD_PYTHON_MODULE=ON",
"-DPython3_EXECUTABLE:FILEPATH=" + sys.executable, "-DPython3_INCLUDE_DIR=" + python_include_dir,
- "-DTRITON_CODEGEN_BACKENDS=" + ';'.join([b.name for b in backends if not b.is_external]),
+ # openRuyi: core Triton hard-depends on the NVGPU/NVWS dialects that
+ # live under third_party/nvidia (TritonGPUTransforms and
+ # TritonInstrumentToLLVM include their TableGen output and link
+ # NVGPUIR/NVWSIR), so the nvidia backend must stay in the CMake
+ # build even though its Python side is not packaged (see the
+ # `backends` list below).
+ "-DTRITON_CODEGEN_BACKENDS=" +
+ ';'.join([b.name for b in backends if not b.is_external] + ["nvidia"]),
"-DTRITON_PLUGIN_DIRS=" + ';'.join([b.src_dir for b in backends if b.is_external]),
"-DTRITON_WHEEL_DIR=" + wheeldir
]
@@ -534,6 +541,10 @@
def download_and_copy_dependencies():
+ # openRuyi: this package ships only the AMD/ROCm backend, so the NVIDIA
+ # CUDA toolchain (ptxas, cuobjdump, ...) is neither needed nor downloaded.
+ # Skipping this also keeps the build fully offline for the OBS sandbox.
+ return
nvidia_version_path = os.path.join(get_base_dir(), "cmake", "nvidia-toolchain-version.json")
with open(nvidia_version_path, "r") as nvidia_version_file:
# parse this json file to get the version of the nvidia toolchain
@@ -619,7 +630,11 @@
)
-backends = [*BackendInstaller.copy(["nvidia", "amd"]), *BackendInstaller.copy_externals()]
+# openRuyi: ship the AMD/ROCm backend only. The NVIDIA C++ libraries are
+# still compiled into libtriton (core requires them; see the cmake_args note
+# above), but the NVIDIA Python backend -- which would bundle ptxas and the
+# proprietary libdevice.10.bc -- is intentionally not packaged.
+backends = [*BackendInstaller.copy(["amd"]), *BackendInstaller.copy_externals()]
def get_package_dirs():
@@ -0,0 +1,21 @@
diff --git a/CMakeLists.txt b/CMakeLists.txt
index c9620e3..6c1fbb0 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -262,6 +262,16 @@ if(TRITON_BUILD_PYTHON_MODULE)
LLVMPowerPCAsmParser
LLVMPowerPCCodeGen
)
+ elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "riscv64")
+ # There is no LLVM RISC-V GPU target; Triton never emits host code on
+ # riscv64. However llvm::InitializeAllTargets() (referenced from
+ # llvm.cc) pulls in the X86 codegen symbols, so link them here to avoid
+ # an "undefined symbol: LLVMInitializeX86Target" failure at import time.
+ # The matching bundled LLVM must therefore be built with the X86 target.
+ list(APPEND TRITON_LIBRARIES
+ LLVMX86CodeGen
+ LLVMX86AsmParser
+ )
else()
message(FATAL_ERROR "LLVM codegen/ASM parser libs: This HW architecture (${CMAKE_SYSTEM_PROCESSOR}) is not configured in cmake lib dependencies.")
endif()
@@ -0,0 +1,25 @@
diff --git a/setup.py b/setup.py
index fe4a78d..400c34f 100644
--- a/setup.py
+++ b/setup.py
@@ -424,10 +424,19 @@ class CMakeBuild(build_ext):
def get_pybind11_cmake_args(self):
pybind11_sys_path = get_env_with_keys(["PYBIND11_SYSPATH"])
if pybind11_sys_path:
+ # openRuyi: distro pybind11 packages install the headers and the
+ # CMake config under a filesystem prefix (include/pybind11 and
+ # share/cmake/pybind11 below /usr), not inside the Python package
+ # the way pip wheels do, so pybind11.get_cmake_dir() raises
+ # ImportError ("pybind11 not installed"). When PYBIND11_SYSPATH
+ # is given, derive the CMake dir from it as well instead of only
+ # the include dir.
pybind11_include_dir = os.path.join(pybind11_sys_path, "include")
+ pybind11_cmake_dir = os.path.join(pybind11_sys_path, "share", "cmake", "pybind11")
else:
pybind11_include_dir = pybind11.get_include()
- return [f"-Dpybind11_INCLUDE_DIR='{pybind11_include_dir}'", f"-Dpybind11_DIR='{pybind11.get_cmake_dir()}'"]
+ pybind11_cmake_dir = pybind11.get_cmake_dir()
+ return [f"-Dpybind11_INCLUDE_DIR='{pybind11_include_dir}'", f"-Dpybind11_DIR='{pybind11_cmake_dir}'"]
def get_proton_cmake_args(self):
cmake_args = get_thirdparty_packages([get_json_package_info()])
+210
View File
@@ -0,0 +1,210 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
#
# SPDX-License-Identifier: MulanPSL-2.0
#
# Originally extracted from Fedora Project
# Authors: The Fedora Project Contributors
# riscv64 build hints contributed by the openRuyi AI working group.
%global srcname triton
# Triton pins an exact, in-development LLVM *commit* (not a release version).
# It calls unstable MLIR/LLVM C++ internals, so it only builds against that one
# revision; no released distro LLVM (nor ROCm's bundled LLVM) matches it, and it
# additionally needs MLIR and LLD. We therefore build LLVM from source at the
# pinned commit and link it statically into the Triton extension, exactly like
# upstream's CI does.
#
# !!! WHEN BUMPING %%{version} !!!
# Triton and LLVM must move together. Set %%{llvm_commit} to the value of
# cmake/llvm-hash.txt for the new Triton tag and refresh Source1's sha256. A
# mismatched LLVM will fail to compile or crash at runtime.
%global llvm_commit f6ded0be897e2878612dd903f7e8bb85448269e5
# Build everything (the bundled LLVM and the Triton extension) with clang,
# matching the rest of the openRuyi ROCm stack.
%global toolchain clang
# The bundled static LLVM is large; drop LTO and skip the dwz pass which can
# exhaust memory on the giant libtriton.so.
%global _lto_cflags %{nil}
%define _find_debuginfo_dwz_opts %{nil}
Name: python-%{srcname}
Version: 3.6.0
Release: %autorelease
Summary: A language and compiler for custom Deep Learning operations
# Triton itself is MIT. The statically bundled LLVM/MLIR/LLD is
# "Apache-2.0 WITH LLVM-exception OR NCSA"; pybind11 headers are BSD-3-Clause.
License: MIT AND (Apache-2.0 WITH LLVM-exception OR NCSA) AND BSD-3-Clause
URL: https://github.com/triton-lang/triton
# Triton's PyPI sdist does not ship the C++ / third_party sources needed to
# build, so the source is taken from the GitHub release tag instead.
#!RemoteAsset: sha256:be270ed11ca5a8fbd9d7941c5bbe9a23a9f6e2ffd372c8398346928bee464774
Source0: %{url}/archive/refs/tags/v%{version}.tar.gz#/%{srcname}-%{version}.tar.gz
# NOTE: codeload generates llvm-project's commit archive on the fly; the
# github.com/.../archive redirect to it times out behind the build proxy, so
# point straight at codeload (identical bytes, same sha256).
#!RemoteAsset: sha256:f63c624aa63eda73508b9df2be2a6945ea4fddbee58615fbe1cd747b6884dd5e
Source1: https://github.com/llvm/llvm-project/archive/%{llvm_commit}.tar.gz
# Ship only the AMD/ROCm Python backend and never reach out to the network
# for the NVIDIA CUDA toolchain (ptxas, libdevice, ...). The NVIDIA C++
# libraries are still compiled into libtriton: Triton core hard-depends on
# the NVGPU/NVWS dialects living under third_party/nvidia.
Patch0: 0001-Ship-only-the-AMD-ROCm-backend-offline.patch
# Link the X86 codegen libraries on the riscv64 host so that
# llvm::InitializeAllTargets() resolves at import time.
Patch1: 0002-Add-riscv64-host-codegen-libraries.patch
# pybind11.get_cmake_dir() only knows the pip-wheel layout and raises
# ImportError with a distro python3-pybind11, so let PYBIND11_SYSPATH (set in
# %%build) supply the CMake dir as well. The unconditional call came in with
# https://github.com/triton-lang/triton/pull/4450
Patch2: 0003-Use-PYBIND11_SYSPATH-for-the-pybind11-CMake-dir-too.patch
BuildSystem: pyproject
BuildOption(install): %{srcname}
# --- Python build backend --------------------------------------------------
BuildRequires: pyproject-rpm-macros
BuildRequires: pkgconfig(python3)
BuildRequires: python3dist(pip)
BuildRequires: python3dist(setuptools)
BuildRequires: python3dist(wheel)
BuildRequires: python3dist(pybind11)
# Supplies %%{_includedir}/pybind11 and %%{_datadir}/cmake/pybind11, which
# PYBIND11_SYSPATH points the build at (see Patch2).
BuildRequires: pkgconfig(pybind11)
# --- Toolchain for the bundled LLVM and the Triton extension ---------------
BuildRequires: clang
BuildRequires: lld
BuildRequires: libstdc++-devel
BuildRequires: compiler-rt
BuildRequires: cmake
BuildRequires: ninja
# --- Libraries the bundled LLVM links against ------------------------------
BuildRequires: pkgconfig(libffi)
BuildRequires: pkgconfig(libxml-2.0)
BuildRequires: pkgconfig(zlib)
BuildRequires: pkgconfig(libzstd)
# --- Runtime ROCm stack ----------------------------------------------------
# Triton JIT-compiles kernels at runtime: the GPU path uses the statically
# linked LLD + AMDGPU code generator, but the per-kernel CPU launcher shim is
# compiled on the fly (triton/runtime/build.py) with a host C compiler against
# the Python headers, and the HIP runtime is dlopen'd.
Requires: gcc
Requires: pkgconfig(python3)
Requires: cmake(hip)
Requires: rocm-device-libs
Provides: python3-%{srcname} = %{version}-%{release}
Provides: python3-%{srcname}%{?_isa} = %{version}-%{release}
%python_provide python3-%{srcname}
%description
Triton is a language and compiler for writing highly efficient custom
Deep-Learning primitives. The aim of Triton is to provide an open-source
environment to write fast code at higher productivity than CUDA, but also
with higher flexibility than other existing DSLs.
This build ships the AMD ROCm (HIP) backend.
%prep -a
# Unpack the pinned LLVM next to the Triton tree (built in %%build).
tar -xf %{SOURCE1}
# Drop any pre-generated metadata shipped in the tarball.
rm -rf %{srcname}.egg-info
# Triton's CMake turns warnings into errors; a from-source LLVM occasionally
# emits new warnings, so relax this for both Triton and the embedded
# add_llvm/add_mlir targets.
sed -i -e 's@ -Werror @ @' CMakeLists.txt
# The wheel is built with --no-build-isolation, so cmake/ninja/pybind11 are
# supplied as system BuildRequires. Strip them from build-system.requires so
# %%pyproject_buildrequires does not emit unsatisfiable python3dist(cmake<4),
# python3dist(ninja) dependencies.
sed -i -e 's@^requires = .*@requires = ["setuptools>=40.8.0", "wheel"]@' pyproject.toml
%generate_buildrequires
%pyproject_buildrequires
# Build the pinned LLVM+MLIR+LLD first, then let the pyproject build system
# compile the Triton wheel against it. Both run in the same shell, so the
# environment exported here reaches %%pyproject_wheel.
%build -p
llvm_src="$(pwd)/llvm-project-%{llvm_commit}"
llvm_install="$(pwd)/llvm-install"
# Cap parallelism by available memory: LLVM/MLIR compile units and the final
# Triton link are memory hungry and will thrash or OOM otherwise.
mem_gb=$(awk '/MemTotal/ {print int($2/1024/1024)}' /proc/meminfo)
compile_jobs=$(nproc)
mem_jobs=$(( 1 + mem_gb / 2 ))
[ "$mem_jobs" -lt "$compile_jobs" ] && compile_jobs=$mem_jobs
[ "$compile_jobs" -lt 1 ] && compile_jobs=1
# Linking the static archives needs far more memory per job.
link_jobs=$(( 1 + mem_gb / 16 ))
[ "$link_jobs" -lt 1 ] && link_jobs=1
%ifarch x86_64
llvm_targets="X86;AMDGPU;NVPTX"
%endif
%ifarch riscv64
# X86 is required by the riscv64 codegen-libs patch; AMDGPU drives the ROCm
# backend; NVPTX is always linked by Triton's core; RISCV is the host.
llvm_targets="RISCV;X86;AMDGPU;NVPTX"
%endif
cmake -S "$llvm_src/llvm" -B "$llvm_src/build" -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX="$llvm_install" \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DLLVM_USE_LINKER=lld \
-DLLVM_ENABLE_PROJECTS="mlir;lld" \
-DLLVM_TARGETS_TO_BUILD="$llvm_targets" \
-DLLVM_ENABLE_ASSERTIONS=OFF \
-DBUILD_SHARED_LIBS=OFF \
-DLLVM_BUILD_LLVM_DYLIB=OFF \
-DLLVM_INSTALL_UTILS=ON \
-DLLVM_ENABLE_TERMINFO=OFF \
-DLLVM_ENABLE_ZSTD=ON \
-DLLVM_INCLUDE_BENCHMARKS=OFF \
-DLLVM_INCLUDE_EXAMPLES=OFF \
-DLLVM_INCLUDE_TESTS=OFF \
-DLLVM_PARALLEL_COMPILE_JOBS=$compile_jobs \
-DLLVM_PARALLEL_LINK_JOBS=$link_jobs
cmake --build "$llvm_src/build" --target install -- -j$compile_jobs
# Point Triton at the freshly built LLVM and keep the build offline + ROCm-only.
export LLVM_SYSPATH="$llvm_install"
export PATH="$llvm_install/bin:$PATH"
# System pybind11 from pybind11-devel: headers and CMake config under
# %%{_prefix} (see Patch2).
export PYBIND11_SYSPATH=%{_prefix}
export CC=clang
export CXX=clang++
export MAX_JOBS=$compile_jobs
export TRITON_PARALLEL_LINK_JOBS=$link_jobs
export TRITON_BUILD_WITH_CLANG_LLD=ON
export TRITON_BUILD_WITH_CCACHE=OFF
# Proton needs CUPTI/roctracer/json; not needed for a plain ROCm backend.
export TRITON_BUILD_PROTON=OFF
# Don't fetch googletest, and don't trip over new LLVM warnings.
export TRITON_APPEND_CMAKE_ARGS="-DTRITON_BUILD_UT=OFF -DLLVM_ENABLE_WERROR=OFF"
%files -f %{pyproject_files}
# Triton's own LICENSE is captured from the wheel metadata by
# %%pyproject_save_files; only the bundled LLVM license must be added by hand.
%license llvm-project-%{llvm_commit}/llvm/LICENSE.TXT
%doc README.md
%changelog
%autochangelog
+133
View File
@@ -0,0 +1,133 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%global rocm_release 7.1
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
# rocm stack builds with clang
%global toolchain clang
Name: rccl
Version: %{rocm_version}
Release: %autorelease
Summary: ROCm Communication Collectives Library
License: BSD-3-Clause AND MIT AND Apache-2.0
# From License.txt the main license is BSD 3
# Modifications from Microsoft is MIT
# The NVIDIA based header files below are Apache-2.0
# src/include/nvtx3/nv*.h and similar
# The URL for NVIDIA in the License.txt https://github.com/NVIDIA/NVTX is Apache-2.0
Url: https://github.com/ROCm/rccl
#!RemoteAsset: sha256:eaa60bcf62feb3198553f2bcf6dcbfdfcecd0fdfabda41f1dae7d3f15fadbd68
Source0: %{url}/archive/rocm-%{rocm_version}.tar.gz
BuildSystem: cmake
BuildOption(conf): -G Ninja
BuildOption(conf): -DGPU_TARGETS=%{rocm_gpu_list_default}
BuildOption(conf): -DEXPLICIT_ROCM_VERSION=%{rocm_version}
BuildOption(conf): -DROCM_PATH=%{_prefix}
BuildOption(conf): -DCMAKE_VERBOSE_MAKEFILE=ON
BuildOption(conf): -DBUILD_TESTS=ON
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(fmt)
BuildRequires: cmake(GTest)
BuildRequires: cmake(hip)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(rocm_smi)
BuildRequires: cmake(rocm-core)
BuildRequires: cmake(rocprofiler-register)
BuildRequires: compiler-rt
BuildRequires: hipify
BuildRequires: lld
BuildRequires: llvm
BuildRequires: ninja
BuildRequires: python3
BuildRequires: rocm-cmake
BuildRequires: rocm-llvm-macros
Requires: %{name}-data = %{version}-%{release}
%description
RCCL (pronounced "Rickle") is a stand-alone library of standard
collective communication routines for GPUs, implementing all-reduce,
all-gather, reduce, broadcast, reduce-scatter, gather, scatter, and
all-to-all. There is also initial support for direct GPU-to-GPU
send and receive operations. It has been optimized to achieve high
bandwidth on platforms using PCIe, xGMI as well as networking using
InfiniBand Verbs or TCP/IP sockets. RCCL supports an arbitrary
number of GPUs installed in a single node or multiple nodes, and
can be used in either single- or multi-process (e.g., MPI)
applications.
The collective operations are implemented using ring and tree
algorithms and have been optimized for throughput and latency. For
best performance, small operations can be either batched into
larger operations or aggregated through the API.
%package devel
Summary: Headers and libraries for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
Provides: rccl-devel = %{version}-%{release}
%description devel
Headers and libraries for %{name}
%package data
Summary: Data for %{name}
BuildArch: noarch
%description data
Data for %{name}
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%prep -a
# Do not force install
sed -i -e 's@set(CMAKE_INSTALL_LIBDIR@#set(CMAKE_INSTALL_LIBDIR@' cmake/Dependencies.cmake
# -amdgpu-s-branch-bits and -amdgpu-long-branch-factor=2 are needed to avoid 'branch size exceed simm16' error
# --lto-partitions to accelerate linking time
sed -i -e 's@target_link_options(rccl PRIVATE "SHELL:-Xoffload-linker -mllvm=-amdgpu-kernarg-preload-count=16")@target_link_options(rccl PRIVATE "SHELL:-Xoffload-linker -mllvm=-amdgpu-s-branch-bits=15" "SHELL:-Xoffload-linker -mllvm=-amdgpu-long-branch-factor=2" "SHELL:-Xoffload-linker -mllvm=-amdgpu-kernarg-preload-count=16" "SHELL:-Xoffload-linker --lto-partitions=%(nproc)" "SHELL:-Xoffload-linker --verbose")@' CMakeLists.txt
%build
# AMDGPU device linker runs as a process that produces no stdout for about 8~12 hours on riscv64
timeout 12h bash -c 'while sleep 300; do echo "[heartbeat] $(date)"; done' & TIME_OUT=$!
%cmake_build
kill $TIME_OUT 2>/dev/null || true
%install -a
rm -f %{buildroot}%{_datadir}/doc/rccl/LICENSE.txt
%files
%license LICENSE.txt
%{_libdir}/librccl.so.*
%{_bindir}/rcclras
%files data
%{_datadir}/rccl/msccl-algorithms/
%{_datadir}/rccl/msccl-unit-test-algorithms/
%files devel
%doc README.md
%{_includedir}/rccl/
%{_libdir}/cmake/rccl/
%{_libdir}/librccl.so
%files test
%{_bindir}/rccl-UnitTests
%changelog
%autochangelog
@@ -1,25 +0,0 @@
From 87f3c0b3ebab78aa6d126633683720031d886313 Mon Sep 17 00:00:00 2001
From: Tom Rix <trix@redhat.com>
Date: Sat, 13 Jan 2024 14:36:01 -0500
Subject: [PATCH] fixup install of tensile output
---
library/src/CMakeLists.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/library/src/CMakeLists.txt b/library/src/CMakeLists.txt
index f4bdfb5f9742..316779134314 100644
--- a/library/src/CMakeLists.txt
+++ b/library/src/CMakeLists.txt
@@ -823,7 +823,7 @@ if( BUILD_WITH_TENSILE )
if (WIN32)
set( ROCBLAS_TENSILE_LIBRARY_DIR "\${CPACK_PACKAGING_INSTALL_PREFIX}/bin/rocblas" CACHE PATH "path to tensile library" )
else()
- set( ROCBLAS_TENSILE_LIBRARY_DIR "\${CPACK_PACKAGING_INSTALL_PREFIX}${CMAKE_INSTALL_LIBDIR}/rocblas" CACHE PATH "path to tensile library" )
+ set( ROCBLAS_TENSILE_LIBRARY_DIR "${CMAKE_INSTALL_LIBDIR}/rocblas" CACHE PATH "path to tensile library" )
endif()
# For ASAN package, Tensile library files(which are not shared libraries) are not required
if( NOT ENABLE_ASAN_PACKAGING )
--
2.51.0
@@ -1,314 +0,0 @@
diff --git a/clients/include/blas2/testing_gbmv.hpp b/clients/include/blas2/testing_gbmv.hpp
index d02e1a5..bfe1046 100644
--- a/clients/include/blas2/testing_gbmv.hpp
+++ b/clients/include/blas2/testing_gbmv.hpp
@@ -267,11 +267,11 @@ void testing_gbmv(const Arguments& arg)
hy_gold = hy;
// copy data from CPU to device
- dAb.transfer_from(hAb);
- dx.transfer_from(hx);
- dy.transfer_from(hy);
- d_alpha.transfer_from(halpha);
- d_beta.transfer_from(hbeta);
+ CHECK_HIP_ERROR(dAb.transfer_from(hAb));
+ CHECK_HIP_ERROR(dx.transfer_from(hx));
+ CHECK_HIP_ERROR(dy.transfer_from(hy));
+ CHECK_HIP_ERROR(d_alpha.transfer_from(halpha));
+ CHECK_HIP_ERROR(d_beta.transfer_from(hbeta));
double cpu_time_used;
double error_host = 0.0, error_device = 0.0;
@@ -290,12 +290,12 @@ void testing_gbmv(const Arguments& arg)
(handle, transA, M, N, KL, KU, &h_alpha, dAb, lda, dx, incx, &h_beta, dy, incy));
handle.post_test(arg);
- hy.transfer_from(dy);
+ CHECK_HIP_ERROR(hy.transfer_from(dy));
}
if(arg.pointer_mode_device)
{
- dy.transfer_from(hy_gold);
+ CHECK_HIP_ERROR(dy.transfer_from(hy_gold));
CHECK_ROCBLAS_ERROR(rocblas_set_pointer_mode(handle, rocblas_pointer_mode_device));
handle.pre_test(arg);
@@ -308,7 +308,7 @@ void testing_gbmv(const Arguments& arg)
{
HOST_MEMCHECK(host_vector<T>, hy_copy, (dim_y, incy));
// copy output from device to CPU
- hy.transfer_from(dy);
+ CHECK_HIP_ERROR(hy.transfer_from(dy));
// multi-GPU support
int device_id, device_count;
@@ -330,17 +330,17 @@ void testing_gbmv(const Arguments& arg)
DEVICE_MEMCHECK(device_vector<T>, d_alpha_copy, (1));
DEVICE_MEMCHECK(device_vector<T>, d_beta_copy, (1));
- dAb_copy.transfer_from(hAb);
- dx_copy.transfer_from(hx);
- d_alpha_copy.transfer_from(halpha);
- d_beta_copy.transfer_from(hbeta);
+ CHECK_HIP_ERROR(dAb_copy.transfer_from(hAb));
+ CHECK_HIP_ERROR(dx_copy.transfer_from(hx));
+ CHECK_HIP_ERROR(d_alpha_copy.transfer_from(halpha));
+ CHECK_HIP_ERROR(d_beta_copy.transfer_from(hbeta));
CHECK_ROCBLAS_ERROR(
rocblas_set_pointer_mode(handle_copy, rocblas_pointer_mode_device));
for(int runs = 0; runs < arg.iters; runs++)
{
- dy_copy.transfer_from(hy_gold);
+ CHECK_HIP_ERROR(dy_copy.transfer_from(hy_gold));
DAPI_CHECK(rocblas_gbmv_fn,
(handle_copy,
transA,
@@ -357,7 +357,7 @@ void testing_gbmv(const Arguments& arg)
dy_copy,
incy));
// copy output from device to CPU
- hy_copy.transfer_from(dy_copy);
+ CHECK_HIP_ERROR(hy_copy.transfer_from(dy_copy));
unit_check_general<T>(1, dim_y, incy, hy, hy_copy);
}
}
@@ -383,7 +383,7 @@ void testing_gbmv(const Arguments& arg)
if(arg.pointer_mode_device)
{
// copy output from device to CPU
- hy.transfer_from(dy);
+ CHECK_HIP_ERROR(hy.transfer_from(dy));
if(arg.unit_check)
{
diff --git a/clients/include/blas2/testing_sbmv.hpp b/clients/include/blas2/testing_sbmv.hpp
index feb1148..95e09e2 100644
--- a/clients/include/blas2/testing_sbmv.hpp
+++ b/clients/include/blas2/testing_sbmv.hpp
@@ -204,9 +204,9 @@ void testing_sbmv(const Arguments& arg)
hy_gold = hy;
// copy data from CPU to device
- dx.transfer_from(hx);
- dy.transfer_from(hy);
- dAb.transfer_from(hAb);
+ CHECK_HIP_ERROR(dx.transfer_from(hx));
+ CHECK_HIP_ERROR(dy.transfer_from(hy));
+ CHECK_HIP_ERROR(dAb.transfer_from(hAb));
double cpu_time_used;
double error_host = 0.0, error_device = 0.0;
@@ -231,7 +231,7 @@ void testing_sbmv(const Arguments& arg)
CHECK_HIP_ERROR(d_alpha.transfer_from(alpha));
CHECK_HIP_ERROR(d_beta.transfer_from(beta));
- dy.transfer_from(hy_gold);
+ CHECK_HIP_ERROR(dy.transfer_from(hy_gold));
handle.pre_test(arg);
diff --git a/clients/include/blas2/testing_sbmv_batched.hpp b/clients/include/blas2/testing_sbmv_batched.hpp
index 4812be9..9a91392 100644
--- a/clients/include/blas2/testing_sbmv_batched.hpp
+++ b/clients/include/blas2/testing_sbmv_batched.hpp
@@ -322,9 +322,9 @@ void testing_sbmv_batched(const Arguments& arg)
hy_gold.copy_from(hy);
// copy data from CPU to device
- dx.transfer_from(hx);
- dy.transfer_from(hy);
- dAb.transfer_from(hAb);
+ CHECK_HIP_ERROR(dx.transfer_from(hx));
+ CHECK_HIP_ERROR(dy.transfer_from(hy));
+ CHECK_HIP_ERROR(dAb.transfer_from(hAb));
double cpu_time_used;
double h_error = 0.0, d_error = 0.0;
@@ -363,7 +363,7 @@ void testing_sbmv_batched(const Arguments& arg)
CHECK_HIP_ERROR(d_alpha.transfer_from(alpha));
CHECK_HIP_ERROR(d_beta.transfer_from(beta));
- dy.transfer_from(hy_gold);
+ CHECK_HIP_ERROR(dy.transfer_from(hy_gold));
handle.pre_test(arg);
DAPI_CHECK(rocblas_sbmv_batched_fn,
diff --git a/clients/include/blas2/testing_sbmv_strided_batched.hpp b/clients/include/blas2/testing_sbmv_strided_batched.hpp
index a32538a..ad902a3 100644
--- a/clients/include/blas2/testing_sbmv_strided_batched.hpp
+++ b/clients/include/blas2/testing_sbmv_strided_batched.hpp
@@ -385,9 +385,9 @@ void testing_sbmv_strided_batched(const Arguments& arg)
hy_gold.copy_from(hy);
// copy data from CPU to device
- dx.transfer_from(hx);
- dy.transfer_from(hy);
- dAb.transfer_from(hAb);
+ CHECK_HIP_ERROR(dx.transfer_from(hx));
+ CHECK_HIP_ERROR(dy.transfer_from(hy));
+ CHECK_HIP_ERROR(dAb.transfer_from(hAb));
double cpu_time_used;
double error_host = 0.0, error_device = 0.0;
@@ -428,7 +428,7 @@ void testing_sbmv_strided_batched(const Arguments& arg)
CHECK_HIP_ERROR(d_alpha.transfer_from(alpha));
CHECK_HIP_ERROR(d_beta.transfer_from(beta));
- dy.transfer_from(hy_gold);
+ CHECK_HIP_ERROR(dy.transfer_from(hy_gold));
handle.pre_test(arg);
DAPI_CHECK(rocblas_sbmv_strided_batched_fn,
diff --git a/clients/include/blas2/testing_symv.hpp b/clients/include/blas2/testing_symv.hpp
index 2b31355..d170478 100644
--- a/clients/include/blas2/testing_symv.hpp
+++ b/clients/include/blas2/testing_symv.hpp
@@ -213,7 +213,7 @@ void testing_symv(const Arguments& arg)
CHECK_HIP_ERROR(d_alpha.transfer_from(alpha));
CHECK_HIP_ERROR(d_beta.transfer_from(beta));
- dy.transfer_from(hy_gold);
+ CHECK_HIP_ERROR(dy.transfer_from(hy_gold));
handle.pre_test(arg);
DAPI_CHECK(rocblas_symv_fn,
diff --git a/clients/include/blas2/testing_symv_batched.hpp b/clients/include/blas2/testing_symv_batched.hpp
index 6fd3f7b..ceed6f3 100644
--- a/clients/include/blas2/testing_symv_batched.hpp
+++ b/clients/include/blas2/testing_symv_batched.hpp
@@ -345,7 +345,7 @@ void testing_symv_batched(const Arguments& arg)
CHECK_HIP_ERROR(d_alpha.transfer_from(alpha));
CHECK_HIP_ERROR(d_beta.transfer_from(beta));
- dy.transfer_from(hy_gold);
+ CHECK_HIP_ERROR(dy.transfer_from(hy_gold));
handle.pre_test(arg);
DAPI_CHECK(rocblas_symv_batched_fn,
diff --git a/clients/include/blas2/testing_symv_strided_batched.hpp b/clients/include/blas2/testing_symv_strided_batched.hpp
index d96e17c..dbaa454 100644
--- a/clients/include/blas2/testing_symv_strided_batched.hpp
+++ b/clients/include/blas2/testing_symv_strided_batched.hpp
@@ -433,7 +433,7 @@ void testing_symv_strided_batched(const Arguments& arg)
CHECK_HIP_ERROR(d_alpha.transfer_from(alpha));
CHECK_HIP_ERROR(d_beta.transfer_from(beta));
- dy.transfer_from(hy_gold);
+ CHECK_HIP_ERROR(dy.transfer_from(hy_gold));
handle.pre_test(arg);
DAPI_CHECK(rocblas_symv_strided_batched_fn,
diff --git a/clients/include/blas_ex/testing_gemm_batched_ex.hpp b/clients/include/blas_ex/testing_gemm_batched_ex.hpp
index 214f0b4..54ca0b5 100644
--- a/clients/include/blas_ex/testing_gemm_batched_ex.hpp
+++ b/clients/include/blas_ex/testing_gemm_batched_ex.hpp
@@ -103,7 +103,7 @@ void testing_gemm_batched_ex_bad_arg(const Arguments& arg)
rocblas_seedrand();
rocblas_init_matrix<To>(
hC, arg, rocblas_client_beta_sets_nan, rocblas_client_general_matrix);
- dC.transfer_from(hC);
+ CHECK_HIP_ERROR(dC.transfer_from(hC));
// clang-format off
// check for invalid enum
diff --git a/clients/include/blas_ex/testing_gemm_ex.hpp b/clients/include/blas_ex/testing_gemm_ex.hpp
index 4977995..be36f2e 100644
--- a/clients/include/blas_ex/testing_gemm_ex.hpp
+++ b/clients/include/blas_ex/testing_gemm_ex.hpp
@@ -102,7 +102,7 @@ void testing_gemm_ex_bad_arg(const Arguments& arg)
HOST_MEMCHECK(host_matrix<To>, hC, (M, N, ldc));
rocblas_seedrand();
rocblas_init_matrix(hC, arg, rocblas_client_beta_sets_nan, rocblas_client_general_matrix);
- dC.transfer_from(hC);
+ CHECK_HIP_ERROR(dC.transfer_from(hC));
// clang-format off
diff --git a/clients/include/testing_set_get_matrix_async.hpp b/clients/include/testing_set_get_matrix_async.hpp
index 01d0648..de8d301 100644
--- a/clients/include/testing_set_get_matrix_async.hpp
+++ b/clients/include/testing_set_get_matrix_async.hpp
@@ -127,7 +127,7 @@ void testing_set_get_matrix_async(const Arguments& arg)
cpu_time_used = get_time_us_no_sync() - cpu_time_used;
- hipStreamSynchronize(stream);
+ CHECK_HIP_ERROR(hipStreamSynchronize(stream));
if(arg.unit_check)
{
@@ -160,7 +160,7 @@ void testing_set_get_matrix_async(const Arguments& arg)
(rows, cols, sizeof(T), dD, ldd, hB, ldb, stream));
}
- hipStreamSynchronize(stream);
+ CHECK_HIP_ERROR(hipStreamSynchronize(stream));
gpu_time_used = get_time_us_sync(stream) - gpu_time_used;
ArgumentModel<e_M, e_N, e_lda, e_ldb, e_ldd>{}.log_args<T>(
diff --git a/clients/include/testing_set_get_vector_async.hpp b/clients/include/testing_set_get_vector_async.hpp
index b88cc0c..0bddc20 100644
--- a/clients/include/testing_set_get_vector_async.hpp
+++ b/clients/include/testing_set_get_vector_async.hpp
@@ -113,7 +113,7 @@ void testing_set_get_vector_async(const Arguments& arg)
cpu_time_used = get_time_us_no_sync() - cpu_time_used;
- hipStreamSynchronize(stream);
+ CHECK_HIP_ERROR(hipStreamSynchronize(stream));
if(arg.unit_check)
{
@@ -144,7 +144,7 @@ void testing_set_get_vector_async(const Arguments& arg)
DAPI_DISPATCH(rocblas_get_vector_async_fn, (N, sizeof(T), db, ldd, hy, incy, stream));
}
- hipStreamSynchronize(stream);
+ CHECK_HIP_ERROR(hipStreamSynchronize(stream));
gpu_time_used = get_time_us_sync(stream) - gpu_time_used;
ArgumentModel<e_N, e_incx, e_incy, e_ldd>{}.log_args<T>(rocblas_cout,
diff --git a/library/src/include/handle.hpp b/library/src/include/handle.hpp
index a0a1760..c80cc5e 100644
--- a/library/src/include/handle.hpp
+++ b/library/src/include/handle.hpp
@@ -147,16 +147,20 @@ private:
: device_id(device_id)
, old_device_id(-1)
{
- hipGetDevice(&old_device_id);
+ THROW_IF_HIP_ERROR(hipGetDevice(&old_device_id));
if(device_id != old_device_id)
- hipSetDevice(device_id);
+ {
+ THROW_IF_HIP_ERROR(hipSetDevice(device_id));
+ }
}
// Old device ID is restored on destruction
~_rocblas_saved_device_id()
{
if(device_id != old_device_id)
- hipSetDevice(old_device_id);
+ {
+ (void)(hipSetDevice(old_device_id));
+ }
}
// Move constructor
diff --git a/library/src/src64/blas1/rocblas_dot_kernels_64.cpp b/library/src/src64/blas1/rocblas_dot_kernels_64.cpp
index 0bd3061..2b38dd7 100644
--- a/library/src/src64/blas1/rocblas_dot_kernels_64.cpp
+++ b/library/src/src64/blas1/rocblas_dot_kernels_64.cpp
@@ -325,7 +325,7 @@ rocblas_status rocblas_internal_dot_launcher_64(rocblas_handle __restrict__ hand
if(handle->pointer_mode == rocblas_pointer_mode_host)
{
// sync here to match legacy BLAS
- hipStreamSynchronize(handle->get_stream());
+ RETURN_IF_HIP_ERROR(hipStreamSynchronize(handle->get_stream()));
}
return rocblas_status_success;
-168
View File
@@ -1,168 +0,0 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
# Tests consumes too much time and space
%bcond test 0
%global rocm_version 7.1.1
Name: rocblas
Summary: BLAS implementation for ROCm
Version: %{rocm_version}
Release: %autorelease
License: MIT AND BSD-3-Clause
URL: https://github.com/ROCm/rocBLAS
#!RemoteAsset
Source0: %{url}/archive/rocm-%{rocm_version}.tar.gz
BuildSystem: cmake
BuildOption(conf): -G Ninja
BuildOption(conf): -DBLAS_INCLUDE_DIR=%{_includedir}/cblas
BuildOption(conf): -DBLAS_LIBRARY=cblas
BuildOption(conf): -DCMAKE_CXX_COMPILER=hipcc
BuildOption(conf): -DCMAKE_C_COMPILER=clang
BuildOption(conf): -DCMAKE_LINKER=%rocmllvm_bindir/ld.lld
BuildOption(conf): -DCMAKE_AR=%rocmllvm_bindir/llvm-ar
BuildOption(conf): -DCMAKE_RANLIB=%rocmllvm_bindir/llvm-ranlib
BuildOption(conf): -DCMAKE_PREFIX_PATH=%{rocmllvm_cmakedir}/..
BuildOption(conf): -DCMAKE_SKIP_RPATH=ON
BuildOption(conf): -DCMAKE_VERBOSE_MAKEFILE=ON
BuildOption(conf): -DBUILD_FILE_REORG_BACKWARD_COMPATIBILITY=OFF
# Avoid using external tensile
BuildOption(conf): -DBUILD_WITH_PIP=OFF
BuildOption(conf): -DROCM_SYMLINK_LIBS=OFF
BuildOption(conf): -DHIP_PLATFORM=amd
# These will be enabled in a long future
BuildOption(conf): -DBUILD_CLIENTS_BENCHMARKS=%{?with_test:ON}%{!?with_test:OFF}
BuildOption(conf): -DBUILD_CLIENTS_TESTS=%{?with_test:ON}%{!?with_test:OFF}
BuildOption(conf): -DBUILD_CLIENTS_TESTS_OPENMP=OFF
BuildOption(conf): -DBUILD_FORTRAN_CLIENTS=OFF
BuildOption(conf): -DBUILD_OFFLOAD_COMPRESS=ON
BuildOption(conf): -DBUILD_WITH_HIPBLASLT=OFF
BuildOption(conf): -DBUILD_WITH_TENSILE=ON
BuildOption(conf): -DGPU_TARGETS=%{rocm_gpu_list_default}
BuildOption(conf): -DTensile_LIBRARY_FORMAT=msgpack
BuildOption(conf): -DTensile_VERBOSE=1
BuildOption(conf): -DTensile_DIR=$(%{_bindir}/TensileGetPath)/cmake
BuildOption(conf): -DTensile_LOGIC=asm_full
BuildOption(conf): -DTensile_CODE_OBJECT_VERSION=default
BuildOption(conf): -DTensile_SEPARATE_ARCHITECTURES=ON
BuildOption(conf): -DTensile_LAZY_LIBRARY_LOADING=ON
BuildOption(conf): -DTensile_ASSEMBLER=clang++
Patch0: 0001-fixup-install-of-tensile-output.patch
# https://github.com/ROCm/rocm-libraries/commit/6221075881f3ea8e9dfa0d985f22005c74ae1f52
Patch1: 0002-fix-nodiscard-return-value-ignored.patch
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(hip)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(msgpack)
BuildRequires: compiler-rt
BuildRequires: gcc-c++
BuildRequires: hipcc
BuildRequires: lld
BuildRequires: llvm
BuildRequires: ninja
BuildRequires: pkgconfig(libzstd)
BuildRequires: python3dist(tensile)
BuildRequires: rocm-cmake
BuildRequires: rocm-device-libs
BuildRequires: rocm-llvm-macros
%if %{with test}
BuildRequires: gcc-fortran
BuildRequires: cmake(openmp)
BuildRequires: cmake(rocm_smi)
BuildRequires: pkgconfig(blas)
BuildRequires: pkgconfig(GTest)
BuildRequires: python3dist(pyyaml)
BuildRequires: rocminfo
%endif
Provides: rocblas = %{version}-%{release}
Requires: python3dist(msgpack)
%description
rocBLAS is the AMD library for Basic Linear Algebra Subprograms
(BLAS) on the ROCm platform. It is implemented in the HIP
programming language and optimized for AMD GPUs.
%package devel
Summary: Libraries and headers for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
Requires: rocm-hip-devel
%description devel
%{summary}
%if %{with test}
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
Requires: diffutils
%description test
%{summary}
%endif
%prep -a
sed -i -e 's@target_link_libraries( rocblas-test PRIVATE ${BLAS_LIBRARY} ${GTEST_BOTH_LIBRARIES} roc::rocblas )@target_link_libraries( rocblas-test PRIVATE cblas ${GTEST_BOTH_LIBRARIES} roc::rocblas )@' clients/gtest/CMakeLists.txt
# no git in this build
sed -i -e 's@find_package(Git REQUIRED)@find_package(Git)@' library/CMakeLists.txt
# /usr/include/gtest/internal/gtest-port.h:279:2: error: C++ versions less than C++14 are not supported.
# 279 | #error C++ versions less than C++14 are not supported.
sed -i -e 's@CXX_STANDARD 11@CXX_STANDARD 17@' clients/samples/CMakeLists.txt
sed -i "s@/opt/rocm@%{_prefix}@g" \
clients/cmake/FindROCmSMI.cmake \
clients/CMakeLists.txt \
rmake.py \
rmake.py \
rmake.py \
toolchain-linux.cmake \
header_compilation_tests.sh \
library/src/tensile_host.cpp \
library/src/include/handle.hpp \
scripts/utilities/check_for_pretuned_sizes_c/Makefile \
scripts/performance/blas/getspecs.py \
scripts/performance/blas/commandrunner.py \
CMakeLists.txt \
library/CMakeLists.txt
sed -i "s@llvm/bin@bin@g" CMakeLists.txt library/CMakeLists.txt
%install -a
rm -f %{buildroot}%{_prefix}/share/doc/rocblas/LICENSE.md
%check
%if %{with test}
export LD_LIBRARY_PATH=%{_vpath_builddir}/library/src:$LD_LIBRARY_PATH
%{_vpath_builddir}/clients/staging/rocblas-test --gtest_brief=1
%endif
%files
%license LICENSE.md
%{_libdir}/librocblas.so.5{,.*}
%{_libdir}/rocblas/
%files devel
%doc README.md
%{_includedir}/rocblas/
%{_libdir}/cmake/rocblas/
%{_libdir}/librocblas.so
%if %{with test}
%files test
%{_bindir}/rocblas*
%endif
%changelog
%{autochangelog}
@@ -0,0 +1,13 @@
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 14bb20b..a5b1ee4 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -49,6 +49,8 @@ set( ROCFFT_BUILD_SCOPE ON )
project( rocfft LANGUAGES CXX C )
+include(GNUInstallDirs)
+
# This finds the rocm-cmake project, and installs it if not found
# rocm-cmake contains common cmake code for rocm projects to help setup and install
set( PROJECT_EXTERN_DIR ${CMAKE_CURRENT_BINARY_DIR}/extern )
+111
View File
@@ -0,0 +1,111 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
# Without a GPU, the test cases will fail with `what(): hipGetDeviceCount failed`
# rocFFT needs a GPU to run tests, but we could still
# keep the test cases for packagers who have a GPU, so make it optional.
%bcond test 0
%global rocm_release 7.1
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
# rocm stack builds with clang
%global toolchain clang
Name: rocfft
Version: %{rocm_version}
Release: %autorelease
Summary: ROCm Fast Fourier Transforms library
License: MIT
Url: https://github.com/ROCm/rocFFT
#!RemoteAsset: sha256:047e4e93e0b12869bf42136b5eb683df3a1635b01a58bbb25c8861df291ab285
Source: %{url}/archive/rocm-%{version}.tar.gz
BuildSystem: cmake
Patch0: 0001-cmake-use-gnu-installdirs.patch
BuildOption(conf): -G Ninja
BuildOption(conf): -DAMDGPU_TARGETS=%{rocm_gpu_list_default}
BuildOption(conf): -DBUILD_CLIENTS_TESTS=ON
BuildOption(conf): -DROCFFT_BUILD_OFFLINE_TUNER=OFF
BuildOption(conf): -DROCFFT_KERNEL_CACHE_ENABLE=OFF
BuildOption(conf): -DSQLITE_USE_SYSTEM_PACKAGE=ON
BuildRequires: boost-devel
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(hip)
BuildRequires: cmake(hiprand)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(GTest)
BuildRequires: cmake(rocrand)
BuildRequires: compiler-rt
BuildRequires: libomp-devel
BuildRequires: lld
BuildRequires: llvm
BuildRequires: ninja
BuildRequires: pkgconfig(fftw3)
BuildRequires: pkgconfig(sqlite3)
BuildRequires: python3
BuildRequires: rocm-cmake
BuildRequires: rocm-device-libs
BuildRequires: rocm-llvm-macros
%description
rocFFT is a software library for computing fast Fourier transforms (FFTs) written
in HIP. It is part of AMD's software ecosystem based on ROCm. In addition to
AMD GPU hardware, rocFFT also works on CPU devices to facilitate testing.
%package devel
Summary: The rocFFT development package
Requires: %{name}%{?_isa} = %{version}-%{release}
Requires: cmake(hip)
%description devel
The rocFFT development package.
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%prep -a
# Do not care so much about the sqlite version
sed -i -e 's@SQLite3 3.50.2 @SQLite3 @' cmake/sqlite.cmake
%install -a
# we don't need the rocfft_rtc_helper binary and client-info file
find %{buildroot} -type f -name "rocfft_rtc_helper" -print0 | xargs -0 -I {} /usr/bin/rm -rf "{}"
rm -rf %{buildroot}/%{_prefix}/.info
rm -f %{buildroot}%{_datadir}/doc/rocfft/LICENSE.md
%if %{with test}
%check
%{_vpath_builddir}/clients/staging/rocfft-test
%endif
%files
%doc README.md
%license LICENSE.md
%{_libdir}/librocfft.so.0{,.*}
%files devel
%{_includedir}/rocfft/
%{_libdir}/cmake/rocfft/
%{_libdir}/librocfft.so
%files test
%{_bindir}/rocfft-test
%{_bindir}/rtc_helper_crash
%changelog
%autochangelog
+52
View File
@@ -0,0 +1,52 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: Sakura286 <chenxuan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%global rocm_version 7.1.1
Name: rocm-core
Version: %{rocm_version}
Release: %autorelease
Summary: A utility to get the ROCm release version
License: MIT
URL: https://github.com/ROCm/rocm-core
#!RemoteAsset: sha256:0171b82a4d028d57035d0d57a01a058f50f1a23959d230cdeab14972dcd94da8
Source0: %{url}/archive/refs/tags/rocm-%{version}.tar.gz
BuildSystem: cmake
BuildOption(conf): -DROCM_VERSION=%{rocm_version}
BuildRequires: cmake
Provides: rocm-core = %{version}-%{release}
%description
%{summary}
%package devel
Summary: Libraries and headers for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description devel
%{summary}
%install -a
rm -rvf %{buildroot}/%{_exec_prefix}/.info
rm -rvf %{buildroot}/%{_exec_prefix}/libexec/rocm-core
rm -rvf %{buildroot}/%{_exec_prefix}/share/doc/*/LICENSE.md
rm -rvf %{buildroot}/%{_libdir}/rocmmod
%files
%doc README.md
%license LICENSE.md
%{_libdir}/librocm-core.so.*
%files devel
%{_includedir}/rocm-core/*.h
%{_libdir}/cmake/rocm-core/*.cmake
%{_libdir}/librocm-core.so
%changelog
%{?autochangelog}
@@ -0,0 +1,15 @@
diff --git a/amd/comgr/CMakeLists.txt b/amd/comgr/CMakeLists.txt
index cfa170f94..e03049224 100644
--- a/amd/comgr/CMakeLists.txt
+++ b/amd/comgr/CMakeLists.txt
@@ -169,6 +169,10 @@ if (ADDRESS_SANITIZER)
"${CMAKE_SHARED_LINKER_FLAGS} ${ASAN_LINKER_FLAGS}")
endif()
+
+set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${ASAN_COMPILER_FLAGS} -fsigned-char")
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${ASAN_COMPILER_FLAGS} -fsigned-char")
+
set(AMD_COMGR_PRIVATE_COMPILE_OPTIONS)
set(AMD_COMGR_PRIVATE_COMPILE_DEFINITIONS ${LLVM_DEFINITIONS})
set(AMD_COMGR_PUBLIC_LINKER_OPTIONS)
@@ -0,0 +1,122 @@
ROCm 7.1.1 build with llvm-20, but only llvm-21 is provided by openRuyi.
Some backport work need to be done.
This patch includes:
* https://github.com/ROCm/llvm-project/commit/a7ad03285bf9ff361acb5e721386870be9354620
* https://github.com/ROCm/llvm-project/commit/75074634076c0e3b8b2a18bbcf6ffef01094b069
and another diff:
old https://github.com/ROCm/llvm-project/commit/f2987311af76ab1d5e6f770861865b6952002b0a
new https://github.com/ROCm/llvm-project/commit/e3dc0a41658572aecb595e55a79b9f4a85224187
---
diff --git a/amd/comgr/src/comgr-cache-bundler-command.cpp b/amd/comgr/src/comgr-cache-bundler-command.cpp
index 514262725..23e919dbd 100644
--- a/amd/comgr/src/comgr-cache-bundler-command.cpp
+++ b/amd/comgr/src/comgr-cache-bundler-command.cpp
@@ -155,10 +155,8 @@ void UnbundleCommand::addOptionsIdentifier(HashAlgorithm &H) const {
Error UnbundleCommand::addInputIdentifier(HashAlgorithm &H) const {
StringRef InputFilename = Config.InputFileNames.front();
- constexpr size_t LargestHeaderSize = CompressedOffloadBundle::V3HeaderSize;
-
ErrorOr<std::unique_ptr<MemoryBuffer>> MaybeInputBuffer =
- MemoryBuffer::getFileSlice(InputFilename, LargestHeaderSize, 0);
+ MemoryBuffer::getFile(InputFilename);
if (!MaybeInputBuffer) {
std::error_code EC = MaybeInputBuffer.getError();
return createStringError(EC, Twine("Failed to open ") + InputFilename +
@@ -167,14 +165,17 @@ Error UnbundleCommand::addInputIdentifier(HashAlgorithm &H) const {
MemoryBuffer &InputBuffer = **MaybeInputBuffer;
- uint8_t Header[LargestHeaderSize];
- memset(Header, 0, sizeof(Header));
- memcpy(Header, InputBuffer.getBufferStart(),
- std::min(LargestHeaderSize, InputBuffer.getBufferSize()));
-
- // only hash the input file, not the whole header. Colissions are unlikely
- // since the header includes a hash (weak) of the contents
- H.update(Header);
+ using Header = CompressedOffloadBundle::CompressedBundleHeader;
+ Expected<Header> MaybeHeader = Header::tryParse(InputBuffer.getBuffer());
+ if (!MaybeHeader)
+ return MaybeHeader.takeError();
+
+ // The hash represents the contents of the bundle. Extracting the same
+ // contents should give the same result, regardless of the compression
+ // algorithm or header version. Since the hash used by the offload bundler is
+ // not a cryptographic hash, we also add the uncompressed file size.
+ H.update(MaybeHeader->Hash);
+ H.update(MaybeHeader->UncompressedFileSize);
return Error::success();
}
diff --git a/amd/comgr/src/comgr-compiler.cpp b/amd/comgr/src/comgr-compiler.cpp
index 82102910a..96ca28b94 100644
--- a/amd/comgr/src/comgr-compiler.cpp
+++ b/amd/comgr/src/comgr-compiler.cpp
@@ -70,6 +70,7 @@
#include "llvm/MC/MCCodeEmitter.h"
#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCInstrInfo.h"
+#include "llvm/MC/MCInstPrinter.h"
#include "llvm/MC/MCObjectFileInfo.h"
#include "llvm/MC/MCObjectWriter.h"
#include "llvm/MC/MCParser/MCAsmParser.h"
@@ -462,8 +463,9 @@ bool executeAssemblerImpl(AssemblerInvocation &Opts, DiagnosticsEngine &Diags,
// FIXME: There is a bit of code duplication with addPassesToEmitFile.
if (Opts.OutputType == AssemblerInvocation::FT_Asm) {
- MCInstPrinter *IP = TheTarget->createMCInstPrinter(
- llvm::Triple(Opts.Triple), Opts.OutputAsmVariant, *MAI, *MCII, *MRI);
+ std::unique_ptr<MCInstPrinter> InstructionPrinter(
+ TheTarget->createMCInstPrinter(
+ llvm::Triple(Opts.Triple), Opts.OutputAsmVariant, *MAI, *MCII, *MRI));
std::unique_ptr<MCCodeEmitter> MCE;
std::unique_ptr<MCAsmBackend> MAB;
if (Opts.ShowEncoding) {
@@ -472,7 +474,7 @@ bool executeAssemblerImpl(AssemblerInvocation &Opts, DiagnosticsEngine &Diags,
MAB.reset(TheTarget->createMCAsmBackend(*STI, *MRI, Options));
}
auto FOut = std::make_unique<formatted_raw_ostream>(*Out);
- Str.reset(TheTarget->createAsmStreamer(Ctx, std::move(FOut), IP,
+ Str.reset(TheTarget->createAsmStreamer(Ctx, std::move(FOut), std::move(InstructionPrinter),
std::move(MCE), std::move(MAB)));
} else if (Opts.OutputType == AssemblerInvocation::FT_Null) {
Str.reset(createNullStreamer(Ctx));
@@ -653,9 +655,9 @@ void logArgv(raw_ostream &OS, StringRef ProgramName,
amd_comgr_status_t executeCommand(const Command &Job, raw_ostream &LogS,
DiagnosticOptions &DiagOpts,
llvm::vfs::FileSystem &FS) {
- TextDiagnosticPrinter DiagClient(LogS, &DiagOpts);
+ TextDiagnosticPrinter DiagClient(LogS, DiagOpts);
IntrusiveRefCntPtr<DiagnosticIDs> DiagID(new DiagnosticIDs);
- DiagnosticsEngine Diags(DiagID, &DiagOpts, &DiagClient, false);
+ DiagnosticsEngine Diags(DiagID, DiagOpts, &DiagClient, false);
auto Arguments = Job.getArguments();
SmallVector<const char *, 128> Argv;
@@ -750,7 +752,7 @@ AMDGPUCompiler::executeInProcessDriver(ArrayRef<const char *> Args) {
// here is mostly copy-and-pasted from driver.cpp/cc1_main.cpp/various Clang
// tests to try to approximate the same behavior as running the `clang`
// executable.
- IntrusiveRefCntPtr<DiagnosticOptions> DiagOpts(new DiagnosticOptions);
+ std::unique_ptr<DiagnosticOptions> DiagOpts(new DiagnosticOptions);
unsigned MissingArgIndex, MissingArgCount;
InputArgList ArgList = getDriverOptTable().ParseArgs(
Args.slice(1), MissingArgIndex, MissingArgCount);
@@ -759,9 +761,9 @@ AMDGPUCompiler::executeInProcessDriver(ArrayRef<const char *> Args) {
// DiagnosticsEngine actually exists.
(void)ParseDiagnosticArgs(*DiagOpts, ArgList);
TextDiagnosticPrinter *DiagClient =
- new TextDiagnosticPrinter(LogS, &*DiagOpts);
+ new TextDiagnosticPrinter(LogS, *DiagOpts);
IntrusiveRefCntPtr<DiagnosticIDs> DiagID(new DiagnosticIDs);
- DiagnosticsEngine Diags(DiagID, &*DiagOpts, DiagClient);
+ DiagnosticsEngine Diags(DiagID, *DiagOpts, DiagClient);
ProcessWarningOptions(Diags, *DiagOpts, *OverlayFS, /*ReportDiags=*/false);
+55
View File
@@ -0,0 +1,55 @@
##Fix issue with HIP, where compilation flags are incorrect, see issue:
#https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/49
#Remove redundant includes:
sed -i '/Args.push_back("-isystem");/,+3d' amd/comgr/src/comgr-compiler.cpp
#Source hard codes the libdir too:
sed -i 's/lib\(\/clang\)/%{_lib}\1/' amd/comgr/src/comgr-compiler.cpp
# Unsupported options
sed -i 's@Args.push_back("-mlink-builtin-bitcode-postopt");@//Args.push_back("-mlink-builtin-bitcode-postopt");@' amd/comgr/src/comgr-compiler.cpp
# Use system perl
sed -i 's|\(/usr/bin/\)env perl|\1perl|' amd/hipcc/bin/hipvars.pm
# Default rocm path is _prefix
# HIPCC fixes to find clang++
sed -i 's| or -e "$HIP_PATH/bin/clang"||' amd/hipcc/bin/hipvars.pm
sed -i 's|lib/llvm/bin|%{_lib}/llvm%{llvm_maj_ver}/bin|' \
amd/hipcc/bin/hipvars.pm amd/hipcc/src/hipBin_amd.h amd/hipcc/src/hipBin_base.h
sed -i 's|-e "$HIP_ROCCLR_HOME/bin/clang" or ||' amd/hipcc/bin/hipvars.pm
# Fixup finding /opt/llvm
sed -i -e 's@sys::path::append(LLVMPath, "llvm");@//sys::path::append(LLVMPath, "llvm");@' amd/comgr/src/comgr-env.cpp
# Fixup finding /opt/rocm/hip
sed -i -e 's@sys::path::append(HIPPath, "hip");@//sys::path::append(HIPPath, "hip");@' amd/comgr/src/comgr-env.cpp
# Default rocm path is _prefix
sed -i -e 's@/opt/rocm@%{_prefix}@' amd/hipcc/src/hipBin_base.h
LLVM_BINDIR=`llvm-config --bindir`
if [ ! -x ${LLVM_BINDIR}/clang++ ]; then
echo "Something wrong with llvm-config"
false
fi
echo "s@\$ROCM_PATH/lib/llvm/bin@${LLVM_BINDIR}@" > pm.sed
echo "s@hipClangPath /= \"lib/llvm/bin\"@hipClangPath = \"${LLVM_BINDIR}\"@" > h.sed
sed -i -f pm.sed amd/hipcc/bin/hipvars.pm
sed -i -f h.sed amd/hipcc/src/hipBin_amd.h
# ROCm upstream uses /opt for rocm-runtime, but we uses /usr
# Don't include it again since /usr/include is already included:
sed -i '/" -isystem " + hsaPath + "\/include"/d' amd/hipcc/src/hipBin_amd.h
sed -i 's/find_package(Clang REQUIRED CONFIG)/find_package(Clang REQUIRED)/' amd/comgr/CMakeLists.txt
sed -i 's/find_package(LLD REQUIRED CONFIG)/find_package(LLD REQUIRED)/' amd/comgr/CMakeLists.txt
sed -i 's@${CLANG_CMAKE_DIR}/../../../@/usr/lib/clang/%{llvm_maj_ver}/@' amd/comgr/cmake/opencl_pch.cmake
# CMP0053 OLD is only needed on Windows. But on new version of cmake it is deprecated.
sed -i 's/cmake_policy(SET CMP0053 OLD)/cmake_policy(SET CMP0053 NEW)/' amd/device-libs/cmake/OCL.cmake
# Fix up the path to the device libs hipcc uses
sed -i -e 's@amdgcnBitcode = roccmPath@amdgcnBitcode = "%{_prefix}/%{amd_device_libs_prefix}/"@' amd/hipcc/src/hipBin_amd.h
# Fix up the location AMD_DEVICE_LIBS_PREFIX
sed -i 's|@AMD_DEVICE_LIBS_PREFIX_CODE@|set(AMD_DEVICE_LIBS_PREFIX "%{_prefix}/%{amd_device_libs_prefix}")|' amd/device-libs/AMDDeviceLibsConfig.cmake.in
+298
View File
@@ -0,0 +1,298 @@
# SPDX-FileCopyrightText: (C) 2025, 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2025, 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
# SPDX-FileContributor: misaka00251 <liuxin@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
# The package follows LLVM's major version, but API version is still important:
%global comgr_maj_api_ver 3
%global comgr_full_api_ver %{comgr_maj_api_ver}.0
# What LLVM is upstream using (use LLVM_VERSION_MAJOR from cmake/Modules/LLVMVersion.cmake):
%global llvm_maj_ver 21
# Sakura286: ROCm 7.1.1 uses LLVM 20, but only LLVM 21 is on openRuyi.
# Backport is needed.
%global rocm_llvm_maj_ver 20
%global rocm_release 7.1
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
%global bundle_prefix %{_libdir}/llvm%{llvm_maj_ver}
%global llvm_triple %{_target_platform}
%global amd_device_libs_prefix lib/clang/%{llvm_maj_ver}
%global toolchain clang
%ifarch x86_64
%global targets_to_build "X86;AMDGPU"
%endif
%ifarch riscv64
%global targets_to_build "RISCV;AMDGPU"
%endif
# All the tests are not enabled both on fedora and debian
# https://salsa.debian.org/rocm-team/rocm-llvm/-/blob/debian/unstable/debian/rules
# https://src.fedoraproject.org/rpms/rocm-compilersupport/blob/rawhide/f/rocm-compilersupport.spec
# Disabled by default.
%bcond device_libs_test 0
%bcond comgr_test 0
Name: rocm-llvm
Version: %{rocm_version}
Release: %autorelease
Summary: Various AMD ROCm LLVM related services
# llvm is Apache-2.0 WITH LLVM-exception OR NCSA
# hipcc is MIT, comgr and device-libs are NCSA:
License: (Apache-2.0 WITH LLVM-exception OR NCSA) AND NCSA AND MIT
URL: https://github.com/ROCm/llvm-project
#!RemoteAsset: sha256:d76a16db4a56914383029e241823f7bc2a3d645f2967dd22230f11c11cfe189e
Source0: %{url}/archive/refs/tags/rocm-%{rocm_version}.tar.gz
Source1: rocm-llvm.prep.in
# RISC-V support patches
# https://salsa.debian.org/rocm-team/rocm-llvm/-/merge_requests/2
Patch0: 0002-Use-signed-char-in-comgr-building.patch
# Backport mainline comgr patches since 7.1.1 is build on llvm-20
Patch1: 0003-adapt-comgr-api-to-llvm-21.patch
BuildRequires: clang >= %{llvm_maj_ver}
BuildRequires: clang-devel >= %{llvm_maj_ver}
BuildRequires: clang-tools-extra
BuildRequires: cmake
BuildRequires: fdupes
BuildRequires: lld >= %{llvm_maj_ver}
BuildRequires: lld-devel >= %{llvm_maj_ver}
BuildRequires: llvm-devel >= %{llvm_maj_ver}
BuildRequires: llvm-test >= %{llvm_maj_ver}
BuildRequires: pkgconfig(libffi)
BuildRequires: pkgconfig(libxml-2.0)
BuildRequires: pkgconfig(libzstd)
# comgr requires python
BuildRequires: pkgconfig(python3)
BuildRequires: pkgconfig(zlib)
BuildRequires: rocm-cmake >= %{rocm_release}
%description
%{summary}
%package macros
Summary: ROCm Compiler RPM macros for RPM Build
BuildArch: noarch
%description macros
This package contains ROCm compiler related RPM macros.
%package -n rocm-device-libs
Summary: AMD ROCm LLVM bit code libraries
%description -n rocm-device-libs
This package contains a set of AMD specific device-side language runtime
libraries in the form of bit code. Specifically:
- Open Compute library controls
- Open Compute Math library
- Open Compute Kernel library
- OpenCL built-in library
- HIP built-in library
- Heterogeneous Compute built-in library
%package -n rocm-comgr
Summary: AMD ROCm LLVM Code Object Manager
Provides: comgr(major) = %{comgr_maj_api_ver}
Provides: rocm-comgr = %{comgr_full_api_ver}-%{release}
%description -n rocm-comgr
The AMD Code Object Manager (Comgr) is a shared library which provides
operations for creating and inspecting code objects.
%package -n rocm-comgr-devel
Summary: AMD ROCm LLVM Code Object Manager
Requires: rocm-comgr%{?_isa} = %{version}-%{release}
Requires: rocm-device-libs
%description -n rocm-comgr-devel
The AMD Code Object Manager (Comgr) development package.
%package -n hipcc
Summary: HIP compiler driver
Requires: rocm-device-libs = %{version}-%{release}
Suggests: rocminfo
%description -n hipcc
hipcc is a compiler driver utility that will call clang or nvcc, depending on
target, and pass the appropriate include and library options for the target
compiler and HIP infrastructure.
hipcc will pass-through options to the target compiler. The tools calling hipcc
must ensure the compiler options are appropriate for the target compiler.
%prep
%autosetup -p1 -n llvm-project-rocm-%{rocm_version}
# llvm_maj_ver sanity check (we should be matching the bundled llvm major ver):
if ! grep -q "set(LLVM_VERSION_MAJOR %{llvm_maj_ver})" cmake/Modules/LLVMVersion.cmake; then
echo "ERROR llvm_maj_ver macro is not correctly set"
# Sakura286: ROCm 7.1.1 uses LLVM 20, but only 21 is on openRuyi. Sad.
# TODO: Need to re-enable this 'if' when rocm upstream bump to llvm-21
# exit 1
fi
# Make sure we only build the AMD bits by discarding the bundled llvm code:
ls | grep -xv "amd" | xargs rm -r
install -pm 755 %{SOURCE1} prep.sh
sed -i -e 's@%%{_prefix}@%{_prefix}@' prep.sh
sed -i -e 's@%%{_lib}@%{_lib}@' prep.sh
sed -i -e 's@%%{amd_device_libs_prefix}@%{amd_device_libs_prefix}@' prep.sh
sed -i -e 's@%%{bundle_prefix}@%{bundle_prefix}@' prep.sh
sed -i -e 's@%%{llvm_maj_ver}@%{llvm_maj_ver}@' prep.sh
grep -v '%%{' prep.sh
. ./prep.sh
%build
CLANG_VERSION=%llvm_maj_ver
# Maybe use llvm-config-%{llvm_maj_ver} in the future
LLVM_BINDIR=`%{_libdir}/llvm%{llvm_maj_ver}/bin/llvm-config --bindir`
LLVM_CMAKEDIR=`%{_libdir}/llvm%{llvm_maj_ver}/bin/llvm-config --cmakedir`
# Only enable one target to accelerate build
GPU_TARGET="gfx1100;gfx1101;gfx1200;gfx1201"
echo "%%rocmllvm_version $CLANG_VERSION" > macros.rocmcompiler
echo "%%rocmllvm_bindir $LLVM_BINDIR" >> macros.rocmcompiler
echo "%%rocmllvm_cmakedir $LLVM_CMAKEDIR" >> macros.rocmcompiler
echo "%%rocm_gpu_list_default \"$GPU_TARGET\"" >> macros.rocmcompiler
export PATH=%{_libdir}/llvm%{llvm_maj_ver}/bin:$PATH
export INCLUDE_PATH=%{_libdir}/llvm%{llvm_maj_ver}/include
# Build device-libs first, hipcc and comgr need it
%define _vpath_srcdir amd/device-libs
%define _vpath_builddir build-devicelibs
# Workaround for bug in cmake tests not finding amdgcn:
ln -s %{amd_device_libs_prefix}/amdgcn amdgcn
#TODO ROCM_DEVICE_LIBS_BITCODE_INSTALL_LOC_* should be removed in ROCm 7.0:
%cmake -DROCM_DEVICE_LIBS_BITCODE_INSTALL_LOC_NEW="%{amd_device_libs_prefix}/amdgcn" \
-DROCM_DEVICE_LIBS_BITCODE_INSTALL_LOC_OLD="" \
-DCMAKE_EXE_LINKER_FLAGS:STRING="-fuse-ld=lld" \
%{?__cmake_build_type:-DCMAKE_BUILD_TYPE="%{__cmake_build_type}"}
%cmake_build -- %{?_smp_mflags}
# Used by comgr to find device libs when building:
export ROCM_PATH=$(realpath %__cmake_builddir)
# Build comgr
%define _vpath_srcdir amd/comgr
%define _vpath_builddir build-comgr
%cmake -DCMAKE_PREFIX_PATH=$ROCM_PATH \
-DCMAKE_MODULE_PATH=%{_libdir}/llvm%{llvm_maj_ver}/lib \
-DCMAKE_BUILD_TYPE="RELEASE" \
-DCMAKE_EXE_LINKER_FLAGS:STRING="-fuse-ld=lld" \
-DBUILD_TESTING=%{?with_comgr_test:ON}%{!?with_comgr_test:OFF}
%cmake_build -- %{?_smp_mflags}
# Build hipcc
%define _vpath_srcdir amd/hipcc
%define _vpath_builddir build-hipcc
%cmake -DHIPCC_BACKWARD_COMPATIBILITY=OFF \
-DCMAKE_EXE_LINKER_FLAGS:STRING="-fuse-ld=lld"
%cmake_build -- %{?_smp_mflags}
%check
# Test device-libs
%define _vpath_srcdir amd/device-libs
%define _vpath_builddir build-devicelibs
# Workaround for bug in cmake tests not finding amdgcn:
ln -s %{amd_device_libs_prefix}/amdgcn build-devicelibs/amdgcn
# Below tests are failed:
# 6 - compile_native_rcp__gfx600 (Failed)
# 7 - compile_native_rsqrt__gfx600 (Failed)
# 10 - compile_native_rcp__gfx700 (Failed)
# 11 - compile_native_rsqrt__gfx700 (Failed)
# 14 - compile_native_rcp__gfx803 (Failed)
# 15 - compile_native_rsqrt__gfx803 (Failed)
# 18 - compile_atomic_work_item_fence__gfx803 (Failed)
# 19 - compile_atomic_work_item_fence__gfx900 (Failed)
# 20 - compile_atomic_work_item_fence__gfx90a (Failed)
# 21 - compile_atomic_work_item_fence__gfx1030 (Failed)
# 22 - compile_atomic_work_item_fence__gfx1100 (Failed)
# 23 - compile_atomic_work_item_fence__gfx1200 (Failed)
%{?with_device_libs_test:%ctest}
# Test comgr
%define _vpath_srcdir amd/comgr
%define _vpath_builddir build-comgr
# Below tests are failed:
# 2 - comgr_disasm_llvm_reloc_test (SEGFAULT)
# 3 - comgr_disasm_llvm_so_test (SEGFAULT)
# 5 - comgr_disasm_options_test (SEGFAULT)
# 13 - comgr_compile_test (Failed)
# 14 - comgr_compile_minimal_test (Failed)
# 16 - comgr_compile_log_remarks_test (Failed)
# 17 - comgr_compile_source_with_device_libs_to_bc_with_vfs_test (Failed)
# 21 - comgr_get_data_isa_name_test (Failed)
# 29 - comgr_mangled_names_test (Failed)
# 30 - comgr_multithread_test (SEGFAULT)
# 32 - comgr_compile_hip_test (Failed)
# 33 - comgr_compile_hip_to_relocatable (Failed)
# 34 - comgr_mangled_names_hip_test (Failed)
# 35 - comgr_unbundle_hip_test (Failed)
%{?with_comgr_test:%ctest}
%install
# Install macros
install -Dpm 644 macros.rocmcompiler \
%{buildroot}%{_rpmmacrodir}/macros.rocmcompiler
# Install device-libs
%define _vpath_builddir build-devicelibs
%cmake_install
# Install comgr
%define _vpath_builddir build-comgr
%cmake_install
# Install hipcc
%define _vpath_builddir build-hipcc
%cmake_install
rm -f %{buildroot}%{_datadir}/doc/ROCm-Device-Libs/LICENSE.TXT
rm -rf %{buildroot}%{_datadir}/doc/amd_comgr
rm -f %{buildroot}%{_datadir}/doc/hipcc/LICENSE.txt
rm -f %{buildroot}%{_datadir}/doc/hipcc/README.md
%files macros
%{_rpmmacrodir}/macros.rocmcompiler
%files -n rocm-device-libs
%doc amd/device-libs/README.md amd/device-libs/doc/*.md
%license amd/device-libs/LICENSE.TXT
%dir %{_libdir}/cmake/AMDDeviceLibs
%{_libdir}/cmake/AMDDeviceLibs/*.cmake
%{_prefix}/%{amd_device_libs_prefix}/amdgcn
%files -n rocm-comgr
%doc amd/comgr/README.md
%license amd/comgr/LICENSE.txt
%license amd/comgr/NOTICES.txt
%{_libdir}/libamd_comgr.so.*
%files -n rocm-comgr-devel
%dir %{_includedir}/amd_comgr
%dir %{_libdir}/cmake/amd_comgr
%{_includedir}/amd_comgr/amd_comgr.h
%{_libdir}/libamd_comgr.so
%{_libdir}/cmake/amd_comgr/*.cmake
%files -n hipcc
%doc amd/hipcc/README.md
%license amd/hipcc/LICENSE.txt
%license amd/hipcc/README.md
%{_bindir}/hipcc
%{_bindir}/hipconfig
%{_bindir}/hipvars.pm
%changelog
%{?autochangelog}
@@ -0,0 +1,32 @@
From 3d470b52d1cc06a3e33cbf38bdde6d3f4f008a25 Mon Sep 17 00:00:00 2001
From: Tom Rix <Tom.Rix@amd.com>
Date: Mon, 3 Nov 2025 06:33:31 -0800
Subject: [PATCH] rocm-origami remove scope for variables
---
cmake/origami-config.cmake.in | 12 ------------
1 file changed, 12 deletions(-)
diff --git a/cmake/origami-config.cmake.in b/cmake/origami-config.cmake.in
index d6c8000d0261..19370e7dd5bd 100644
--- a/cmake/origami-config.cmake.in
+++ b/cmake/origami-config.cmake.in
@@ -6,15 +6,4 @@ find_dependency(hip REQUIRED)
include("${CMAKE_CURRENT_LIST_DIR}/origami-targets.cmake")
-block(SCOPE_FOR VARIABLES)
- if(NOT TARGET origami::origami)
- message(FATAL_ERROR "origami::origami target is missing")
- endif()
-
- get_target_property(link_libraries origami::origami INTERFACE_LINK_LIBRARIES)
-
- if(link_libraries AND "hip::device" IN_LIST link_libraries)
- message(FATAL_ERROR "Do not export targets with hip::device as an interface link library")
- endif()
-endblock()
--
2.51.0
+81
View File
@@ -0,0 +1,81 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%global rocm_release 7.1
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
Name: rocm-origami
Version: %{rocm_version}
Release: %autorelease
Summary: Analytical GEMM Solution Selection
License: MIT
Url: https://github.com/ROCm/rocm-libraries
#!RemoteAsset: sha256:1fb56e620a06e198aeec2cf37c11e6879d0c67c62e295b48779b7f486e34acb4
Source0: %{url}/releases/download/rocm-%{version}/origami.tar.gz
# License file is not included in the release tarball
#!RemoteAsset: sha256:b185aaa652b0bf066c37a0d6314ce4bf4521e4a3c9bf46edd2f6a777ac522223
Source1: https://raw.githubusercontent.com/ROCm/rocm-libraries/develop/shared/origami/LICENSE.md
BuildSystem: cmake
BuildOption(conf): -G Ninja
BuildOption(conf): -DCMAKE_VERBOSE_MAKEFILE=ON
# Workaround hipblaslt build issue:
# origami::origami target is missing
# https://github.com/ROCm/rocm-libraries/issues/2422
Patch0: 0001-rocm-origami-remove-scope-for-variables.patch
BuildRequires: clang
BuildRequires: cmake
BuildRequires: cmake(hip)
BuildRequires: lld
BuildRequires: llvm
BuildRequires: rocm-cmake
BuildRequires: rocm-llvm-macros
BuildRequires: ninja
%description
The name "origami" still evokes the elegance of transforming
a flat (2-D) sheet into intricate higher dimensional
structures. In this context, however, Origami has evolved
into a tool set for GEMM solution selection and optimization.
Inspired by the art of paper folding, the library now enables
users to explore a range of tiling and mapping configurations
and to make informed decisions on data and computation mapping
for high-performance GEMM operations.
%package devel
Summary: Libraries and headers for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description devel
%{summary}
%prep -a
# License file is not in the tarball
cp %{SOURCE1} .
# Use system rocm-cmake, no downloading
sed -i -e 's@if(NOT ROCM_FOUND)@if(FALSE)@' cmake/dependencies.cmake
# We are building from a tarball, not a git repo
sed -i -e 's@find_package(Git REQUIRED)@#find_package(Git REQUIRED)@' cmake/dependencies.cmake
%install -a
rm -f %{buildroot}%{_datadir}/doc/origami/LICENSE.md
%files
%doc README.md
%license LICENSE.md
%{_libdir}/liborigami.so.0{,.*}
%files devel
%{_includedir}/origami/
%{_libdir}/cmake/origami/
%{_libdir}/liborigami.so
%changelog
%autochangelog
-69
View File
@@ -1,69 +0,0 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%global rocm_version 7.1.1
Name: rocprim
Version: %{rocm_version}
Release: %autorelease
Summary: ROCm parallel primatives
License: MIT AND BSD-3-Clause
URL: https://github.com/ROCm/rocPRIM
#!RemoteAsset
Source0: %{url}/archive/rocm-%{rocm_version}.tar.gz
BuildSystem: cmake
BuildOption(conf): -DBUILD_FILE_REORG_BACKWARD_COMPATIBILITY=OFF
BuildOption(conf): -DBUILD_TEST=OFF
BuildOption(conf): -DCMAKE_AR=%{rocmllvm_bindir}/llvm-ar
BuildOption(conf): -DCMAKE_C_COMPILER=%{rocmllvm_bindir}/clang
BuildOption(conf): -DCMAKE_CXX_COMPILER=%{rocmllvm_bindir}/clang++
BuildOption(conf): -DCMAKE_LINKER=%{rocmllvm_bindir}/ld.lld
BuildOption(conf): -DCMAKE_PREFIX_PATH=%{rocmllvm_cmakedir}/..
BuildOption(conf): -DCMAKE_RANLIB=%{rocmllvm_bindir}/llvm-ranlib
BuildOption(conf): -DGPU_TARGETS=%{rocm_gpu_list_default}
BuildOption(conf): -DROCM_SYMLINK_LIBS=OFF
BuildRequires: clang-tools-extra-devel
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(Clang)
BuildRequires: cmake(hip)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(LLD)
BuildRequires: cmake(LLVM)
BuildRequires: gcc-c++
BuildRequires: python3
BuildRequires: rocm-cmake
BuildRequires: rocm-device-libs
BuildRequires: rocm-llvm-macros
BuildRequires: rocminfo
%description
The rocPRIM is a header-only library providing HIP parallel primitives
for developing performant GPU-accelerated code on AMD ROCm platform.
%package devel
Summary: ROCm parallel primatives
BuildArch: noarch
%description devel
The rocPRIM is a header-only library providing HIP parallel primitives
for developing performant GPU-accelerated code on AMD ROCm platform.
%install -a
rm -f %{buildroot}%{_prefix}/share/doc/rocprim/LICENSE.md
%files devel
%doc README.md
%license LICENSE.md
%license NOTICES.txt
%{_includedir}/%{name}
%{_libdir}/cmake/rocprim
%changelog
%{?autochangelog}
@@ -1,42 +0,0 @@
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 2fdf5c4..0aac139 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -78,7 +78,8 @@ set(CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake ${PROJECT_SOURCE_DIR}/cmake/Mo
include(GNUInstallDirs) # install directories
# ROCm does not use lib64
-set(CMAKE_INSTALL_LIBDIR "lib")
+# But distributions use
+# set(CMAKE_INSTALL_LIBDIR "lib")
include(rocprofiler_register_utilities) # various functions/macros
include(rocprofiler_register_interfaces) # interface libraries
@@ -113,6 +114,7 @@ if(ROCPROFILER_REGISTER_BUILD_SAMPLES)
add_subdirectory(samples)
endif()
-include(rocprofiler_register_config_packaging)
+# packaging don't need cpack
+# include(rocprofiler_register_config_packaging)
rocprofiler_register_print_features()
diff --git a/source/lib/rocprofiler-register/CMakeLists.txt b/source/lib/rocprofiler-register/CMakeLists.txt
index e15fa88..6d9591e 100644
--- a/source/lib/rocprofiler-register/CMakeLists.txt
+++ b/source/lib/rocprofiler-register/CMakeLists.txt
@@ -20,10 +20,13 @@ target_include_directories(
target_link_libraries(
rocprofiler-register
PUBLIC rocprofiler-register::headers
- PRIVATE fmt::fmt glog::glog rocprofiler-register::build-flags
+ PRIVATE fmt glog rocprofiler-register::build-flags
rocprofiler-register::memcheck rocprofiler-register::stdcxxfs
rocprofiler-register::dl)
+target_compile_definitions(rocprofiler-register
+ PRIVATE GLOG_USE_GLOG_EXPORT)
+
set_target_properties(
rocprofiler-register
PROPERTIES OUTPUT_NAME rocprofiler-register
@@ -1,78 +0,0 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%global rocm_version 7.1.0
Name: rocprofiler-register
Version: %{rocm_version}
Release: %autorelease
Summary: A rocprofiler helper library
License: MIT AND BSD-3-Clause
Url: https://github.com/ROCm/rocprofiler-register
#!RemoteAsset
Source0: %{url}/archive/rocm-%{rocm_version}.tar.gz
BuildSystem: cmake
# Use system glog
Patch0: 0001-use-system-buildreq.patch
BuildOption(conf): -DROCPROFILER_REGISTER_BUILD_FMT=OFF
BuildOption(conf): -DROCPROFILER_REGISTER_BUILD_GLOG=OFF
BuildRequires: cmake
BuildRequires: cmake(glog)
BuildRequires: gcc-c++
BuildRequires: pkgconfig(fmt)
BuildRequires: pkgconfig(gflags)
%description
The rocprofiler-register library is a helper library that coordinates
the modification of the intercept API table(s) of the HSA/HIP/ROCTx
runtime libraries by the ROCprofiler (v2) library. The purpose of this
library is to provide a consistent and automated mechanism of enabling
performance analysis in the ROCm runtimes which does not rely on
environment variables or unique methods for each runtime library.
When a runtime is initialized (either explicitly and lazily) and the
intercept API table is constructed, it passes this API table to
rocprofiler-register. Rocprofiler-register scans the symbols in the
address space and if it detects there is at least one visible symbol
named rocprofiler_configure (which is a function provided by tools),
it passes the intercept API table to the rocprofiler library (dlopening
the rocprofiler library if it is not already loaded). The rocprofiler
library then does an extensive scan for all the instances of the
rocprofiler_configure symbols and invokes each of them. The
rocprofiler_configure function (again, provided by a tool) returns
effectively tells rocprofiler which behaviors it wants to be notified
about, features it wants to use (e.g. API tracing, kernel dispatch
timing), etc.
%package devel
Summary: The development package for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description devel
%{summary}
%install -a
# Do not install the test source etc
rm -rf %{buildroot}%{_prefix}/share/rocprofiler-register
rm -rf %{buildroot}%{_prefix}/share/modulefiles
rm -rf %{buildroot}%{_prefix}/share/doc/rocprofiler-register/LICENSE.md
%files
%license LICENSE.md
%{_libdir}/librocprofiler-register.so.0{,.*}
%files devel
%doc README.md
%{_includedir}/rocprofiler-register/
%{_libdir}/librocprofiler-register.so
%{_libdir}/cmake/rocprofiler-register/
%changelog
%{?autochangelog}
+98
View File
@@ -0,0 +1,98 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: Sakura286 <chenxuan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
# rocRAND need a GPU to run tests, but we could still
# keep the test cases for packagers who have a GPU, so make it optional.
%bcond test 0
%if %{with test}
%global build_test ON
%else
%global build_test OFF
%endif
%global rocm_release 7.1
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
# rocm builds with clang
%global toolchain clang
Name: rocrand
Version: %{rocm_version}
Release: %autorelease
Summary: ROCm random number generator
License: MIT AND BSD-3-Clause
Url: https://github.com/ROCm/rocRAND
#!RemoteAsset: sha256:15c33c595aa8e4de1d8b3736df9eaf2ceba7914ffebe718f0997b0da28215d9e
Source: %{url}/archive/rocm-%{version}.tar.gz
BuildSystem: cmake
BuildOption(conf): -G Ninja
BuildOption(conf): -DAMDGPU_TARGETS=%{rocm_gpu_list_default}
BuildOption(conf): -DBUILD_TEST=%{build_test}
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
%if %{with test}
BuildRequires: cmake(GTest)
%endif
BuildRequires: cmake(hip)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: compiler-rt
BuildRequires: lld
BuildRequires: llvm
BuildRequires: ninja
BuildRequires: rocm-cmake
BuildRequires: rocm-device-libs
BuildRequires: rocm-llvm-macros
%description
The rocRAND project provides functions that generate pseudo-random and
quasi-random numbers.
The rocRAND library is implemented in the HIP programming language and
optimized for AMD's latest discrete GPUs. It is designed to run on top of AMD's
Radeon Open Compute ROCm runtime, but it also works on CUDA enabled GPUs.
%package devel
Summary: The rocRAND development package
Requires: %{name}%{?_isa} = %{version}-%{release}
%description devel
The rocRAND development package.
%if %{with test}
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%endif
%install -a
rm -f %{buildroot}%{_datadir}/doc/rocrand/LICENSE.md
%files
%doc README.md
%license LICENSE.md
%{_libdir}/librocrand.so.1{,.*}
%files devel
%{_includedir}/rocrand/
%{_libdir}/cmake/rocrand/
%{_libdir}/librocrand.so
%if %{with test}
%files test
%{_bindir}/rocRAND/
%{_bindir}/test_*
%endif
%changelog
%autochangelog
@@ -1,49 +0,0 @@
From 22d2be00dc2289037144abeffab7a5526a8014ea Mon Sep 17 00:00:00 2001
From: Tom Rix <Tom.Rix@amd.com>
Date: Thu, 30 Oct 2025 11:27:03 -0700
Subject: [PATCH] rocsolver ninja job pools
---
CMakeLists.txt | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 32757570f70f..003b37f98fc5 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -125,6 +125,32 @@ option(BUILD_SHARED_LIBS "Build rocSOLVER as a shared library" ON)
include(util)
include(CheckLanguage)
include(CMakeDependentOption)
+#
+# Seperate linking jobs from compiling
+# Too many concurrent linking jobs can break the build
+# Copied from LLVM
+set(ROCSOLVER_PARALLEL_LINK_JOBS "" CACHE STRING
+ "Define the maximum number of concurrent link jobs (Ninja only).")
+if(CMAKE_GENERATOR MATCHES "Ninja")
+ if(ROCSOLVER_PARALLEL_LINK_JOBS)
+ set_property(GLOBAL APPEND PROPERTY JOB_POOLS link_job_pool=${ROCSOLVER_PARALLEL_LINK_JOBS})
+ set(CMAKE_JOB_POOL_LINK link_job_pool)
+ endif()
+elseif(ROCSOLVER_PARALLEL_LINK_JOBS)
+ message(WARNING "Job pooling is only available with Ninja generators.")
+endif()
+# Similar for compiling
+set(ROCSOLVER_PARALLEL_COMPILE_JOBS "" CACHE STRING
+ "Define the maximum number of concurrent compile jobs (Ninja only).")
+if(CMAKE_GENERATOR MATCHES "Ninja")
+ if(ROCSOLVER_PARALLEL_COMPILE_JOBS)
+ set_property(GLOBAL APPEND PROPERTY JOB_POOLS compile_job_pool=${ROCSOLVER_PARALLEL_COMPILE_JOBS})
+ set(CMAKE_JOB_POOL_COMPILE compile_job_pool)
+ endif()
+elseif(ROCSOLVER_PARALLEL_COMPILE_JOBS)
+ message(WARNING "Job pooling is only available with Ninja generators.")
+endif()
+
include(CheckCXXCompilerFlag)
option(BUILD_TESTING "Build rocSOLVER tests" OFF)
--
2.51.0
@@ -1,42 +0,0 @@
From 10affbe2ed6ad66b8a3940bc077161d71d8a8d54 Mon Sep 17 00:00:00 2001
From: Tom Rix <Tom.Rix@amd.com>
Date: Thu, 30 Oct 2025 11:38:27 -0700
Subject: [PATCH] rocsolver parallel jobs
---
CMakeLists.txt | 3 +++
library/src/CMakeLists.txt | 4 ++++
2 files changed, 7 insertions(+)
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 003b37f98fc5..1f93a519e537 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -177,6 +177,9 @@ option(WERROR "Treat warnings as errors" OFF)
option(BUILD_COMPRESSED_DBG "Enable compressed debug symbols" ON)
check_cxx_compiler_flag("--offload-compress" CXX_COMPILER_SUPPORTS_OFFLOAD_COMPRESS)
cmake_dependent_option(BUILD_OFFLOAD_COMPRESS "Build with offload compression" ON CXX_COMPILER_SUPPORTS_OFFLOAD_COMPRESS OFF)
+check_cxx_compiler_flag("-parallel-jobs=4" CXX_COMPILER_SUPPORTS_PARALLEL_HIP_JOBS)
+cmake_dependent_option(BUILD_PARALLEL_HIP_JOBS "Build with parallel hip jobs" ON CXX_COMPILER_SUPPORTS_PARALLEL_HIP_JOBS OFF)
+
message(STATUS "Tests: ${BUILD_CLIENTS_TESTS}")
message(STATUS "Benchmarks: ${BUILD_CLIENTS_BENCHMARKS}")
diff --git a/library/src/CMakeLists.txt b/library/src/CMakeLists.txt
index b39646ee0f1d..7c5cc98b19ba 100755
--- a/library/src/CMakeLists.txt
+++ b/library/src/CMakeLists.txt
@@ -448,6 +448,10 @@ if(BUILD_OFFLOAD_COMPRESS)
target_compile_options(rocsolver PRIVATE "--offload-compress")
endif()
+if(BUILD_PARALLEL_HIP_JOBS)
+ target_compile_options(rocsolver PRIVATE "-parallel-jobs=4")
+endif()
+
target_include_directories(rocsolver
PUBLIC
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/library/include>
--
2.51.0
-109
View File
@@ -1,109 +0,0 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%global rocm_version 7.1.1
# consume too much time
%bcond test 0
%bcond sample 0
%bcond benchmark 0
Name: rocsolver
Version: %{rocm_version}
Release: %autorelease
Summary: Next generation LAPACK implementation for ROCm platform
License: BSD-3-Clause AND BSD-2-Clause
Url: https://github.com/ROCm/rocSOLVER
#!RemoteAsset
Source0: %{url}/archive/rocm-%{rocm_version}.tar.gz
BuildSystem: cmake
BuildOption(conf): -G Ninja
BuildOption(conf): -DCMAKE_CXX_COMPILER=hipcc
BuildOption(conf): -DCMAKE_C_COMPILER=clang
BuildOption(conf): -DCMAKE_AR=%rocmllvm_bindir/llvm-ar
BuildOption(conf): -DCMAKE_RANLIB=%rocmllvm_bindir/llvm-ranlib
BuildOption(conf): -DCMAKE_PREFIX_PATH=%{rocmllvm_cmakedir}/..
BuildOption(conf): -DBUILD_FILE_REORG_BACKWARD_COMPATIBILITY=OFF
BuildOption(conf): -DROCM_SYMLINK_LIBS=OFF
BuildOption(conf): -DHIP_PLATFORM=amd
BuildOption(conf): -DAMDGPU_TARGETS=%{rocm_gpu_list_default}
BuildOption(conf): -DBUILD_OFFLOAD_COMPRESS=ON
BuildOption(conf): -DBUILD_PARALLEL_HIP_JOBS=ON
BuildOption(conf): -DBUILD_CLIENTS_TESTS=%{?with_test:ON}%{!?with_test:OFF}
BuildOption(conf): -DBUILD_CLIENTS_BENCHMARKS=%{?with_benchmark:ON}%{!?with_benchmark:OFF}
BuildOption(conf): -DBUILD_CLIENTS_SAMPLES=%{?with_sample:ON}%{!?with_sample:OFF}
# https://github.com/ROCm/rocSOLVER/pull/652
Patch0: 0001-rocsolver-ninja-job-pools.patch
# https://github.com/ROCm/rocSOLVER/pull/962
Patch1: 0001-rocsolver-parallel-jobs.patch
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(fmt)
BuildRequires: cmake(hip)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(rocblas)
BuildRequires: cmake(rocprim)
BuildRequires: compiler-rt
BuildRequires: gcc-c++
BuildRequires: hipcc
BuildRequires: lld
BuildRequires: llvm
BuildRequires: ninja
BuildRequires: pkgconfig(libzstd)
BuildRequires: rocm-cmake
BuildRequires: rocm-llvm-macros
BuildRequires: rocsparse-devel
BuildRequires: rocminfo
Provides: rocsolver = %{version}-%{release}
%description
rocSOLVER is a work-in-progress implementation of a subset
of LAPACK functionality on the ROCm platform.
%package devel
Summary: Libraries and headers for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description devel
%{summary}
%if %{with test}
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%endif
%install -a
rm -f %{buildroot}%{_prefix}/share/doc/rocsolver/LICENSE.md
%files
%license LICENSE.md
%doc README.md
%{_libdir}/librocsolver.so.0{,.*}
%files devel
%{_includedir}/rocsolver/
%{_libdir}/librocsolver.so
%{_libdir}/cmake/rocsolver/
%if %{with test}
%files test
%{_datadir}/rocsolver/
%{_bindir}/rocsolver*
%endif
%changelog
%{?autochangelog}
-134
View File
@@ -1,134 +0,0 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
%bcond test 0
%global rocm_version 7.1.1
Name: rocsparse
Version: %{rocm_version}
Release: %autorelease
Summary: SPARSE implementation for ROCm
License: MIT
Url: https://github.com/ROCm/rocSPARSE
#!RemoteAsset
Source0: %{url}/archive/rocm-%{rocm_version}.tar.gz
BuildSystem: cmake
BuildOption(conf): -DBUILD_FILE_REORG_BACKWARD_COMPATIBILITY=OFF
BuildOption(conf): -DBUILD_WITH_OFFLOAD_COMPRESS=ON
BuildOption(conf): -DCMAKE_CXX_COMPILER=hipcc
BuildOption(conf): -DCMAKE_C_COMPILER=clang
BuildOption(conf): -DCMAKE_LINKER=%rocmllvm_bindir/ld.lld
BuildOption(conf): -DCMAKE_AR=%rocmllvm_bindir/llvm-ar
BuildOption(conf): -DCMAKE_RANLIB=%rocmllvm_bindir/llvm-ranlib
BuildOption(conf): -DCMAKE_PREFIX_PATH=%{rocmllvm_cmakedir}/..
BuildOption(conf): -DHIP_PLATFORM=amd
BuildOption(conf): -DROCM_SYMLINK_LIBS=OFF
BuildOption(conf): -DBUILD_CLIENTS_BENCHMARKS=%{?with_test:ON}%{!?with_test:OFF}
BuildOption(conf): -DBUILD_CLIENTS_TESTS=%{?with_test:ON}%{!?with_test:OFF}
BuildOption(conf): -DBUILD_CLIENTS_TESTS_OPENMP=OFF
BuildOption(conf): -DBUILD_FORTRAN_CLIENTS=OFF
BuildOption(conf): -DGPU_TARGETS=%{rocm_gpu_list_default}
BuildOption(conf): -G Ninja
%if %{with test}
BuildOption(conf): -DCMAKE_MATRICES_DIR=%{_builddir}/rocsparse-test-matrices/
%endif
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: compiler-rt
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(hip)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: cmake(rocprim)
BuildRequires: hipcc
BuildRequires: lld
BuildRequires: llvm
BuildRequires: gcc-c++
BuildRequires: ninja
BuildRequires: pkgconfig(libzstd)
BuildRequires: python3
BuildRequires: rocm-cmake
BuildRequires: rocm-llvm-macros
BuildRequires: rocminfo
%if %{with test}
BuildRequires: cmake(GTest)
BuildRequires: cmake(rocblas)
BuildRequires: cmake(openmp)
BuildRequires: gcc-gfortran
BuildRequires: python3dist(pyyaml)
%endif
Provides: %{name} = %{version}-%{release}
%description
rocSPARSE exposes a common interface that provides Basic
Linear Algebra Subroutines for sparse computation
implemented on top of AMD's Radeon Open eCosystem Platform
ROCm runtime and toolchains. rocSPARSE is created using
the HIP programming language and optimized for AMD's
latest discrete GPUs.
%package devel
Summary: Libraries and headers for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description devel
%{summary}
%if %{with test}
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%endif
%prep -a
# /usr/include/gtest/internal/gtest-port.h:273:2: error: C++ versions less than C++17 are not supported.
# Convert the c++14 to c++17
sed -i -e 's@set(CMAKE_CXX_STANDARD 14)@set(CMAKE_CXX_STANDARD 17)@' {,clients/}CMakeLists.txt
%install -a
rm -f %{buildroot}%{_prefix}/share/doc/rocsparse/LICENSE.md
%if %{with test}
mkdir -p %{buildroot}/%{_datadir}/rocsparse/matrices
install -pm 644 %{_builddir}/rocsparse-test-matrices/* %{buildroot}/%{_datadir}/rocsparse/matrices
%endif
%check
%if %{with test}
export LD_LIBRARY_PATH=%{_vpath_builddir}/library:$LD_LIBRARY_PATH
%{_vpath_builddir}/clients/staging/rocsparse-test
%endif
%files
%doc README.md
%license LICENSE.md
%{_libdir}/librocsparse.so.1{,.*}
%files devel
%{_includedir}/rocsparse/
%{_libdir}/librocsparse.so
%{_libdir}/cmake/rocsparse/
%if %{with test}
%files test
%{_bindir}/rocsparse*
%{_datadir}/rocsparse/test/rocsparse_*
%{_datadir}/rocsparse/
%{_libdir}/rocsparse/
%{_libexecdir}/rocsparse/
%endif
%changelog
%{?autochangelog}
+108
View File
@@ -0,0 +1,108 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
# rocThrust needs a GPU to run tests, but we could still
# keep the test cases for packagers who have a GPU, so make it optional.
%bcond test 0
%if %{with test}
%global build_test ON
%else
%global build_test OFF
%endif
%global rocm_release 7.1
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
# rocm builds with clang
%global toolchain clang
Name: rocthrust
Version: %{rocm_version}
Release: %autorelease
Summary: ROCm Thrust library
Url: https://github.com/ROCm/rocThrust
VCS: git:https://github.com/ROCm/rocThrust.git
License: Apache-2.0 AND BSD-2-Clause AND BSD-3-Clause AND BSL-1.0 AND MIT
# All files are Apache 2.0 with some exceptions:
# ./cmake contains only files under MIT
# ./internal/benchmark/*.py are dual licensed Apache 2.0 and Boost 1.0
# ./thrust/ contain some header files that are Boost 1.0 licensed
# ./thrust/ contain some headers that are dual Apache 2.0 and Boost 1.0
# ./thrust/cmake/FindTBB.cmake is public domain
# ./thrust/detail/allocator/allocator_traits.h is dual Apache 2.0 and MIT
# ./thrust/detail/complex contains BSD 2 clause licensed headers
#!RemoteAsset: sha256:995f9498402f207d04aac1edeb845abea295f6f132151ae1e04a6f0d0dc5edf5
Source: %{url}/archive/rocm-%{version}.tar.gz
BuildSystem: cmake
BuildOption(conf): -G Ninja
BuildOption(conf): -DAMDGPU_TARGETS=%{rocm_gpu_list_default}
BuildOption(conf): -DBUILD_TEST=%{build_test}
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: cmake
%if %{with test}
BuildRequires: cmake(GTest)
%endif
BuildRequires: cmake(hip)
BuildRequires: cmake(rocprim)
BuildRequires: compiler-rt
BuildRequires: lld
BuildRequires: llvm
BuildRequires: ninja
BuildRequires: rocm-cmake
BuildRequires: rocm-device-libs
BuildRequires: rocm-llvm-macros
%description
Thrust is a parallel algorithm library. This library has been
ported to HIP/ROCm platform, which uses the rocPRIM library.
%package devel
Summary: Libraries and headers for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description devel
%{summary}
%if %{with test}
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%endif
%prep -a
# ROCMExportTargetsHeaderOnly.cmake hardcodes 'lib' as the library directory.
# Change it to the correct platform-specific library directory.
sed -i -e 's/ROCM_INSTALL_LIBDIR lib/ROCM_INSTALL_LIBDIR %{_lib}/' cmake/ROCMExportTargetsHeaderOnly.cmake
%install -a
rm -f %{buildroot}%{_docdir}/rocthrust/LICENSE
%files
%doc README.md
%license LICENSE
%license NOTICES.txt
%files devel
%{_includedir}/thrust/
%{_libdir}/cmake/rocthrust/
%if %{with test}
%files test
%{_bindir}/test_*
%{_bindir}/rocthrust/
%endif
%changelog
%autochangelog
+106
View File
@@ -0,0 +1,106 @@
# SPDX-FileCopyrightText: (C) 2026 Institute of Software, Chinese Academy of Sciences (ISCAS)
# SPDX-FileCopyrightText: (C) 2026 openRuyi Project Contributors
# SPDX-FileContributor: CHEN Xuan <chenxuan@iscas.ac.cn>
# SPDX-FileContributor: Yifan Xu <xuyifan@iscas.ac.cn>
#
# SPDX-License-Identifier: MulanPSL-2.0
# roctracer needs a GPU to run tests, but we could still
# keep the test cases for packagers who have a GPU, so make it optional.
%bcond test 0
%if %{with test}
%global build_test ON
%else
%global build_test OFF
%endif
%global rocm_release 7.1
%global rocm_patch 1
%global rocm_version %{rocm_release}.%{rocm_patch}
# rocm stack builds with clang
%global toolchain clang
Name: roctracer
Version: %{rocm_version}
Release: %autorelease
Summary: ROCm Tracer Callback/Activity Library
Url: https://github.com/ROCm/roctracer
VCS: git:https://github.com/ROCm/roctracer.git
License: MIT
#!RemoteAsset: sha256:dec80803c6d2d684759172145177849efda65672645b95a2f2ad1a84335043bb
Source: %{url}/archive/rocm-%{version}.tar.gz
BuildSystem: cmake
BuildOption(conf): -G Ninja
BuildOption(conf): -DGPU_TARGETS=%{rocm_gpu_list_default}
BuildRequires: clang
BuildRequires: clang-tools-extra
BuildRequires: cmake
BuildRequires: cmake(amd_comgr)
BuildRequires: cmake(hip)
BuildRequires: cmake(hsa-runtime64)
BuildRequires: compiler-rt
BuildRequires: lld
BuildRequires: llvm
BuildRequires: ninja
BuildRequires: pkgconfig(atomic_ops)
BuildRequires: python3dist(cppheaderparser)
BuildRequires: rocm-cmake
BuildRequires: rocm-device-libs
BuildRequires: rocm-llvm-macros
%description
roctracer is a callback and activity tracing library for ROCm. It provides
function call tracing for HIP and other ROCm runtimes, activity (asynchronous)
tracing, and ROCTx user-defined event markers.
%package devel
Summary: The roctracer development package
Requires: %{name}%{?_isa} = %{version}-%{release}
%description devel
The roctracer development package.
%if %{with test}
%package test
Summary: Tests for %{name}
Requires: %{name}%{?_isa} = %{version}-%{release}
%description test
%{summary}
%endif
%prep -a
# No knob in cmake to turn off testing
%if %{without test}
sed -i -e 's@add_subdirectory(test)@#add_subdirectory(test)@' CMakeLists.txt
%else
# Adjust test running script lib dir
sed -i -e 's@../lib/@../%{_lib}/@' test/run.sh
%endif
%install -a
rm -f %{buildroot}%{_datadir}/doc/%{name}/LICENSE.md
rm -rf %{buildroot}%{_datadir}/doc/%{name}-asan
%files
%license LICENSE.md
%doc README.md
%{_libdir}/libroctracer64.so.*
%{_libdir}/libroctx64.so.*
%{_libdir}/roctracer/
%files devel
%{_includedir}/roctracer/
%{_libdir}/libroctracer64.so
%{_libdir}/libroctx64.so
%if %{with test}
%files test
%{_datadir}/roctracer/
%endif
%changelog
%autochangelog