- Nov 07, 2022
-
-
Nathanaël Schaeffer authored
-
- Nov 01, 2022
-
-
Nathanaël Schaeffer authored
-
- Oct 31, 2022
-
-
Nathanaël Schaeffer authored
-
Nathanaël Schaeffer authored
-
- Oct 26, 2022
-
-
Nathanaël Schaeffer authored
-
- Oct 25, 2022
-
-
Nathanaël Schaeffer authored
CUDA: decouple LSPAN from BLOCKSIZE in leg_m_kernel() for greater flexibility and better performance with multiple warps per block.
-
- Oct 24, 2022
-
-
Nathanaël Schaeffer authored
HIP/CUDA: new flag SHT_ALLOW_SH2ISH_FUSE to fuse sh2shioka() into leg_m_kernel() for better performance.
-
Nathanaël Schaeffer authored
-
Nathanaël Schaeffer authored
CUDA: allow NW>1 for HI_LLIM in leg_m_kernel() with polar optim, leading to performance improvements (5-10% on V100)
-
Nathanaël Schaeffer authored
CUDA: in (i)leg_m_kernel(), swap z and y dimensions of grid (better cache usage?), for 3% performance improvement on V100
-
- Oct 21, 2022
-
-
Nathanaël Schaeffer authored
-
Nathanaël Schaeffer authored
-
- Oct 20, 2022
-
-
Nathanaël Schaeffer authored
-
- Oct 08, 2022
-
-
Nathanaël Schaeffer authored
-
- Sep 26, 2022
-
-
Nathanaël Schaeffer authored
-
Nathanaël Schaeffer authored
-
- Jun 30, 2022
-
-
Nathanael Schaeffer authored
-
- Jun 24, 2022
-
-
Nathanael Schaeffer authored
-
- Jun 20, 2022
-
-
Nathanael Schaeffer authored
-
- Dec 17, 2021
-
-
Nathanaël Schaeffer authored
-
- Dec 10, 2021
-
-
Nathanaël Schaeffer authored
fix(batch gpu vector transforms): in batch mode, the on-device vector transform functions cu_SH*_to_spat() and cu_spat_to_SH*() were wrong.
-
- Nov 30, 2021
-
-
Nathanaël Schaeffer authored
-
- Nov 17, 2021
-
-
Nathanaël Schaeffer authored
-
- Nov 16, 2021
-
-
Nathanaël Schaeffer authored
-
- Nov 15, 2021
-
-
Nathanaël Schaeffer authored
-
Nathanael Schaeffer authored
-
Nathanaël Schaeffer authored
-
Nathanael Schaeffer authored
-
Nathanaël Schaeffer authored
-
Nathanaël Schaeffer authored
-
Nathanaël Schaeffer authored
-
Nathanaël Schaeffer authored
-
- Nov 12, 2021
-
-
Nathanaël Schaeffer authored
-
- Nov 09, 2021
-
-
Nathanael Schaeffer authored
feat(batch): shtns_set_batch() must now be called before shtns_set_grid*(). Avoids planning for 1 transform before many.
-
- Nov 07, 2021
-
-
Nathanael Schaeffer authored
-
Nathanael Schaeffer authored
feat(cuda): change the way cuda streams are managed; cushtns_set_streams() should now be called just after shtns_create()
-
Nathanaël Schaeffer authored
-
Nathanaël Schaeffer authored
-
- Nov 06, 2021
-
-
Alexandru Fikl authored
-
- Nov 04, 2021
-
-
Nathanael Schaeffer authored
-