Xu Ma
update
1c3c0d9
raw
history blame
1.14 kB
Directions for compiling and running the benchmark with Ubuntu Linux:
Install Intel's Threading Building Blocks library (TBB):
$ sudo apt-get install libtbb-dev
Compile the benchmark:
$ nvcc -O3 -arch=sm_20 bench.cu -ltbb -o bench
Run the benchmark:
$ ./bench
Typical output (Tesla C2050):
Benchmarking with input size 33554432
Core Primitive Performance (elements per second)
Algorithm, STL, TBB, Thrust
reduce, 3121746688, 3739585536, 26134038528
transform, 1869492736, 2347719424, 13804681216
scan, 1394143744, 1439394816, 5039195648
sort, 11070660, 34622352, 673543168
Sorting Performance (keys per second)
Type, STL, TBB, Thrust
char, 24050078, 62987040, 2798874368
short, 15644141, 41275164, 1428603008
int, 11062616, 33478628, 682295744
long, 11249874, 33972564, 219719184
float, 9850043, 29011806, 692407232
double, 9700181, 27153626, 224345568
The reported numbers are performance rates in "elements per second" (higher is better).