File size: 1,143 Bytes
be11144
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Directions for compiling and running the benchmark with Ubuntu Linux:

Install Intel's Threading Building Blocks library (TBB):
$ sudo apt-get install libtbb-dev

Compile the benchmark:
$ nvcc -O3 -arch=sm_20 bench.cu -ltbb -o bench

Run the benchmark:
$ ./bench

Typical output (Tesla C2050):

Benchmarking with input size 33554432
Core Primitive Performance (elements per second)
      Algorithm,          STL,          TBB,       Thrust
         reduce,   3121746688,   3739585536,  26134038528
      transform,   1869492736,   2347719424,  13804681216
           scan,   1394143744,   1439394816,   5039195648
           sort,     11070660,     34622352,    673543168
Sorting Performance (keys per second)
  Type,          STL,          TBB,       Thrust
  char,     24050078,     62987040,   2798874368
 short,     15644141,     41275164,   1428603008
   int,     11062616,     33478628,    682295744
  long,     11249874,     33972564,    219719184
 float,      9850043,     29011806,    692407232
double,      9700181,     27153626,    224345568

The reported numbers are performance rates in "elements per second" (higher is better).