As a starting point, I used the code that was shown in the previous post - The summation from 1 to n. This code was put into a class called GpuInterface in GpuSolver.cu and it also had a GpuSolver.h header file. The files are shown below:
GpuSolver.h
#ifndef EXAMPLE6_H
#define EXAMPLE6_H
class GpuInterface
{
public:
int n[20];
int y;
int asize;
GpuInterface();
int calculateSum();
void setY(int);
};
#endif
#include <iostream>
#include <cuda.h>
#include "GpuSolver.h"
__global__
void findSumToN(int *n, int limit)
{
int tId = threadIdx.x;
for (int i=0; i<=(int)log2((double)limit); i++)
{
if (tId%(int)(pow(2.0,(double)(i+1))) == 0){
if (tId+(int)pow(2.0, (double)i) >= limit) break;
n[tId] += n[tId+(int)pow(2.0, (double)i)];
}
__syncthreads();
}
}
GpuInterface::GpuInterface()
{
y = 20;
asize = y*sizeof(int);
for (int i=0; i<y; i++)
n[i] = i;
}
int GpuInterface::calculateSum()
{
int *n_d;
cudaMalloc( (void**)&n_d, asize );
cudaMemcpy(n_d, n, asize, cudaMemcpyHostToDevice );
dim3 dimBlock( y, 1 );
dim3 dimGrid( 1, 1 );
findSumToN<<<dimGrid, dimBlock>>>(n_d, y);
cudaMemcpy(n, n_d, asize, cudaMemcpyDeviceToHost);
cudaFree (n_d);
return n[0];
}
void GpuInterface::setY(int newVal)
{
y = newVal;
asize = y*sizeof(int);
for (int i=0; i<y; i++)
n[i] = i;
}
#include <iostream>
#include "GpuSolver.h"
int main()
{
GpuInterface obj;
obj.setY(16);
std::cout << obj.calculateSum();
return 0;
}
Typically, compilers automatically perform the assembly followed by the linking process. However, you can force it to stop after just the assembly, and then do the linking process later on. This is what we will have to do.
We run g++ on main.cpp with the -c flag that instructs g++ to stop compilation after the object files are generated. We also use the -I. flag to ask it to look for headers files within the current folder. The -o flag asks the compiler to call the output as whatever string follows the flag (in this case main.cpp.o). The full command looks like:
g++ -c -I. main.cpp -o main.cpp.o
nvcc -c -I. -I/usr/local/cuda/include GpuSolver.cu -o GpuSolver.cu.o
So now we have a bunch of files in our project directory:
- main.cpp
- GpuSolver.cu
- GpuSolver.h
- main.cpp.o
- GpuSolver.cu.o
We now need to link the two .o files into one executable. We do this with the following command:
g++ -o exec GpuSolver.cu.o main.cpp.o -L/usr/local/cuda/lib -lcudart
When all of this has been done, you get a neat little executable that will calculate the sum from 1 to n!
I packed all of these commands into a makefile, which I've put down here
CUDA_INSTALL_PATH := /usr/local/cuda
CXX := g++
CC := gcc
LINK := g++ -fPIC
NVCC := nvcc
# Includes
INCLUDES = -I. -I$(CUDA_INSTALL_PATH)/include
# Common flags
COMMONFLAGS += $(INCLUDES)
NVCCFLAGS += $(COMMONFLAGS)
CXXFLAGS += $(COMMONFLAGS)
CFLAGS += $(COMMONFLAGS)
LIB_CUDA := -L$(CUDA_INSTALL_PATH)/lib -lcudart
OBJS = GpuSolver.cu.o main.cpp.o
TARGET = exec
LINKLINE = $(LINK) -o $(TARGET) $(OBJS) $(LIB_CUDA)
.SUFFIXES: .c .cpp .cu .o
%.c.o: %.c
$(CC) $(CFLAGS) -c $< -o $@
%.cu.o: %.cu
$(NVCC) $(NVCCFLAGS) -c $< -o $@
%.cpp.o: %.cpp
$(CXX) $(CXXFLAGS) -c $< -o $@
$(TARGET): $(OBJS) Makefile
$(LINKLINE)
Makefiles are a way to script out the entire compilation and installation process for programs in linux. They have very weird syntax and there is no way I can explain all of it's details here. However, there are excellent tutorials on makefiles elsewhere on the internet, so I'd suggest doing some research.
The main point is that this makefile does almost exactly what I described above, with a bit of extra functionality for things like making sure the resulting executable can be further used in other programs, as opposed to needing to be run manually.
That's it for now! I'll leave the integration into MOOSE for next time.