所以我的目标是将 GPU 用于我全新的 Java 项目,即创建游戏和游戏引擎本身(我认为这是深入了解其工作原理的好方法)。
我在 CPU 上使用 java.awt.Graphics2D 的多线程来显示我的游戏,但我在其他 PC 上观察到游戏运行速度低于 40FPS,所以我决定学习如何使用 GPU(我仍将渲染for 循环中的所有对象,然后在屏幕上绘制图像)。
出于这个原因,我开始按照 OpenCL 文档编写代码,并且 JOCL 采样了一个简单的小测试,即将纹理绘制到背景图像上(让我们假设每个实体都有一个纹理)。
在每次渲染调用中都会调用此方法,并将此实体的背景、纹理和位置作为参数。
下面的两个代码都已更新以适应 @ProjectPhysX 建议。
public static void XXX(final BufferedImage output_image, final BufferedImage input_image, float x, float y) {
cl_image_format format = new cl_image_format();
format.image_channel_order = CL_RGBA;
format.image_channel_data_type = CL_UNSIGNED_INT8;
//allocate ouput pointer
cl_image_desc output_description = new cl_image_desc();
output_description.buffer = null; //must be null for 2D image
output_description.image_depth = 0; //is only used if the image is a 3D image
output_description.image_row_pitch = 0; //must be 0 if host_ptr is null
output_description.image_slice_pitch = 0; //must be 0 if host_ptr is null
output_description.num_mip_levels = 0; //must be 0
output_description.num_samples = 0; //must be 0
output_description.image_type = CL_MEM_OBJECT_IMAGE2D;
output_description.image_width = output_image.getWidth();
output_description.image_height = output_image.getHeight();
output_description.image_array_size = output_description.image_width * output_description.image_height;
cl_mem output_memory = clCreateImage(context, CL_MEM_WRITE_ONLY, format, output_description, null, null);
//set up first kernel arg
clSetKernelArg(kernel, 0, Sizeof.cl_mem, Pointer.to(output_memory));
//allocates input pointer
cl_image_desc input_description = new cl_image_desc();
input_description.buffer = null; //must be null for 2D image
input_description.image_depth = 0; //is only used if the image is a 3D image
input_description.image_row_pitch = 0; //must be 0 if host_ptr is null
input_description.image_slice_pitch = 0; //must be 0 if host_ptr is null
input_description.num_mip_levels = 0; //must be 0
input_description.num_samples = 0; //must be 0
input_description.image_type = CL_MEM_OBJECT_IMAGE2D;
input_description.image_width = input_image.getWidth();
input_description.image_height = input_image.getHeight();
input_description.image_array_size = input_description.image_width * input_description.image_height;
DataBufferInt input_buffer = (DataBufferInt) input_image.getRaster().getDataBuffer();
int input_data[] = input_buffer.getData();
cl_mem input_memory = clCreateImage(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, format, input_description, Pointer.to(input_data), null);
//loads the input buffer to the gpu memory
long[] input_origin = new long[] { 0, 0, 0 };
long[] input_region = new long[] { input_image.getWidth(), input_image.getHeight(), 1 };
int input_row_pitch = input_image.getWidth() * Sizeof.cl_uint; //the length of each row in bytes
clEnqueueWriteImage(commandQueue, input_memory, CL_TRUE, input_origin, input_region, input_row_pitch, 0, Pointer.to(input_data), 0, null, null);
//set up second kernel arg
clSetKernelArg(kernel, 1, Sizeof.cl_mem, Pointer.to(input_memory));
//set up third and fourth kernel args
clSetKernelArg(kernel, 2, Sizeof.cl_float, Pointer.to(new float[] { x }));
clSetKernelArg(kernel, 3, Sizeof.cl_float, Pointer.to(new float[] { y }));
//blocks until all previously queued commands are issued
clFinish(commandQueue);
//enqueue the program execution
long[] globalWorkSize = new long[] { input_description.image_width, input_description.image_height };
clEnqueueNDRangeKernel(commandQueue, kernel, 2, null, globalWorkSize, null, 0, null, null);
//transfer the output result back to host
DataBufferInt output_buffer = (DataBufferInt) output_image.getRaster().getDataBuffer();
int output_data[] = output_buffer.getData();
long[] output_origin = new long[] { 0, 0, 0 };
long[] output_region = new long[] { output_description.image_width, output_description.image_height, 1 };
int output_row_pitch = output_image.getWidth() * Sizeof.cl_uint;
clEnqueueReadImage(commandQueue, output_memory, CL_TRUE, output_origin, output_region, output_row_pitch, 0, Pointer.to(output_data), 0, null, null);
//free pointers
clReleaseMemObject(input_memory);
clReleaseMemObject(output_memory);
}
这是在内核上运行的程序源。
常量 sampler_t 采样器 = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST;
__kernel void drawImage(__write_only image2d_t dst_image, __read_only image2d_t src_image, float xoff, float yoff)
{
const int x = get_global_id(0);
const int y = get_global_id(1);
int2 in_coords = (int2) { x, y };
uint4 pixel = read_imageui(src_image, sampler, in_coords);
pixel = -16184301;
printf("%d, %d, %u\n", x, y, pixel);
const int sx = get_global_size(0);
const int sy = get_global_size(1);
int2 out_coords = (int2) { ((int) xoff + x) % sx, ((int) yoff + y) % sy};
write_imageui(dst_image, out_coords, pixel);
}
如果没有调用 write_imageui,背景将被涂成黑色,否则为白色。目前,我有点难以理解为什么 C 函数中的 pixel = 0,但我认为熟悉 JOCL 的人会很快发现我在这段代码中的错误。今天,也许明天,我对这段代码感到非常困惑,但我觉得我永远不会发现自己的错误。出于这个原因,我请求您帮助查看我的代码。我觉得自己像个白痴,我当时想不通。