After the
I use each element of the root matrix CUDA (which is basically a vector of float values in memory once ) Needs class.
Matrix dimensions do not know 'one priority' and these [2-20,000] may vary.
I was thinking: I (as if Jonathan had suggested) a block dimension:
int thread_id = blockDim.x * block_id + threadIdx.x; and check thread_id less than the rows ... Columns ... it is very simple and straightforward.
But there is no special performance reason to allow me to calculate the dimension of two (or even three) block grids to such calculations (keeping in mind that I have the matrix) instead of just one ?
i,
On the issue of unionization problems, all the threads progressively Like making the reading> Dimensions are only for convenience, everything internally is linear, so there will be no use in terms of efficiency. It seems a bit faster to avoid the linear index shown above you (customized), but there will be no difference in collecting threads.
Comments
Post a Comment