summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-04-20anv: fix build without Wayland platformHEADmasterMarcin Ślusarz2-7/+5
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-20anv: fix building on i686 with -mcpu=genericLaurent Carlier1-1/+1
mcpu=generic doesn't enable sse2, and anvil definitly needs it Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-20spirv: Trivially handle the NonWriteable decorationJason Ekstrand2-0/+4
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-20nir: rename nir_foreach_block*() to nir_foreach_block*_call()Connor Abbott59-89/+92
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-20nvc0: avoid tex read fault from compute shaders on GK110Samuel Pitoiset1-0/+3
After some investigation, it seems like that disabling the UNK02C4 command avoid a read fault with texelFetch() from a compute shader. I have no clue on what this method actually does, but this avoid the GPU to hang with basic-texelFetch.shader_test without introducing any compute-related regressions. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-20i965/vec4: Always split uniforms in array_access_to_pull_constantsJason Ekstrand1-1/+3
Normally, we split uniforms at the end but in Vulkan, we bail because we don't want pull constants. However, we still need them split because pack_uniforms relies on it. I really don't like this patch not because it doesn't work (it does) but because now that we're using MOV_INDIRECT, uniform numbers and sizes don't really matter anymore. In the FS backend, uniform splitting and packing is handled all at once (actual re-assignment of locations happens later) and we really should do it that way in vec4 eventually as well. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001
2016-04-20i965/vec4: Use the correct offset for the swizzle shift in push constantsJason Ekstrand1-1/+1
This was actually caught by Ken in review the first time around but somehow didn't get fixed before the patches were pushed. :-( Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001
2016-04-20i965/vec4: Use nir_intrinsic_base in the load_uniform implementationJason Ekstrand1-1/+1
We shouldn't be reading the const_index directly Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001
2016-04-20anv/apply_dynamic_offsets: Provide a range on the load_uniformJason Ekstrand1-1/+3
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001
2016-04-20anv/lower_push_constants: Stop treating scalar speciallyJason Ekstrand3-28/+4
All of the code that did something special based on vec4 vs. scalar is bogus. In the backend, everything is now in units of bytes and the vec4 backend can handle full std140 packing so we don't need to do anything special anymore. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
2016-04-20swr: fix resource backed constant buffersTim Rowley2-7/+7
Code was using an incorrect address for the base pointer. v2: use swr_resource_data() utility function. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94979 Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com> Tested-by: Markus Wick <markus@selfnet.de>
2016-04-20nouveau: codegen: Add support for OpenCL global memory buffersHans de Goede1-2/+10
Add support for OpenCL global memory buffers, note this has only been tested with regular load and stores and likely needs more work for e.g. atomic ops. Tested with piglet on a gf119 and a gk107: ./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader [9/9] pass: 9 / ./piglit run -o shader -t '.*arb_compute_shader.*' results/shader [20/20] skip: 4, pass: 16 | Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-04-20nouveau: codegen: Use FILE_MEMORY_BUFFER for buffersHans de Goede6-5/+13
Some of the lowering steps we currently do for FILE_MEMORY_GLOBAL only apply to buffers, making it impossible to use FILE_MEMORY_GLOBAL for OpenCL global buffers. This commits changes the buffer code to use FILE_MEMORY_BUFFER at the ir_from_tgsi and lowering steps, freeing use of FILE_MEMORY_GLOBAL for use with OpenCL global buffers. Note that after lowering buffer accesses use the FILE_MEMORY_GLOBAL register file. Tested with piglet on a gf119 and a gk107: ./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader [9/9] pass: 9 / ./piglit run -o shader -t '.*arb_compute_shader.*' results/shader [20/20] skip: 4, pass: 16 | Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-04-20scons: Build dri_common_interop.c.Jose Fonseca1-0/+1
2016-04-20st/dri: implement the GL interop DRI extension (v2.2)Marek Olšák1-0/+258
v2: - set interop_version - simplify the offset_after macro v2.1: - use version numbers, remove offset_after - set "out_driver_data_written" v2.2: - set buf_offset & buf_size for GL_ARRAY_BUFFER too - add whandle.offset to buf_offset - disable the minmax cache for GL_TEXTURE_BUFFER
2016-04-20glx: implement GLX part of interop interface (v2)Marek Olšák8-6/+192
v2: - use const
2016-04-20egl: implement EGL part of interop interface (v2)Marek Olšák4-0/+114
v2: - use const
2016-04-20dri_interface: add interface for GL interop with other APIs (v2)Marek Olšák1-0/+26
v2: - use const
2016-04-20include/GL: add mesa_glinterop.h for OpenGL-OpenCL interop (v4.2)Marek Olšák1-0/+287
v2: - use "enum" to define stuff v3: - more comments, define MESA_GLINTEROP_UNSUPPORTED v4: - add mesa_glinterop_device_info::interop_version - more comments - remove #define MESA_GLINTEROP_VERSION - use const for "in" v4.1: - use version numbers for structures - add "out_driver_data_written" v4.2: - buf_offset & buf_size affect GL_ARRAY_BUFFER too, this is required for sharing suballocations within a larger buffer
2016-04-20st/dri: Fix RGB565 EGLImage creationNicolas Dufresne1-20/+24
When creating egl images we do a bytes to pixel conversion by deviding by 4 regardless of the pixel format. This does not work for RGB565. In this patch, we avoid useless conversion and use proper API when the conversion cannot be avoided. Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-04-20st/dri: Factor out DRI2 to PIPE_FORMAT conversionNicolas Dufresne1-34/+27
This code is already duplicated twice and will be useful again. This will also help when adding formats. Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-04-19freedreno/a4xx: lower srgb in shader for astc texturesRob Clark7-6/+62
This *seems* like a hw bug, and maybe only applies to certain a4xx variants/revisions. But setting the SRGB bit in sampler view state (texconst0) causes invalid alpha for ASTC textures. Work around this by doing the srgb->linear conversion in the shader instead. This fixes 392 dEQP tests: dEQP-GLES3.functional.texture.*astc*srgb* (The remaining fails seem to be a bug w/ ASTC + linear filtering, also possibly a420.0 specific.) Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19nir/lower-tex: add srgb->linear loweringRob Clark2-0/+53
Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-19nir/builder: const'ify swiz paramRob Clark1-1/+1
No need for it not to be const, and lets caller declare it const if desired. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>
2016-04-19nir/lower-tex: make options a local varRob Clark1-8/+8
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19freedreno: cleanup fd_set_sampler_viewsRob Clark1-37/+24
The separate FS/VS entrypoints are no longer used since a3ed98f. So just inline them. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19tgsi/lowering: improved lowering for LRPRussell King1-35/+20
Provide an improved lowering for LRP, which can be implemented in two MAD instructions with a bit of rearranging of the equation, rather than the literal implementation of two multiplies, an add and a subtract. Signed-off-by: Russell King <rmk@arm.linux.org.uk> Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19tgsi/lowering: improved lowering for XPDRussell King1-22/+13
Improve XPD lowering to consume less instructions by using the MAD instruction to perform the multiply and subtraction together. Signed-off-by: Russell King <rmk@arm.linux.org.uk> Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19tgsi/lowering: add support for lowering TRUNCRussell King2-0/+85
Add support for lowering TRUNC using the following sequence: FRC tmpA, |src| SUB tmpA, |src|, tmpA CMP dst, -tmpA, tmpA Note that this is incompatible with FRC lowering. Signed-off-by: Russell King <rmk@arm.linux.org.uk> Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19tgsi/lowering: add support for lowering FLR and CEILRussell King2-20/+149
Add support for lowering FLR and CEIL to FRC/SUB and FRC/ADD instructions for GPUs that support FRC but not FLR or CEIL. Since these uses FRC, it is invalid to ask for FLR or CEIL to be lowered along with FRC, so add an assert to catch this invalid configuration. We also need to deal with FLR instructions emitted by the lowering code. Fix these up with the FRC+SUB equivalent when FLR lowering is enabled. Signed-off-by: Russell King <rmk@arm.linux.org.uk> Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19radeonsi: enable TGSI support cap for compute shadersBas Nieuwenhuizen4-9/+33
v2: Use chip_class instead of family. v3: Check kernel version for SI. v4: Preemptively allow amdgpu winsys for SI. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19radeonsi: Consider input SGPR count for compute shader SGPR count.Bas Nieuwenhuizen2-6/+13
si_shader_create corrects the SGPR count with si_fix_num_sgprs. We then recompute the rsrc1 register to use the new SGPR count. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19radeonsi: Add CE synchronization for compute dispatches.Bas Nieuwenhuizen3-2/+8
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19mesa/st: enable compute shaders if images are also supportedBas Nieuwenhuizen1-3/+4
v2: Also depend on atomic counters. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19radeonsi: clean up compute flushBas Nieuwenhuizen2-18/+8
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19radeonsi: do not do two full flushes on every compute dispatchBas Nieuwenhuizen5-22/+17
v2: Add more CS_PARTIAL_FLUSH events. Essentially every place with waits on finishing for pixel shaders also has a write after read hazard with compute shaders. Invalidating L2 waits implicitly on pixel and compute shaders, so, we don't need a CS_PARTIAL_FLUSH for switching FBO. v3: Add CS_PARTIAL_FLUSH events even if we already have INV_GLOBAL_L2. According to Marek the INV_GLOBAL_L2 events don't wait for compute shaders to finish, so wait for them explicitly. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19radeonsi: split setting graphics and compute descriptorsBas Nieuwenhuizen4-14/+59
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19radeonsi: split texture decompression for compute shadersBas Nieuwenhuizen4-4/+16
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19radeonsi: update predicate condition for compute dispatchesBas Nieuwenhuizen2-0/+15
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19radeonsi: implement TGSI compute dispatchBas Nieuwenhuizen1-27/+77
v2: - Use radeon_set_sh_reg_seq. - Set predicate bit for conditional rendering. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19radeonsi: only emit compute shader state when switching shadersBas Nieuwenhuizen2-59/+88
v2: - Do check if anything changed earlier - Use emitted_program instead of emitted_bo to prevent shaders with shader->bo = NULL confusing the check - Use radeon_set_sh_reg* Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19radeonsi: rework compute scratch bufferBas Nieuwenhuizen3-93/+47
Instead of having a scratch buffer per program, have one per context. Also removed the per kernel wave count calculations, but that only helped if the total number of waves in the dispatch was smaller than sctx->scratch_waves. v2: Fix style issue. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19radeonsi: do per cs setup for compute shaders once per csBas Nieuwenhuizen3-32/+48
Also removes PKT3_CONTEXT_CONTROL as that is already being done by si_begin_new_cs, when emitting init_config. v2: - Use radeon_set_sh_reg_seq. - Also set COMPUTE_STATIC_THREAD_MGMT_SE2 / SE3 for CIK+ Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19radeonsi: don't pass scratch buffer to user SGPRsBas Nieuwenhuizen1-8/+0
As far as I can see we use relocations for clover too. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19radeonsi: split input upload off from si_launch_gridBas Nieuwenhuizen1-41/+52
Also uses a dynamically allocated buffer using u_upload_alloc. The old buffer per program approach required serializing all dispatches of the same program. v2: - Clarified commit message. - Use radeon_set_sh_reg_seq. - Also upload input buffer for clover kernels, even when input_size is 0, as it contains grid parameters. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19radeonsi: implement TGSI compute shader creationBas Nieuwenhuizen1-18/+58
v2: Moved scratch_enabled initialization after compile. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19radeonsi: update shader count for compute shadersBas Nieuwenhuizen1-1/+2
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19radeonsi: set maximum work group size based on block sizeBas Nieuwenhuizen1-0/+12
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19radeonsi: implement shared atomicsBas Nieuwenhuizen1-1/+76
v2: - Use single region - Use get_memory_ptr Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19radeonsi: implement shared memory load/storeBas Nieuwenhuizen1-2/+82
v2: - Use single region - Combine address calculation Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>