Commit graph

262 commits

Author SHA1 Message Date
yzct12345
e13e98d99d nvdec: Implement VA-API hardware video acceleration (#6713)
* nvdec: VA-API

* Verify formatting

* Forgot a semicolon for Windows

* Clarify comment about AV_PIX_FMT_NV12

* Fix assert log spam from missing negation

* vic: Remove forgotten debug code

* Address lioncash's review

* Mention VA-API is Intel/AMD

* Address v1993's review

* Hopefully fix CMakeLists style this time

* vic: Improve cache locality

* vic: Fix off-by-one error

* codec: Async

* codec: Forgot the GetValue()

* nvdec: Address ameerj's review

* codec: Fallback to CPU without VA-API support

* cmake: Address lat9nq's review

* cmake: Make VA-API optional

* vaapi: Multiple GPU

* Apply suggestions from code review

Co-authored-by: Ameer J <52414509+ameerj@users.noreply.github.com>

* nvdec: Address ameerj's review

* codec: Use anonymous instead of static

* nvdec: Remove enum and fix memory leak

* nvdec: Address ameerj's review

* codec: Remove preparation for threading

Co-authored-by: Ameer J <52414509+ameerj@users.noreply.github.com>
2021-08-03 23:43:11 -04:00
ReinUsesLisp
482c1ec8e5 renderer_vulkan: Add setting to log pipeline statistics
Use VK_KHR_pipeline_executable_properties when enabled and available to
log statistics about the pipeline cache in a game.

For example, this is on Turing GPUs when generating a pipeline cache
from Super Smash Bros. Ultimate:

Average pipeline statistics
==========================================
Code size:       6433.167
Register count:    32.939

More advanced results could be presented, at the moment it's just an
average of all 3D and compute pipelines.
2021-07-27 21:29:24 -03:00
ameerj
7e661303d5 gl_shader_cache: Implement async shaders 2021-07-22 21:51:38 -04:00
ReinUsesLisp
ddb24146b6 gl_shader_cache: Rename Program abstractions into Pipeline 2021-07-22 21:51:33 -04:00
ReinUsesLisp
06ae1bff4b video_core: Abstract transform feedback translation utility 2021-07-22 21:51:33 -04:00
ReinUsesLisp
5ca5988c63 shader: Initial OpenGL implementation 2021-07-22 21:51:30 -04:00
ReinUsesLisp
ed6c131c92 shader: Move pipeline cache logic to separate files
Move code to separate files to be able to reuse it from OpenGL. This
greatly simplifies the pipeline cache logic on Vulkan.

Transform feedback state is not yet abstracted and it's still
intrusively stored inside vk_pipeline_cache. It will be moved when
needed on OpenGL.
2021-07-22 21:51:29 -04:00
lat9nq
a4e7a41e7f shader_recompiler,video_core: Cleanup some GCC and Clang errors
Mostly fixing unused *, implicit conversion, braced scalar init,
fpermissive, and some others.

Some Clang errors likely remain in video_core, and std::ranges is still
a pertinent issue in shader_recompiler

shader_recompiler: cmake: Force bracket depth to 1024 on Clang
Increases the maximum fold expression depth

thread_worker: Include condition_variable

Don't use list initializers in control flow

Co-authored-by: ReinUsesLisp <reinuseslisp@airmail.cc>
2021-07-22 21:51:26 -04:00
ReinUsesLisp
33090a74dd shader: Add partial rasterizer integration 2021-07-22 21:51:23 -04:00
ReinUsesLisp
a5f87011d3 shader: Primitive Vulkan integration 2021-07-22 21:51:22 -04:00
ReinUsesLisp
65069df8aa shader: Remove old shader management 2021-07-22 21:51:22 -04:00
Morph
a487c17aff video_core: Enforce C4242 2021-06-28 14:20:25 -04:00
ReinUsesLisp
cc3a6c6f51 video_core: Enforce C4244
Enforce implicit integer casts to a smaller type as errors.
2021-06-26 03:29:34 -03:00
ameerj
e5da434498 textures: Reintroduce CPU ASTC decoder
Users may want to fall back to the CPU ASTC texture decoder due to hangs
and crashes that may be caused by keeping the GPU under compute heavy
loads for extended periods of time. This is especially the case in games
such as Astral Chain which make extensive use of ASTC textures.
2021-06-15 20:19:00 -04:00
ameerj
e0977af861 astc_decoder: Refactor for style and more efficient memory use 2021-03-25 16:53:51 -04:00
ReinUsesLisp
2dfce2fca6 video_core: Reimplement the buffer cache
Reimplement the buffer cache using cached bindings and page level
granularity for modification tracking. This also drops the usage of
shared pointers and virtual functions from the cache.

- Bindings are cached, allowing to skip work when the game changes few
  bits between draws.
- OpenGL Assembly shaders no longer copy when a region has been modified
  from the GPU to emulate constant buffers, instead GL_EXT_memory_object
  is used to alias sub-buffers within the same allocation.
- OpenGL Assembly shaders stream constant buffer data using
  glProgramBufferParametersIuivNV, from NV_parameter_buffer_object. In
  theory this should save one hash table resolve inside the driver
  compared to glBufferSubData.
- A new OpenGL stream buffer is implemented based on fences for drivers
  that are not Nvidia's proprietary, due to their low performance on
  partial glBufferSubData calls synchronized with 3D rendering (that
  some games use a lot).
- Most optimizations are shared between APIs now, allowing Vulkan to
  cache more bindings than before, skipping unnecesarry work.

This commit adds the necessary infrastructure to use Vulkan object from
OpenGL. Overall, it improves performance and fixes some bugs present on
the old cache. There are still some edge cases hit by some games that
harm performance on some vendors, this are planned to be fixed in later
commits.
2021-02-13 02:17:22 -03:00
Ameer J
a4606a986a Merge pull request #5880 from lat9nq/ffmpeg-external
cmake: FFmpeg linking rework
2021-02-08 21:13:10 -05:00
Chloe Marcec
66c653566c video_core: Delete morton
moron.h & morton.cpp are not used anywhere and are just empty files
2021-02-08 10:20:21 +11:00
lat9nq
1eaff4546b CMake: Port citra-emu/citra FindFFmpeg.cmake
Also renames related CMake variables to match both the Find*FFmpeg* and
variables defined within the file. Fixes odd errors produced by the old
FindFFmpeg.

Citra's FindFFmpeg is slightly modified here: adds Citra's copyright at
the beginning, renames FFmpeg_INCLUDES to FFmpeg_INCLUDE_DIR, disables a
few components in _FFmpeg_ALL_COMPONENTS, and adds the missing avutil
component to the comment above.
2021-02-05 15:39:19 -05:00
lat9nq
e6211c3753 CMake: Implement YUZU_USE_BUNDLED_FFMPEG
For Linux, instructs CMake to use the FFmpeg submodule in externals.
This is HEAVILY based on our usage of the late Unicorn.  Minimal change
to MSVC as it uses the yuzu-emu/ext-windows-bin. MinGW now targets the
same ext-windows-bin libraries as MSVC for FFmpeg. Adds FFMPEG_LIBRARIES
to WIN32 and simplifies video_core/CMakeLists.txt a bit.
2021-02-05 14:49:51 -05:00
ReinUsesLisp
80f235a8cc video_core/cmake: Properly generate fatal errors on Aftermath
Fix "message(ERROR ..." to "message(FATAL_ERROR ..." to properly stop
cmake when Nsight Aftermath can't be configured.
2021-01-23 04:15:30 -03:00
Rodrigo Locatti
2fccc35fa8 Merge pull request #5262 from ReinUsesLisp/buffer-base
buffer_cache/buffer_base: Add a range tracking buffer container and tests
2021-01-16 19:48:26 -03:00
ReinUsesLisp
fa012cc7e6 vulkan_common: Move allocator to the common directory
Allow using the abstraction from the OpenGL backend.
2021-01-15 16:19:39 -03:00
ReinUsesLisp
98ad500af1 video_core/cmake: Remove Werror flags already defined code-base wide
These flags are already defined in src/cmake.
2021-01-15 03:37:34 -03:00
ReinUsesLisp
77efe79868 buffer_cache/buffer_base: Add a range tracking buffer container
It keeps track of the modified CPU and GPU ranges on a CPU page
granularity, notifying the given rasterizer about state changes
in the tracking behavior of the buffer.

Use a small vector optimization to store buffers smaller than 256 KiB
locally instead of using free store memory allocations.
2021-01-13 04:14:58 -03:00
ReinUsesLisp
eb04c63df5 renderer_vulkan/nsight_aftermath_tracker: Move to vulkan_common 2021-01-04 02:22:22 -03:00
ReinUsesLisp
fc515aed5f renderer_vulkan: Move device abstraction to vulkan_common 2021-01-04 02:22:22 -03:00
Rodrigo Locatti
b0764f3823 Merge pull request #5230 from ReinUsesLisp/vulkan-common
vulkan_common: Move reusable Vulkan abstractions to a separate directory
2021-01-03 17:38:29 -03:00
bunnei
41e8f75c82 Merge pull request #5208 from bunnei/service-threads
Service threads
2020-12-30 22:06:05 -08:00
ReinUsesLisp
7334e9e212 renderer_vulkan: Initialize surface in separate file
Move surface initialization code to a separate file. It's unlikely to
use this code outside of Vulkan, but keeping platform-specific code
(Win32, Xlib, Wayland) in its own translation unit keeps things cleaner.
2020-12-31 02:07:33 -03:00
ReinUsesLisp
d7f0249d2e renderer_vulkan: Create debug callback on separate file and throw
Initialize debug callbacks (messenger) from a separate file. This allows
sharing code with different backends.

Change our Vulkan error handling to use exceptions instead of error
codes, simplifying the initialization process.
2020-12-31 02:07:33 -03:00
ReinUsesLisp
74276df159 renderer_vulkan: Move instance initialization to a separate file
Simplify Vulkan's backend initialization code by moving it to a separate
file, allowing us to initialize a Vulkan instance from different
backends.
2020-12-31 02:07:33 -03:00
ReinUsesLisp
ddddd25033 vulkan_common: Rename renderer_vulkan/wrapper.h to vulkan_common/vulkan_wrapper.h
Allows sharing Vulkan wrapper code between different rendering backends.
2020-12-31 02:07:14 -03:00
ReinUsesLisp
b05cecfbd8 vulkan_common: Move dynamic library load to a separate file
Allows us to initialize a Vulkan dynamic library from different backends
without duplicating code.
2020-12-31 02:02:48 -03:00
ReinUsesLisp
d25b097e84 video_core: Rewrite the texture cache
The current texture cache has several points that hurt maintainability
and performance. It's easy to break unrelated parts of the cache
when doing minor changes. The cache can easily forget valuable
information about the cached textures by CPU writes or simply by its
normal usage.The current texture cache has several points that hurt
maintainability and performance. It's easy to break unrelated parts
of the cache when doing minor changes. The cache can easily forget
valuable information about the cached textures by CPU writes or simply
by its normal usage.

This commit aims to address those issues.
2020-12-30 03:38:50 -03:00
ReinUsesLisp
2d951b73bf video_core: Add a delayed destruction ring abstraction 2020-12-30 02:10:19 -03:00
bunnei
927976c86c video_core: gpu: Refactor out synchronous/asynchronous GPU implementations.
- We must always use a GPU thread now, even with synchronous GPU.
2020-12-28 16:33:48 -08:00
Rodrigo Locatti
2ee2a45da2 Merge pull request #5226 from ReinUsesLisp/c4715-vc
video_core: Enforce C4715 (not all control paths return a value)
2020-12-25 03:11:47 -03:00
ReinUsesLisp
4df8b8a0f5 cmake: Always enable Vulkan
Removes the unnecesary burden of maintaining separate #ifdef paths and
allows us sharing generic Vulkan code across APIs.
2020-12-24 21:07:24 -03:00
ReinUsesLisp
472e86da85 video_core: Enforce C4715 (not all control paths return a value)
Most of the time people write code that always returns a value,
terminates execution, throws an exception, or uses an unconventional
jump primitive.

This is not always true when we build without asserts on mainline builds.
To avoid introducing undefined behavior on our most used builds, enforce
this warning signalling an error and stopping the build from shipping.
2020-12-24 21:01:23 -03:00
Lioncash
5db4785535 video_core: Resolve more variable shadowing scenarios pt.3
Cleans out the rest of the occurrences of variable shadowing and makes
any further occurrences of shadowing compiler errors.
2020-12-05 16:02:23 -05:00
LC
cf0f8d0969 Merge pull request #4848 from ReinUsesLisp/type-limits
video_core: Enforce -Werror=type-limits
2020-10-28 03:16:10 -04:00
ReinUsesLisp
de16b5a409 video_core: Enforce -Wredundant-move and -Wpessimizing-move
Silence three warnings and make them errors to avoid introducing more in the future.
2020-10-28 02:44:50 -03:00
ReinUsesLisp
1ae83819d9 video_core: Enforce -Werror=type-limits
Silences one warning and avoids introducing more in the future.
2020-10-28 02:37:47 -03:00
ameerj
9ef5c53e52 video_core: NVDEC Implementation
This commit aims to implement the NVDEC (Nvidia Decoder) functionality, with video frame decoding being handled by the FFmpeg library.

The process begins with Ioctl commands being sent to the NVDEC and VIC (Video Image Composer) emulated devices. These allocate the necessary GPU buffers for the frame data, along with providing information on the incoming video data. A Submit command then signals the GPU to process and decode the frame data.

To decode the frame, the respective codec's header must be manually composed from the information provided by NVDEC, then sent with the raw frame data to the ffmpeg library.

Currently, H264 and VP9 are supported, with VP9 having some minor artifacting issues related mainly to the reference frame composition in its uncompressed header.

Async GPU is not properly implemented at the moment.

Co-Authored-By: David <25727384+ogniK5377@users.noreply.github.com>
2020-10-26 23:07:36 -04:00
Lioncash
09a096c6ff video_core: Conditially activate relevant compiler warnings
These compiler flags aren't shared with clang, so specifying these flags
unconditionally can lead to a bit of warning spam.

While we're in the area, we can also enable -Wunused-but-set-parameter
given this is almost always a bug.
2020-10-20 20:28:25 -04:00
ReinUsesLisp
31c5fbd15d video_core: Enforce -Wclass-memaccess 2020-10-09 16:46:11 -03:00
ReinUsesLisp
ca90a52bea video_core: Enforce -Wunused-variable and -Wunused-but-set-variable 2020-10-02 21:19:35 -03:00
ReinUsesLisp
7c747f9bea renderer_vulkan: Make unconditional use of VK_KHR_timeline_semaphore
This reworks how host<->device synchronization works on the Vulkan
backend. Instead of "protecting" resources with a fence and signalling
these as free when the fence is known to be signalled by the host GPU,
use timeline semaphores.

Vulkan timeline semaphores allow use to work on a subset of D3D12
fences. As far as we are concerned, timeline semaphores are a value set
by the host or the device that can be waited by either of them.

Taking advantange of this, we can have a monolithically increasing
atomic value for each submission to the graphics queue. Instead of
protecting resources with a fence, we simply store the current logical
tick (the atomic value stored in CPU memory). When we want to know if a
resource is free, it can be compared to the current GPU tick.

This greatly simplifies resource management code and the free status of
resources should have less false negatives.

To workaround bugs in validation layers, when these are attached there's
a thread waiting for timeline semaphores.
2020-09-19 01:46:37 -03:00
ReinUsesLisp
ee7e70cfbc video_core: Enforce -Werror=switch
This forces us to fix all -Wswitch warnings in video_core.
2020-09-16 17:48:01 -03:00