That wasn't early DDR3 mate. I'm talking Intel P35 boards with 65nm first gen Core 2's. Back when 2gb kits of it were $300+. DDR2 outperformed it at first because the frequency boosts were offset by the hilariously higher timings.
Was not aware any Core2 supported DDR3, my mistake.
This shows to me that you haven't a bloody clue of what it actually entails. It's essentially multithreading shaders using the queue process, breaking things into sectors.
This shows me you "haven't a bloody clue of what it actually entails". It is actually closer to Hyper Threading with no real parralisim occurring. Only 1 command queue will execute at a time, but another command queue can execute with no overhead during cycles when the main command queue job blocks. This can allow hardware manufacturers to lower the overhead of context switching pipelines to virtually free, allowing pipeline pre-emption as a viable software technique. It also allows the functional hardware to be utilized more heavily in cases that commands result in stalls.
The specification specifies a number of default required command queues and any number of custom pipelines. Although aimed at hardware acceleration and recommended for implementation, it is not mandatory. Also no hardware implementation does it perfectly, with many having limits (such as 16 or 32 for Xbox One).
GCN 1.0 does it fine, VLIW4/5 cannot. You might be confusing the older APU's with whats in the consoles, as those were VLIW4 while the consoles use GCN. Nvidia also theoretically supported asynchronous compute for CUDA using drivers going back to Fermi, but there was no hardware implementation.
AMD must have messed up their own press releases. They clearly stated that Async was only fully supported by their newest cards. Or possibly that what the game does is only supported on their newest cards. They are probably referring to some limitation aspect of the command queues since the Xbox One and PlayStation 4 likely have worse or fewer optional features compared with their newer cards.
It's a basic feature both knew about, Nvidia thought they could cut corners and use a software solution because it was originally just a niche for CUDA programmers. They're going to have to redesign their queueing engine from the ground up, which won't be fun given their thread/warp model.
You must remember the feature comes at a cost. It might reduce performance in other areas to install due to the extra logic required. If a non-standard feature is not used you will be stupid to keep it in.
For the most part a driver solution could work with at most some performance loss. It is only if you start to depend on the feature, specifically for mixing Direct Compute results in the graphic pipeline, will it become impossible. This is why Ashes of Singularity specifically does this, because it needs a full hardware implementation to work otherwise the context switching is impossible. The fact it just did not work meant the feature was not implemented properly. It should still have worked, although probably very badly.
They're already being used on console games bruv. Battlefield 4 was the first one IIRC
Anything using Direct3D 12 will use command queues. If they gain anything from it is entirely another question. If they take advantage of some form of asynchronous hardware implementation is not required. The only thing they are guaranteed by using Direct3D 12 is the order in which command queues run (which is where NVidia probably failed). AMD is selling a specific implementation of command queues which is their "asynchronous compute" or whatever it is called.
It's funny, if you use GPUview on the game, you can see that Nvidia gets calls to put things in the async queue; but then just queues it normally as if it weren't async. That said the game wasn't a heavy user of the queue, neither was Ashes for that matter.
Once again, asynchronous compute is an AMD implementation of Direct3D 12 command queues. How it is implemented is not defined, with the actual documentation hinting that a GPU could in theory run each command queue completely in parallel (not only during pipeline stalls).
As long as the synchronization between command queues remains correct it still complies. Even if priorities do not work that well.
Stardock is partnered with AMD; Oxide, the engine programmers working with Stardock on it, are not.
http://www.overclock.net/t/1569897/v...#post_24356995
Stardock's involvement will be shaky as always with such projects. Usually they help make it, but then leave the other company to look after it (aka Demigod). They seem more like a team of consultant programmers than an actual software company.