Nvidia apparently gimped the 900 series.

Deleted member 212788 · Aug 30, 2015

https://www.reddit.com/r/nvidia/comments/3iwt2j/oxide_nv_gpus_do_not_support_dx12_asynchronous/

Vengeancekael · Aug 31, 2015

Oh great and I got a GTX960. ;D

Deleted member 212788 · Aug 31, 2015

Vengeancekael said:
Oh great and I got a GTX960. ;D

And I was considering a 950 for when my card decides to kick the bucket.

Ralle · Aug 31, 2015

Meh, I am satisfied with my 970. It is way better than my previous 260.

Deleted member 212788 · Aug 31, 2015

UPDATE: Nvidia is apparently putting pressure on the devs of AotS to change the benchmark to better suit them

Source:http://www.overclock3d.net/articles...ssuring_them_to_change_their_dx12_benchmark/1

Dr Super Good · Aug 31, 2015

UPDATE: Nvidia is apparently putting pressure on the devs of AotS to change the benchmark to better suit them

You are aware that the company who made the benchmark has a partnership with AMD right? Obviously they will be using everything that AMD does best in their benchmarks while leaving out everything NVidia does best at the same time.

Get a benchmark targeting NVidia and the opposite will be the case.

And the feature they say is missing from NVidia?

Robert continues his explanation: “Graphics rendering tasks naturally have gaps or “bubbles” in the pipeline. AS fills those bubbles with compute. Instead of having to wait for a graphics task to end to do the compute work, or having to pause graphics to work on compute. This reduces scene rendering time, and increases GPU utilization by activating resources that would have been dormant in DX11″.

So it only affects games that use DirectCompute anyway since it allow those tasks to be run in a ways similar to Hyper Threading does for CPUs.

How much performance does it give? Basically it gives you some useful time during pipe-line changes. I recon 5%-10% max and only when DirectCompute is being heavily used (tasks still waiting to finish) and the GPU is near maximum utilization (otherwise chances are the compute tasks have already finished).

Deleted member 212788 · Aug 31, 2015

Dr Super Good said:
You are aware that the company who made the benchmark has a partnership with AMD right? Obviously they will be using everything that AMD does best in their benchmarks while leaving out everything NVidia does best at the same time.

Get a benchmark targeting NVidia and the opposite will be the case.

And the feature they say is missing from NVidia?

So it only affects games that use DirectCompute anyway since it allow those tasks to be run in a ways similar to Hyper Threading does for CPUs.

How much performance does it give? Basically it gives you some useful time during pipe-line changes. I recon 5%-10% max and only when DirectCompute is being heavily used (tasks still waiting to finish) and the GPU is near maximum utilization (otherwise chances are the compute tasks have already finished).

Isn't that the point of a benchmark though? To show what the hardware is fully capable of - in a way, Nvidia try asking the developer to gimp AMD's performance so that they look better? Fact of the matter is that Nvidia had access to the code of both AotS and DX12 quite a bit in advance and did nothing - they marketed the 900 series as being fully DX12 ready, yet they aren't - the lack of Async compute shows that - Nvidia have been doing this for quite some time - they underdevelop a card, be it via VRAM (680 and 770) or lack of key elements that would become important in the future so that people need to upgrade to the newer generation of card. That, I feel, is anti-consumer as we are essentially getting cheated out of our money - buying a 1000$ flagship like the Titan X should mean that the person doesn't need to upgrade 1 year down the line but rather enjoy "futureproof"-ness to a certain degree. A lot of people bought Titan Xs expecting them to last as much as the original Titan - 4 years or more and yet that seems to not be the case.

Deleted member 212788 · Aug 31, 2015

http://www.guru3d.com/news-story/nv...12-benchmark-to-disable-certain-settings.html

Dr Super Good · Sep 1, 2015

Fact of the matter is that Nvidia had access to the code of both AotS and DX12 quite a bit in advance and did nothing

No they did not have it in advance. When today's GPUs were being developed DX12 had not been fully specified. GPUs take years to develop, not months.

Isn't that the point of a benchmark though? To show what the hardware is fully capable of - in a way

A benchmark is there to compare a certain type of performance. In that case the performance of AMD cards since it does everything the AMD cards were designed to do. If you put it on an NVidia card it will perform much worse since NVidia designs their cards with different type of performance in mind.

Nvidia try asking the developer to gimp AMD's performance so that they look better?

No they were asking for them to also target NVidia performance features to make a non-biased benchmark. Sure everything it uses is in the Direct3D 12 standard however that does not mean it has or should be using such features. Specifically they use all features which AMD cards do well, as opposed to the features which NVidia cards do well.

buying a 1000$ flagship like the Titan X should mean that the person doesn't need to upgrade 1 year down the line but rather enjoy "futureproof"-ness to a certain degree. A lot of people bought Titan Xs expecting them to last as much as the original Titan - 4 years or more and yet that seems to not be the case.

And AMD's cards are any more future proof? Sure they are more standard compliant but seeing how they used their cooperate muscle to write those parts of the standard one would expect so.

The Titan X will still run most games very well. Most of the common Direct3D 12 features will run very well on it. Just look at the Final Fantasy 15 tech demo for proof.

http://www.guru3d.com/news-story/nvi...-settings.html

As I said, it is basically GPU Hyper Threading allowing other tasks to run when normally nothing is happening. NVidia is likely reluctant to support this due to it probably eating up a huge area of die. As such cards which use it will have potentially fewer processing elements, be less energy efficient, or other side effects.

The tech demo was clearly designed around use of that feature at the request of AMD to show off how well their cards work. NVidia likely wanted them to disable it and instead fall back to traditional methods of feeding compute and graphic shaders as the performance would be better.

So what performance difference does it make? As a rough guess it works wonders for lots of small Compute tasks as they can fit seamlessly inside wasted Graphics shader cycles so be completed very fast (low latency, probably what the benchmark does and depends on happening to work well). On the other hand big Compute jobs will only see a performance gain equal to the number of Graphic shader cycles wasted waiting to be fed and will still have the same latency.

In the case of big compute jobs NVidia can emulate the feature and no one will notice the difference. It can finish one thing then do the other and the resulting latency will be roughly the same (or possibly better/worse due to hardware differences). However if you want lots of small low latency compute tasks NVidia cannot handle that since it will run the Graphics to completion then start answering them.

The reality is that this is Hyperthreading. One should never depend on it to produce low latency results since the amount of cycles available for Compute during graphics shaders can very from model to model. One can use it for a little bit extra performance in the form of parallelism giving access to normally wasted cycles but that is about it as ultimately any large job cannot fit into wasted cycles. Also this is only one way of improving performance at a hardware level, others do exist (faster clock speed, more processing elements, etc).

Even if it allows for priorities (allow switching main execution from Graphic to Compute etc, one could still ask if you really need so many small low latency Compute jobs.

Velmarshal · Sep 5, 2015

don_svetlio said:
And I was considering a 950 for when my card decides to kick the bucket.

Go for it, 950 is decent for the price unlike it's older brother, the 960.

Deleted member 212788 · Sep 5, 2015

Velm said:
Go for it, 950 is decent for the price unlike it's older brother, the 960.

Nah - it would be a significant downgrade to my 280 and with recent news I'd rather go for a 370X and be on the safe side.

Dr Super Good · Sep 5, 2015

People are aware that this is just a marketing move by AMD right?

I am pretty sure nothing in the standard forces them to implement the feature in a hardware parallel way, as long as it can be fed by the program in a parallel way the API will be happy so they can advertise the availability of the feature. Sure the performance will be terrible or worse but then again the API has nothing to do with performance.

It would be very stupid if NVidia broke entirely when applying "Synchronization and Multi-Engine". I am guessing it might have problems with only one specific feature.

Specifically I am guessing the following points is where NVidia is struggling.

•Asynchronous and low priority GPU work. This enables concurrent execution of low priority GPU work and atomic operations that enable one GPU thread to consume the results of another unsynchronized thread without blocking.
•High priority compute work. With background compute it is possible to interrupt 3D rendering to do a small amount of high priority compute work. The results of this work can be obtained early for additional processing on the CPU.
•Background compute work. A separate low priority queue for compute workloads allows an application to utilize spare GPU cycles to perform background computation without negative impact on the primary rendering (or other) tasks. Background tasks may include decompression of resources or updating simulations or acceleration structures. Background tasks should be synchronized on the CPU infrequently (approximately once per frame) to avoid stalling or slowing foreground work.

All that would require extra registers on hardware to maintain state as it probably is implemented similar to Hyper Threading.

Will most games use such a feature heavily? I doubt it.

Deleted member 212788 · Sep 5, 2015

" In regards to the purpose of Async compute, there are really 2 main reasons for it:

1) It allows jobs to be cycled into the GPU during dormant phases. In can vaguely be thought of as the GPU equivalent of hyper threading. Like hyper threading, it really depends on the workload and GPU architecture for as to how important this is. In this case, it is used for performance. I can’t divulge too many details, but GCN can cycle in work from an ACE incredibly efficiently. Maxwell’s schedular has no analog just as a non hyper-threaded CPU has no analog feature to a hyper threaded one.

2) It allows jobs to be cycled in completely out of band with the rendering loop. This is potentially the more interesting case since it can allow gameplay to offload work onto the GPU as the latency of work is greatly reduced. I’m not sure of the background of Async Compute, but it’s quite possible that it is intended for use on a console as sort of a replacement for the Cell Processors on a ps3. On a console environment, you really can use them in a very similar way. This could mean that jobs could even span frames, which is useful for longer, optional computational tasks."

It is physically missing from the PCB - and I wouldn't call something that gives a boost of up to 45% minor - it's something that will be used on consoles very heavily and post PC port - so on the PC platform.

Dr Super Good · Sep 5, 2015

I can’t divulge too many details

Of an open standard? Man your employer must have a water tight contract on you...

Maxwell’s schedular has no analog just as a non hyper-threaded CPU has no analog feature to a hyper threaded one.

Which is what one could guess.

However do remember that there are only 3 standard command queues. Nothing stops one from having more as the standard is designed to support any number of command queues. Throw in 2 compute and 2 graphics and I am sure even AMD will struggle.

It can allow gameplay to offload work onto the GPU as the latency of work is greatly reduced.

No it does not as gameplay has to be deterministic for multiplayer. Not everyone might have compliant GPUs or GPUs with enough idle time even if they are. It can unload graphic effects traditionally run on the CPU but it still cannot alter gameplay as that would then become platform dependent.

I’m not sure of the background of Async Compute, but it’s quite possible that it is intended for use on a console as sort of a replacement for the Cell Processors on a ps3.

We have quad, hex or oct cores for that. No it is intended by AMD to give them a selling point. They invested in a technology no one would touch for years and finally everyone wants. Does it give performance? Yes. Does it give them a lead? No. NVidia has performance in other areas as they invested in other technologies.

On a console environment, you really can use them in a very similar way. This could mean that jobs could even span frames, which is useful for longer, optional computational tasks."

Do the consoles even support it? As far as I know only the latest PC AMD graphic cards support it in the way they advertise. The consoles support D3D 12_0 which will run such code, but that does not mean it will run it well as I think those GPUs are in the same position as NVidia ATM. They support Mantel hence D3D 12_0 but not D3D 12_1 like new AMD cards (and what NVidia claims).

This could mean that jobs could even span frames, which is useful for longer, optional computational tasks."

No since game developers would rather use it to add extra glitter to their frames than do anything to improve gameplay. Most likely it will be used for destructible effects, particles, paper blowing in the wind, etc.

and I wouldn't call something that gives a boost of up to 45% minor

NVidia have technology in some of their cards that can give up to a 200% boost. It all depends with what and that benchmark was clearly aimed at showing the what. I would expect gains for the average user in the average game of 5-10% at most (like Hyper Threading), and that can easily be compensated with better, faster or more GPU units which not supporting it can allow (why Hyper Threading was dropped for the Core2 range and only brought back for I7). A game like WC3 will see 0 or <0% boost as the GPU will be slowed by support for the feature which a D3D 8 game cannot use. SC2 again will be 0 or <0% gain as D3D10 does not gain from it.

it's something that will be used on consoles very heavily and post PC port

No as if you read carefully, only the latest AMD cards support it (which is what the tech demo was made to show off). The consoles are based on old cards so do not. Yes they will support it better than old NVidia cards but they still lack the feature level. The article was specifically targeting feature level D3D 12_1 which NVidia advertise support for but benchmark shows bad performance with the a-sync part of it. All current consoles are D3D 12_0 so not expected to have good support for it anyway.

edo494 · Sep 6, 2015

Ralle said:
Meh, I am satisfied with my 970. It is way better than my previous 260.

260? damn ancient...well wait...I had 230 before getting 760, so Im even more so

Deleted member 212788 · Sep 6, 2015

edo494 said:
260? damn ancient...well wait...I had 230 before getting 760, so Im even more so

I can't brag with an older card sadly :/ - my first actual one (not iGPU) was a Radeon 5670. So many good memories with that card. I knew a lot less about hardware and was considerably calmer back then

Velmarshal · Sep 6, 2015

Geforce 4 440 MX > 8600 GS > 9600 GT (8600 caught fire) > GTX 660 and in the future probably Pascal x60 or maybe even x70.

Deleted member 212788 · Sep 6, 2015

Velm said:
Geforce 4 440 MX > 8600 GS > 9600 GT (8600 caught fire) > GTX 660 and in the future probably Pascal x60 or maybe even x70.

If I had the cash I would grab a Fury X or R9 Nano - I love small cool cards ^_^

Velmarshal · Sep 7, 2015

don_svetlio said:
If I had the cash I would grab a Fury X or R9 Nano - I love small cool cards ^_^

Good luck with finding either of those, they are apparently really low in supply due to low yield of HBM gen1. (I'm not sure if R9 Nano even came out)

Deleted member 212788 · Sep 7, 2015

Velm said:
Good luck with finding either of those, they are apparently really low in supply due to low yield of HBM gen1. (I'm not sure if R9 Nano even came out)

Both are available at every retailer in my country

I even have the option to choose between Sapphire and Asus for the Fury X

Nvidia apparently gimped the 900 series.

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Deleted member 212788

Similar threads