An 8 GB GPU could run 2.5 more threads at 8x more bandwidth by simply porting the CPU code (no special parallelizing needed). The problem with the Equihash paper is that it reverenced a 2011 GPU and did not point out that modern GPUs have a lot more on-board RAM. So 2.5 x 8 = 20x is an upper limit. But the cores are operating at 1/3 the clock speed of CPUs and my experiments in acquiring blocks on the testnest indicate core-caching and/or clock speed on the CPU matters a lot. Either way, it indicates less than 2.5 x 8, maybe 8x benefit as a minimum. The important point is that this minimum is double the Equihash paper and it does not require any special programming that was required in the 4x claim of the Equihash paper. The paper referenced a 2011 CPU for the comparison, so I did not think there was a problem in looking at an old GPU as both have advanced. So the problem (if you wanted CPUs instead of GPUs) is that Zcash has chosen parameters that are good for 2011 but not for 2016. I am not being critical as I did not realize the implications myself until now. Even without the GUI, I could not get 4 threads to run good on 4 GB, and 6GB seemed to be even worse. So 8 GB is the demand. Since 8 GB is the demand, 750 MB/thread is not good. 1.2 GB should have been the requirement in order to allow ubuntu and to hold back GPUs.
update to the above:
The Equihash paper was from 2016. The GPU vs CPU data was from 2011. I wanted nothing more than CPUs to win, but an 8 GB GPU should be 10x better than a CPU at launch if they are no better than the stock miner. The Equihash paper assumed the cryptocurrency developers would choose a RAM requirement that is higher than on-board GPU RAM. But with new GPUs, a GPU coder can copy the stock miner and run it on 10 cores to get 2.5x more threads than a 4 core CPU at 20x the bandwidth (a $400 GPU). It's not 20 x 2.5 = 50x faster than CPUs only because the GPU cores are so slow. The 4x statement in the Equihash has nothing to do with this: by assuming the coin's RAM requirement would be larger than the GPU RAM, they assumed advanced parallel programming would be needed to make use of the GPU's many cores. That is not the case. Zcash was not able to force larger RAM, so the Equihash paper is not relevant as far as GPUs are concerned. They might could make the RAM about 1200 MB per core if they go to 5 minute intervals. This would reduce the GPU advantage to 7.5 by my above math.
But we have not actually seen any $400 GPU results faster than a $200 desktop.