so, that I actually did not know; but it makes sense that "small wires melt whan they get too hot", if that is more or less what you are implying in layman's terms...
That has nothing to do with what I said...
Capacitive elements have an impedience of 1/(jwc) where j=i=sqrt(-1), w=omega=2*pi*f and c is the capacitence in farad. This impedience translates to a resistance (magnitude) with a phase offset (as it is a complex component). One can easilly see that frequiency is inversly proportional to the resulting resistance from the impedience. At DC currents ((f)requiency = 0) you end up with division by 0 for the resistance which can be viewed as an impossibly high resistance which explains why capacitors block DC currents. At f=infinity, you end up with division by infinity which results in 0 which means that as f increases capacitors go towards open circuit.
The way processors are currently designed is very compact. This means that connections between components inside them are very very small. This brings about a variety of problems. Firstly, this means that they are very close together separated by semiconductor as an isolater (which is basically what a capacitor is) so it can be viewed as having disimlar connections joined by capacitors so current leakage occurs (and more so at higher frequiencies). Secondly, these metal tracks that make the channel have a very small inductance (as all wires do) resulting in systamatic signal skew. Finally distortion to magnetic fields add some element of randomness to skew.
This is not even getting to mention how the actual logical elements made up from transistors have skew due to their own means of opperation. Worse still, if a signal that is not very close to a logical level (low or high) enters a transistors there is a chance of metastability occuring which can potentially cascade through the entire processor causing it to produce unintended results.
Inorder to minimise these effects, a certain minimum time per transistor step (receivng, producing reliable result and letting result arrive at next transistor element) is defined. Ofcouse logical elements are made of multiple transistor elements and a computer functions by having many logical elements chained together. The end results is that the maximum clockrate of a computer is defined by the length of the critical path (the longest chain or logical elements) such that it achieves a reliabilty that is usable (will not error in months of opperation). The allocated clock is based on worse case tollerences and thus you can overclock (raise clock frequiency) of most CPUs a variable amount while still keeping high reliability values).
Ofcourse this does not mean that processors are stuck in the 3-4 GHz range. Only CISC processors like the x86/64 architecture (what most PCs use) are limited to reliable opperation in this range. RISC architectures have already hit past 6GHz due to shorter critical path but this does not mean they perform better than lower clock processors (the predecessor to the processor used in Watson was 6GHz but they found it more efficient to reduce clock to 4 GHz and use the strain that this freed to improve its performance in other ways).
You might be wondering how a square wave behaves when passed through a capacitor (afterall computer clocks are square waves). A square wave is actually a spectrum of waves that are at odd multiples of the frequiency of the square wave. As the frequiency of a square wave is increased, the magnitude of higher frequiency waves in the spectrum increase.