CPUs actually do not use gates. Although their logic can be modelled to some degree with gates and in the old days mainframes were made out of gates the reality is that gates are not compact or fast enough to make modern processors.
Modern processors are designed using VLSI. They are made using transistor logic to form functional units which are then arranged over a huge (tiny but comparatively huge) area into bigger units. This repeats until you get the overall processor layout (what you can discern by looking at the actual piece of silicon through a microscope outside of the protective casing).
Transistor logic operates by diffusion of charge carriers (doping agents) inside silicon when exposed to an electric field. The gate of such transistors creates a charge carrier bridge between two doped sections when an appropriate voltage level is reached. This allows current to flow and so fundamentally form a transistor.
If only it was that easy. Since there are actually two different kinds of doping. N type transistors use negative charge carriers inside a positive charge well. As the name implies it is very good at carrying negative charge but is incapable of carrying positive charges. You also have P type (positive charge carriers) transistors which are the inverse, they have P type doping in an N type well. Since negative charge carriers flow more freely than positive charge carriers most substrate is by default p type. This does mean that for Pmos transistors you need to generate and Nwell.
Due to the size and scale of the circuits involved parasitic capacitance is a huge issue. As such layouts have to minimize parallel tracks. Additionally to carry away rouge charges and prevent signal leakage all transistor structures need to be amply connected to bulks to keep their well/substrate isolating the transistor active area.
Capacitance of the transistors themselves is also huge. In order for the circuits to function correctly at high clock rates they have to be matched in size based on what they are driving. so simple branching from 1 transistor into many is not possible without making the transistor large. There is also the issue of current propagation generating race conditions but digital circuit design handles that. However in doing so it introduces the idea of critical paths which limit maximum obtainable clock rate.
This pretty much concludes the VLSI part of circuit design. From there on it is all digital circuit design and the need to optimize the logic. This area itself is also immensely complicated. To reduce the critical path a technique called pipe-lining is used which breaks each step of instruction execution into small approximately equally long segments that can be executed in a chain. Doing this introduces the issue of decision logic stalling the pipeline which is combated to some degree with prediction.
Even logically simple operations like addition can be too slow for high clock rates. Either new and faster ways of computation are used or they are converted into functional units. These units then can run in parallel inside a highly open ended pipeline. This then requires logic to manage the pipeline so that everything executes correctly and stalls when necessary.
Eventually you get to a working instruction machine which you can call a processor. The processor can then communicate with external resources to gain more memory, persistent storage, communication with other processors etc. These are done through communication buses and protocols.
And what's the role of CPU and RAM and HDD at all? Couldn't the gates be like on the MB itself?
The motherboard is a circuit board not an integrated circuit. Although manufactured with similar principles and even tools, it is a completely different technology. You cannot place integrated circuits onto circuit boards.
Additionally out of the integrated circuits that make up your computer there can be differences in technology. The most widely known is that RAM uses a completely different technology from the CPU due to the way RAM operates. RAM is far more efficient power wise than the cache on your CPU for that reason but RAM technology cannot be made into high performance CPUs. Until recently it was impossible to make affordable integrated circuits with storage density similar to mechanical HDDs however with solid state that is changing. Solid state is printed much like RAM and CPUs but again using an entirely different technology.
Many modern computers such as from Apple integrate all 3 together to some extent. A CPU/GPU on a single die from AMD is soldered onto a motherboard circuit board along was RAM (often soldered directly onto the CPU/GPU chip) and FLASH RAM chips (solid state) are also soldered directly onto the motherboard. The result is a highly optimized system that can sit in unbelievably thin cases and weigh practically nothing at the expense of being completely unable to upgrade or replacing anything in the future.