What is Bitcoin mining actually doing?
From trying to design my own ASIC-I got as far as having a simulated but not completely debugged Verilog implementation-I can tell you how mine would have worked. Whilst I have not checked, the design choices seem so obvious to me that I doubt anyone would do it differently.
The inner loop of the mining process is a double SHA-256 hash of data where only one 32-bit word, essentially a counter or "nonce", changes. It looks for a specific result where there are enough zeros in the right place after the second SHA-256 and only needs to output that counter value for which this is the case (if it is for any).
This inner loop is perfect for implementation in hardware: The SHA-256 is of reasonably low complexity, is itself a loop that can easily be unrolled and pipelined (there are 64 identical steps in each of the two applications after an optimization I'll describe later), and if the counter-loop is included in the hardware, then the required IO is very low, both in terms of actual data transmitted and in the sense that it can be very slow with negligible impact on overall performance.
In the source code you reference, the loop I am talking about constitutes the entire function FindShare (lines 85 through 107). However, let me explain that of the actual work, implemented in FindShare as Sha256(Sha256(Current)) in line 90, due to an easy and very common optimization, only half of the initial function call is executed. The first half can easily be moved outside the loop.