We cannot initiate an insightful discussion regarding the relationship between PYNQ and accelerated computing without mentioning the origins of FPGA and computing acceleration. Although this dependency has existed for many years and widely used in many professional high-performance computing applications, accelerated computing based on FPGAs has only recently become a hot topic, while FPGA technology has gradually taken the form of a killer application. To a large extent this is due to the explosive proliferation of big data and the large-scale deployment of FPGA accelerator cards by several giant cloud computing companies. Today, as Moore's Law is gradually fading away from the focus of the industry, engineers are looking for better ways to process more data at faster speed with lower power consumption. FPGA is considered to be an excellent platform for accelerating computation processing, especially for tasks that require massively parallel computing. FPGA can reduce the workload of traditional processors by implementing important computing-intensive algorithm modules in programmable hardware, thus greatly reducing latency and power consumption.
It is generally believed that the purpose of accelerated computing is to improve computational efficiency, shorten the development and verification time of the algorithm. At present, popular acceleration frameworks include multi-node distributed computing, multi-processor and multi-thread parallel computing. There are also some distributed computing engines similar to Spark and parallel computing languages such as Scala. These frameworks provide excellent performance in different application fields, such as cloud computing, silicon design, software engineering. As far as the acceleration method is concerned, it involves the parallel optimization of the algorithm itself, the parallel processing of the architecture and the computing architecture directly leveraging multi-core processors. FPGA is bound to play an important role in accelerated computing. Compared with traditional CPU, DSP and other processors, the most obvious advantage of FPGA is the native parallelization. Through the parallelization of hardware logic resources, ideal data throughput can be achieved at relatively low clock frequencies, while delivering lower latency and power consumption.
For years, there is precedent of using hardware boards to accelerate algorithm simulation and data modeling, which would be several orders of magnitude faster than simulation programs that usually run only on processors. Many algorithm engineers are deeply impressed by the value provided by this type of applications. The term “hardware in the loop (HIL)” is the terminology from the simulation industry. To put it simply, it is referring to insert hardware devices in the simulation link which can imitate the actual hardware environment to verify an algorithm model, and at the same time, it helps the simulation to run much faster, which is significantly more efficient than the traditional software simulation environments. FPGA is a very important platform choice for this type of hardware, which on the other hand reflects the computing efficiency of FPGA versus traditional desktop computers.
In general, the algorithm acceleration provided by FPGA is to accelerate some computationally intensive algorithms by offload the task from processor to the FPGA logic, which is what FPGA engineers would like to pursue. For years of practice, this concept has been proven. FPGA has made great strides in embedded computing and cloud computing. For example, in cloud computing, a PCIe-based FPGA acceleration card will greatly offload computing-intensive algorithms from the CPU to the acceleration card, which has a great advantage in terms of processing speed and power consumption.
Not only the PYNQ framework has been repeatedly mentioned in this article which can improve the productivity for embedded developers, but also fully validated the concept of hardware acceleration to software developers, so that software developers can directly feel and experience the role of hardware acceleration. By calling the overlay in the Python code, we can use hardware to implement those algorithms and realize algorithm acceleration, which is a remarkable improvement over the performance of the Python algorithm in processors. Through the lab practice with PYNQ, we can unveil the mystery of accelerated computing, focus on offloading algorithms to hardware. In the coming articles, we will introduce the details for the concept of algorithm acceleration under the PYNQ framework, which is also the core value of FPGA for big data processing.