A case study of OpenCV Optical flow based on Ultra96 Development Board

In this article, we will introduce you to a specific case, the OpenCV optical flow project based on Avnet Ultra96 development board. This development board is a very cost-effective development board based on Xilinx 16nm Zynq MPSoC device, which is ideal for embedded computing applications such as machine vision and artificial intelligence. Xilinx's Zynq series of devices are excellent SoC devices, including ARM Processing System (PS) and Programmable Logic (PL), which can effectively realize the collaboration of software and hardware, make better use of the resources of the devices, and are widely used in automotive, consumer electronics, testing instruments, industrial production and other fields.

Through this case, we will understand how Xilinx's Zynq device can support development with the popular Python language, as well as OpenCV applications. With the PYNQ framework, the development productivity of Python in embedded systems can be greatly accelerated. At the same time, the support for hardware library, overlay, is added in PYNQ, and a special Python API calling to the overlay is provided, and bitstreams can be loaded dynamically. In this optical flow case, we compare the performance differences between the Farneback optical flow algorithm implemented by software and the dense non-pyramidal Lucas-Kanade optical flow algorithm implemented by hardware logic.

Optical flow is a complex algorithm function, which is mainly used to detect the motion of objects in a frame. There are several similar algorithms for optical flow. In this case, we use the Farneback algorithm of OpenCV and the dense non-pyramidal Lucas-Kanade algorithm from Xilinx xFOpenCV library. As input, we can use the images stored in the SD card or those captured in real time from the external USB camera.

By learning from this case, users may gain insight on the following:

Overlay: Hardware library

  • Different algorithms can be implemented in different bitstreams, a bitstream can be simply considered as an algorithm hardware accelerator implemented by PL in the device.
  • Overlay can be accessed with APIs, (very easy to download bitstream)


  • PYNQ supports Python programming on the Xilinx Zynq platform
  • Python is a very productive programming language

Jupyter Notebook

  • This is an interactive computing environment that enables engineers to create notebook documents, including writing code, interactive widgets, drawings, annotated text, embedded images and videos.
  • This tool allows engineers to edit and run Python code in a browser environment

Infographic comparing Software Stack to Hardware Stack

Figure 1: Optical flow implementation on software stack and hardware stack

From Figure 1, we can clearly see the difference between the software and the hardware implementation. The software implementation used the traditional approach of calling openCV function in the Python language and then running it in the PS of the Zynq device. In this implementation, the PL is not used. From the hardware implementation perspective, it is still programmed in Python, but at present, it calls Xilinx optimized openCV library-xFOpenCV. These libraries have corresponding hardware overlay in PL. Ultimately, we compare the performance of the two implementation methods, including the processing time and the frame rate processed.


3 people meeting on a street intersection

3 people meeting on a street intersection

Figure 2: Result pictures

Bar graph comparing the execution tme per frame of Software Optical Flow to Hardware Optical Flow

Bar graph comparing the frames per second of Software Optical Flow to Hardware Optical Flow

Figure 3:Result performance comparison

Code reference:

This is the code that calls the algorithm overlay and the memory manager, as a part of the program's preprocessing.

Screenshot of code for program overlay

Figure 4:Program overlay example

This is the code implemented in the software environment

Screenshot of software code

Figure 5:Software code example

This is the code implemented in the hardware environment; the two major differences are highlighted in red.

  1. Definition of variable of memory: usually only the size of memory space needs to be considered in software design, because the architecture of memory is generally fixed. However, the hardware implementation factors should be considered in the hardware design, including fixed point and floating point, the number of data bits, the specific length and width of memory space and the architecture implemented by Array, FIFO, dual-port RAM, etc. In hardware design, the memory space and architecture can be completely customized, especially in embedded products, it is necessary to save as much memory resources as possible.
  2. The calling of optical flow function: this is the main difference from the software program. In the hardware code, this is done by calling overlay through API. This optical flow function is actually implemented in the PL of the Zynq device, which demonstrates the concept of hardware acceleration of the OpenCV algorithm.

Screenshot of hardware accelerator code

Figure 6:Hardware code example

Here is a brief introduction to Xilinx's xfOpenCV library, which is an optimized OpenCV library for Xilinx devices, mainly for embedded developers on Xilinx Zynq-7000 and Zynq MPSoC devices, but it can also run on pure FPGA devices. With the help of Xilinx development tool SDx or the current Vitis tool chain, engineers can call these computer vision functions in a software development environment and do hardware acceleration in FPGA. In addition, the function capability of the xfOpenCV library is also very similar to the traditional OpenCV function, which makes it easier for developers to understand and port. For further information regarding Xilinx xfOpenCV, please refer to the Xilinx User Guide UG1233.

Xilinx PYNQ

PYNQ is an open-source project from Xilinx

Learn More

Let's Talk

Connect with Avnet's Xilinx experts to find the right solution for your business

Learn More