Developing multicamera vision systems for autonomous vehicles
There’s growing interest in autonomous vehicles, but there’s a wide gap between the technology that today’s ‘self-driving cars use to steer along well-lit, well-marked highways and that needed to drive safely along an unlit, unmarked country lane in the rain at dusk. Many companies are trying to bridge this gap by developing sophisticated embedded-vision systems that can capture large volumes of scene data and then apply machine-learning strategies to plan the next stage of a vehicle’s route. Creating such systems requires a development system that enables rapid experimentation, systemic optimization, and a quick route to commercialization.
The hardware challenge
One of the biggest challenges in developing vision systems for autonomous vehicles is handling the very large amounts of data created by multiple cameras and sensors.
For example, the advanced driver assistance systems used in cars have multiple image cameras, as well as radar and LIDAR sensors, to achieve a 360° field of view. There are often cameras in the vehicle to sense driver alertness and other safety-related factors. As we move towards greater vehicle autonomy, the number of cameras involved will grow, their resolution will increase from 1Mpixel to 8Mpixel, and their frame rates will rise from the 10 to 30frame/s of today to 60frame/s.
It’s a similar story for UAVs (unmanned aerial vehicles) or drones, which use multiple cameras to sense their environment, and application-specific sensors for tasks such as agricultural inspection, security and surveillance. Autonomous guided robots also use multiple cameras so they can strongly coordinate situational sensing, motion tracking and decision-making to enable fast reactions in busy environments.
Autonomous vehicle vision systems must also offer very high levels of reliability and functional safety, real-time execution with low latency, minimal power consumption, the flexibility to work with different camera configurations, and extensive capabilities to implement computer-vision and machine-learning algorithms for object detection, recognition, classification and verification.
Vision systems for cars also need to achieve automotive qualifications, which means their constituent components must be available in automotive quality grades throughout a relatively long product lifecycle. Developers, therefore, need to ensure that there’s an easy route from the systems upon which they validate their hardware and software choices to a qualified commercial system.
The software challenge
On the software side, the amount of data that has to be managed is large, and working with the heterogenous multiprocessor system architectures needed to process that data adds complexity to the code base. Machine-learning algorithms are also evolving very rapidly, making it more likely that core code has to be revised late in the development project. Reliability and safety issues further complicate the development process.
One of the key tasks in developing vision systems for autonomous vehicles is sensor fusion, correlating inputs from multiple sensors of different types – for example an image sensor, LIDAR and radar in a car – to correct deficiencies in individual sensors, and to enable more accurate position and orientation tracking. If the end application is something like obstacle avoidance, then sensor fusion has to be done with low latency. This is obviously challenging, but the good news is that traditional computer-vision experts have been developing solutions for years.
The second major software task is to implement efficient machine-learning algorithms to do the object recognition, classification and verification necessary for an autonomous vehicle to ‘understand’ its environment and react appropriately. Machine-learning strategies, especially those based upon mimicking human brains, have been around for decades. However, the steady rise in general computing power, the development of specialised processors such as graphics processing units (GPUs), and the availability of flexible computing resources such as field programmable gate arrays (FPGAs), has provided the processing power necessary to move from lab experiments to practical applications. The availability of very large datasets, for example of tagged images, has providing the raw material upon which these neural networks can be trained. And recent research shows that although training a neural network takes a lot of computing power, there are shortcuts that can reduce the computational load of using a trained neural network to infer information from the data with which it is presented.
A development solution
Creating a vision system for autonomous vehicles obviously involves a lot of experimentation with real-world data and systemic optimisation among the hardware, software, computer-vision and machine-learning aspects of the design. Developers therefore need tools that help them make trade-offs efficiently, so that they minimise their development time and risk.
The development solution presented here is for an embedded vision system using multiple cameras. It includes a design methodology, hardware platform, software environment, drivers, and access to example computer-vision and machine-learning algorithms and implementations.
The hardware (see Figure 1) is architected to support up to six cameras. The current design supports four 2Mpixel cameras, and this capability will be upgraded in a future design to support an additional two 8Mpixel cameras. It is based on a Xilinx Zynq UltraScale+ MPSoC, which combines a quad-core ARM Cortex-A53 64bit processor with a programmable logic fabric. The cameras connect over coaxial cables of up to 15 metres in length to a multi-camera deserialiser on an FPGA mezzanine card (FMC). That module plugs on top of a board that carries the MPSoC.
Figure 1: Overall hardware architecture with camera modules, multi-camera FMC module, and a carrier card with a multi-processor SoC or SOM
The Zynq UltraScale+ MPSoC devices (see Figure 2) are available with a variety of CPUs, and the option of additional GPUs and video codecs. They also have a programmable logic fabric, based on an FPGA, which can be configured as a hardware accelerator for computationally intensive algorithms, or for other functions using Xilinx or third-party IP blocks.
Figure 2: The Zynq UltraScale+ MPSoC offers rich processing resources and configurability.
The design methodology
The design methodology is based on reVISION, a set of software resources for developing algorithms, applications and platforms. It works with current development tools for hardware and software application design, including Xilinx’ SDSoC environment, PetaLinux, and libraries for computer vision and machine learning. Figure 3 shows a simplified diagram of the overall hardware/software architecture.
The development toolchain also supports GStreamer, the pipeline-based multimedia framework that enables developers to link various media-processing systems to complete complex workflows. Its pipeline-based design has been used to date to build applications such as video editors and transcoders, or streaming media broadcasters, which also serves as a useful basis upon which to build deep-learning based analysis of streaming media data.
Figure 3: The general architecture of the reVISION hardware/software platform
Applying machine-learning strategies, including the deep-learning algorithms made possible by neural networks with multiple internal layers, to multicamera visions systems means finding efficient ways to access the very large amounts of computing resources these algorithms can demand. The programmable logic fabric available on Xilinx MPSoCs can provide large amounts of dedicated computing power, while the toolchains that the company offers can ensure that this resource is used as effectively as possible.
Recent research has shown that, at least for inference tasks, it is possible to reduce computational loads by discarding any matrix calculations that have zero, or a number close to zero, as an operator. It also shows that, in some cases, developers can reduce the precision with which values are represented in these calculations, from the 8bit and 16bit integer data types widely used during algorithm development all the way down to binary representations.
Xilinx has acquired a company called DeePhi Tech, which provides tools with which users can explore the impact of their choices of deep-learning algorithms, and the efficiencies that may be gained from making the sorts of simplifications discussed above.
The DeePhi Deep Neural Network Development Kit (DNNDK) takes as its input the output from the most popular algorithm exploration toolkits, such as Google’s TensorFlow or Caffe. These toolkits enable developers to experiment with different types of deep-learning algorithms, and various configurations of each type. Their output is then fed into the DNNDK toolchain, which includes a graph compression tool (known as DECENT), a deep neural network compiler (DNNC), assembler (DNNAS), and a neural network runtime (N2Cube). DeePhi has also developed hardware architectures optimised for video and image recognition, which can be instantiated on an FPGA fabric. As you would expect, the DNNDK toolchain includes simulators and profilers for these architectures.
Figure 4: An example development solution, with a ZCU102 carrier card and the multi-camera FMC bundle including the multi-camera FMC module and four 2Mpixel modules
There’s a lot of interest in developing embedded vision systems for autonomous vehicle control. Avnet Silica offers embedded vision solutions bundled with tools and reference designs to minimize development time in creating autonomous vision systems (see Figure 4). Given the novelty of some of the technology involved, and the need to work with real-world data, engineers need to start experimenting with algorithms and systemic optimizations as soon as possible. The development system described enables this kind of fast start, as well as making it simple for developers to access the benefits of hardware acceleration and model compression strategies.
For more information go to avnet-silica.com/ai4cameras, email firstname.lastname@example.org, or contact your local Avnet Silica offices.
This article is also available in German at elektronikpraxis.de and next-mobility.news.