Unreliable autonomous driving? Take a look at how difficult vehicle-mounted vision processing is
We all have high expectations for autonomous driving, a fact that’s exemplified by the display of autonomous driving technologies at every major tech expo. Development in this sector has however been accompanied by an increase in accidents related to autonomous driving including fatal accidents involving Tesla and Uber.
To enable autonomous driving, we first have to build a reliable Advanced Driver Assistance System (ADAS), which is in itself a daunting task. On one hand, ADAS vision processing must respond to increasingly complicated applications and environments while ensuring reliable performance in low light or severe weather conditions; on the other hand, to improve the identification accuracy of the vision system and provide it with capacity for autonomous learning, machine learning, neural networks and other AI algorithms must be incorporated.
These requirements necessarily increase the complexity and load of visual processing tasks and consume more computing resources and time. However, this is in direct conflict with the attributes of the embedded environment of vehicle-mounted applications, an environment that has limited resources and is extremely hardware-demanding. This is the predicament developers of vehicle-mounted applications face on a daily basis.
Figure 1 Typical process for vehicle-mounted vision processing
To achieve breakthrough, let us take a look at the typical process for vehicle-mounted vision processing. The process includes four steps:
Pre-processing: Pre-processing includes frame technology, color adjustment, white balance, contrast balancing, and image rectification. Not only does processing these graphics output primitives require massive amounts of data, each primitive is moreover independent from the others and have low levels of dependency, and therefore require high bandwidth as well as parallel data processing capacity.
Feature extraction: Feature extraction involves the extraction of features from images on the basis of pre-processing, particularly key edges and corners.
Target identification: The objects in the images—people, vehicles, traffic signals, etc.—are identified based on the output of data for distinguishing features and require the utilizing of machine learning and neural network algorithms.
Target tracking: Each frame in the aforementioned images is recorded and multiple frames are accumulated for determining targets and realizing stable identification and judgment.
The three first steps are usually considered bottom- and mid-level processing while parallel processing is relatively more advanced. Step 4 involves logical relationships that require sequential execution and continuous processing. We can therefore conclude that each task in vehicle-mounted vision processing has different requirements and that it is difficult for a hardware platform with a single framework to satisfy all requirements. That’s why we need to build a more complex and comprehensive heterogeneous system architecture that enables different hardware resources to respond to different calculation and processing tasks.
The S32V vehicle-mounted vision processor produced by NXP Semiconductors, for example, is equipped with specific computing units corresponding to the different steps within vehicle-mounted vision processing.
Figure 2 Diagram of the NXP Semiconductors S32V vehicle-mounted vision processor (source: NXP)
To pre-process graphics output primitives for feature extraction, S32V utilizes a programmable image sensor processor (ISP) to accelerate stream processing. The programmable design also provides bottom-level processing with the flexibility to respond to the pre-processing requirements in different applications.
AI algorithms needed to perform tasks from feature extraction to target identification require vision acceleration. For this purpose, S32V incorporates two dedicated APEX-2 coprocessors to facilitate high-speed parallel single instruction multiple data (SIMD) accelerated computing.
High-level processing tasks from target identification to target tracking involve serial computing. S32V completes these tasks using the multiple-core (quad-core maximum) ARM Cortex-A53 processor with a clock speed of 1GHz. Its processing system also has an integrated Cortex-M4 core with frequency up to 133 MHz to implement control functions and real-time tasks.
Additional functions such as 3D GPU, hardware security encryption, storage, and peripheral interface are integrated into S32V to form a comprehensive automotive-grade embedded security vision processing platform.
Appropriate hardware is however only the first step toward a comprehensive solution or product. Software collaboration is also required. The key lies in achieving optimal integration of hardware and software in embedded visual applications with limited functions and high sensitivity to power consumption. This means that software tasks must be allocated to the most suitable hardware unit for processing to fully make use of and unleash hardware capacity.
To achieve this goal, we need to first analyze and classify typical computing models for applications, identify tasks that require or potentially require acceleration, and assign suitable hardware to perform the acceleration. In practice, there are several methods we can try to accelerate parallel computing:
- Data parallelism: Assign data that require parallel processing to units with parallel processing capacity such as dedicated coprocessors like APEX-2. Dedicated processors are always faster than general processors.
- Pipeline parallelism: Organize and assign various computing units so all units can operate at full capacity at the same time, ensuring that no units are idle.
- Task parallelism: Assign different vision processing tasks to be performed concurrently.
Vehicle-mounted vision processing speed and comprehensive performance can only be greatly enhanced after the aforementioned comprehensive optimization and close integration of software and hardware.
Figure 3 Process of vehicle-mounted vision processing and corresponding S32V hardware resources (source: NXP)
Vehicle-mounted vision is a challenging sector for embedded vision processing and requires closer synergy and comprehensive integration of different resources. Hardware developers must fully understand the requirements of the target application in order to formulate high-performance hardware acceleration structures. Software developers must also make full use of hardware and allocate resources reasonably to maximize hardware performance. This may be difficult, but it’s a necessary step for us to achieve fully autonomous driving.