Imagine cruising down a bustling street, not behind the wheel, but as a passenger in a self-driving car. Every inch of the environment is a blur of information?—?lane markings, traffic signs, vehicles weaving in and out, pedestrians crossing intersections. This constant data stream is crucial for safe navigation, but how do Autonomous Vehicles (AVs) decipher it all?
The answer lies in sensor data perception, a technology akin to human senses. LiDAR, one of the prominent sensors, paints a 3D picture of the world using laser pulses, providing accurate information even in challenging conditions both day and night. But raw LiDAR data need interpretation. Just like our brain processes sensory inputs, the perception system of AV analyzes LiDAR data and interprets the world. This is where pctFusion, a cutting-edge 3D deep learning architecture co-developed by SimDaaS, steps in.
Think of pctFusion as a highly trained translator. It takes the LiDAR point cloud and transforms it into a rich understanding of the environment. It identifies objects, cars, and vegetation-even faded or obscured, and differentiates erratic two-wheelers from pedestrians. This intricate understanding is especially crucial in decision making and building a real-time map for safe navigation.
pctFusion (see Figure 1) has multiple features that make it efficient and effective specifically for point cloud segmentation, like:
Dual convolutions and positional encodings: It captures local spatial relationships within point neighborhoods. Extracting features from multiple neighborhoods (like overlapping patches in images) is beneficial for high-level feature extraction. However, geometric information about objects often loses its meaning when features are learned through convolutions. This is primarily because, while establishing a neighbourhood to a representative point, relative encodings are learned. This means an aggregator representative point is aware of the relative positions of the neighbour points, however, points are not aware of each other’s position. During transformations in higher dimensions, the geometry information due to relative encodings often leads to geometric information loss. pctFusion addresses this by incorporating positional encodings, allowing each point to be aware of its neighbor’s position and leading to better geometric learning in higher dimensions.

Hierarchical local & global attention: Self-attention mechanisms at different levels capture long-range dependencies and global context within the entire point cloud. These local and global embeddings are then fused for more informative feature vectors, enhancing representation ability. Notably, global attention is implemented after multiple convolutions for computational efficiency.
Pointwise Geometric Anisotropy (PGA) Loss: pctFusion introduces an attention-based loss function that assigns weights based on the semantic distribution of points in a neighborhood. This overcomes limitations of the existing loss functions that neglect semantic and positional importance, leading to improved accuracy, especially at sharp class boundaries. The impact of the PGA loss can be visualized in the Figure 2 below:

The ability of pctFusion to exploit LiDAR point cloud characteristics is a significant leap forward in segmentation. The perception engine of AVs benefits from this by developing a better understanding of their surroundings and, thus, better decision making. Whether it’s bustling city streets or winding country roads, LiDAR and pctFusion pave the way for a future where autonomous transportation becomes a reality. The full paper and the code can be accessed at: https://link.springer.com/article/10.1007/s42979-024-02627-5