Friday, 2 May 2025

RK3588 - Implementing a Vectorscope for processing video in real time

Following on from my previous post covering decoding and rendering HDMI input on the RK3588, I encountered a new requirement: implementing a real-time vectorscope to visualize chrominance information from the video stream. This proved to be a challenging task due to the need for efficient frame processing and rendering without impacting video playback performance.


Extracting UV Data from Video Frames

The challenges to overcome were:

  • Accessing U and V (chrominance) values for every pixel in the video stream.
  • For RGB-formatted frames, a costly color space conversion (RGB to YUV) is required.
  • The processing overhead increases significantly with higher-resolution frames.

To minimize CPU overhead and offload the color space conversion, I utilized RGA3, which efficiently converts each RGB frame into NV12 or NV16 format. This step significantly reduces the processing time needed to access UV data.

Once converted, the UV plane needed to be imported into an OpenGL ES texture for further processing and visualization. To preserve performance and avoid unnecessary memory copies, the goal was to directly bind the UV data as a texture.

Computing the UV Histogram

The primary processing here 

  • Computing a UV histogram in real time based on pixel data from the video frame.

  • Normalizing the histogram values so it scales properly regardless of resolution or luminance.

Building a real-time UV histogram from each frame requires scanning and processing a large volume of pixel data efficiently. Traditional OpenGL fragment shaders are not well-suited for this type of arbitrary data accumulation, especially when dealing with high-resolution video. 

Given that the Mali G-610 GPU is compliant with OpenGL ES 3.1, I explored the use of compute shaders, a more flexible approach for performing general-purpose GPU (GPGPU) operations like histogram generation.

Compute shaders for OpenGL ES is sparsely documented, and practical usage examples, especially for embedded platforms are limited. As a result, much of the development involved experimentation and iterative debugging, making it feel like a bit of a black art. Finally I managed to chain to together 3 compute shaders each perform one step in the pipeline.

Rendering the Vectorscope Output

The final step involved visualizing the processed chroma data in a way that mirrors a traditional vectorscope. This required overlaying the normalized UV histogram along with reference markers without disrupting video playback.

I drew inspiration from the OBS monitor plugin, which includes a vectorscope feature. Its rendering approach informed how I structured the visualization pipeline, particularly around how histogram data is mapped to screen coordinates.

The demo video shows the Vectorscope is capable of processing a 1080p@60 video stream.

Work was carried out on a ROCK 5B board running a tailored Ubuntu image maintained by Joshua Riek.

 

Thursday, 1 May 2025

RK3588 - Building an simple HDMI anaylzer

While developing and debugging HDMI drivers and custom video applications. You quickly come to appreciate the convenience of having a straightforward way to debug the HDMI output both video and audio in real time. This led me to develop a utility specifically designed to display basic HDMI metadata and render the incoming video and audio stream. The goal was to create a tool that could simplify the process of diagnosing HDMI-related issues when connected to an HDMI source.


One of the undervalued features of the RK3588 SoC is its built-in HDMI receiver—an often overlooked capability that eliminates the need for external HDMI-to-CSI adapters like the TC358743XBG or RK628D. This built-in receiver makes the RK3588 an ideal platform for developing such a utility, as it allows direct access to HDMI input source without additional hardware.

While the BSP hdmi rx kernel driver may not be of the highest quality, it offers several useful IOCTLs for detecting the present of a valid signal in addition to providing valuable video and audio metadata. With this in mind, the application was developed as a Weston (Wayland) client, which turned out to be more challenging than initially anticipated. The primary difficulties arose from:

  1. Video Input Rendering – Optimizing the pipeline to minimize latency between frame acquisition and on-screen rendering. For rendering, I went with GStreamer pipelines paired with WaylandSink, to ease integration the Wayland compositor.
  2. Overlaying graphical information on top of video playback – synchronizing real-time overlays with live video in a Wayland compositor like Weston required careful handling of surface layers and rendering pipelines. For rendering overlays, I adopted an approach based on Weston subsurfaces.
  3. Converting and scaling the video input – processing the raw video feed and resizing it dynamically while maintaining real-time performance introduced both technical and performance challenges. To handle conversion and scaling, I resorted to developing a custom GStreamer plugin built around RGA3, deliberately bypassing RGA2 due to the known constraints with 32-bit memory addressing (its no different to the NPU).
  4. Audio Detection – The HDMI RX kernel driver monitors for the presence of an audio stream. The challenge was to dynamically add or remove audio from the GStreamer pipeline based on its availability and without disrupting the ongoing video stream.

As illustrated in the image below, basic metadata for both the video and audio streams is displayed in the overlay window.

 

In the first video (above), the HDMI output from a Lenovo Windows laptop is used as the input source. The laptop is configured to duplicate its display over HDMI, functioning as a secondary screen. This setup effectively demonstrates minimal latency in both user interaction and video playback, highlighting the responsiveness of the HDMI capture pipeline.


The second video showcases the utility’s scaling capabilities. With the output display set to 1080p, I used a Lindy HDMI 18G Signal Generator to feed input signals at various resolutions—720p, 1080p, 4K@30Hz, and 4K@40Hz—in multiple formats including RGB, NV16, and NV12. All input signals were successfully scaled to 1080p.

During testing, I discovered a limitation: although the RK3588’s HDMI RX receiver supports NV24 format input, neither RGA3 nor RGA2 (according to documentation and my own validation) are capable of processing NV24. Interestingly, the RK3576 is equipped with the newer RGA2-Pro which support NV24, despite lacking an HDMI RX receiver.

Work was performed on a ROCK 5B board running a tailored Ubuntu image maintained by Joshua Riek.