Sunday, 15 September 2024

AX650N - Sipeed Maix-IV (AXeraPi-Pro) NPU teardown

After spending a significant amount of time reverse-engineering the RK3588 NPU and examining Rockchip's 6 TOPS claim. The AXera AX650N SoC piqued my interest due to AXera's ambitious claims about the NPU's processing power, boasting "72T mixed precision computing power, native support for Transformer intelligent processing platform". Upon closer inspection, the AX650N delivers 72.0 TOPS@INT4 and 18.0 TOPS@INT8. Interestingly, the performance claim from INT8 to INT4 is a 4x gain, rather than the typical 2x improvement. There is an ongoing effort to port some of the smaller Transformer model to showcase its capabilities. However, given the performance claims I would have expected some larger models to be showcased, but that doesn't seem to be the case.

4 comments:

  1. Hello I also have an AX650N dev board bought from sipeed. When using its NPU for neural network model inference, I noticed that the system's load average is particularly high. Additionally, at this time, the system's built-in serial port experiences data loss when receiving external data. I used cyclictest to test the system's real-time performance and found that the latency reached 9000~13000 microseconds, which is significantly worse compared to products like the Jetson Nano or Rockchip. I tried contacting Axera and Sipeed but received no response. I would like to ask if you've encountered this issue before, or based on your experience, what might be the cause of this problem? Thank you and I hope to hear back from you.

    ReplyDelete
    Replies
    1. When I tested the Phi-3 mini, I didn't encounter the issue you're experiencing; in fact, the CPU load was very low, and the serial port worked fine. That said, I suspect the kernel drivers may not be well-coded, but without access to the source code, this can't be confirmed. Which model(s) were you trying to run?

      Delete
    2. In fact, I encountered this issue while running the FRTDemo included in the SDK. The test command is /opt/bin/FRTDemo/run.sh -s 0 -p 0, and the effect is to pull video from the camera and use the NPU for inference. It seems that this command requires two external MIPI cameras to run properly. After running this command, the system's load average becomes particularly high (around 16~18), and the vmstat 1 command shows that the system's context switching is extremely frequent.

      Delete
    3. I would also like to confirm which version of the system SDK you are using. I have encountered this issue on SDK versions 1.40, 1.45, and 2.0.2.

      Delete