Tuesday, 27 August 2019

Jetson Nano - Developing a Pi v1.3 camera driver Part 2

I liked to thank motiveorder.com for sponsoring the hardware and development time for this article. 

Following on from my previous post, finally I am in a position to release a alpha version of the driver unfortunately at this stage only in binary form. Development of the driver has been complicated by the fact that determining the correct settings for the OV5647 is extremely time consuming giving the lack of good documentation.

The driver supports the following resolutions

2592 x 1944 @15 fps
1920 x 1080 @30 fps
1280 x 960  @45 fps
1280 x 720  @60 fps

I have added support for 720p because most of the clone camera seem to be targeting 1080p or 720p based on the lens configuration. I mainly tested with an original RPI V1.3 camera to ensure backward compatibility.

The driver is pre-compiled with the latest L4T R32.2 release so there is a requirement to deploy a kernel plus modules and with a new dtb file. Therefore I recommend you do some background reading to understand the process before deploying. Furthermore I recommend you have access to the linux console via the UART interface if the new kernel fails to boot or the camera is not recognised.

Deployment of the kernel and modules will be done on the Nano itself while flashing of the dtb file has to be done from a Linux machine where the SDK Manager is installed.

Download nano_ov5647.tar.gz and extract to your nano :

mkdir ov5647
cd ov5647
wget  https://drive.google.com/open?id=1qA_HwiLXIAHbQN-TTEU1daEIW9z7R2vy

tar -xvf ../nano_ov5647.tar.gz

After extraction you will see the following files:

-rw-r--r-- 1 user group 291462110 Aug 26 17:23 modules_4_9_140.tar.gz
-rw-r--r-- 1 user group 200225    Aug 26 17:26 tegra210-p3448-0000-p3449-0000-a02.dtb
-rw-r--r-- 1 user group  34443272 Aug 26 17:26 Image-ov5647


Copy kernel to /boot directory :

sudo cp  Image-ov5647 /boot/Image-ov5647

Change boot configuration file to load our kernel by editing /boot/extlinux/extlinux.conf. Comment out the following line and added the new kernel, so the change is from this:

      LINUX /boot/Image

to

       #LINUX /boot/Image
       LINUX /boot/Image-ov5647


Next step is to extract the kernel modules:

cd /lib/modules/
sudo tar -xvf <path to where files were extracted>/modules_4_9_140.tar.gz


The last step is to flash the dtb file, tegra210-p3448-0000-p3449-0000-a02.dtb.  As discussed in the comments section (below) by jiangwei it is possible to copy the dtb file directly to Nano refer to this link on how this can be achieved. See section  "Flash custom DTB on the Jetson Nano"

Alternatively you can use SDK manager,  flashing require copying the dtb file to the linux host machine into the directory Linux_for_Tegra/kernel/dtb/  where SDK your installed. Further instructions on how to flash the dtb are covered in a post I made here however since we don't want to replace the kernel the command to use is:

sudo ./flash.sh --no-systemimg -r -k DTB jetson-nano-qspi-sd mmcblk0p1

There seems to be some confusion about how to put the nano into recovery mode. The steps to do that are:

1. Power down nano
2. J40 - Connect recovery pins 3-4 together
3. Power up nano
4. J40 - Disconnect pins 3-4
5. Flash file


After flashing the dtb the nano should boot the new kernel and hopefully the desktop will reappear. To verify the new kernel we can run the following command:

uname -a

It should report the kernel version as 4.19.10+ :

Linux jetson-desktop 4.9.140+

If successful power down the Nano and now you can connect your camera to FPC connector J13. Power up the nano and once desktop reappears verify the camera is detected by:

dmesg | grep ov5647

It should report the following:

[    3.584908] ov5647 6-0036: tegracam sensor driver:ov5647_v2.0.6
[    3.603566] ov5647 6-0036: Found ov5647 with model id:5647 process:11 version:1
[    5.701298] vi 54080000.vi: subdev ov5647 6-0036 bound



The above indicates the camera was detected and initialised. Finally we can try streaming, commands for different the resolutions are below:

#2592x1944@15fps
gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),width=2592, height=1944, framerate=15/1' ! nvvidconv flip-method=0 ! 'video/x-raw,width=2592, height=1944' ! nvvidconv ! nvegltransform ! nveglglessink -e

#1920x1080@30fps
gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),width=1920, height=1080, framerate=30/1' ! nvvidconv flip-method=0 ! 'video/x-raw,width=1920, height=1080' ! nvvidconv ! nvegltransform ! nveglglessink -e

#1280x960@45fps
gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),width=1280, height=960, framerate=45/1' ! nvvidconv flip-method=0 ! 'video/x-raw,width=1280, height=960' ! nvvidconv ! nvegltransform ! nveglglessink -e


#1280x720@60fps
gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),width=1280, height=720, framerate=60/1' ! nvvidconv flip-method=0 ! 'video/x-raw,width=1280, height=720' ! nvvidconv ! nvegltransform ! nveglglessink -e

The driver supports controlling of the analogue gain which has a range of 16 to 128. This can be set using the 'gainrange' property, example below:

gst-launch-1.0 nvarguscamerasrc gainrange="16 16" ! 'video/x-raw(memory:NVMM),width=1280, height=720, framerate=60/1' ! nvvidconv flip-method=0 ! 'video/x-raw,width=1280, height=720' ! nvvidconv ! nvegltransform ! nveglglessink -e

If you require commercial support please contact motiveorder.com.

Sunday, 23 June 2019

Jetson Nano - Developing a Pi v1.3 camera driver Part 1

I liked to thank motiveorder.com for sponsoring the hardware and development time for this article. 

The jetson nano is fairly capable device considering the appealing price point of the device. In fact its one of the few ARM devices which out of the box provides a decent (and usable) X11 graphics stack (even though the drivers are closed source).
Although the jetson nano supports the same 15 pin CSI connector as the RPI camera support is currently limited to Pi V2 cameras which is host the imx219. The older Pi v1.3 cameras are appealing partly because there are numerous low cost clones available and partly because there are numerous add ons such as lenses and night mode options.

The v1.3 cameras uses the OV5647 which apparently is discontinued by OmniVision furthermore the full datasheet isn't freely available (only under NDA). There is a preliminary datasheet on the internet but it seems to be incomplete or worse inconsistent in places. This does hinder the process some what as debugging errors can be very time consuming and at time frustrating.

One noticeable different is that the v1.3 camera hosts a 25Mhz crystal where most non rpi OV5647 boards use a standard 24Mhz. This can make the tuning more difficult as some of the default setting need adjustments.


The first step in bringing up the camera was ensuring the board was powered on so that it could be detected for through its i2c interface (address 0x36). After numerous attempts the OV5647 finally appeared:


Warning: Can't use SMBus Quick Write command, will skip some addresses
WARNING! This program can confuse your I2C bus, cause data loss and worse!
I will probe file /dev/i2c-6.
I will probe address range 0x03-0x77.
Continue? [Y/n] Y
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                                                
10:                                                
20:                                                
30: -- -- -- -- -- -- UU --                        
40:                                                
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60:                                                
70:                                             


The second step was to develop enough of a skeleton kernel driver to initialise the OV5647 and enable it as v4l2 device. Although this sounds may easy it turned out be extremely time consuming for two reasons. Firstly due to the lack of documentations for OV5647 and secondly the NVIVIDA camera driver documentation is also poor and a in number of cases the documentation doesn't match the code. Finally after a few weeks a v4l2 device appeared:


jetson-nano@jetsonnano-desktop:~$ v4l2-ctl -d /dev/video0 -D
Driver Info (not using libv4l2):
    Driver name   : tegra-video
    Card type     : vi-output, ov5647 6-0036
    Bus info      : platform:54080000.vi:0
    Driver version: 4.9.140
    Capabilities  : 0x84200001
        Video Capture
        Streaming
        Extended Pix Format
        Device Capabilities
    Device Caps   : 0x04200001
        Video Capture
        Streaming
        Extended Pix Format



Next step was to put the camera in test pattern mode and capture a raw image. The OV5647 outputs raw bayer format in our case 10 bit so the captured raw data file needs to be converted to a displayable format. Conversion can be done using a utility like bayer2rgb. Finally I arrived a valid test pattern.

Next stage was to configure the OV5647 to a valid resolution for image capture again which has been extremely challenging for the reasons stated above. Some of the images from numerous attempts are shown on the left and right.

Current progress is that the camera is outputting 1920x1080@30fps however this is work in progress as the driver is in a primitive state and the output format requires further improvements. On the plus side to it is now possible to stream with the nvarguscamerasrc gstreamer plugin. Below is a 1080 recording from the OV5647 with a pipeline based on nvarguscamerasrc and nvv4l2h264enc.

Update: In my 2nd post we have a driver that you can test with.




Friday, 29 March 2019

Machine learning with the i.MX6 and the Intel NCS2




Last October Intel released a upgraded Neural Compute Stick known as NCS2 hosting the Movidius Myriad X VPU (MA2485). Intel claim "NCS 2 delivers up to eight times the performance boost compared to the previous generation NCS". Intel also provide OpenVINO an open visual inference and neural network optimization toolkit with multiplatform support for Intel based hardware. With release R5 of OpenVINO support was added for NCS2/NCS and ARMv7-A CPU architecture through the introduction of library support for Raspberry Pi boards. As a progression from my previous post this give us the opportunity test NCS2 with OpenVINO on the i.mx6 platform. The first video above is showing the sample security_barrier_camera_demo and second is running the model vehicle-detection-adas-0002.xml. These are executed on a imx6q board (BCM AR6MXQ).



 

To maximise performance from NCS2 ideally it should be connected to a USB 3.0 port. Unfortunately the i.mx6 doesn't host native support for 3.0 however most of the i.mx6 range do support a PCIE interface. So our plan was to deployed a mini PCIE to USB 3.0 card in our case using the NEC UPD720202 chipset. Using PCIE also alleviates saturating the USB bus when testing interference with a USB camera.


Target board for testing was the BCM ARM6QX which host on board mini-pice interface. The mini PICE card host a 20 pin USB connector and a SATA connector for USB power. We used an adapter card to expose two USB 3.0 ports hence the NCS2 ending up in an upright position.


OpenVINO provides a easy to use interface to OpenCV via python and C++. In our case for a embedded platform C++ is best suited for optimum performance. Testing was done using a number of the existing OpenVINO samples with the primary code modification being to accelerate resizing the camera input and rendering of the OpenCV buffer to screen.

The face recoginition video above is using object_detection_demo_ssd_async with model face-detection-retail-0004.xml model and is rated 1.067 GFLOPs Complexitiy. NCS2 interference times average 22ms although the model lacks some accuracy with its ability not to distinguish between a human face and 'Dora'. The overall fps rate at 19 is pretty good. In regards to CPU usage on a i.mx6q only one of the 4 cores is fully occupied as suggested by the output of 'top'.

What is nice about OpenVINO is that we can easily compare these benchmarks against the original NCS by simply plugging in it and re-runing the test.




As shown above the inference times rise from 22 to 62 ms although from our testing the trade off seems to be a rise in power consumption and heat dissipation between the two releases of the NCS.


I liked to thank motiveorder.com for sponsoring the hardware and development time for this article.




Sunday, 22 April 2018

ESP32 - Enabling RS485 half duplex UART support

Although the ESP32 UART claims to support RS485 the documentation provided in the Technical Reference Manual is pretty poor as the register and interrupt descriptions are extremely vague in terms of each feature and its purpose. This is further exacerbated by the fact there are no examples provided in the SDK. The subject has been widely discussed in a espressif forum thread and a pull request was submitted. Unfortunately the main problem with the solution was that spurious break characters were observed in the RX FIFO and needed to be filtered. Another issue is that toggling of RTS pin happens with in the RX interrupt handler, so we can't control at an application level.

Having spent quite a few (or too many) hours debugging the UART behaviour in RS485 mode I think I have an improved implementation for a driver.  Its important to note that the driver only supports half duplex mode, ie only one node on the RS485 bus can transmit at any time. See commits in my fork , the main changes are:

1. The RTS pin is toggled outside of the driver code, ie in the application code therefore data direction and auto direction transceiver are supported.
2. Spurious break characters shouldn't occur in the RX FIFO.
3. Enabling of RS485 interrupts, currently only the collision detection is implemented. Further work is required to correctly handle the framing or parity error interrupts.

The uart_rs485_echo example (under  examples/peripherals/) provides a simple demonstration of its use. The example receives data from the UART and echoes the same data back to the sender. After configuring and enabling the UART we can control the RTS pin using the existing uart_set_rts function to control the data direction pin of the transceiver, note this is optional and can be disabled in the code for auto direction transceivers.

The example has been tested with the following boards :

1. SparkFun Transceiver RS485 Breakout board which hosts a SP3485 transceiver.


2. XY-017 RS485 to TTL Module which is an auto direction transceiver with unmarked ICs.







Saturday, 24 March 2018

Machine learning with the i.MX6 SoloX and the Movidius Neural Compute Stick



The i.MX6 SoloX processor is fairly unique in the i.MX6 family primarily because it co-hosts a single Cortex A9 along with a Cortex M4. The heterogeneous architecture proves very useful for hard real-time processing occurring on the M4 while concurrently running a Linux stack running on the A9 (the heterogeneous architecture is implemented on the i.MX8 line of processors). In previous posts using the UDOO Neo I have covered how these features can be exploited when interfacing different peripheral devices. The processor architecture lends itself nicely to IOT (Internet of Things) Edge devices where sensor capture and data preprocessing/conversion can occur on the device before being forwarded to the cloud where a richer set of analytic processing can be performed. If we could perform some (or all) of the analytic processing on the edge device then we might dramatically reduce the amount of device data traffic sent to the cloud. Alternatively the edge device could make decisions for itself and not completely rely on the cloud, furthermore it opens ups the possibility of the edge device partially functioning when the network isn't available. This concept is know as Edge Analytics.

A single Cortex A9 practically isn't up to the job of performing intensive analytical processing especially if we would like to implement a machine learning algorithm. In terms of machine learning techniques Neural Networks are one branch that has gained considerable popularity in the last few years primarily because it offers new avenues for the types of analytical processing that can be done ie image recognition or text processing. The Movidius Neural Compute Stick (NCS) is an intriguing concept as it opens up the possibility of deploying deep neural networks on embedded devices. In the video we demonstrate feeding a number of images (loaded from png files) to a caffe GoogLeNet model, for each inference it displays the top matching label and probability score. As a performance enhancement we utilise the PXP engine to perform hardware image resizing and BGRA conversion before feeding a 224 x 224 image to the model for classification. The resized image is also rendered to the screen (using the 2D acceleration). To gain an acceptable level of performance the application was developed in C/C++.

So, the first challenge was to see if we could get NCS running with the UDOO Neo (the i.MX6SX board). My starting point was referring to the Movidius article of deploying on the Raspberry PI. As mentioned in the article its important to highlight that training, conversion or profiling of the Neural Network can't be done on the embedded device ie "Full SDK mode". This implies that the Neural Network needed to be trained and converted using a standard PC or cloud environment. Deployment to an embedded device is restricted to "API only mode". This first step turned out to be a challenge mainly because my starting pointing was Ubuntu 14.04 and not 16. It took a few days to get the correct packages compiled and installed before caffe would compile without errors. The Neural Compute Application Zoo  provides a number of sample applications, you can use hello_ncs_cpp or hello_ncs_py to verify the OS can communicate with NCS. The other gotach is that the NCS is power hungry and requires a powered usb hub especially if you have other usb peripherals attached. On the NEO the NCS can be plugged directly into the USB type A socket if you don't have a need for additional peripherals.

The second step was to see if we could deploy a Neural Network graph on the NCS and perform simple inferences. Most of the sample applications in the 'Zoo' are Python based with some having further dependency on OpenCV. Unfortunately running OpenCV and Python on Neo would introduce too much of a bottleneck with regards to performance (or in fact most low power ARM embedded devices). The 2 reasons for this are the single A9 core and the fact that the X11 interface doesn't support hardware accelerated graphics. With ARM processors there is trade off between power and performance and for 'always on' IOT devices this does become a major deciding factor. Fortunately caffe provides a C++ interface although there's little documentation available about the API interface. Within 'Zoo' there is  multistick_cpp C++ application which demonstrates communicating with multiple NCS devices.

My starting point was altering multistick_cpp to use one stick with the GoogLeNet model. In multistick_cpp after the GoogleLeNet graph is load subsequent processing can be broken down into two further steps. Firstly it loads, resizes and converts to BGRA a png image file and secondly it sends the image data to NCS for inference and finally displays the result. Sample timings for each steps running on the Neo are shown below.

1. Load png, resize and convert : approximately 800 milliseconds
2. NCS inference : approximately 130 milliseconds

We can't do much about Step 2 without re-tuning (a redesign) the Neural Network or by reducing the image size (possibly leading to less accuracy). Step 1 is slow because the file images are roughly around 800x800 pixels and software resizing to 224 x 224 is painfully slow. Fortunately we can address the resizing and conversion time in Step 1, the i.MX6SX contains an image processing unit known as PXP (Pixel Pipeline) which can rescale and perform colour space conversions on graphic buffers. I re-factored the code in step 1 as below:

1. Use libpng to read the png file
2. Resize and color space the image using PXP
3. 2D blit re-sized image to screen

With the above changes sample timings dramtically improved for step 1 (as show in the video):

1. Load png, resize and convert : approximately 233 milliseconds
2. NCS inference : approximately 112 milliseconds

Hopefully this article provides useful introduction to deploying the NCS with i.MX6 or i.MX7 line of processors. Going forward I would like to get a camera working with Neo and see what FPS rate we can achieve. The other interesting avenue is deploying SSD MobileNet and using the PXP overlay feature to render matches.

I liked to thank motiveorder.com for sponsoring the hardware and development time for this article.

Thursday, 8 February 2018

i.mx6sx - IMU Sensor fusion with the UDOO NEO

A key feature of the UDOO NEO is the hosting of a 9-DOF IMU through the inclusion of the NXP FXOS8700CQ and FXAS21002C sensors. The FXOS8700CQ provides a 6-axis accelerometer and magnetometer and FXAS21002C provides a 3-Axis Digital Angular Rate Gyroscope. Note that these sensors are only available on the Extended and Full models. In the video we demonstrate 9-DOF sensor fusion running on the cortex M4. The M4 fusion output is sent from the serial port and fed to a modified OrientationVisualiser application run on a PC. OrientationVisualiser is Processing application that displays a Arduino like board in 3D is part of the NXPMotionSense library refer to Teensys example and code.




The combination of the two sensors offers the ability to track absolute orientation with respect to a fixed Earth frame of reference with reasonable accuracy. Orientation can be modelled in 3 dimensions and most descriptions refer to the analogy of an aircraft. Where yaw represents movement of the nose of the aircraft from side to side, pitch represents the up or down movement of the nose of the aircraft and finally roll represents the up and down movement of the wing tips of the aircraft. Refer to this post for a detailed description. For our solution the sensor fusion algorithm implements a Kalaman filter. The filter smooths noise from the accelerometer/magnetometer and drift from the gyroscope. We also run magnetic calibration to reduce the effect of hard and soft iron magnetic interference.

Its important to note that the FXOS8700CQ is mounted on the underside of the pcb, however the axes should be aligned to a Cartesian coordinate system that follows the Right Hand Rule (RHR). Therefore its important that X,Y,Z values read from the FXOS8700CQ should to be adjusted accordingly before applying to the fusion algorithm. Although the FXOS8700CQ has been placed at the edge of the board its important to remember that the pcb spacer hole is about a 1 centimetre away. To reduce magnetic interference on the magnetometer a metal pcb spacer or screw shouldn't be used in this hole.

In this example we reallocated I2C4 to the M4 in order read data from both sensors. In our set-up the FXOS8700CQ output data rate (ODR) was 200Hz while the FXAS21002C was configured for 400Hz. As per previous posts the code was developed using the i.mx6sx FreeRTOS SDK.








Thursday, 28 December 2017

i.mx6sx - M4 SD Card access on the UDOO NEO

In this post I demonstrate that it is possible to interface the M4 to a SD card shield (similar to using an Arduino SD card shield) in order to retrieve or save data locally (without relying on the A9). This work is the result of a larger data logger project, where the A9 remains in sleep mode to conserve power while the M4 performs data logging from numerous sensors.



In the video an SD card shield is interfaced to the UDOO NEO and accessed from the M4. The code initialises the SD card, mounts and reads the FAT32 partition. Subsequently we read bitmap files from the FAT32 partition and display the contents to the LCD display (320x240). Code is written using the FreeRTOS bsp and exeutes on the M4 while the A9 boots linux. Each bitmap file is 230400 bytes and when reading 720 byte blocks  throughput is around 230KB/sec. If we increase the block size to 23040 bytes then throughput is around 340KB/sec.

NEO Connectivty

I chose to use a WeMos data logger shield over a Arduino SD card shield mainly for the following reasons:

1. 3.3v compatible
2. An RTC (plus battery backup) is available on the shield, although the accuracy of the DS1307 compared to DS3231 is questionable.
3. Nice stackable design for inclusion of additional WeMos shields

The shield (just the Arduino equivalent) supports an SPI interface. The Phyical Layer of the SD Card Specification mentions that the primary hardware interface is the SD bus which is implemented through 4 data lines and one command line. On power up the native operating mode of the card is SD bus however it possible to switch the card to SPI bus which is considered a secondary operating mode. The main disadvantages of SPI mode versus SD mode are:

1. The loss of performance single data line versus 4 data lines .
2. Only a subset of the SD commands and functions are supported.
3. The maximum SPI clock speed is limited to 25Mhz regardless of the SD card Class.


Minimum connectivity from a host MCU for SPI mode requires 3 SPI pins plus a GPIO pin for CS.


From the NEO the shield can be connected to ECSPI 2 plus a arbitrary GPIO pin and 3.3v, see above (NEO Connectivity) image for wiring.

Power up and initialisation of the card along with commands and responses are well documented in the Phyical Layer of the SD Card Specification. After powering up the card it should be initialised by applying 74 clock cycles (eg. sending 10 bytes with 0xff as the payload). Followed by CMD0 as the first command to send the card to SPI mode, a positive R1 response will contain 0x01. Next step is to interrogate SD version support by sending CMD8 and lastly we can use ACMD41 to set or determine :

1. Card is initialised
2. Card capacity type (SDHC or SDXC)
3. Switch to 1.8V signal voltage

After initialising the card we can interrogate the card data for example:

1. Read the Card Identification (CID) Register, a 16 byte code that contains information that uniquely identifies the SD card, including the card serial number (PSN), manufacturer ID number (MID) and manufacture date (MDT).

2. Read the Card Specific Data (CSD) Register which defines the data format, error correction type, maximum data access time .. etc



Subsequently I built a simple library to read FAT32 partitions and their file contents as shown in the above screen shot.

In order to improve performance we would need to see if we can enable DMA for the SPI transfers however this represents a challenge as the DMA engine is initialised on the A9 therefore we would need to wait for Linux to boot before accessing the DMA engine.