Tiny Devices: 2018

Sunday 22 April 2018

ESP32 - Enabling RS485 half duplex UART support

Although the ESP32 UART claims to support RS485 the documentation provided in the Technical Reference Manual is pretty poor as the register and interrupt descriptions are extremely vague in terms of each feature and its purpose. This is further exacerbated by the fact there are no examples provided in the SDK. The subject has been widely discussed in a espressif forum thread and a pull request was submitted. Unfortunately the main problem with the solution was that spurious break characters were observed in the RX FIFO and needed to be filtered. Another issue is that toggling of RTS pin happens with in the RX interrupt handler, so we can't control at an application level.

Having spent quite a few (or too many) hours debugging the UART behaviour in RS485 mode I think I have an improved implementation for a driver. Its important to note that the driver only supports half duplex mode, ie only one node on the RS485 bus can transmit at any time. See commits in my fork , the main changes are:

1. The RTS pin is toggled outside of the driver code, ie in the application code therefore data direction and auto direction transceiver are supported.
2. Spurious break characters shouldn't occur in the RX FIFO.
3. Enabling of RS485 interrupts, currently only the collision detection is implemented. Further work is required to correctly handle the framing or parity error interrupts.

The uart_rs485_echo example (under examples/peripherals/) provides a simple demonstration of its use. The example receives data from the UART and echoes the same data back to the sender. After configuring and enabling the UART we can control the RTS pin using the existing uart_set_rts function to control the data direction pin of the transceiver, note this is optional and can be disabled in the code for auto direction transceivers.

The example has been tested with the following boards :

1. SparkFun Transceiver RS485 Breakout board which hosts a SP3485 transceiver.

2. XY-017 RS485 to TTL Module which is an auto direction transceiver with unmarked ICs.

Saturday 24 March 2018

Machine learning with the i.MX6 SoloX and the Movidius Neural Compute Stick

The i.MX6 SoloX processor is fairly unique in the i.MX6 family primarily because it co-hosts a single Cortex A9 along with a Cortex M4. The heterogeneous architecture proves very useful for hard real-time processing occurring on the M4 while concurrently running a Linux stack running on the A9 (the heterogeneous architecture is implemented on the i.MX8 line of processors). In previous posts using the UDOO Neo I have covered how these features can be exploited when interfacing different peripheral devices. The processor architecture lends itself nicely to IOT (Internet of Things) Edge devices where sensor capture and data preprocessing/conversion can occur on the device before being forwarded to the cloud where a richer set of analytic processing can be performed. If we could perform some (or all) of the analytic processing on the edge device then we might dramatically reduce the amount of device data traffic sent to the cloud. Alternatively the edge device could make decisions for itself and not completely rely on the cloud, furthermore it opens ups the possibility of the edge device partially functioning when the network isn't available. This concept is know as Edge Analytics.

A single Cortex A9 practically isn't up to the job of performing intensive analytical processing especially if we would like to implement a machine learning algorithm. In terms of machine learning techniques Neural Networks are one branch that has gained considerable popularity in the last few years primarily because it offers new avenues for the types of analytical processing that can be done ie image recognition or text processing. The Movidius Neural Compute Stick (NCS) is an intriguing concept as it opens up the possibility of deploying deep neural networks on embedded devices. In the video we demonstrate feeding a number of images (loaded from png files) to a caffe GoogLeNet model, for each inference it displays the top matching label and probability score. As a performance enhancement we utilise the PXP engine to perform hardware image resizing and BGRA conversion before feeding a 224 x 224 image to the model for classification. The resized image is also rendered to the screen (using the 2D acceleration). To gain an acceptable level of performance the application was developed in C/C++.

So, the first challenge was to see if we could get NCS running with the UDOO Neo (the i.MX6SX board). My starting point was referring to the Movidius article of deploying on the Raspberry PI. As mentioned in the article its important to highlight that training, conversion or profiling of the Neural Network can't be done on the embedded device ie "Full SDK mode". This implies that the Neural Network needed to be trained and converted using a standard PC or cloud environment. Deployment to an embedded device is restricted to "API only mode". This first step turned out to be a challenge mainly because my starting pointing was Ubuntu 14.04 and not 16. It took a few days to get the correct packages compiled and installed before caffe would compile without errors. The Neural Compute Application Zoo provides a number of sample applications, you can use hello_ncs_cpp or hello_ncs_py to verify the OS can communicate with NCS. The other gotach is that the NCS is power hungry and requires a powered usb hub especially if you have other usb peripherals attached. On the NEO the NCS can be plugged directly into the USB type A socket if you don't have a need for additional peripherals.

The second step was to see if we could deploy a Neural Network graph on the NCS and perform simple inferences. Most of the sample applications in the 'Zoo' are Python based with some having further dependency on OpenCV. Unfortunately running OpenCV and Python on Neo would introduce too much of a bottleneck with regards to performance (or in fact most low power ARM embedded devices). The 2 reasons for this are the single A9 core and the fact that the X11 interface doesn't support hardware accelerated graphics. With ARM processors there is trade off between power and performance and for 'always on' IOT devices this does become a major deciding factor. Fortunately caffe provides a C++ interface although there's little documentation available about the API interface. Within 'Zoo' there is multistick_cpp C++ application which demonstrates communicating with multiple NCS devices.

My starting point was altering multistick_cpp to use one stick with the GoogLeNet model. In multistick_cpp after the GoogleLeNet graph is load subsequent processing can be broken down into two further steps. Firstly it loads, resizes and converts to BGRA a png image file and secondly it sends the image data to NCS for inference and finally displays the result. Sample timings for each steps running on the Neo are shown below.

1. Load png, resize and convert : approximately 800 milliseconds
2. NCS inference : approximately 130 milliseconds

We can't do much about Step 2 without re-tuning (a redesign) the Neural Network or by reducing the image size (possibly leading to less accuracy). Step 1 is slow because the file images are roughly around 800x800 pixels and software resizing to 224 x 224 is painfully slow. Fortunately we can address the resizing and conversion time in Step 1, the i.MX6SX contains an image processing unit known as PXP (Pixel Pipeline) which can rescale and perform colour space conversions on graphic buffers. I re-factored the code in step 1 as below:

1. Use libpng to read the png file
2. Resize and color space the image using PXP
3. 2D blit re-sized image to screen

With the above changes sample timings dramtically improved for step 1 (as show in the video):

1. Load png, resize and convert : approximately 233 milliseconds
2. NCS inference : approximately 112 milliseconds

Hopefully this article provides useful introduction to deploying the NCS with i.MX6 or i.MX7 line of processors. Going forward I would like to get a camera working with Neo and see what FPS rate we can achieve. The other interesting avenue is deploying SSD MobileNet and using the PXP overlay feature to render matches.

I liked to thank motiveorder.com for sponsoring the hardware and development time for this article.

Thursday 8 February 2018

i.mx6sx - IMU Sensor fusion with the UDOO NEO

A key feature of the UDOO NEO is the hosting of a 9-DOF IMU through the inclusion of the NXP FXOS8700CQ and FXAS21002C sensors. The FXOS8700CQ provides a 6-axis accelerometer and magnetometer and FXAS21002C provides a 3-Axis Digital Angular Rate Gyroscope. Note that these sensors are only available on the Extended and Full models. In the video we demonstrate 9-DOF sensor fusion running on the cortex M4. The M4 fusion output is sent from the serial port and fed to a modified OrientationVisualiser application run on a PC. OrientationVisualiser is Processing application that displays a Arduino like board in 3D is part of the NXPMotionSense library refer to Teensys example and code.

The combination of the two sensors offers the ability to track absolute orientation with respect to a fixed Earth frame of reference with reasonable accuracy. Orientation can be modelled in 3 dimensions and most descriptions refer to the analogy of an aircraft. Where yaw represents movement of the nose of the aircraft from side to side, pitch represents the up or down movement of the nose of the aircraft and finally roll represents the up and down movement of the wing tips of the aircraft. Refer to this post for a detailed description. For our solution the sensor fusion algorithm implements a Kalaman filter. The filter smooths noise from the accelerometer/magnetometer and drift from the gyroscope. We also run magnetic calibration to reduce the effect of hard and soft iron magnetic interference.

Its important to note that the FXOS8700CQ is mounted on the underside of the pcb, however the axes should be aligned to a Cartesian coordinate system that follows the Right Hand Rule (RHR). Therefore its important that X,Y,Z values read from the FXOS8700CQ should to be adjusted accordingly before applying to the fusion algorithm. Although the FXOS8700CQ has been placed at the edge of the board its important to remember that the pcb spacer hole is about a 1 centimetre away. To reduce magnetic interference on the magnetometer a metal pcb spacer or screw shouldn't be used in this hole.

In this example we reallocated I2C4 to the M4 in order read data from both sensors. In our set-up the FXOS8700CQ output data rate (ODR) was 200Hz while the FXAS21002C was configured for 400Hz. As per previous posts the code was developed using the i.mx6sx FreeRTOS SDK.