Sunday, 22 April 2018

ESP32 - Enabling RS485 half duplex UART support

Although the ESP32 UART claims to support RS485 the documentation provided in the Technical Reference Manual is pretty poor as the register and interrupt descriptions are extremely vague in terms of each feature and its purpose. This is further exacerbated by the fact there are no examples provided in the SDK. The subject has been widely discussed in a espressif forum thread and a pull request was submitted. Unfortunately the main problem with the solution was that spurious break characters were observed in the RX FIFO and needed to be filtered. Another issue is that toggling of RTS pin happens with in the RX interrupt handler, so we can't control at an application level.

Having spent quite a few (or too many) hours debugging the UART behaviour in RS485 mode I think I have an improved implementation for a driver.  Its important to note that the driver only supports half duplex mode, ie only one node on the RS485 bus can transmit at any time. See commits in my fork , the main changes are:

1. The RTS pin is toggled outside of the driver code, ie in the application code therefore data direction and auto direction transceiver are supported.
2. Spurious break characters shouldn't occur in the RX FIFO.
3. Enabling of RS485 interrupts, currently only the collision detection is implemented. Further work is required to correctly handle the framing or parity error interrupts.

The uart_rs485_echo example (under  examples/peripherals/) provides a simple demonstration of its use. The example receives data from the UART and echoes the same data back to the sender. After configuring and enabling the UART we can control the RTS pin using the existing uart_set_rts function to control the data direction pin of the transceiver, note this is optional and can be disabled in the code for auto direction transceivers.

The example has been tested with the following boards :

1. SparkFun Transceiver RS485 Breakout board which hosts a SP3485 transceiver.

2. XY-017 RS485 to TTL Module which is an auto direction transceiver with unmarked ICs.

Saturday, 24 March 2018

Machine learning with the i.MX6 SoloX and the Movidius Neural Compute Stick

The i.MX6 SoloX processor is fairly unique in the i.MX6 family primarily because it co-hosts a single Cortex A9 along with a Cortex M4. The heterogeneous architecture proves very useful for hard real-time processing occurring on the M4 while concurrently running a Linux stack running on the A9 (the heterogeneous architecture is implemented on the i.MX8 line of processors). In previous posts using the UDOO Neo I have covered how these features can be exploited when interfacing different peripheral devices. The processor architecture lends itself nicely to IOT (Internet of Things) Edge devices where sensor capture and data preprocessing/conversion can occur on the device before being forwarded to the cloud where a richer set of analytic processing can be performed. If we could perform some (or all) of the analytic processing on the edge device then we might dramatically reduce the amount of device data traffic sent to the cloud. Alternatively the edge device could make decisions for itself and not completely rely on the cloud, furthermore it opens ups the possibility of the edge device partially functioning when the network isn't available. This concept is know as Edge Analytics.

A single Cortex A9 practically isn't up to the job of performing intensive analytical processing especially if we would like to implement a machine learning algorithm. In terms of machine learning techniques Neural Networks are one branch that has gained considerable popularity in the last few years primarily because it offers new avenues for the types of analytical processing that can be done ie image recognition or text processing. The Movidius Neural Compute Stick (NCS) is an intriguing concept as it opens up the possibility of deploying deep neural networks on embedded devices. In the video we demonstrate feeding a number of images (loaded from png files) to a caffe GoogLeNet model, for each inference it displays the top matching label and probability score. As a performance enhancement we utilise the PXP engine to perform hardware image resizing and BGRA conversion before feeding a 224 x 224 image to the model for classification. The resized image is also rendered to the screen (using the 2D acceleration). To gain an acceptable level of performance the application was developed in C/C++.

So, the first challenge was to see if we could get NCS running with the UDOO Neo (the i.MX6SX board). My starting point was referring to the Movidius article of deploying on the Raspberry PI. As mentioned in the article its important to highlight that training, conversion or profiling of the Neural Network can't be done on the embedded device ie "Full SDK mode". This implies that the Neural Network needed to be trained and converted using a standard PC or cloud environment. Deployment to an embedded device is restricted to "API only mode". This first step turned out to be a challenge mainly because my starting pointing was Ubuntu 14.04 and not 16. It took a few days to get the correct packages compiled and installed before caffe would compile without errors. The Neural Compute Application Zoo  provides a number of sample applications, you can use hello_ncs_cpp or hello_ncs_py to verify the OS can communicate with NCS. The other gotach is that the NCS is power hungry and requires a powered usb hub especially if you have other usb peripherals attached. On the NEO the NCS can be plugged directly into the USB type A socket if you don't have a need for additional peripherals.

The second step was to see if we could deploy a Neural Network graph on the NCS and perform simple inferences. Most of the sample applications in the 'Zoo' are Python based with some having further dependency on OpenCV. Unfortunately running OpenCV and Python on Neo would introduce too much of a bottleneck with regards to performance (or in fact most low power ARM embedded devices). The 2 reasons for this are the single A9 core and the fact that the X11 interface doesn't support hardware accelerated graphics. With ARM processors there is trade off between power and performance and for 'always on' IOT devices this does become a major deciding factor. Fortunately caffe provides a C++ interface although there's little documentation available about the API interface. Within 'Zoo' there is  multistick_cpp C++ application which demonstrates communicating with multiple NCS devices.

My starting point was altering multistick_cpp to use one stick with the GoogLeNet model. In multistick_cpp after the GoogleLeNet graph is load subsequent processing can be broken down into two further steps. Firstly it loads, resizes and converts to BGRA a png image file and secondly it sends the image data to NCS for inference and finally displays the result. Sample timings for each steps running on the Neo are shown below.

1. Load png, resize and convert : approximately 800 milliseconds
2. NCS inference : approximately 130 milliseconds

We can't do much about Step 2 without re-tuning (a redesign) the Neural Network or by reducing the image size (possibly leading to less accuracy). Step 1 is slow because the file images are roughly around 800x800 pixels and software resizing to 224 x 224 is painfully slow. Fortunately we can address the resizing and conversion time in Step 1, the i.MX6SX contains an image processing unit known as PXP (Pixel Pipeline) which can rescale and perform colour space conversions on graphic buffers. I re-factored the code in step 1 as below:

1. Use libpng to read the png file
2. Resize and color space the image using PXP
3. 2D blit re-sized image to screen

With the above changes sample timings dramtically improved for step 1 (as show in the video):

1. Load png, resize and convert : approximately 233 milliseconds
2. NCS inference : approximately 112 milliseconds

Hopefully this article provides useful introduction to deploying the NCS with i.MX6 or i.MX7 line of processors. Going forward I would like to get a camera working with Neo and see what FPS rate we can achieve. The other interesting avenue is deploying SSD MobileNet and using the PXP overlay feature to render matches.

I liked to thank for sponsoring the hardware and development time for this article.

Thursday, 8 February 2018

i.mx6sx - IMU Sensor fusion with the UDOO NEO

A key feature of the UDOO NEO is the hosting of a 9-DOF IMU through the inclusion of the NXP FXOS8700CQ and FXAS21002C sensors. The FXOS8700CQ provides a 6-axis accelerometer and magnetometer and FXAS21002C provides a 3-Axis Digital Angular Rate Gyroscope. Note that these sensors are only available on the Extended and Full models. In the video we demonstrate 9-DOF sensor fusion running on the cortex M4. The M4 fusion output is sent from the serial port and fed to a modified OrientationVisualiser application run on a PC. OrientationVisualiser is Processing application that displays a Arduino like board in 3D is part of the NXPMotionSense library refer to Teensys example and code.

The combination of the two sensors offers the ability to track absolute orientation with respect to a fixed Earth frame of reference with reasonable accuracy. Orientation can be modelled in 3 dimensions and most descriptions refer to the analogy of an aircraft. Where yaw represents movement of the nose of the aircraft from side to side, pitch represents the up or down movement of the nose of the aircraft and finally roll represents the up and down movement of the wing tips of the aircraft. Refer to this post for a detailed description. For our solution the sensor fusion algorithm implements a Kalaman filter. The filter smooths noise from the accelerometer/magnetometer and drift from the gyroscope. We also run magnetic calibration to reduce the effect of hard and soft iron magnetic interference.

Its important to note that the FXOS8700CQ is mounted on the underside of the pcb, however the axes should be aligned to a Cartesian coordinate system that follows the Right Hand Rule (RHR). Therefore its important that X,Y,Z values read from the FXOS8700CQ should to be adjusted accordingly before applying to the fusion algorithm. Although the FXOS8700CQ has been placed at the edge of the board its important to remember that the pcb spacer hole is about a 1 centimetre away. To reduce magnetic interference on the magnetometer a metal pcb spacer or screw shouldn't be used in this hole.

In this example we reallocated I2C4 to the M4 in order read data from both sensors. In our set-up the FXOS8700CQ output data rate (ODR) was 200Hz while the FXAS21002C was configured for 400Hz. As per previous posts the code was developed using the i.mx6sx FreeRTOS SDK.

Thursday, 28 December 2017

i.mx6sx - M4 SD Card access on the UDOO NEO

In this post I demonstrate that it is possible to interface the M4 to a SD card shield (similar to using an Arduino SD card shield) in order to retrieve or save data locally (without relying on the A9). This work is the result of a larger data logger project, where the A9 remains in sleep mode to conserve power while the M4 performs data logging from numerous sensors.

In the video an SD card shield is interfaced to the UDOO NEO and accessed from the M4. The code initialises the SD card, mounts and reads the FAT32 partition. Subsequently we read bitmap files from the FAT32 partition and display the contents to the LCD display (320x240). Code is written using the FreeRTOS bsp and exeutes on the M4 while the A9 boots linux. Each bitmap file is 230400 bytes and when reading 720 byte blocks  throughput is around 230KB/sec. If we increase the block size to 23040 bytes then throughput is around 340KB/sec.

NEO Connectivty

I chose to use a WeMos data logger shield over a Arduino SD card shield mainly for the following reasons:

1. 3.3v compatible
2. An RTC (plus battery backup) is available on the shield, although the accuracy of the DS1307 compared to DS3231 is questionable.
3. Nice stackable design for inclusion of additional WeMos shields

The shield (just the Arduino equivalent) supports an SPI interface. The Phyical Layer of the SD Card Specification mentions that the primary hardware interface is the SD bus which is implemented through 4 data lines and one command line. On power up the native operating mode of the card is SD bus however it possible to switch the card to SPI bus which is considered a secondary operating mode. The main disadvantages of SPI mode versus SD mode are:

1. The loss of performance single data line versus 4 data lines .
2. Only a subset of the SD commands and functions are supported.
3. The maximum SPI clock speed is limited to 25Mhz regardless of the SD card Class.

Minimum connectivity from a host MCU for SPI mode requires 3 SPI pins plus a GPIO pin for CS.

From the NEO the shield can be connected to ECSPI 2 plus a arbitrary GPIO pin and 3.3v, see above (NEO Connectivity) image for wiring.

Power up and initialisation of the card along with commands and responses are well documented in the Phyical Layer of the SD Card Specification. After powering up the card it should be initialised by applying 74 clock cycles (eg. sending 10 bytes with 0xff as the payload). Followed by CMD0 as the first command to send the card to SPI mode, a positive R1 response will contain 0x01. Next step is to interrogate SD version support by sending CMD8 and lastly we can use ACMD41 to set or determine :

1. Card is initialised
2. Card capacity type (SDHC or SDXC)
3. Switch to 1.8V signal voltage

After initialising the card we can interrogate the card data for example:

1. Read the Card Identification (CID) Register, a 16 byte code that contains information that uniquely identifies the SD card, including the card serial number (PSN), manufacturer ID number (MID) and manufacture date (MDT).

2. Read the Card Specific Data (CSD) Register which defines the data format, error correction type, maximum data access time .. etc

Subsequently I built a simple library to read FAT32 partitions and their file contents as shown in the above screen shot.

In order to improve performance we would need to see if we can enable DMA for the SPI transfers however this represents a challenge as the DMA engine is initialised on the A9 therefore we would need to wait for Linux to boot before accessing the DMA engine.

Monday, 3 July 2017

i.mx6sx - Tiny NEO Scope

This project span from two requirements, firstly testing the performance of the ADC (analogue to digital) and ECSPI interfaces when running against the M4. Secondly as a diagnostic aid to verify low bandwidth clock signs (<100Khz) from the PWM or clock/GPIO pins were correctly mux'd from the A9 side. The end result is a simple low bandwidth (<100Khz) oscilloscope. In the above video we first demonstrate the capture of the 50Khz PWM (50% duty cycle) which is generated from the A9 side, see how scope manages to display the signal and correctly calculate the frequency. Secondly we feed the scope a PWM ranging from 3Khz to 95Khz to demonstrate its ability to track and best calculate the input frequency of the signal.

The imx6sx ADC is rated to produce up to 1 Million samples per seconds, however the Reference Manual isn't particularly clear on settings to get this level of performance. It's clear that the fastest conversion are only possible in 8 bit mode (lowest precision) using the IPG clock. From our testing we roughly achieved 500 thousand samples per second by applying no hardware averaging and no clock divider ratio. By disabling hardware averaging we trade off precision for speed in 8 bit mode.

The scope simply consist of reading an ADC pin (in our case A3), buffering 128 samples, outputting samples to the OLED display through the ECSPI interface clocked at 8Mhz. A primitive trigger mechanism is implemented to catch start of a rising signal. This is complemented with high precision EPIT timer to calculate the frequency of input signal for display.

Saturday, 10 June 2017

i.mx6sx - SPI interfacing an OLED display for fast updates using the cortex M4 on the UDOO Neo

It has taken me a considerable amount of time to prove that the SPI interface can be made to work with cortex M4 on the UDOO Neo. To demonstrate the speed of SPI I chose to interface with a SSD1306 OLED which can be driven at a clock speed of 8Mhz. Which theoretically should allow a complete RAM buffer to transmitted to the SSD1306 in a very short time period hence offering fast screen updates. To control the speed of screen updates I hooked up an old potentiometer to the ADC input. By varying the potentiometer we control the rate of updates from slow to fast as shown in the video.

The SPI interface is configured for Master mode using interrupts for data transmission resulting in acceptable performance. To reduce interrupt latency further, DMA could be used to transmit the whole RAM buffer in one go.
I chose to use ECSPI 2 (normally allocated to the A9 side) and not ECSPI 5 because after reviewing the schematics I think there is a hardware bug with ECSPI 5 as the ECSPI5_SCLK line is shared with Red on board LED (see SPI3_CLK on J6 connector).

This is a C application developed using i.mx6sx FreeRTOS SDK. The graphics rendering code was converted from the Haricord example by hwiguna.

Friday, 19 May 2017

i.mx6sx - One Wire Digital Temperature gauge using DS18B20 + UDOO Neo + LCD

The challenge here was to implement the Dallas one wire protocol so that two DS18B20 sensors could be wired to the cortex M4 to form the basis for a temperature gauge. I recycled the LCD display from the previous post to display temperature readings using 'ring meter' widgets. The end result is that the M4 is used for reading DS18D20 sensors and updating the 'ring meter' widgets. In the video we have one sensor in a 3 pin TO-92 package reading room temperature and the other as a waterproof probe dipped into hot/cold water.

The DS18D20 one wire protocol simply requires a gpio pin that can be toggled between input and output modes albeit using precise timings to the microsecond (between 1 and 480 microseconds). The protocol and timings are described in the DS18B20 datasheet. To achieve the necessary level of time precision we utilised the imx6 Enhanced Periodic Interrupt Timer (EPIT) timers sourced from a 24Mhz clock. Since there are two DS18B20 sensors on the one wire bus we first query for the address of the sensors and then in turn poll for the temperature from each which takes approximately 480 microseconds.