tag:blogger.com,1999:blog-22071909302363493952024-03-23T10:47:47.764-07:00Tiny DevicesEmbedded Software Development Unknownnoreply@blogger.comBlogger68125tag:blogger.com,1999:blog-2207190930236349395.post-70899274255716105652024-02-08T07:46:00.000-08:002024-02-09T01:58:26.864-08:00RK3588 - Reverse engineering the RKNN (Rockchip Neural Processing Unit) <p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9-5DrGuxC7KeAnZL-VXfTK2cYoatY9N8xb104tL4Arps2Q7wK-FSGUrbSphHs24DgHQRvZ6F84-WMu8JHrDV6rcCkTO_VzPas9uwKEiTjKuc9LaxwOV2mqc6C4VxeuEyg9IHZaSirl6Y_CMzv2ZXIhDYq3G52hd85w2rAlcIYppwVGWg9ZfR08R2WY5Fd/s938/Screenshot%20from%202024-02-08%2015-28-39.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="701" data-original-width="938" height="239" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9-5DrGuxC7KeAnZL-VXfTK2cYoatY9N8xb104tL4Arps2Q7wK-FSGUrbSphHs24DgHQRvZ6F84-WMu8JHrDV6rcCkTO_VzPas9uwKEiTjKuc9LaxwOV2mqc6C4VxeuEyg9IHZaSirl6Y_CMzv2ZXIhDYq3G52hd85w2rAlcIYppwVGWg9ZfR08R2WY5Fd/s320/Screenshot%20from%202024-02-08%2015-28-39.png" width="320" /></a></div><br /><p></p><p>The internal operations and capabilities of the RK3588 NPUs are mainly concealed within a closed-source SDK known as <a href="https://github.com/airockchip/rknn-toolkit2/tree/master/rknpu2">RKNPU2</a>. Given the huge interest in Large Language Models (LLMs) and the quest for optimal matrix multiplications for transformer models. I was curious to understand the implementation of the newly introduced matrix multiplication API (rknn_matmul_run) to the sdk . A thorough examination of the RKNN section in the TRM (Technical Reference Manual) reveals no native mechanism for matrix multiplication, especially for vectors. <br /></p><p>To grasp whats going on, the initial step was to understand how the NPU functioned. While the TRM furnished a detailed list of registers and a brief overview of the core units constituting the NPU. It notably lacked essential information on programming the registers for
executing operations. For example there were no specifics about deriving
or calculating register values based on factors such as data formats
(e.g., int8 vs. float16) or the size of input data or weights. Furthermore there was no information on how construct a pipeline for the NPU to execute. Fortunately, I had a slight advantage from a previous reverse engineering attempt on the <a href="http://jas-hacks.blogspot.com/2021/04/reverse-engineering-v831-npu-neural.html">V831 NPU</a>. Nevertheless, even armed with this knowledge, it has still required several months of trial and error, extensive analysis of data streams, encountering a few dead ends, and numerous attempts at reverse engineering. Finally, I managed to understand how to activate the NPU and get it to execute simple operations.</p><p>The RK3588 NPU seems to be distant cousin of the<i> NVDLA </i>architecture in that the some of the terminology is similar and the core units has similar functions and pipe lines to NVDLA although they have been named differently. One of primarily differences is that we can give the NPU a list of tasks (RKNN terminology) to execute and then wait for completion. For example if I have simple neural network consisting of 3 layers and each layer consists of convolution + bias then it is possible to feed 3 tasks (each performing convolution + bias) to the NPU along with the necessary input, weight and bias values. Subsequently we just wait for the NPU to notify when its complete.<br /></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEih2ztFLEpM4scmDzuwebNC1__UFR23fU78FhdDz2PiTtYE2FFnE2CpCg-kys8O6OZD8_Zar3-jwScLDvFKiSQkHgjvrVMs9aaIFEN6I3oVUzefLp4h9OHvhohx4uM8LqcRHlFuDeasXKTUSAVnp4fCacAVXLKaRFgJ0d_-YDzNUhlH6q7HX7f7nOSJw4yr/s702/rknn_npu.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="650" data-original-width="702" height="296" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEih2ztFLEpM4scmDzuwebNC1__UFR23fU78FhdDz2PiTtYE2FFnE2CpCg-kys8O6OZD8_Zar3-jwScLDvFKiSQkHgjvrVMs9aaIFEN6I3oVUzefLp4h9OHvhohx4uM8LqcRHlFuDeasXKTUSAVnp4fCacAVXLKaRFgJ0d_-YDzNUhlH6q7HX7f7nOSJw4yr/s320/rknn_npu.png" width="320" /></a></div><br /><p></p>The image presented above is extracted from the TRM and has been altered because the description provided in the TRM doesn't entirely align with their diagram, and, more crucially, the register naming convention. Here is my interpretation, each NPU comprises of three distinct units: <ul><li>CNA - Convolution Network Accelerator (include CORE rectangle). In the TRM it refers to the Neural Network Accelerating Engine, CNA isn't described.<br /></li><li>DPU - Data Processing Unit</li><li>PPU - Planar Processing Unit</li></ul><p></p><p>Based on the above, the NPU is primarily designed for running conventional Convolutional Neural Networks (CNNs). This is attributed to the CNA core feature, which revolves around executing convolutions by inputting image or feature data along with the corresponding weights. The emphasis on CNNs is further evident by the majority of RKNPU2 samples provided, such as YOLOX, Mobilenet, and ResNet. The CNA output can be directed to the DPU, where element-wise operations such as addition, multiplication, and RELU can be carried out. Subsequently, the DPU's output can be channeled to the PPU, where operations like min, max, and average pooling are executed. Additionally, there is the option to directly feed data to the DPU or PPU without necessitating a convolution step.<br /></p><p>To execute convolutions efficiently, the CNA employs multiply-accumulate (MAC) operations. The performance of a CNA is partially determined by the number of MAC units used. According to the TRM, for a single NPU core the count of MAC operations depends on the input data type:</p><ul><li>1024 int8 MAC operations per cycle</li><li>512 float16 MAC operations per cycle</li></ul><p>Each MAC cell caches 1x1x16 weight bytes, for int 8 its 16 values whilst for float16 it reduces to 8. We require 2 MAC cells to perform float 16 hence the reduction in operations per cycle. Internally feature and weight data must conform to Rockchips NC1HWC2 format where C2 is the aforementioned value. One 1x1x16 cube of feature data is then shared by all MAC cells to calculate partial sums which are then sent to the accumulator. At higher level the CNA appears to execute a block operation, as observed in my tests where, for instance, the MAC caches 32 channels of weight data for fp16. Hence the requirement to layout weights in kernel groups each with 32 channels.<br /></p><p></p><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"> </div>Performance is also affected by the access time to input and weight data, the CNA incorporates a second level cache known as convolution buffer (cbuf). In the above diagram the 384KB onboard memory is partly for that purpose. Importantly the numbers of MAC units plus the cbuf influence how large of a convolution can be completed in one task.<p></p><p>Some of you may have already deduced that the matrix multiplication API is essentially executed through a 2D convolution. For instance, let's consider matrix A as [M x K] and matrix B as [K x N]. Matrix A represents the feature data arranged in an Mx1xK (hwc) format, while matrix B denotes the weight data organized in a 1x1xNxK (hwck) format. Consequently, the resulting matrix C [M x N] is arranged as Mx1xN. I'm at the point where I have a simple test running which asks the NPU to perform a matrix multiplication. I'm using matrices data derived from a GGML testcase (<a href="http://test-mul-mat.cpp">test-mul-mat.cpp</a>) to verify the output is correct. To run the test check out my <a href="https://github.com/mtx512/rk3588-npu/tree/main">repo</a> and build, sadly I'm still testing against a kernel 5.10 on a Rock-5b. If the test runs output should be as below & screenshot above.<br /></p><p><span style="font-family: courier;"><span style="font-size: x-small;">rock@rock-5b:~/rk3588-npu/build$ ./matmul_4_36_16<br />drm name is rknpu - 20220829 - RKNPU driver<br />input dma is ffff8000, output dma is ffffa000, weights dma is ffff9000<br />Size of npu_regs 112<br />RKNPU_SUBMIT returned 0<br />=========================================================================================================<br /> 1224.0 1023.0 1158.0 1259.0 1359.0 1194.0 1535.0 1247.0 1185.0 1029.0 889.0 1182.0 955.0 1179.0 1147.0<br /> 1216.0 1087.0 1239.0 1361.0 1392.0 1260.0 1247.0 1563.0 1167.0 1052.0 942.0 1214.0 1045.0 1134.0 1264.0<br /> 1125.0 966.0 1079.0 1333.0 1287.0 1101.0 1185.0 1167.0 1368.0 990.0 967.0 1121.0 971.0 1086.0 1130.0<br /> 999.0 902.0 1020.0 1056.0 1076.0 929.0 1029.0 1052.0 990.0 1108.0 823.0 989.0 759.0 1041.0 1003.0<br />=========================================================================================================</span></span></p><p>Regarding reverse engineering, I've reached a stage where I understand the majority of register settings that impact convolution when dealing with feature data as input. The primary uncertainty lies in determining the bank sizes for feature/weight data, however I'm hopeful that this can be deduced. After dedicating a significant amount of time to analyzing the NPU, here is a list of key areas that you should be aware of: <br /></p>1. All data pointers within the NPU (e.g., input, weights, outputs, task lists) are 32-bit and must reference physical memory. Consequently, this restricts the memory range to 4GB, making it impractical to leverage a board with 16/32GB memory for the NPU to use. Moreover, it potentially imposes limitations on the types of models that can be executed on the NPU. <br /><p>2. The claim of 6 TOPS should be approached with caution. While each NPU core is rated at 2 TOPS, there are registers that could potentially enable convolution across all 3 cores. However, after analyzing the data streams generated by the SDK, it appears that this feature is never utilized. Additionally, there doesn't seem to be a similar capability available for the DPU/PPU units, which would restrict its usability. In my view, the most effective approach is to treat them as individual cores and execute models on each one, taking into account the memory constraints mentioned earlier.</p><p>3. The SDK matrix multiplication API, in certain aspects, represents an inefficient utilization of the NPU. There is the overhead of memory allocation, a kernel call, and instructing the NPU to execute a single convolution. Ideally, the NPU should be tasked with executing multiple operations and providing all the supplied data for those operations. Typically this is how the NPU is utilized when running a CNN model (ie YOLOvX). The caveat here is that the converted model is limited to contains layers where the operations are supported by the NPU.<br /></p><p>4. Initial bench marking for the multiplication of two fp16 [512 x 512] matrices suggests that I could achieve completion in a respectable time of around 1ms. Please note, this involves sending 2 tasks to the NPU, as mentioned earlier due to the cbuf limitation. Unfortunately, this is only part of the story when it comes using vectors data as input. The costly operations involve converting the matrices to feature and weight data formats, and vice versa for the output, if done at runtime. I made an effort to create a highly optimized conversion routine for vector to feature data conversion. According to my benchmarks, this process takes approximately 2ms for fp16 [512 x 512] matrices. I would estimate 12-15ms to perform all the conversions for the matrices mentioned above. Ideally, the matrix for the weight data should be converted ahead of time to reduce conversion overhead and, if possible, persisted for reuse.</p><p>5. I was hoping there was the capability to use a programmable core to perform custom operations. Unfortunately this isn't case and your left with using OpenCL as the alternative. This brings it own challenges if you need to shuffle data between OpenCL and the NPU.</p><p>There is still more to discover about the other units (DPU/NPU) and I'll spend time doing that. Lastly TRM v1.0 contains numerous gaps and inconsistencies for RKNN, if anyone has later version it would be greatly appreciated.<br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2207190930236349395.post-78272874016733531402023-06-07T10:27:00.001-07:002023-06-07T10:29:14.152-07:00RK3588 - RKNN Object detection on multiple video streams<p></p><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="266" src="https://www.youtube.com/embed/ZWeFYyx0pio" width="320" youtube-src-id="ZWeFYyx0pio"></iframe></div><br /> <p></p><p>Having previously reversed engineered the <span style="font-size: small;"><i><span face="-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif" style="-webkit-text-stroke-width: 0px; background-color: white; color: #0f1419; display: inline; float: none; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; white-space: pre-wrap; word-spacing: 0px;"><a href="http://jas-hacks.blogspot.com/2021/04/reverse-engineering-v831-npu-neural.html">V831 NPU</a> , </span></i><span face="-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif" style="-webkit-text-stroke-width: 0px; background-color: white; color: #0f1419; display: inline; float: none; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; white-space: pre-wrap; word-spacing: 0px;">let's now examine the RK3588 NPU. While the RK3588 RKNN advertises 6 TOPs@int8, it is not entirely clear what this figure represents since the RKNN process unit comprises a tri-core NPU. Referring to the Technical Reference Manual (TRM), we can gather further information:</span></span></p><p><span style="font-family: arial;">1024x3 integer 8 MAC operations per cycle</span><br /></p><p>The RKNN clock is 1Ghz therefore based on the standard TOPS formula </p><p>TOPS = MACs * Frequency * 2</p><p> = (1024x3) * 1Ghz * 2<br /></p><p>If all three cores (1024x3) are utilized, the total computational power reaches 6 TOPS. The RKNN framework offers various workload configurations, including tri-core, dual-core, and single-core. However, upon reviewing the RKNN documentation, it appears that out of the 43 operators, only around 10 support tri-core or dual-core execution (as of v1.5.0 of RKNPU SDK) :</p><p><span style="font-family: courier;"><span style="font-size: x-small;">Conv, DepthwiseConvolution, Add, Concat, Relu, Clip, Relu6, ThresholdedRelu. Prelu, LeakyRelu</span></span></p><p>Deploying a single RKNN model in tri-core mode allows for achieving a maximum computational power of 6 TOPS, but this relies on encountering operators that support tri-core execution or having the model compiler identify parallelizable operations. Consequently, the utilization of the full 6 TOPS may be limited to specific scenarios. Given this constraint, an alternative approach could be running three instances of the model, with each instance allocated to a core. Although this approach increases memory usage, it may provide improved efficiency. For instance, when running rknn_benchmark against yolov5s-640-640.rknn for 1000 iterations with a core mask of 7 (tri-core), the results observed are (v1.5.0 sdk) : <br /></p>Avg Time 9.86ms, Avg FPS = 101.416<p>Running 3 separate instances of rknn benchmark for same model with core mask 1, 2 & 4 (single core) the average per instance is :</p><p>Avg Time 18.84ms, Avg FPS = 53.084</p><p>The initial benchmark results suggest a potential improvement with this approach, as running three object detection streams in parallel could yield better overall performance. Furthermore this opens up the possibility of multi stream object detection. However, it is crucial to acknowledge that the frames per second (fps) figures reported by the benchmark are quite optimistic. Primarily because the test input is a static pre-cropped RGB (640x640) image, and the outputs are not sorted based on confidence levels. Hence, in a real-world deployment, additional pre and post processing steps would be necessary and effect the overall processing time.<br /></p><p>In order to assess the feasibility of the aforementioned approach, I developed a C++ application that performs several tasks concurrently. This application includes the decoding of an H264 stream, resizing and converting each frame to RGB (640x640), running the yolov5 model on each frame for object detection whilst simultaneously rendering the video. It's worth noting that video playback occurs independently of rendering the rectangles generated by yolov5 through an overlay. The primary challenge encountered during development was optimizing the frequency of frame conversions and resizing for both inference and rendering. This optimization was crucial to ensure that the output rectangles from yolov5 remained synchronized with the corresponding video frame intended for rendering. Otherwise fast moving objects in the video stream are noticeably out of sync with the detected rectangle for that frame. The main argument passed to the application is the core mask, which allows the selection of which NPU core(s) to utilize for the processing tasks. </p><p>As shown in the showcase video above, by running three instances of the application with each assigned a single NPU core, we were able to achieve sufficient performance to keep up (well almost in case of 60fps stream) with the video playback rate. The application was tested on the following boards running under weston:<br /></p><ul style="text-align: left;"><li>Mekotronics <a href="https://www.mekotronics.com/h-pd-76.html">R58 Mini HDD</a></li><li>Radxa <a href="https://radxa.com/product/detailed?productName=rock5b">Rock 5-b</a> </li></ul>The test videos, sourced from the <a href="http://www.kaggle.com">kangle</a> site, are either 1080p at 60 or 30 frames per second (fps). To fit all the videos on the same display (1080p resolution), they are not resized back to their original format. The detected objects are color-coded as follows:<ul style="text-align: left;"><li>Red: person</li><li>Green: par, truck, bus, bicycle</li><li>Blue: anything else <br /></li></ul><p>Benchmarks from concurrently running 3 instances show an average per instance of:</p><p>Avg Time = 25.20ms Avg FPS = 38.49</p><p></p><p>Compared to a single instance running with NPU in tri-core mode</p><p>Avg Time = 15.92ms Avg FPS = 61.42 <br /></p><p>Based on my testing it is possible to run object detection on 3 video streams assuming 1080p@30 assuming the inference time of your model on a single npu core is less than 25ms. This work was done as part of a suite of video applications that I'm developing for the RK3588.<br /></p><p>CPU usage while running the 3 instances:<br /></p><p><span style="font-family: courier; font-size: x-small;">Tasks: 236 total, 2 running, 234 sleeping, 0 stopped, 0 zombie<br />%Cpu(s): 10.6 us, 3.3 sy, 0.0 ni, 84.5 id, 1.3 wa, 0.0 hi, 0.3 si, 0.0 st<br />MiB Mem : 7691.7 total, 6531.9 free, 558.4 used, 601.4 buff/cache<br />MiB Swap: 0.0 total, 0.0 free, 0.0 used. 6942.5 avail Mem <br /><br /> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND <br /> 1422 rock 1 -19 1153180 128604 89000 S 32.1 1.6 0:18.35 subsurf+<br /> 1439 rock 1 -19 1158676 128880 89480 S 31.8 1.6 0:12.97 subsurf+<br /> 1404 rock 1 -19 1161504 132664 89456 S 28.1 1.7 0:24.37 subsurf+<br /> 1000 rock 20 0 705784 99024 76756 S 21.2 1.3 0:56.09 weston <br /> 363 root 20 0 94212 48112 47004 R 4.6 0.6 0:24.04 systemd+<br /> 212 root -51 0 0 0 0 S 4.3 0.0 0:10.57 irq/34-+<br /> 927 rock 20 0 16096 4608 3416 S 0.7 0.1 0:00.60 sshd <br /> 1100 root 20 0 0 0 0 I 0.7 0.0 0:01.05 kworker+<br /> 1395 root 0 -20 0 0 0 I 0.7 0.0 0:01.23 kworker+<br /> 1402 root 0 -20 0 0 0 I 0.7 0.0 0:00.76 kworker+<br /> 139 root 20 0 0 0 0 S 0.3 0.0 0:01.07 queue_w+<br /> 371 root 0 -20 0 0 0 I 0.3 0.0 0:00.16 kworker+<br /> 910 rock 20 0 16096 4600 3408 S 0.3 0.1 0:00.89 sshd <br /> 1329 root 0 -20 0 0 0 I 0.3 0.0 0:00.43 kworker+<br /> 1330 root 20 0 0 0 0 I 0.3 0.0 0:00.76 kworker+<br /> 1403 root 20 0 7124 3128 2364 R 0.3 0.0 0:00.60 top <br /> 1421 root 20 0 0 0 0 I 0.3 0.0 0:00.46 kworker+<br /></span><span style="font-family: courier;"> </span><br /></p><p><br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2207190930236349395.post-26627140577441544182023-04-30T12:01:00.004-07:002023-05-01T03:47:37.143-07:00RK3588 - Adventures with an external GPU through PCIE Gen3 x4 (Radxa Rock-5b)<p>One of the interesting features of the RK3588 is the pcie controller because of it support for a Gen3 X4 link. I'd started looking into using the controller for a forth coming project and subsequently this lead me to the idea of testing the controller against a external GPU card to gain an understanding of it limitations and potential. From what I understand Jeff Geerling has been a similar journey with the RPI CM4 and has had limited <a href="https://www.jeffgeerling.com/blog/2022/external-graphics-cards-work-on-raspberry-pi">success</a> with help from numerous developers. Furthermore there was a Radxa <a href="https://twitter.com/theradxa/status/1592374915231289344?cxt=HHwWgMDUuY6joJksAAAA">tweet</a> which a gave a teasing glimpse of the working GPU. So lets see what is or isn't possible using a Rock-5b.</p><p> </p><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="266" src="https://www.youtube.com/embed/ZqTd7LbIbMM" width="320" youtube-src-id="ZqTd7LbIbMM"></iframe></div><p> </p><p>I'd managed to get hold of a Radeon R7 520 (<a href="https://www.techpowerup.com/gpu-specs/xfx-r7-250-low-profile.b2510">XFX</a> R7 250 low-profile) card along a with M.2 Key M Extender Cable to PCIE x16 <a href="https://www.aliexpress.com/item/4000202776933.html?spm=a2g0o.store_pc_groupList.0.0.9e5f1d75rqTIoI&pdp_ext_f=%7B%22sku_id%22:%2210000000772646313%22,%22ship_from%22:%22%22%7D&gps-id=pcStoreJustForYou&scm=1007.23125.137358.0&scm_id=1007.23125.137358.0&scm-url=1007.23125.137358.0&pvid=0436eac5-5b3f-4f16-87cf-63c9cd0b053d">Graphics Card Riser Adapter</a>. To power the card I'd reused a old <a href="https://www.aliexpress.com/item/32428858468.html">LR1007</a> 120W 12VDC ATX board which was to hand. Setup as shown below, we reuse the nvme slot for the m.2 adapter and revert back to an sd card for booting an OS. I'd used the Radxa debian image with a custom compiled Radxa kernel to include the graphics card drivers and fixes. Having reviewed the pcie BAR definitions in the rk3588.dtsi there should be enough address space available for the card to use. After removing the hdmi and mali drivers from kernel config, I initially tried the amdgpu driver but that seems to report an error and no display output<br /></p><p></p><blockquote><p><span style="font-size: x-small;">[ 11.844163] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)<br />[ 11.844378] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v6_0> failed -110<br />[ 11.844383] amdgpu 0000:01:00.0: amdgpu: amdgpu_device_ip_init failed<br />[ 11.844388] amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init<br />[ 11.844414] amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.<br />[ 11.846559] [drm] amdgpu: ttm finalized<br />[ 11.848018] amdgpu: probe of 0000:01:00.0 failed with error -110<br /></span></p><p></p></blockquote><p>The radeon driver fared slightly better with a similar error but at least display output for console login<br /></p><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.blogger.com/video.g?token=AD6v5dyEbid05664YMTdvqAn--bZg47CavBoSIoGUv-1NEVNqOkmLxG-nx4E246IIb5gCSRWvGvNjgfhHoykWh_sRA' class='b-hbp-video b-uploaded' frameborder='0'></iframe></div><br /><p></p><blockquote><p><span style="font-size: x-small;">[ 12.059398] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x850C)=0xCAFEDEAD)<br />[ 12.059408] radeon 0000:01:00.0: disabling GPU acceleration<br /></span></p><p></p></blockquote><p>The was puzzling as the card relies on pcie memory mapped I/O which the RK3588 should see a standard memory and be able to read/write too. It turns out Peter Geis who was attempting to mainline a pcie driver for the RK3566 and raised 2 issues per this <a href="https://lore.kernel.org/lkml/2791594e-db60-e1d0-88e5-7e5bbd98ae4d@rock-chips.com/T/#m99c47b07bfbf4818542d0d545f41a52e2f5275ae">thread</a> which Rockchip replied too. The same issues weren't improved/fixed on the RK3588 as mentioned <a href="https://lore.kernel.org/lkml/20230227151847.207922-1-lucas.tanure@collabora.com/T/#m39a31643562646aa54a463fc211aac156492b30b">here</a> . In simple terms for our requirements:<br /></p><p>1. For the pcie dma transfers memory allocation are limited to 32bits so a 4GB board might not see an issue. While a 8GB board like mine the kernel could pick an address range above 4GB.<br /></p><p>2. AMD cards rely on pcie snooping, there is no CPU snooping on the RK3588 interconnect. So any cache copies of the same device memory block won't get updated to remain in sync.<br /></p><p>If we hack the Radeon driver to work around these issues we get:</p><p><span style="font-size: xx-small;"></span></p><blockquote><p><span style="font-size: xx-small;">[ 12.529087] [drm] ring test on 0 succeeded in 1 usecs<br />[ 12.529094] [drm] ring test on 1 succeeded in 1 usecs<br />[ 12.529102] [drm] ring test on 2 succeeded in 1 usecs<br />[ 12.529121] [drm] ring test on 3 succeeded in 8 usecs<br />[ 12.529132] [drm] ring test on 4 succeeded in 3 usecs<br />[ 12.706419] [drm] ring test on 5 succeeded in 2 usecs<br />[ 12.706427] [drm] UVD initialized successfully.<br />[ 12.816582] [drm] ring test on 6 succeeded in 18 usecs<br />[ 12.816625] [drm] ring test on 7 succeeded in 5 usecs<br />[ 12.816627] [drm] VCE initialized successfully.<br />[ 12.816879] [drm:si_irq_set [radeon]] si_irq_set: sw int gfx<br />[ 12.816921] [drm] ib test on ring 0 succeeded in 0 usecs<br />[ 12.816989] [drm:si_irq_set [radeon]] si_irq_set: sw int cp1<br />[ 12.817028] [drm] ib test on ring 1 succeeded in 0 usecs<br />[ 12.817088] [drm:si_irq_set [radeon]] si_irq_set: sw int cp2<br />[ 12.817127] [drm] ib test on ring 2 succeeded in 0 usecs<br />[ 12.817185] [drm:si_irq_set [radeon]] si_irq_set: sw int dma<br />[ 12.817224] [drm] ib test on ring 3 succeeded in 0 usecs<br />[ 12.817281] [drm:si_irq_set [radeon]] si_irq_set: sw int dma1<br />[ 12.817319] [drm] ib test on ring 4 succeeded in 0 usecs<br />[ 13.477677] [drm] ib test on ring 5 succeeded<br />[ 13.984454] [drm] ib test on ring 6 succeeded<br />[ 14.491404] [drm] ib test on ring 7 succeeded<br />...</span></p><p><span style="font-size: xx-small;">[ 14.549296] [drm] Initialized radeon 2.50.0 20080528 for 0000:01:00.0 on minor 1<br /></span></p></blockquote><p><span style="font-size: xx-small;"><br /></span> So potentially we have graphics acceleration ... let try kmstest<br /></p><p><span style="font-size: x-small;"></span></p><blockquote><p><span style="font-size: x-small;">rock@rock-5b:~$ kmstest<br />trying to open device 'i915'...failed<br />trying to open device 'amdgpu'...failed<br />trying to open device 'radeon'...done<br />main: All ok!</span></p><p></p><p></p></blockquote><p>Next (fingers crossed) kmscube</p><p><span style="font-size: xx-small;"></span></p><blockquote><span style="font-size: xx-small;">rock@rock-5b:~$ kmscube <br />Using display 0x55b67f0020 with EGL version 1.5<br />===================================<br />EGL information:<br /> version: "1.5"<br /> vendor: "Mesa Project"<br /> client extensions: "EGL_EXT_device_base EGL_EXT_device_enumeration EGL_EXT_device_query EGL_EXT_platform_base EGL_KHR_client_get_all_proc_addresses EGL_EXT_client_extensions EGL_KHR_debug EGL_EXT_platform_device EGL_EXT_platform_wayland EGL_KHR_platform_wayland EGL_EXT_platform_x11 EGL_KHR_platform_x11 EGL_MESA_platform_gbm EGL_KHR_platform_gbm EGL_MESA_platform_surfaceless"<br /> display extensions: "EGL_ANDROID_blob_cache EGL_EXT_buffer_age EGL_EXT_create_context_robustness EGL_EXT_image_dma_buf_import EGL_EXT_image_dma_buf_import_modifiers EGL_KHR_cl_event2 EGL_KHR_config_attribs EGL_KHR_create_context EGL_KHR_create_context_no_error EGL_KHR_fence_sync EGL_KHR_get_all_proc_addresses EGL_KHR_gl_colorspace EGL_KHR_gl_renderbuffer_image EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_3D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_image EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_no_config_context EGL_KHR_reusable_sync EGL_KHR_surfaceless_context EGL_EXT_pixel_format_float EGL_KHR_wait_sync EGL_MESA_configless_context EGL_MESA_drm_image EGL_MESA_image_dma_buf_export EGL_MESA_query_driver EGL_WL_bind_wayland_display "<br />===================================<br />OpenGL ES 2.x information:<br /> version: "OpenGL ES 3.2 Mesa 20.3.5"<br /> shading language version: "OpenGL ES GLSL ES 3.20"<br /> vendor: "AMD"<br /> renderer: "AMD VERDE (DRM 2.50.0, 5.10.110-99-rockchip-g6e21553c2116, LLVM 11.0.1)"<br /> extensions: "GL_EXT_blend_minmax GL_EXT_multi_draw_arrays GL_EXT_texture_filter_anisotropic GL_EXT_texture_compression_s3tc GL_EXT_texture_compression_dxt1 GL_EXT_texture_compression_rgtc GL_EXT_texture_format_BGRA8888 GL_OES_compressed_ETC1_RGB8_texture GL_OES_depth24 GL_OES_element_index_uint GL_OES_fbo_render_mipmap GL_OES_mapbuffer GL_OES_rgb8_rgba8 GL_OES_standard_derivatives GL_OES_stencil8 GL_OES_texture_3D GL_OES_texture_float GL_OES_texture_float_linear GL_OES_texture_half_float GL_OES_texture_half_float_linear GL_OES_texture_npot GL_OES_vertex_half_float GL_EXT_draw_instanced GL_EXT_texture_sRGB_decode GL_OES_EGL_image GL_OES_depth_texture GL_AMD_performance_monitor GL_OES_packed_depth_stencil GL_EXT_texture_type_2_10_10_10_REV GL_NV_conditional_render GL_OES_get_program_binary GL_APPLE_texture_max_level GL_EXT_discard_framebuffer GL_EXT_read_format_bgra GL_EXT_frag_depth GL_NV_fbo_color_attachments GL_OES_EGL_image_external GL_OES_EGL_sync GL_OES_vertex_array_object GL_OES_viewport_array GL_ANGLE_pack_reverse_row_order GL_ANGLE_texture_compression_dxt3 GL_ANGLE_texture_compression_dxt5 GL_EXT_occlusion_query_boolean GL_EXT_robustness GL_EXT_texture_rg GL_EXT_unpack_subimage GL_NV_draw_buffers GL_NV_read_buffer GL_NV_read_depth GL_NV_read_depth_stencil GL_NV_read_stencil GL_EXT_draw_buffers GL_EXT_map_buffer_range GL_KHR_debug GL_KHR_robustness GL_KHR_texture_compression_astc_ldr GL_NV_pixel_buffer_object GL_OES_depth_texture_cube_map GL_OES_required_internalformat GL_OES_surfaceless_context GL_EXT_color_buffer_float GL_EXT_sRGB_write_control GL_EXT_separate_shader_objects GL_EXT_shader_group_vote GL_EXT_shader_implicit_conversions GL_EXT_shader_integer_mix GL_EXT_tessellation_point_size GL_EXT_tessellation_shader GL_ANDROID_extension_pack_es31a GL_EXT_base_instance GL_EXT_compressed_ETC1_RGB8_sub_texture GL_EXT_copy_image GL_EXT_draw_buffers_indexed GL_EXT_draw_elements_base_vertex GL_EXT_gpu_shader5 GL_EXT_polygon_offset_clamp GL_EXT_primitive_bounding_box GL_EXT_render_snorm GL_EXT_shader_io_blocks GL_EXT_texture_border_clamp GL_EXT_texture_buffer GL_EXT_texture_cube_map_array GL_EXT_texture_norm16 GL_EXT_texture_view GL_KHR_blend_equation_advanced GL_KHR_context_flush_control GL_KHR_robust_buffer_access_behavior GL_NV_image_formats GL_OES_copy_image GL_OES_draw_buffers_indexed GL_OES_draw_elements_base_vertex GL_OES_gpu_shader5 GL_OES_primitive_bounding_box GL_OES_sample_shading GL_OES_sample_variables GL_OES_shader_io_blocks GL_OES_shader_multisample_interpolation GL_OES_tessellation_point_size GL_OES_tessellation_shader GL_OES_texture_border_clamp GL_OES_texture_buffer GL_OES_texture_cube_map_array GL_OES_texture_stencil8 GL_OES_texture_storage_multisample_2d_array GL_OES_texture_view GL_EXT_blend_func_extended GL_EXT_buffer_storage GL_EXT_float_blend GL_EXT_geometry_point_size GL_EXT_geometry_shader GL_EXT_shader_samples_identical GL_KHR_no_error GL_KHR_texture_compression_astc_sliced_3d GL_OES_EGL_image_external_essl3 GL_OES_geometry_point_size GL_OES_geometry_shader GL_OES_shader_image_atomic GL_EXT_clip_cull_distance GL_EXT_disjoint_timer_query GL_EXT_texture_compression_s3tc_srgb GL_EXT_window_rectangles GL_MESA_shader_integer_functions GL_EXT_clip_control GL_EXT_color_buffer_half_float GL_EXT_memory_object GL_EXT_memory_object_fd GL_EXT_texture_compression_bptc GL_KHR_parallel_shader_compile GL_NV_alpha_to_coverage_dither_control GL_EXT_EGL_image_storage GL_EXT_texture_sRGB_R8 GL_EXT_texture_shadow_lod GL_INTEL_blackhole_render GL_MESA_framebuffer_flip_y GL_EXT_depth_clamp GL_EXT_texture_query_lod "<br />===================================<br />Using modifier ffffffffffffff<br />Modifiers failed!<br />Bus error<span style="font-size: xx-small;"><p> </p></span></span></blockquote>The 'bus error' indicates a memory alignment issue and turns out to be a bit of a of rabbit hole. To fix the Radeon kernel driver we are ensuring the cards memory is mapped as '<a href="https://developer.arm.com/documentation/den0024/a/Memory-Ordering/Memory-types/Device-memory">Device memory</a>' type Device-nGnRnE. If it were 'Normal Memory' then unaligned access is <a href="https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions">allowed</a>. This implies fixing up userspace drivers/applications as these errors are encountered as these applications can directly manlipulate the cards memory. For this particular bus error it was caused by a memcpy in the radeon gallium driver and fixed applied there and as shown in the video kmscube runs<br /><p></p><p></p><blockquote><span style="font-size: xx-small;">===================================<br />Using modifier ffffffffffffff<br />Modifiers failed!<br />Using modifier ffffffffffffff<br />Modifiers failed!<br />Rendered 120 frames in 2.000246 sec (59.992635 fps)<br />Rendered 240 frames in 4.000428 sec (59.993577 fps)<br />Rendered 361 frames in 6.016865 sec (59.998019 fps)<br />Rendered 481 frames in 8.017015 sec (59.997390 fps)<br />Rendered 601 frames in 10.017050 sec (59.997704 fps)<br />Rendered 721 frames in 12.017079 sec (59.997942 fps)<br />Rendered 841 frames in 14.017118 sec (59.998067 fps)<br />Rendered 961 frames in 16.017314 sec (59.997574 fps)<br />Rendered 1082 frames in 18.033850 sec (59.998280 fps)</span></blockquote>Similiar fixes were applied to glmark2-drm & glmark2-es2-drm to run successfully (1680x1050 resolution) although the terrain scene displayed a bunch of colored bars on the screen.<p></p><p></p><blockquote><p><span style="font-size: xx-small;">=======================================================<br /> glmark2 2021.12<br />=======================================================<br /> OpenGL Information<br /> GL_VENDOR: AMD<br /> GL_RENDERER: AMD VERDE (DRM 2.50.0, 5.10.110-99-rockchip-g6e21553c2116, LLVM 11.0.1)<br /> GL_VERSION: 4.5 (Compatibility Profile) Mesa 20.3.5<br /> Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0<br /> Surface Size: 1680x1050 fullscreen<br />=======================================================<br />[build] use-vbo=false: FPS: 939 FrameTime: 1.066 ms<br />[build] use-vbo=true: FPS: 2411 FrameTime: 0.415 ms<br />[texture] texture-filter=nearest: FPS: 1957 FrameTime: 0.511 ms<br />[texture] texture-filter=linear: FPS: 1958 FrameTime: 0.511 ms<br />[texture] texture-filter=mipmap: FPS: 2003 FrameTime: 0.499 ms<br />[shading] shading=gouraud: FPS: 1975 FrameTime: 0.506 ms<br />[shading] shading=blinn-phong-inf: FPS: 1973 FrameTime: 0.507 ms<br />[shading] shading=phong: FPS: 1976 FrameTime: 0.506 ms<br />[shading] shading=cel: FPS: 1974 FrameTime: 0.507 ms<br />[bump] bump-render=high-poly: FPS: 1739 FrameTime: 0.575 ms<br />[bump] bump-render=normals: FPS: 2373 FrameTime: 0.422 ms<br />[bump] bump-render=height: FPS: 2330 FrameTime: 0.429 ms<br />[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 1254 FrameTime: 0.798 ms<br />[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 707 FrameTime: 1.415 ms<br />[pulsar] light=false:quads=5:texture=false: FPS: 1338 FrameTime: 0.747 ms<br />[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 456 FrameTime: 2.194 ms<br />[desktop] effect=shadow:windows=4: FPS: 600 FrameTime: 1.667 ms<br />[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 214 FrameTime: 4.684 ms<br />[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 233 FrameTime: 4.306 ms<br />[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 347 FrameTime: 2.885 ms<br />[ideas] speed=duration: FPS: 1430 FrameTime: 0.700 ms<br />[jellyfish] <default>: FPS: 806 FrameTime: 1.242 ms<br />[terrain] <default>: FPS: 150 FrameTime: 6.706 ms<br />[shadow] <default>: FPS: 843 FrameTime: 1.188 ms<br />[refract] <default>: FPS: 115 FrameTime: 8.718 ms<br />[conditionals] fragment-steps=0:vertex-steps=0: FPS: 1970 FrameTime: 0.508 ms<br />[conditionals] fragment-steps=5:vertex-steps=0: FPS: 1980 FrameTime: 0.505 ms<br />[conditionals] fragment-steps=0:vertex-steps=5: FPS: 1972 FrameTime: 0.507 ms<br />[function] fragment-complexity=low:fragment-steps=5: FPS: 1979 FrameTime: 0.505 ms<br />[function] fragment-complexity=medium:fragment-steps=5: FPS: 1971 FrameTime: 0.507 ms<br />[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 1972 FrameTime: 0.507 ms<br />[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 1972 FrameTime: 0.507 ms<br />[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 1968 FrameTime: 0.508 ms<br />=======================================================<br /> glmark2 Score: 1450 <br />=======================================================</span></p><p></p></blockquote><p>Next up was to see if startx would run, unfortunately it drops out with a shader compiler error. Looks like glamor is using egl but encounters an opengl shader to compile, requires further investigation.</p><p></p><blockquote><span style="font-size: xx-small;">[ 7916.924] (II) modeset(0): Modeline "360x202"x119.0 11.25 360 372 404 448 202 204 206 211 doublescan -hsync +vsync (25.1 kHz d)<br />[ 7916.924] (II) modeset(0): Modeline "360x202"x118.3 10.88 360 384 400 440 202 204 206 209 doublescan +hsync -vsync (24.7 kHz d)<br />[ 7916.924] (II) modeset(0): Modeline "320x180"x119.7 9.00 320 332 360 400 180 181 184 188 doublescan -hsync +vsync (22.5 kHz d)<br />[ 7916.924] (II) modeset(0): Modeline "320x180"x118.6 8.88 320 344 360 400 180 181 184 187 doublescan +hsync -vsync (22.2 kHz d)<br />[ 7916.925] (II) modeset(0): Output DVI-D-1 status changed to disconnected.<br />[ 7916.925] (II) modeset(0): EDID for output DVI-D-1<br />[ 7916.939] (II) modeset(0): Output VGA-1 status changed to disconnected.<br />[ 7916.939] (II) modeset(0): EDID for output VGA-1<br />[ 7916.939] (II) modeset(0): Output HDMI-1 connected<br />[ 7916.939] (II) modeset(0): Output DVI-D-1 disconnected<br />[ 7916.939] (II) modeset(0): Output VGA-1 disconnected<br />[ 7916.939] (II) modeset(0): Using exact sizes for initial modes<br />[ 7916.939] (II) modeset(0): Output HDMI-1 using initial mode 1680x1050 +0+0<br />[ 7916.939] (==) modeset(0): Using gamma correction (1.0, 1.0, 1.0)<br />[ 7916.939] (==) modeset(0): DPI set to (96, 96)<br />[ 7916.939] (II) Loading sub module "fb"<br />[ 7916.939] (II) LoadModule: "fb"<br />[ 7916.940] (II) Loading /usr/lib/xorg/modules/libfb.so<br />[ 7916.944] (II) Module fb: vendor="X.Org Foundation"<br />[ 7916.944] compiled for 1.20.11, module version = 1.0.0<br />[ 7916.944] ABI class: X.Org ANSI C Emulation, version 0.4<br />[ 7916.964] Failed to compile VS: 0:1(1): error: syntax error, unexpected NEW_IDENTIFIER<br /><br />[ 7916.964] Program source:<br />precision highp float;<br />attribute vec4 v_position;<br />attribute vec4 v_texcoord;<br />varying vec2 source_texture;<br /><br />void main()<br />{<br /> gl_Position = v_position;<br /> source_texture = v_texcoord.xy;<br />}<br />[ 7916.964] (EE)<br />Fatal server error:<br />[ 7916.964] (EE) GLSL compile failure<br />[ 7916.964] (EE)<br /></span> </blockquote><p></p><p>Lastly I installed vappi to attempt video playback unfortunately even after fixing a couple of bus errors in galmium theres more to fix. So this pretty much sums up the nature of the problem to address. Furthermore this does raise the question is the tweet from Radxa using acclerated graphics given the hardware restrictions of the RK3588.<br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2207190930236349395.post-87965921198124633782023-01-15T06:51:00.005-08:002023-03-27T00:37:56.879-07:00RK3588 - Decoding & rendering 16 1080p streams<p></p><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="266" src="https://www.youtube.com/embed/Ew4ZMljSIZI" width="320" youtube-src-id="Ew4ZMljSIZI"></iframe></div><br /> <p></p><p> </p><p>I'm currently working on a video application for the RK3588 given it is one of the few processors on the market that currently has native HDMI input support (up to 4K30). As part of that work one of the first tasks has been trying to rendering video efficiently within a Wayland/Weston window (not full screen). I reverted to Wayland for video because from my testing on X11 it can result in tearing if not played full screen as the graphic stack (ARM Mali )has no ability to vsync. The existing Rockchip SDK patches the gstreamer waylandsink plugin to provide video rendering support for Wayland. However there are a number of challenges to get the waylandsink to render to a Weston window as by default it resorts to full screen, resulting in a Weston application launching a secondary full screen window to display video within. Whilst trying to find a solution to this problem I can a across a number of claims about the video decoder (part of the VPU) :<br /></p><p><span style="font-size: x-small;">Up to 32-channel 1080P@30fps decoding (<a href="https://en.t-firefly.com/product/industry/rocrk3588pc">FireFly ROC-RK3588-PC)</a></span></p><p><span style="font-size: x-small;">x32 1080P@60fps channels (H.265/VP9) (<a href="https://www.khadas.com/edge2">Khandas Edge 2</a>)</span></p><p><span style="font-size: x-small;">Up to 32 channels 1080P@30fps decoding (<a href=" https://www.pepper-jobs.eu/en/x3588-i.html">PEPPER JOBS X3588</a>)</span></p><p>After reviewing the RK3588 <a href="https://github.com/pengyixing/RK3588-Development-Board/blob/main/mini-pc.md">datasheet</a> and TRM I can't find a mention of this capability by Rockchip so I'd assume this a derived figure based on this statement in the datasheet "<i>Multi-channel decoder in parallel for less resolution</i>". From the datasheet H264 max resolution decode is 8K@30 and H265 it is 8K@60, theoretically that would mean 16 channels for H264 1080@30 and possibly 32 for H265 if each stream is 1080@30.</p><p>So the challenged turned out be could I decode 16 1080p streams and render each within its own window on a 1080@60 display? As you can tell from the above video it is possible. This is a custom Weston application running on a <a href="https://wiki.radxa.com/Rock5">Rock 5B board </a> , each video is being read/decoded from a separate file (there is a mixture of trailers/open videos & a fps test video) and then rendered. Initially I tried to resizing each video using RGA3 (Raster Graphics Acceleration) however this turned out be to non-performant as RGA doesn't seem to cope well with more a than few videos. In turns out the only way to render is to use AFBC (Arm framebuffer compression). For this test there are 14 H264 streams (mixture of 30 & 60 fps) and 2 H265 60fps streams. </p>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-2207190930236349395.post-20191315709586405492022-08-26T09:12:00.011-07:002022-08-28T10:13:16.475-07:00Inside another fake ELM327 adaptor (filled with Air)<p>I'd ordered a couple of ELM327 <a href="https://www.aliexpress.com/item/1005004189070205.html?spm=a2g0o.store_pc_groupList.8148356.59.32496163euY7VD&pdp_npi=2%40dis%21GBP%21%EF%BF%A12.61%21%EF%BF%A12.61%21%21%21%21%21%402100bb5116615304065653759e5843%2112000028326389741%21sh">compatible</a> <a href="https://www.aliexpress.com/item/1005004074085562.html?spm=a2g0o.store_pc_saleItems.0.0.562e3456cGDc8h&pdp_ext_f=%7B%22sku_id%22:%2212000027952844105%22,%22ship_from%22:%22%22%7D&gps-id=pcStoreJustForYou&scm=1007.23125.137358.0&scm_id=1007.23125.137358.0&scm-url=1007.23125.137358.0&pvid=9d191848-dd72-4442-aefa-47861ba7464e">adapters</a> from Aliexpress expecting that these would be similar to the item in the image below. Normally these contain a PCB board to fit the enclosure and populated with the unknown MCU (covered with epoxy), a Bluetooth chip, CAN transceiver and the necessary circuity to support a K-Line interface.<br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgByRzrr4FmfHGbVb_qKXG0478Ysn3_qS3eYSfAbEL5JVwrpHeaBMk6aM_vKaHqIrRYNdSYDVdYXxpqLSnfCni97ynGH-rh98BlBEkmD2MKuQZgJPoTrQZCQkYt8Od0po3BGWmW2ndcyc21tHw-RU2pSZKLfEA6aXGTiDEyW6lsvxOqG5WUZNoohhB17g/s1000/elm327.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1000" data-original-width="1000" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgByRzrr4FmfHGbVb_qKXG0478Ysn3_qS3eYSfAbEL5JVwrpHeaBMk6aM_vKaHqIrRYNdSYDVdYXxpqLSnfCni97ynGH-rh98BlBEkmD2MKuQZgJPoTrQZCQkYt8Od0po3BGWmW2ndcyc21tHw-RU2pSZKLfEA6aXGTiDEyW6lsvxOqG5WUZNoohhB17g/s320/elm327.jpg" width="320" /></a></div><p>After dissecting the received adapters here is what we have, 80% air and a small pcb.<br /></p><div class="separator" style="clear: both; text-align: center;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8kxGhG6ppdPFO_HTLB4vrbEyhrWEF3JFFO8WiqCrf79j5xtytJZSFkqcbdHfPB_TKytrQelXisvdJHgLi2sbcDLjS26dE-FqnTEiE9Ssyh1IKYGj_BfmRJQ-OR1jtc0pYzEsUVWQRzkK7XC6zooszCb6Fv4ZvZjFA3Q-66pqsVP-squWRu2RnCl5muQ/s4032/IMG_1976.JPG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="4032" data-original-width="3024" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8kxGhG6ppdPFO_HTLB4vrbEyhrWEF3JFFO8WiqCrf79j5xtytJZSFkqcbdHfPB_TKytrQelXisvdJHgLi2sbcDLjS26dE-FqnTEiE9Ssyh1IKYGj_BfmRJQ-OR1jtc0pYzEsUVWQRzkK7XC6zooszCb6Fv4ZvZjFA3Q-66pqsVP-squWRu2RnCl5muQ/s320/IMG_1976.JPG" width="240" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9SC3eVqfiVD_8RPmSVrh5d4DDVs2MYLOtg2GELXq5BLvBpIdHZraUxP74UfX1blL27JEcdVVvsSH4qIbMT0N6cEL1Q7AQch9eSoSOOuBmJmQI3pJM6x8TUvgjS5KeFezTik5P2MYQHwhsa51FiKsdNmrCHmkoZE5mSjiFkKMd_RWnhPwVXqrO_lU5lA/s4032/IMG_1979.JPG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="4032" data-original-width="3024" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9SC3eVqfiVD_8RPmSVrh5d4DDVs2MYLOtg2GELXq5BLvBpIdHZraUxP74UfX1blL27JEcdVVvsSH4qIbMT0N6cEL1Q7AQch9eSoSOOuBmJmQI3pJM6x8TUvgjS5KeFezTik5P2MYQHwhsa51FiKsdNmrCHmkoZE5mSjiFkKMd_RWnhPwVXqrO_lU5lA/s320/IMG_1979.JPG" width="240" /></a></div></div><p></p><br /><p>Pictures of the small PCB reveal a single 16 SOP package (and a 24Mhz crystal) with the chip marking etched out and no BLE chip or CAN transceiver present 😒. Is this one chip doing all the work?<br /></p><div class="separator" style="clear: both; text-align: center;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEja71m9WsddV2kbp0vXTUn2b7sj40pWpVspogF193fUljGJLQDUTaYwiQaAIS0lKbwr2EpVN-WfiDld2yTu_xl35wJcT2Py2UILmkBW6y9AdM-MIyfjQ0zun0d4vIBzhnCT8ecwZFrcVupJHGgXv1F1ZKCdItz-KmfdWsPNZoSfHkLHiufGGXiNC4Uplw/s4032/IMG_1966.JPG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="4032" data-original-width="3024" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEja71m9WsddV2kbp0vXTUn2b7sj40pWpVspogF193fUljGJLQDUTaYwiQaAIS0lKbwr2EpVN-WfiDld2yTu_xl35wJcT2Py2UILmkBW6y9AdM-MIyfjQ0zun0d4vIBzhnCT8ecwZFrcVupJHGgXv1F1ZKCdItz-KmfdWsPNZoSfHkLHiufGGXiNC4Uplw/s320/IMG_1966.JPG" width="240" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj561DtuMIvCXM_tHiYnwk6uHZ_lLq35otZvvPnTUpyL4Gqqb8XA5bU-sVpWrqC620501Ro_B-UOZaMk0bBmImUZG_iJ9bOfyniC3ZvTTKGO16ZoiP25-nBlCS4pqzag40JsF4QHDdALbRkqCKBhuO3Jm0QGg7BcP28Ru6jXgtx5ohHnGeBnHhMr26tFA/s4032/IMG_1972.JPG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="4032" data-original-width="3024" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj561DtuMIvCXM_tHiYnwk6uHZ_lLq35otZvvPnTUpyL4Gqqb8XA5bU-sVpWrqC620501Ro_B-UOZaMk0bBmImUZG_iJ9bOfyniC3ZvTTKGO16ZoiP25-nBlCS4pqzag40JsF4QHDdALbRkqCKBhuO3Jm0QGg7BcP28Ru6jXgtx5ohHnGeBnHhMr26tFA/s320/IMG_1972.JPG" width="240" /></a></div><br /></div><p>From a software point of view the device reports itself as ELM V2.1 and I managed to retrieve the firmware version as TDA99 V0.34.0628C (not sure what it means though). The firmware is extremely buggy and feature wise incomplete for ELM V2.1.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhd-J9UrCatK34bYJ9ldGiEi2_6TLjfkI57G9BBTuFDhSjg4MYQIea6s1u2sVKtCVbQm7bxdIpMao7Aqqs0V8SQjhopAqdOnp7YZy539oRQsRlxrDPMIOTXghEEv5wqYP9fFQSIY3bM8sztgb52uIdBekJgKUpTn8BhXgTx45NXx46M2iAW6qHdemM9Kg/s1334/image0.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1334" data-original-width="750" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhd-J9UrCatK34bYJ9ldGiEi2_6TLjfkI57G9BBTuFDhSjg4MYQIea6s1u2sVKtCVbQm7bxdIpMao7Aqqs0V8SQjhopAqdOnp7YZy539oRQsRlxrDPMIOTXghEEv5wqYP9fFQSIY3bM8sztgb52uIdBekJgKUpTn8BhXgTx45NXx46M2iAW6qHdemM9Kg/s320/image0.png" width="180" /></a> <br /></div><p>The intriguing question was "could a 16 pin chip" replace a number of discrete components. After days of research it turns out the chip seems to be a repurposed Bluetooth audio/toy chip (possibly from <a href="http://www.zh-jieli.com/">ZhuHai Jieli Technology</a> ). The same unmarked chip seems to be present on the <a href="http://www.thinmi.com/elmc327.html">Thinmi ELM327C</a> with the chip referred to as QBD255. Can't locate any information for the QBD255.<i> Worst to come is that the CAN implementation seems to be completely written in software (hence no CAN transceiver) and therefore prone to timing errors and limited data rates. </i>Furthermore this chip must have limited memory/flash hence the incomplete implementation of ELM features. </p><p>Buyer beware! <br /></p><p>I suspect this chip may be the Jieli AC6329F or AC6329C but need to prove it somehow?</p><p><span style="color: #2b00fe;"><b>Update 28-08-2022: </b></span></p><p>There seems to be another chipset floating around from YMIOT, described as "<i>ELM327 V2.1 Bluetooth universal diagnostic adapter with 16-pin YM1130 1343E38 chip</i>"<br /></p><p>History of this chipset is below:</p><p>2017 - YM1120 (131G76)<br />2018 - YM1122 (1218F57) & YM1121<br />2019 - YM1130 (1343E38)<br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2207190930236349395.post-9983518657854728142021-04-29T00:19:00.002-07:002021-04-29T00:27:39.284-07:00Reverse engineering the V831 NPU (Neural Processor Unit) <p>I took up the challenge posted on the <a href="https://twitter.com/SipeedIO/status/1316721429400887296">sipeed</a> twitter feed </p><p><span style="font-size: small;"><i>"<span face="-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif" style="-webkit-text-stroke-width: 0px; background-color: white; color: #0f1419; display: inline; float: none; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; white-space: pre-wrap; word-spacing: 0px;">We are reversing V831's NPU register, and make opensource AI toolchian based on NCNN~
If you are interested in making opensource AI toolchain and familiar with NCNN, please contact support at sipeed.com, we will send free sample board for you to debug"</span></i></span></p><p><span style="font-size: small;"><a href="https://www.sipeed.com/">Sipeed</a> were kind enough to send me one of the initial prototype board of the <a href="https://www.cnx-software.com/2021/01/06/sipeed-maix-ii-dock-is-an-allwinner-v831-powered-aiot-vision-devkit/">MAXI-II</a>. To give you a brief introduction the V831 is a camera SOC targeting video encoding applications (cctv, body cams, etc.). It comprises of a Cortex A7 processor combined with 64MB of embedded RAM and for those interested full details of the V831 capabilities can be found in the <a href="https://www.cnx-software.com/pdf/datasheet/V833%EF%BC%8FV831_Datasheet_V1.0(For%20%E7%B4%A2%E6%99%BA).pdf">datasheet</a>.<br /></span></p><p>The datasheet is sparse on information about the NPU :</p><ul style="text-align: left;"><li><i>V831: Maximum performance up to 0.2Tops</i></li><li><i>Supports Conv, Activation, Pooling, BN, LRN, FC/Inner Product</i><br /></li></ul><p>In addition the datsheet briefly mentions two registers in refer to the NPU, one for enabling/resetting the NPU and the other for setting the clock source. No mention of how it can be programmed to perform the operations specified in the datasheet.<br /></p><p>Fortunately the registers listed in the sipeed twitter post provided a first clue and after many months of trial and error, endless deciphering of data dumps, a few dead ends and numerous reverse engineering attempts, parts of the NPU operations have been decoded. Fundamentally a large portion of the NPU is a customised implementation of Nvidia Deep Learning Accelerator (NVDLA) architecture. More details about the project can be found on the <a href="http://nvdla.org/">NVDLA</a> site and here is a quote of it aims :</p><p><i>The NVIDIA Deep Learning Accelerator (NVDLA) is a free and open architecture that promotes a standard way to design deep learning inference accelerators.</i></p><p><span style="font-size: small;">What I have determined so far about the NPU is:</span></p><p><span style="font-size: small;">1. The NPU clock can be set between 100-1200 Mhz with the code defaulting to 400 Mhz. My hunch is that this may tie to the clock speed of the onboard DDR2 memory. <br /></span></p><p><span style="font-size: small;">2. NPU is implemented with nv_small configuration (<a href="http://nvdla.org/primer.html">NV Small Model</a>) and relies on system memory for all data operations. Importantly CPU and NPU are sharing the memory bus.<br /></span></p><p><span style="font-size: small;">3. It supports both int8 and int16, however I haven't verified if FP16 is supported or not. Theoretically int8 should be twice as fast as int16 while also preserving memory given the V831 limited onboard memory (</span><span style="font-size: small;">64Mb).<br /></span></p><p><span style="font-size: small;">4. Number of MACs is 64 </span><span style="font-size: small;"><i style="-webkit-text-stroke-width: 0px; background-color: white; box-sizing: border-box; color: #333333; font-family: DINPro-Regular; font-size: 14px; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: -0.14px; orphans: 2; text-align: left; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">(Atomic-C * Atomic-K)</i></span><br /></p><p>5. NPU registers are memory mapped and therefore can be programmed from userspace which proved to be extremely useful for initial debugging & testing.<br /></p><p>6. NPU requires physical address locations when referencing weights & input/output data locations therefore<span style="font-size: small;"> kernel memory needs to be allocated and the physical addresses retrieved if accessed from userspace.</span></p><p><span style="font-size: small;">7. NPU weights and input/output data follow a similar layout to the NVDLA private formats. Therefore well knows formats like </span><span style="font-size: small;"><span style="font-size: small;">nhwc or nchw</span> require transformation before they can be fed to the NPU.<br /></span></p><p>Initial code from my endeavours is located in this repo <a href="https://github.com/mtx512/v831-npu">v831-npu</a> and should be treated as <b><i>work in progress</i></b>. Hopefully this forms the basis of the fully open source implementation. The <span style="font-family: courier;">tests</span> directory has code from my initial attempts to interact with the hardware and is redundant. However it can be used as an initial introduction to how the hardware units works and what configuration is required. So far I have decoded the CONV, SDP and PDP units which allow for the following operations (tested with int8 data type) :<br /></p><p>1. Direct Convolutions</p><p>2. Bias addition</p><p>3. Relu/Prelu</p><p>4. Element wise operations <br /></p><p>5. Max/Average pooling<br /></p><p>To verify most of the above I ported across the cifar10 example (see examples directory) from ARMs <a href="https://github.com/ARM-software/CMSIS_5/tree/develop/CMSIS/NN">CMSIS_5 NN</a> library. Furthermore I have managed to removed all dependencies on closed AllWinner libraries, this is partially achieved by implementing a simple ION memory allocation utility. Instructions to build cifar10 for deploying on the MAXI-II are below (assuming you using a linux machine) :<br /></p><p>1. Clone the SDK toolchain git repo from <a href="https://github.com/lindenis-org/lindenis-v536-prebuilt">here</a>. We are still dependent on the SDK toolchain as the MAXI-II kernel/rootfs is built with this toolchain.<br /></p><p>2. Export PATH to include <span style="font-family: courier;">'lindenis-v536-prebuilt/gcc/linux-x86/arm/toolchain-sunxi-musl/toolchain/bin'</span> so that <span style="font-family: courier;">arm-openwrt-linux-gcc</span> can be found.</p><p>3. Run 'make'</p><p>4. Copied built executable 'nna_cifar10' to MAXI-II</p><p>5. Run './nna_cifar10', output should be as below given the input image was a boat:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtYH8erQShwdFSFltI26RiBnUeErFk_PB3SL478AEKSx4bukh4ndKg_IEvzPBNpOXAwETzkI6yVZFJKaN_vI5imlqbknqHfLgjLIqWnyN6UGiOHSWftg1m3DvFzBzTVGfAd_2O_pVhJECu/s806/Screenshot+from+2021-04-28+15-52-35.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="615" data-original-width="806" height="244" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtYH8erQShwdFSFltI26RiBnUeErFk_PB3SL478AEKSx4bukh4ndKg_IEvzPBNpOXAwETzkI6yVZFJKaN_vI5imlqbknqHfLgjLIqWnyN6UGiOHSWftg1m3DvFzBzTVGfAd_2O_pVhJECu/w320-h244/Screenshot+from+2021-04-28+15-52-35.png" width="320" /></a></div><br /><p>There is still quite a bit of work left to be done such as :</p><p>1. Weight and input/output data conversion utility</p><p>2. The NPU should support pixel input formats which needs to be verified. <br /></p><p>2. Decoding remaining hardware units</p><p>3. Possibly integrating with an existing AI framework or writing a compiler.<br /></p><p>By the way the new <a href="https://beagleboard.org/beaglev">Beagle V</a> is also spec'd to implement NVDLA with a larger MAC size of 1024.<br /></p><p>I would like to thank <a href="https://www.sipeed.com/">sipeed</a> for providing the hardware/software.</p><p><br /></p><p> <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMxUV3bGxupCh0viv6NHWSqzXryROxl5G4pyToanv1G7D8oDK09DFLgIK2vsMYPthzFAUKvuy2IDlOK9ATz-a4Xw7qmFCGUYUIMBaWzHVNAu4QaqkXZ-EmLcqbp3OSh9yunWyHHZZcMVzU/s1600/thumbnail_MotiveOrder_logo_V2_withURL_web_150pxW.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="78" data-original-width="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMxUV3bGxupCh0viv6NHWSqzXryROxl5G4pyToanv1G7D8oDK09DFLgIK2vsMYPthzFAUKvuy2IDlOK9ATz-a4Xw7qmFCGUYUIMBaWzHVNAu4QaqkXZ-EmLcqbp3OSh9yunWyHHZZcMVzU/s1600/thumbnail_MotiveOrder_logo_V2_withURL_web_150pxW.png" /></a>I liked to thank <a href="http://motiveorder.com/">motiveorder.com</a> for sponsoring the development time for this work. </p>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-2207190930236349395.post-91573625207510796072020-03-08T06:53:00.003-07:002020-03-08T06:53:42.991-07:00ESP32 impersonates a Particle Xenon<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/4wrN15wUR94/0.jpg" src="https://www.youtube.com/embed/4wrN15wUR94?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<br />
<br />
With the <a href="https://community.particle.io/t/particle-mesh-update-a-note-from-the-ceo/54394">announcement</a> that Particle will no longer manufacture the Xenon development board and drop their OpenThread based mesh networking solution. We decided to see if we could impersonate an existing claimed Xenon(s) (ie one that is already registered on the cloud) on alternative hardware. Hence the idea of 'bring your own device' to connect to the cloud.<br />
<br />
After reviewing the <a href="https://github.com/particle-iot/device-os">device-os source code</a> for a few months it turned out to get a proof of concept working I need a implemented at minimum the following:<br />
<br />
1. Port across the dtls protocol layer as it turns out the Gen 3 devices create a secure UDP socket connection over dtls.<br />
2. Extract the devices private key and the cloud public key (no certificates are stored). Particles implementation of the dtls handshake purely relies on Raw Public Key support (RFC7250).<br />
3. Implement a COAP layer as the '<a href="https://github.com/particle-iot/device-os/tree/develop/communication">Spark protocol</a>' is built on top of this.<br />
<br />
The above was implemented as set of library functions using the ESP32-IDF and I reused the ESP32 (<a href="https://www.banggood.com/LILYGO-TTGO-16M-bytes-128M-Bit-Pro-ESP32-OLED-V2_0-Display-WiFi-bluetooth-ESP-32-Module-For-Arduino-p-1205876.html?cur_warehouse=CN">LILYGO TTGO</a>) from the previous post which fortunately hosts a OLED 128x64 display. In the video we demonstrate :<br />
<br />
1. Connects to a wifi access point.<br />
2. Retrieves time from a SNTP server.<br />
3. Connects to the Particle Cloud via a dtsl handshake.<br />
4. Sends a number of 'Spark protocol' messages to let the cloud know the Xenon is alive. <br />
5. Awaits commands from the Cloud, including ping and signal operations.
When receiving the signal command the screen scrolls the text from left
to right.<br />
<br />
I liked to thank <a href="http://motiveorder.com/">motiveorder.com</a> for sponsoring the hardware and development time for this article. Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2207190930236349395.post-7863736443351607982019-11-30T11:37:00.003-08:002019-12-01T09:39:18.178-08:00Particle Xenon - Adding WIFI support with a EPS32<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/_YGgOhIjr-g/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/_YGgOhIjr-g?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSq4_ycZ1B1Tn4CD8uERwIU2sC16tCX5CkbLglzbJ865ru1_Wz7T6dVq5OROT_-kS1A4PMlmLdI2IedkfRbywX3er7UzmVvV-1gHB2PtfM4LWcUjPT9VVTmcZ8_MUSBYYrlbIC_uEt6XU9/s1600/thumbnail_MotiveOrder_logo_V2_withURL_web_150pxW.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="78" data-original-width="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSq4_ycZ1B1Tn4CD8uERwIU2sC16tCX5CkbLglzbJ865ru1_Wz7T6dVq5OROT_-kS1A4PMlmLdI2IedkfRbywX3er7UzmVvV-1gHB2PtfM4LWcUjPT9VVTmcZ8_MUSBYYrlbIC_uEt6XU9/s1600/thumbnail_MotiveOrder_logo_V2_withURL_web_150pxW.png" /></a>The preferred option for WIFI support with Gen 3 devices is to deploy a <a href="https://docs.particle.io/argon/">Particle Argon</a>. The Argon consists of a Nordic nRF52840 paired with Espressif ESP32. The EPS32 simply provides the WIFI interface and is running a customised version of EPS-AT firmware (<a href="https://github.com/particle-iot/argon-ncp-firmware">argon-ncp-firmware</a>). The nRF52840 communicates with the EPS32 using one its serial ports using fours pins TX,RX,CTS & RTS. The challenge here was to see if we could enable WIFI support on Particle Xenon by connecting it to a ESP32 running the argon-ncp-firmware. As demonstrated in the video it was possible although it required a number of hoops to jump through.<br />
<br />
Unfortunately the only spare EPS32 board I had was a <a href="https://www.banggood.com/LILYGO-TTGO-16M-bytes-128M-Bit-Pro-ESP32-OLED-V2_0-Display-WiFi-bluetooth-ESP-32-Module-For-Arduino-p-1205876.html?cur_warehouse=CN">LILYGO TTGO</a> this is a 16M board with a OLED display. So the first task was porting the argon-ncp-firmware and re-factoring the pin mappings to support this board. Once this was complete it was fairly easy to validate the firmware was functioning by simply executing the AT commands the Argon issues to establish WIFI connectivity.<br />
<br />
For the Xenon the primary changes were to port across the Argon EPS32 networking code. Which turned out to be more challenging that envisaged primarily because the Xenon firmware isn't expecting a WIFI configuration and the command line tools don't support provisioning a WIFI connection for a Xenon. After 4 weeks of effort I finally had built a working version of the Xenon firmware. It took another 2 weeks to get the Xenon provisioned a WIFI configuration so it could connect to the Particle Cloud.<br />
<br />
The main drawback of this approach is that is the firmware on the both the Xenon and ESP32 are customised therefore any updates from the Cloud would override the changes. Hence a customised rebuild is required when new firmware is released.<br />
<br />
I liked to thank <a href="http://motiveorder.com/">motiveorder.com</a> for sponsoring the hardware and development time for this article. Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2207190930236349395.post-6592162111207195782019-11-26T10:42:00.000-08:002019-12-01T09:22:46.546-08:00Particle Xenon - Enable Ethernet connectivity with a low cost W5500 module<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHESR6EwDNhE3fatqzcnxdEYJi05vKgoD1yGz5FStJEUgFouloJKpQ_rti5zHLkXlmrq8wnpwohyphenhyphenaqWq-hTXpfhZjQhYZBPWd-Oq4m2z6GI2dWWgo_oJZvdqzBPnrO708l15dTmgDXf9RB/s1600/w5500.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" data-original-height="1000" data-original-width="1000" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHESR6EwDNhE3fatqzcnxdEYJi05vKgoD1yGz5FStJEUgFouloJKpQ_rti5zHLkXlmrq8wnpwohyphenhyphenaqWq-hTXpfhZjQhYZBPWd-Oq4m2z6GI2dWWgo_oJZvdqzBPnrO708l15dTmgDXf9RB/s200/w5500.jpg" width="200" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHESR6EwDNhE3fatqzcnxdEYJi05vKgoD1yGz5FStJEUgFouloJKpQ_rti5zHLkXlmrq8wnpwohyphenhyphenaqWq-hTXpfhZjQhYZBPWd-Oq4m2z6GI2dWWgo_oJZvdqzBPnrO708l15dTmgDXf9RB/s1600/w5500.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1S-z48UisPJIT9JzJhZa3EhY9NDohss2O1gawUrb5A2SFpRf2Pyb4h6B1NCRob6lb2Tn5rwlUxf7DuL3K58jq9QSCjKustEQM14hz5p9p8Ay3fP8JT0uFsre2M7eV_Ui46_45KvzvEn3T/s1600/w5500_1.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="666" data-original-width="1000" height="133" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1S-z48UisPJIT9JzJhZa3EhY9NDohss2O1gawUrb5A2SFpRf2Pyb4h6B1NCRob6lb2Tn5rwlUxf7DuL3K58jq9QSCjKustEQM14hz5p9p8Ay3fP8JT0uFsre2M7eV_Ui46_45KvzvEn3T/s200/w5500_1.jpg" width="200" /></a>The preferred option to enable a Xenon to act as a Gateway is to deploy the <a href="https://store.particle.io/products/particle-ethernet-featherwing">Particle Ethernet FeatherWing</a>. Unfortunately I didn't have one to hand, however after reviewing the schematics it turns out this FeatherWing simply relies on the WIZnet W5500 Ethernet controller.<br />
<br />
From a previous project I did have a <a href="https://uk.banggood.com/W5500-Ethernet-Network-Module-Hardware-TCPIP-Interface-51STM32-Program-Driver-Development-Board-p-982668.html?gmcCountry=GB&currency=GBP&createTmp=1&utm_source=googleshopping&utm_medium=cpc_bgcs&utm_content=zouzou&utm_campaign=ssc-gbg-all-newcustom-0822&gclid=EAIaIQobChMIjcaGxJKD5gIVSbDtCh30kQJPEAQYBiABEgKiIfD_BwE&cur_warehouse=CN#jsReviewsWrap">W5500 Ethernet Module</a> (which seems to widely available and relatively cheap), so the challenge was to see if the Xenon could work it.<br />
<br />
<br />
In the end it turned out to be relatively simple to connect the Xenon to the module through the exposed SPI interface. The back of pcb indicates the pin out details for the W5500. The diagram below details which pins from the Xenon connect needed to be connection W5500 Module. <br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDz-x3RuA6Ev9RnsEGABvUcuGEFBh7TG5ovLEiry9VL1AIko2by4Lj9Y_ehT30b2LtyaSSNxOhTRNWe-fy3zg4M7J7f0u5k6mvBXWMeVncDlQ68xs8JQOWc0f_zY9lNoXaMtqKdXTS8zAc/s1600/w5500_2.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="1219" data-original-width="1082" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDz-x3RuA6Ev9RnsEGABvUcuGEFBh7TG5ovLEiry9VL1AIko2by4Lj9Y_ehT30b2LtyaSSNxOhTRNWe-fy3zg4M7J7f0u5k6mvBXWMeVncDlQ68xs8JQOWc0f_zY9lNoXaMtqKdXTS8zAc/s200/w5500_2.jpg" width="176" /></a></div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWD4bcmQioaeJ7y3oVeVVfNNIIqWQP9auXkAj5CeZJCWq6AYQtFgyE8FcFWY3om7rSFk0bLc21c32fRpUzXZZYfPBlsTKADCWU-Y4Jp2dJxn_jxAh4OmMgRzqL4D6WEvwISPTfuGpGfCOp/s1600/IMG_20191126_181141.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" data-original-height="1600" data-original-width="1200" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWD4bcmQioaeJ7y3oVeVVfNNIIqWQP9auXkAj5CeZJCWq6AYQtFgyE8FcFWY3om7rSFk0bLc21c32fRpUzXZZYfPBlsTKADCWU-Y4Jp2dJxn_jxAh4OmMgRzqL4D6WEvwISPTfuGpGfCOp/s200/IMG_20191126_181141.jpg" width="150" /></a>This <a href="https://docs.particle.io/support/particle-devices-faq/mesh-setup-over-usb/#xenon-with-ethernet-gateway-setup">post</a> on the Particle site covers how to enable Ethernet and fingers crossed your Xenon should connect to the Particle cloud as mine did. <br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<br />
<br />
<br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2207190930236349395.post-70937810632377364502019-08-27T01:46:00.004-07:002019-08-30T03:09:28.760-07:00Jetson Nano - Developing a Pi v1.3 camera driver Part 2<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUCPFuOFl8dd0Xo3hc0G_oAcGjnXyXprHM6XMxcX9DhEg2wKy5jSQWtO4PPLPVkxpP2pmoGHwCRS7_wSX2kFzr9dyjnkIaFIqcs7N_ZBihE8GTvtgI7WDvssny0qYpkHVHGkIoXFnsGd0o/s1600/thumbnail_MotiveOrder_logo_V2_withURL_web_150pxW.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="78" data-original-width="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUCPFuOFl8dd0Xo3hc0G_oAcGjnXyXprHM6XMxcX9DhEg2wKy5jSQWtO4PPLPVkxpP2pmoGHwCRS7_wSX2kFzr9dyjnkIaFIqcs7N_ZBihE8GTvtgI7WDvssny0qYpkHVHGkIoXFnsGd0o/s1600/thumbnail_MotiveOrder_logo_V2_withURL_web_150pxW.png" /></a>I liked to thank <a href="http://motiveorder.com/">motiveorder.com</a> for sponsoring the hardware and development time for this article. <br />
<br />
Following on from my previous post, finally I am in a position to release a alpha version of the driver unfortunately at this stage only in binary form. Development of the driver has been complicated by the fact that
determining the correct settings for the OV5647 is extremely time consuming giving the lack of good documentation.<br />
<br />
The driver supports the following resolutions<br />
<br />
<span style="color: red;"><span style="font-family: "courier new" , "courier" , monospace;">2592 x 1944 @15 fps</span></span><br />
<span style="color: red;"><span style="font-family: "courier new" , "courier" , monospace;">1920 x 1080 @30 fps</span></span><br />
<span style="color: red;"><span style="font-family: "courier new" , "courier" , monospace;">1280 x 960 @45 fps</span></span><br />
<span style="color: red;"><span style="font-family: "courier new" , "courier" , monospace;">1280 x 720 @60 fps</span></span><br />
<br />
I have added support for 720p because most of the clone camera seem to be targeting 1080p or 720p based on the lens configuration. I mainly tested with an original RPI V1.3 camera to ensure backward compatibility.<br />
<br />
The driver is pre-compiled with the latest L4T R32.2 release so there is a requirement to deploy a kernel plus modules and with a new dtb file. Therefore I recommend you do some background reading to understand the process before deploying. Furthermore I recommend you have access to the linux console via the UART interface if the new kernel fails to boot or the camera is not recognised.<br />
<br />
Deployment of the kernel and modules will be done on the Nano itself while flashing of the dtb file has to be done from a Linux machine where the SDK Manager is installed. <br />
<br />
Download <span style="font-family: "courier new" , "courier" , monospace;">nano_ov5647.tar.gz</span> and extract to your nano :<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">mkdir ov5647</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">cd ov5647</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">wget https://drive.google.com/open?id=1qA_HwiLXIAHbQN-TTEU1daEIW9z7R2vy</span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">tar -xvf ../nano_ov5647.tar.gz</span><br />
<br />
After extraction you will see the following files:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">-rw-r--r-- 1 user group 291462110 Aug 26 17:23 modules_4_9_140.tar.gz<br />-rw-r--r-- 1 user group 200225 Aug 26 17:26 tegra210-p3448-0000-p3449-0000-a02.dtb<br />-rw-r--r-- 1 user group 34443272 Aug 26 17:26 Image-ov5647</span><br />
<br />
Copy kernel to /boot directory :<br />
<br />
sudo cp <span style="font-family: "courier new" , "courier" , monospace;">Image-ov5647 /boot/</span><span style="font-family: "courier new" , "courier" , monospace;"><span style="font-family: "courier new" , "courier" , monospace;">Image-ov5647</span></span><br />
<br />
Change boot configuration file to load our kernel by editing /boot/extlinux/extlinux.conf. Comment out the following line and added the new kernel, so the change is from this: <br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"> LINUX /boot/Image</span><br />
<br />
to<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"> #LINUX /boot/Image</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> LINUX /boot/Image-ov5647</span><br />
<br />
<br />
Next step is to extract the kernel modules:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">cd /lib/modules/</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">sudo tar -xvf <path to where files were extracted>/modules_4_9_140.tar.gz </span><br />
<br />
<br />
The last step is to flash the dtb file, <span style="font-family: "courier new" , "courier" , monospace;">tegra210-p3448-0000-p3449-0000-a02.dtb.<span style="font-family: "times" , "times new roman" , serif;"> </span></span><i>As discussed in the comments section (below) by <a href="https://www.blogger.com/profile/16618322902593863966?authuser=0">jiangwei</a> it is possible to copy the dtb file directly to Nano refer to this <a href="https://developer.ridgerun.com/wiki/index.php?title=Jetson_Nano/Development/Building_the_Kernel_from_Source#Flash_DTB_from_the_Jetson_device_itself">link</a> on how this can be achieved. See section "<span class="mw-headline" id="Flash_custom_DTB_on_the_Jetson_Nano">Flash custom DTB on the Jetson Nano"</span></i><br />
<br />
Alternatively you can use SDK manager, flashing require copying the dtb file to the linux host machine into the directory <span style="font-family: "courier new" , "courier" , monospace;">Linux_for_Tegra/kernel/dtb</span>/ where SDK your installed. Further instructions on how to flash the dtb are covered in a post I made <a href="https://devtalk.nvidia.com/default/topic/1050427/jetson-nano/enabling-spidev-on-the-jetson-nano-is-hanging-when-flashing/3">here</a> however since we don't want to replace the kernel the command to use is:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">sudo ./flash.sh --no-systemimg -r -k DTB jetson-nano-qspi-sd mmcblk0p1</span><br />
<br />
There seems to be some confusion about how to put the nano into recovery mode. The steps to do that are:<br />
<br />
<span style="color: blue;"><span style="font-family: "courier new" , "courier" , monospace;">1. Power down nano<br />2. J40 - Connect recovery pins 3-4 together<br />3. Power up nano<br />4. J40 - Disconnect pins 3-4<br />5. Flash file</span></span><br />
<span style="background-color: #dedede; color: #333333; display: inline; float: none; font-family: "dinwebpro" , "trebuchet" , "trebuchet ms" , "helvetica" , "arial" , sans-serif; font-size: 14px; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><br /></span>
After flashing the dtb the nano should boot the new kernel and hopefully the desktop will reappear. To verify the new kernel we can run the following command:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">uname -a</span><br />
<br />
It should report the kernel version as 4.19.10+ :<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">Linux jetson-desktop 4.9.140+ </span><br />
<br />
If successful power down the Nano and now you can connect your camera to FPC connector J13. Power up the nano and once desktop reappears verify the camera is detected by:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">dmesg | grep ov5647</span><br />
<br />
It should report the following:<br />
<br />
<span style="color: blue;"><span style="font-family: "courier new" , "courier" , monospace;">[ 3.584908] ov5647 6-0036: tegracam sensor driver:ov5647_v2.0.6<br />[ 3.603566] ov5647 6-0036: Found ov5647 with model id:5647 process:11 version:1<br />[ 5.701298] vi 54080000.vi: subdev ov5647 6-0036 bound</span></span><br />
<br />
<br />
The above indicates the camera was detected and initialised. Finally we can try streaming, commands for different the resolutions are below:<br />
<br />
<span style="color: blue;"><span style="font-family: "courier new" , "courier" , monospace;">#2592x1944@15fps</span></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),width=2592, height=1944, framerate=15/1' ! nvvidconv flip-method=0 ! 'video/x-raw,width=2592, height=1944' ! nvvidconv ! nvegltransform ! nveglglessink -e</span><br />
<span style="color: blue;"><span style="font-family: "courier new" , "courier" , monospace;"><br /></span></span>
<span style="color: blue;"><span style="font-family: "courier new" , "courier" , monospace;">#1920x1080@30fps</span></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),width=1920, height=1080, framerate=30/1' ! nvvidconv flip-method=0 ! 'video/x-raw,width=1920, height=1080' ! nvvidconv ! nvegltransform ! nveglglessink -e</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"><span style="color: blue;">#1280x960@45fps<br /><span style="color: black;">gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),width=1280, height=960, framerate=45/1' ! nvvidconv flip-method=0 ! 'video/x-raw,width=1280, height=960' ! nvvidconv ! nvegltransform ! nveglglessink -e</span></span></span><br />
<br />
<span style="color: blue;"><span style="font-family: "courier new" , "courier" , monospace;">#1280x720@60fps</span></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),width=1280, height=720, framerate=60/1' ! nvvidconv flip-method=0 ! 'video/x-raw,width=1280, height=720' ! nvvidconv ! nvegltransform ! nveglglessink -e</span><br />
<br />
The driver supports controlling of the analogue gain which has a range of 16 to 128. This can be set using the 'gainrange' property, example below:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">gst-launch-1.0 nvarguscamerasrc gainrange="16 16" ! 'video/x-raw(memory:NVMM),width=1280, height=720, framerate=60/1' ! nvvidconv flip-method=0 ! 'video/x-raw,width=1280, height=720' ! nvvidconv ! nvegltransform ! nveglglessink -e</span><br />
<br />
If you require commercial support please contact motiveorder.com.Unknownnoreply@blogger.com35tag:blogger.com,1999:blog-2207190930236349395.post-37578435252771926652019-06-23T08:48:00.002-07:002019-08-30T03:03:04.197-07:00Jetson Nano - Developing a Pi v1.3 camera driver Part 1<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVmbXlL6MHU4pUvw7RUFKsKDRsL5pOBRclVG7JR5mgLnrHTm-emNaIvlcHv27r0ybffUTn4omybllhPIrh5Q1MhnZE6SZLKb-i9AR7KW65y1Jl2qI0m71IAjjbL9h0rWXC696Xi_HgVEbb/s1600/thumbnail_MotiveOrder_logo_V2_withURL_web_150pxW.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="78" data-original-width="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVmbXlL6MHU4pUvw7RUFKsKDRsL5pOBRclVG7JR5mgLnrHTm-emNaIvlcHv27r0ybffUTn4omybllhPIrh5Q1MhnZE6SZLKb-i9AR7KW65y1Jl2qI0m71IAjjbL9h0rWXC696Xi_HgVEbb/s1600/thumbnail_MotiveOrder_logo_V2_withURL_web_150pxW.png" /></a></div>
I liked to thank <a href="http://motiveorder.com/">motiveorder.com</a> for sponsoring the hardware and development time for this article. <br />
<br />
The jetson nano is fairly capable device considering the appealing price point of the device. In fact its one of the few ARM devices which out of the box provides a decent (and usable) X11 graphics stack (even though the drivers are closed source).<br />
Although the jetson nano supports the same 15 pin CSI connector as the RPI camera support is currently limited to Pi V2 cameras which is host the imx219. The older Pi v1.3 cameras are appealing partly because there are numerous low cost clones available and partly because there are numerous add ons such as lenses and night mode options.<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi47pC3TM67upv6q35nVqJlI6pAGcjmibMnaYALbAuqirAGATDQ41gCYK6o793fB47Nxvp0mqFsO1mjMUaPPoovvWepoOf6dCyctewVZ7KG0tcp0W0rsBzyM3Flk8XV-asuSHsL1Z003SdF/s1600/rpi_camera_1.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="410" data-original-width="382" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi47pC3TM67upv6q35nVqJlI6pAGcjmibMnaYALbAuqirAGATDQ41gCYK6o793fB47Nxvp0mqFsO1mjMUaPPoovvWepoOf6dCyctewVZ7KG0tcp0W0rsBzyM3Flk8XV-asuSHsL1Z003SdF/s200/rpi_camera_1.png" width="186" /></a>The v1.3 cameras uses the OV5647 which apparently is discontinued by OmniVision furthermore the full datasheet isn't freely available (only under NDA). There is a <a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwj9ocWj4__iAhU_QEEAHQBUBJAQFjAAegQIAxAC&url=https%3A%2F%2Fcdn.sparkfun.com%2Fdatasheets%2FDev%2FRaspberryPi%2Fov5647_full.pdf&usg=AOvVaw1b2hRYt-jJ-2Rwj9epS6OA">preliminary datasheet</a> on the internet but it seems to be incomplete or worse inconsistent in places. This does hinder the process some what as debugging errors can be very time consuming and at time frustrating.<br />
<br />
One noticeable different is that the v1.3 camera hosts a 25Mhz crystal where most non rpi OV5647 boards use a standard 24Mhz. This can make the tuning more difficult as some of the default setting need adjustments.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYAy8HfssIX_wCZgDOmQZHnuCbGki2gGYv3HcyzAYKIVF7OWj7tLGD3POLgHNO5Luww0pU-_MCCIaZpKXm9klO0TR06XB4rKJLhXFjtiX-ABpT7BMkUg_hk7waXgElb9DhacFeDgjfKnnU/s1600/rpi_camera_2.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" data-original-height="428" data-original-width="377" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYAy8HfssIX_wCZgDOmQZHnuCbGki2gGYv3HcyzAYKIVF7OWj7tLGD3POLgHNO5Luww0pU-_MCCIaZpKXm9klO0TR06XB4rKJLhXFjtiX-ABpT7BMkUg_hk7waXgElb9DhacFeDgjfKnnU/s200/rpi_camera_2.png" width="175" /></a></div>
<br />
The first step in bringing up the camera was ensuring the board was powered on so that it could be detected for through its i2c interface (address 0x36). After numerous attempts the OV5647 finally appeared:<br />
<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="font-size: x-small;">Warning: Can't use SMBus Quick Write command, will skip some addresses<br />WARNING! This program can confuse your I2C bus, cause data loss and worse!<br />I will probe file /dev/i2c-6.<br />I will probe address range 0x03-0x77.<br />Continue? [Y/n] Y<br /> 0 1 2 3 4 5 6 7 8 9 a b c d e f<br />00: <br />10: <br />20: <br />30: -- -- -- -- -- -- UU -- <br />40: <br />50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- <br />60: <br />70: </span></span><br />
<br />
The second step was to develop enough of a skeleton kernel driver to initialise the OV5647 and enable it as v4l2 device. Although this sounds may easy it turned out be extremely time consuming for two reasons. Firstly due to the lack of documentations for OV5647 and secondly the NVIVIDA <a href="https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%2520Linux%2520Driver%2520Package%2520Development%2520Guide%2Fjetson_xavier_camera_soft_archi.html%23">camera driver documentation</a> is also poor and a in number of cases the documentation doesn't match the code. Finally after a few weeks a v4l2 device appeared:<br />
<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="font-size: x-small;">jetson-nano@jetsonnano-desktop:~$ v4l2-ctl -d /dev/video0 -D<br />Driver Info (not using libv4l2):<br /> Driver name : tegra-video<br /> Card type : vi-output, ov5647 6-0036<br /> Bus info : platform:54080000.vi:0<br /> Driver version: 4.9.140<br /> Capabilities : 0x84200001<br /> Video Capture<br /> Streaming<br /> Extended Pix Format<br /> Device Capabilities<br /> Device Caps : 0x04200001<br /> Video Capture<br /> Streaming<br /> Extended Pix Format</span></span><br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1W5XTlfgAkZEq9vFoiYf0BfhBVGK-1pEZ-10NgqnkEZCF8VNn383OcsGNfxrtG9QY0Utr4OWuEMOqHNH7UJUqU8XNeqgQf-_7tQCxro5gXEJCZH5EQqc6I2xUmmt5EXUOWCGGG80Bm6el/s1600/IMG_20190620_195329.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" data-original-height="1200" data-original-width="1600" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1W5XTlfgAkZEq9vFoiYf0BfhBVGK-1pEZ-10NgqnkEZCF8VNn383OcsGNfxrtG9QY0Utr4OWuEMOqHNH7UJUqU8XNeqgQf-_7tQCxro5gXEJCZH5EQqc6I2xUmmt5EXUOWCGGG80Bm6el/s200/IMG_20190620_195329.jpg" width="200" /></a><br />
Next step was to put the camera in test pattern mode and capture a raw image. The OV5647 outputs raw bayer format in our case 10 bit so the captured raw data file needs to be converted to a displayable format. Conversion can be done using a utility like <a href="https://github.com/jdthomas/bayer2rgb">bayer2rgb</a>. Finally I arrived a valid test pattern.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBlvPXb-kMpIBVGtPWlw9kyimuvguxuTAo95L38RyMaqLAMfIrggjkSFapgj9CAS6btNeSIvdoO-PBXztNXcwZ_Rpfp6iNCwWQLv_XdN8DzJbNsW-wIh7FicwIG8ZT8g7qZxscJ_bXDPvL/s1600/data_1.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="1200" data-original-width="1600" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBlvPXb-kMpIBVGtPWlw9kyimuvguxuTAo95L38RyMaqLAMfIrggjkSFapgj9CAS6btNeSIvdoO-PBXztNXcwZ_Rpfp6iNCwWQLv_XdN8DzJbNsW-wIh7FicwIG8ZT8g7qZxscJ_bXDPvL/s200/data_1.png" width="200" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEha7bcRhdAOvt0F6YB_dNXg8XqYB5emnO-jxtO0OfBfjd7eXguIBasP4hNQFf9UNVZIgEdfrc5x_6gKyd-ocpXVvSHGtGJqnMEVIGGmMdVFxfM1fdoXSZUqN8mmiE2tWhvaGzyBmXSphM9a/s1600/data_5.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" data-original-height="1200" data-original-width="1600" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEha7bcRhdAOvt0F6YB_dNXg8XqYB5emnO-jxtO0OfBfjd7eXguIBasP4hNQFf9UNVZIgEdfrc5x_6gKyd-ocpXVvSHGtGJqnMEVIGGmMdVFxfM1fdoXSZUqN8mmiE2tWhvaGzyBmXSphM9a/s200/data_5.png" width="200" /></a>Next stage was to configure the OV5647 to a valid resolution for image capture again which has been extremely challenging for the reasons stated above. Some of the images from numerous attempts are shown on the left and right.<br />
<br />
Current progress is that the camera is outputting 1920x1080@30fps however this is work in progress as the driver is in a primitive state and the output format requires further improvements. On the plus side to it is now possible to stream with the nvarguscamerasrc gstreamer plugin. Below is a 1080 recording from the OV5647 with a pipeline based on nvarguscamerasrc and nvv4l2h264enc.<br />
<br />
<b><span style="color: blue;">Update: In my 2nd <a href="https://jas-hacks.blogspot.com/2019/08/jetson-nano-developing-pi-v13-camera.html">post</a> we have a driver that you can test with. </span></b><br />
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.blogger.com/video.g?token=AD6v5dyxP16pYvgIKIe9N8v5uhe46I5oqVr32OyXpJ5eSm3-I17D6czflm1c3ElFMnlqklYN2TtlNfM6vqerYLFMhg' class='b-hbp-video b-uploaded' frameborder='0'></iframe><br />
<br />
<br />
<span style="color: black; display: inline; float: none; font-family: "trebuchet ms" , "arial" , "freesans" , sans-serif; font-size: x-small; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"></span>Unknownnoreply@blogger.com5tag:blogger.com,1999:blog-2207190930236349395.post-20297455681705871812019-03-29T04:45:00.000-07:002019-03-29T04:45:06.793-07:00Machine learning with the i.MX6 and the Intel NCS2<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/m-jvQBvpuLY/0.jpg" src="https://www.youtube.com/embed/m-jvQBvpuLY?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/iCvgdNPr_7g/0.jpg" src="https://www.youtube.com/embed/iCvgdNPr_7g?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<div style="text-align: center;">
<br /></div>
<br />
Last October Intel released a upgraded Neural Compute Stick known as NCS2 hosting the <a href="https://www.movidius.com/MyriadX">Movidius Myriad X VPU</a> (MA2485). Intel claim <i>"NCS 2 delivers up to eight times the performance boost compared to the previous generation NCS"</i>. Intel also provide <a href="https://software.intel.com/en-us/openvino-toolkit">OpenVINO</a> an open visual inference and neural network optimization toolkit with multiplatform support for Intel based hardware. With release R5 of OpenVINO support was added for NCS2/NCS and ARMv7-A CPU architecture through the introduction of library support for Raspberry Pi boards. As a progression from my previous <a href="https://jas-hacks.blogspot.com/2018/03/machine-learning-with-imx6-solox-and.html">post</a> this give us the opportunity test NCS2 with OpenVINO on the i.mx6 platform. The first video above is showing the sample security_barrier_camera_demo and second is running the model vehicle-detection-adas-0002.xml. These are executed on a imx6q board (<a href="http://www.bcmcom.com/product_ARM_Motherboards.html">BCM AR6MXQ</a>). <br />
<br />
<br />
<div style="text-align: center;">
</div>
<div style="text-align: center;">
</div>
<div style="text-align: center;">
</div>
<div style="text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/8QDmVRUH3BA/0.jpg" src="https://www.youtube.com/embed/8QDmVRUH3BA?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<div style="text-align: center;">
<br /></div>
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3qCeGEcyso1POTQZ_qs9OrQqqZ05Eyb7KakPp6uM5FmkZnZp9-xxrdH2BeaGqllzpacGJZ1oFvI3_FKXWgUHzWOPI1GEHhDhsC6MQU2fMswekGLOdebFPvm8TRsZOlN8MhoLh7MwLNns4/s1600/mpcie2u-r01.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" data-original-height="686" data-original-width="998" height="136" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3qCeGEcyso1POTQZ_qs9OrQqqZ05Eyb7KakPp6uM5FmkZnZp9-xxrdH2BeaGqllzpacGJZ1oFvI3_FKXWgUHzWOPI1GEHhDhsC6MQU2fMswekGLOdebFPvm8TRsZOlN8MhoLh7MwLNns4/s200/mpcie2u-r01.png" width="200" /></a> <br />
To maximise performance from NCS2 ideally it should be connected to a USB 3.0 port. Unfortunately the i.mx6 doesn't host native support for 3.0 however most of the i.mx6 range do support a PCIE interface. So our plan was to deployed a mini PCIE to USB 3.0 card in our case using the NEC UPD720202 chipset. Using PCIE also alleviates saturating the USB bus when testing interference with a USB camera.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh27NaJfATdydF-okPSxIZBvkY7JDsDcDJkYbpeO0v1k5CZAirVQYn1vZLAx1EfBeqHaZ0XkqVdqMLQBdwxbKZ5otQ5b0iAJzRHm4OkId5_nu3S0kk29yRomYtBxncx7M_iuCuON09zxjpI/s1600/IMG_20190324_104658.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1200" data-original-width="1600" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh27NaJfATdydF-okPSxIZBvkY7JDsDcDJkYbpeO0v1k5CZAirVQYn1vZLAx1EfBeqHaZ0XkqVdqMLQBdwxbKZ5otQ5b0iAJzRHm4OkId5_nu3S0kk29yRomYtBxncx7M_iuCuON09zxjpI/s320/IMG_20190324_104658.jpg" width="320" /></a></div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNR_Z5lulgSvU00GeAhgAs7Z1811ydOa4XX2tdGru-bfFikr9KhSLMSwC45TSCB8rFXpOUzC04qdkBt84Ih3-53kb1ofFYZNxb_AqJr2jdD2yUSaZ66hEbHw8HG7PF7b24TRWyCxB9jRco/s1600/usb3_0_adapter.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" data-original-height="472" data-original-width="622" height="151" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNR_Z5lulgSvU00GeAhgAs7Z1811ydOa4XX2tdGru-bfFikr9KhSLMSwC45TSCB8rFXpOUzC04qdkBt84Ih3-53kb1ofFYZNxb_AqJr2jdD2yUSaZ66hEbHw8HG7PF7b24TRWyCxB9jRco/s200/usb3_0_adapter.png" width="200" /></a><br />
Target board for testing was the <a href="http://www.bcmcom.com/product_ARM_Motherboards.html">BCM ARM6QX</a> which host on board mini-pice interface. The mini PICE card host a 20 pin USB connector and a SATA connector for USB power. We used an adapter card to expose two USB 3.0 ports hence the NCS2 ending up in an upright position.<br />
<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhe6NK3-l7lWN57aECpCUQ8Pt9veg5A3rZlrLEamHLtCsLtRf-jSDZu6FgdDFpAk1fkV7yJgdzvr64-Zfe_BmupvYHjdjTtX71cJMUqAkz7vEiAGugPSWpfNirSU4T1coBJrVkoOrhBOAfK/s1600/IMG_20190324_104617.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" data-original-height="1200" data-original-width="1600" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhe6NK3-l7lWN57aECpCUQ8Pt9veg5A3rZlrLEamHLtCsLtRf-jSDZu6FgdDFpAk1fkV7yJgdzvr64-Zfe_BmupvYHjdjTtX71cJMUqAkz7vEiAGugPSWpfNirSU4T1coBJrVkoOrhBOAfK/s200/IMG_20190324_104617.jpg" width="200" /></a>OpenVINO provides a easy to use interface to OpenCV via python and C++. In our case for a embedded platform C++ is best suited for optimum performance. Testing was done using a number of the existing OpenVINO samples with the primary code modification being to accelerate resizing the camera input and rendering of the OpenCV buffer to screen.<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhY-SyKOpmaLiXINy60h-lRKID4jJKKDRYSIbufRV9UrvePYjxn88BxVYuTqfu1Ftq45dMBpzjpPRpvxwnygC5luWb9_OrRNOYvOrkCDBa_14jFxNCYYi7h-zbcAJ07U7k1RdjOmBk2tTgR/s1600/Screenshot+from+2019-03-24+11%253A57%253A38.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="675" data-original-width="926" height="145" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhY-SyKOpmaLiXINy60h-lRKID4jJKKDRYSIbufRV9UrvePYjxn88BxVYuTqfu1Ftq45dMBpzjpPRpvxwnygC5luWb9_OrRNOYvOrkCDBa_14jFxNCYYi7h-zbcAJ07U7k1RdjOmBk2tTgR/s200/Screenshot+from+2019-03-24+11%253A57%253A38.png" width="200" /></a>The face recoginition video above is using object_detection_demo_ssd_async with model face-detection-retail-0004.xml model and is rated <a href="https://github.com/opencv/open_model_zoo/blob/master/intel_models/index.md">1.067 GFLOPs Complexitiy</a>. NCS2 interference times average 22ms although the model lacks some accuracy with its ability not to distinguish between a human face and 'Dora'. The overall fps rate at 19 is pretty good. In regards to CPU usage on a i.mx6q only one of the 4 cores is fully occupied as suggested by the output of 'top'.<br />
<br />
What is nice about OpenVINO is that we can easily compare these benchmarks against the original NCS by simply plugging in it and re-runing the test. <br />
<br />
<span style="-webkit-text-stroke-width: 0px; background-color: white; color: #24292e; display: inline !important; float: none; font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 16px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"></span> <br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiuSvFSa8CWeyFmcQjhB85c5ObFECmuZNHoubKKdDf35Iw4adLsCUWLoVPaNdz3oBWXATO-AQ-QZSGmBKL_3A8-yjhiTY-wmm_F94ul6L0TUpdya9EsIUmk9s7hO-AQpY3rVRsVw2aXa8Cm/s1600/IMG_20190324_133658.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1200" data-original-width="1600" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiuSvFSa8CWeyFmcQjhB85c5ObFECmuZNHoubKKdDf35Iw4adLsCUWLoVPaNdz3oBWXATO-AQ-QZSGmBKL_3A8-yjhiTY-wmm_F94ul6L0TUpdya9EsIUmk9s7hO-AQpY3rVRsVw2aXa8Cm/s320/IMG_20190324_133658.jpg" width="320" /></a></div>
<br />
As shown above the inference times rise from 22 to 62 ms although from our testing the trade off seems to be a rise in power consumption and heat dissipation between the two releases of the NCS.<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMxUV3bGxupCh0viv6NHWSqzXryROxl5G4pyToanv1G7D8oDK09DFLgIK2vsMYPthzFAUKvuy2IDlOK9ATz-a4Xw7qmFCGUYUIMBaWzHVNAu4QaqkXZ-EmLcqbp3OSh9yunWyHHZZcMVzU/s1600/thumbnail_MotiveOrder_logo_V2_withURL_web_150pxW.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="78" data-original-width="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMxUV3bGxupCh0viv6NHWSqzXryROxl5G4pyToanv1G7D8oDK09DFLgIK2vsMYPthzFAUKvuy2IDlOK9ATz-a4Xw7qmFCGUYUIMBaWzHVNAu4QaqkXZ-EmLcqbp3OSh9yunWyHHZZcMVzU/s1600/thumbnail_MotiveOrder_logo_V2_withURL_web_150pxW.png" /></a><br />
<br />
I liked to thank <a href="http://motiveorder.com/">motiveorder.com</a> for sponsoring the hardware and development time for this article. <br />
<br />
<br />
<br />
<br />Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-2207190930236349395.post-27037088807157517632018-04-22T11:15:00.003-07:002018-04-23T09:58:08.709-07:00ESP32 - Enabling RS485 half duplex UART supportAlthough the ESP32 UART claims to support RS485 the documentation provided in the <a href="https://espressif.com/sites/default/files/documentation/esp32_technical_reference_manual_en.pdf">Technical Reference Manual</a> is pretty poor as the register and interrupt descriptions are extremely vague in terms of each feature and its purpose. This is further exacerbated by the fact there are no examples provided in the SDK. The subject has been widely discussed in a <a href="https://www.esp32.com/viewtopic.php?t=1858">espressif forum thread</a> and a <a href="https://github.com/espressif/esp-idf/pull/667">pull request</a> was submitted. Unfortunately the main problem with the solution was that spurious break characters were observed in the RX FIFO and needed to be filtered. Another issue is that toggling of RTS pin happens with in the RX interrupt handler, so we can't control at an application level. <br />
<br />
Having spent quite a few (or too many) hours debugging the UART behaviour in RS485 mode I think I have an improved implementation for a driver. Its important to note that the driver only supports half duplex mode, ie only one node on the RS485 bus can transmit at any time. See commits in my <a href="https://github.com/mtx512/esp-idf">fork</a> , the main changes are:<br />
<br />
1. The RTS pin is toggled outside of the driver code, ie in the application code therefore data direction and auto direction transceiver are supported.<br />
2. Spurious break characters shouldn't occur in the RX FIFO.<br />
3. Enabling of RS485 interrupts, currently only the collision detection is implemented. Further work is required to correctly handle the framing or parity error interrupts.<br />
<br />
The <i>uart_rs485_echo</i> example (under examples/peripherals/) provides a simple demonstration of its use. The example receives data from the UART and echoes the same data back to the sender. After configuring and enabling the UART we can control the RTS pin using the existing uart_set_rts function to control the data direction pin of the transceiver, note this is optional and can be disabled in the code for auto direction transceivers.<br />
<br />
The example has been tested with the following boards :<br />
<br />
1. SparkFun Transceiver RS485 Breakout board which hosts a SP3485 transceiver.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjLMp5-7pjX_usnCGsSESOdsowGnuR3o5l1kPGUv4FdaCjqP_k6vC_wWJkKuOw-gD8CUzd0aXE-jzQm7MBg_tItWwsZuBvNZlR34uVApdzlLC9dMdEjVST-v6IkMXknM-SHbLtYMh5sCLW8/s1600/sp3485_breakout.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="311" data-original-width="262" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjLMp5-7pjX_usnCGsSESOdsowGnuR3o5l1kPGUv4FdaCjqP_k6vC_wWJkKuOw-gD8CUzd0aXE-jzQm7MBg_tItWwsZuBvNZlR34uVApdzlLC9dMdEjVST-v6IkMXknM-SHbLtYMh5sCLW8/s200/sp3485_breakout.png" width="168" /></a></div>
<br />
2. XY-017 RS485 to TTL Module which is an auto direction transceiver with unmarked ICs.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjD0veJqArL6Qnzi4uD7A8p3SdLQTAXz8SzMt-WlHbiXQABRPEWtpSboGMROYmK8E1wz14uyxnecOi17wOS5q0mjyd6Njmpuj4i4IxUnG5I4fS_06wVqYqxr_LIXYVgP13MIGlBJbRni6wH/s1600/XY-017_RS485.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="238" data-original-width="596" height="127" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjD0veJqArL6Qnzi4uD7A8p3SdLQTAXz8SzMt-WlHbiXQABRPEWtpSboGMROYmK8E1wz14uyxnecOi17wOS5q0mjyd6Njmpuj4i4IxUnG5I4fS_06wVqYqxr_LIXYVgP13MIGlBJbRni6wH/s320/XY-017_RS485.png" width="320" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-2207190930236349395.post-38317593111363391412018-03-24T04:12:00.002-07:002018-03-25T03:51:27.061-07:00Machine learning with the i.MX6 SoloX and the Movidius Neural Compute Stick<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/F2wHrbnqHDY/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/F2wHrbnqHDY?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
The i.MX6 SoloX processor is fairly unique in the i.MX6 family primarily because it co-hosts a single Cortex A9 along with a Cortex M4. The heterogeneous architecture proves very useful for hard real-time processing occurring on the M4 while concurrently running a Linux stack running on the A9 (the heterogeneous architecture is implemented on the i.MX8 line of processors). In previous posts using the UDOO Neo I have covered how these features can be exploited when interfacing different peripheral devices. The processor architecture lends itself nicely to IOT (Internet of Things) Edge devices where sensor capture and data preprocessing/conversion can occur on the device before being forwarded to the cloud where a richer set of analytic processing can be performed. If we could perform some (or all) of the analytic processing on the edge device then we might dramatically reduce the amount of device data traffic sent to the cloud. Alternatively the edge device could make decisions for itself and not completely rely on the cloud, furthermore it opens ups the possibility of the edge device partially functioning when the network isn't available. This concept is know as Edge Analytics.<br />
<br />
A single Cortex A9 practically isn't up to the job of performing intensive analytical processing especially if we would like to implement a machine learning algorithm. In terms of machine learning techniques Neural Networks are one branch that has gained considerable popularity in the last few years primarily because it offers new avenues for the types of analytical processing that can be done ie image recognition or text processing. The Movidius Neural Compute Stick (NCS) is an intriguing concept as it opens up the possibility of deploying deep neural networks on embedded devices. In the video we demonstrate feeding a number of images (loaded from png files) to a caffe GoogLeNet model, for each inference it displays the top matching label and probability score. As a performance enhancement we utilise the PXP engine to perform hardware image resizing and BGRA conversion before feeding a 224 x 224 image to the model for classification. The resized image is also rendered to the screen (using the 2D acceleration). To gain an acceptable level of performance the application was developed in C/C++.<br />
<br />
So, the first challenge was to see if we could get NCS running with the UDOO Neo (the i.MX6SX board). My starting point was referring to the <a href="https://movidius.github.io/blog/ncs-apps-on-rpi/">Movidius article</a> of deploying on the Raspberry PI. <i>As mentioned in the article its important to highlight that training, conversion or profiling of the Neural Network can't be done on the embedded device ie "Full SDK mode". This implies that the Neural Network needed to be trained and converted using a standard PC or cloud environment. Deployment to an embedded device is restricted to "API only mode".</i> This first step turned out to be a challenge mainly because my starting pointing was Ubuntu 14.04 and not 16. It took a few days to get the correct packages compiled and installed before caffe would compile without errors. The <a href="https://github.com/movidius/ncappzoo">Neural Compute Application Zoo</a> provides a number of sample applications, you can use hello_ncs_cpp or hello_ncs_py to verify the OS can communicate with NCS. The other gotach is that the NCS is power hungry and requires a powered usb hub especially if you have other usb peripherals attached. On the NEO the NCS can be plugged directly into the USB type A socket if you don't have a need for additional peripherals.<br />
<br />
The second step was to see if we could deploy a Neural Network graph on the NCS and perform simple inferences. Most of the sample applications in the 'Zoo' are Python based with some having further dependency on OpenCV. Unfortunately running OpenCV and Python on Neo would introduce too much of a bottleneck with regards to performance (or in fact most low power ARM embedded devices). The 2 reasons for this are the single A9 core and the fact that the X11 interface doesn't support hardware accelerated graphics. With ARM processors there is trade off between power and performance and for 'always on' IOT devices this does become a major deciding factor. Fortunately caffe provides a C++ interface although there's little documentation available about the API interface. Within 'Zoo' there is multistick_cpp C++ application which demonstrates communicating with multiple NCS devices.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitmb5U9laPwqKb43VKDN5E-O1F6tYWo2syVb5p4kRa4oTOYiYUOvB-fTQcaQcqsfXztlzh7cXV_xks8J0ZbGHgWzI2M1wK0i5fR5GxWytC0J_oXns_tmTnh9qi4jGj6qi7BuQd828VgVst/s1600/IMG_20180325_114655.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" data-original-height="1200" data-original-width="1600" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitmb5U9laPwqKb43VKDN5E-O1F6tYWo2syVb5p4kRa4oTOYiYUOvB-fTQcaQcqsfXztlzh7cXV_xks8J0ZbGHgWzI2M1wK0i5fR5GxWytC0J_oXns_tmTnh9qi4jGj6qi7BuQd828VgVst/s320/IMG_20180325_114655.jpg" width="320" /></a></div>
My starting point was altering multistick_cpp to use one stick with the GoogLeNet model. In multistick_cpp after the GoogleLeNet graph is load subsequent processing can be broken down into two further steps. Firstly it loads, resizes and converts to BGRA a png image file and secondly it sends the image data to NCS for inference and finally displays the result. Sample timings for each steps running on the Neo are shown below.<br />
<br />
<span style="font-family: "trebuchet ms" , sans-serif;">1. Load png, resize and convert : approximately 800 milliseconds</span><br />
<span style="font-family: "trebuchet ms" , sans-serif;">2. NCS inference : approximately 130 milliseconds</span><br />
<br />
We can't do much about Step 2 without re-tuning (a redesign) the Neural Network or by reducing the image size (possibly leading to less accuracy). Step 1 is slow because the file images are roughly around 800x800 pixels and software resizing to 224 x 224 is painfully slow. Fortunately we can address the resizing and conversion time in Step 1, the i.MX6SX contains an image processing unit known as PXP (Pixel Pipeline) which can rescale and perform colour space conversions on graphic buffers. I re-factored the code in step 1 as below:<br />
<br />
1. Use libpng to read the png file<br />
2. Resize and color space the image using PXP<br />
3. 2D blit re-sized image to screen<br />
<br />
With the above changes sample timings dramtically improved for step 1 (as show in the video):<br />
<br />
<span style="font-family: "trebuchet ms" , sans-serif;">1. Load png, resize and convert : approximately 233 milliseconds</span><br />
<span style="font-family: "trebuchet ms" , sans-serif;">2. NCS inference : approximately 112 milliseconds</span><br />
<br />
Hopefully this article provides useful introduction to deploying the NCS with i.MX6 or i.MX7 line of processors. Going forward I would like to get a camera working with Neo and see what FPS rate we can achieve. The other interesting avenue is deploying SSD MobileNet and using the PXP overlay feature to render matches.<br />
<br />
I liked to thank motiveorder.com for sponsoring the hardware and development time for this article. Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-2207190930236349395.post-34358639490883666252018-02-08T10:56:00.002-08:002018-02-08T10:56:28.031-08:00 i.mx6sx - IMU Sensor fusion with the UDOO NEOA key feature of the UDOO NEO is the hosting of a 9-DOF IMU through the inclusion of the NXP FXOS8700CQ and FXAS21002C sensors. The FXOS8700CQ provides a 6-axis accelerometer and magnetometer and FXAS21002C provides a 3-Axis Digital Angular Rate Gyroscope. <i>Note that these sensors are only available on the Extended and Full models. </i>In the video we demonstrate 9-DOF sensor fusion running on the cortex M4. The M4 fusion output is sent from the serial port and fed to a modified OrientationVisualiser application run on a PC. OrientationVisualiser is <a href="https://processing.org/download/?processing">Processing</a> application that displays a Arduino like board in 3D is part of the NXPMotionSense library refer to <a href="https://www.pjrc.com/store/prop_shield.html">Teensys example</a> and <a href="https://github.com/PaulStoffregen/NXPMotionSense">code</a>.<br />
<i></i><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/NbAL-_vt7Mc/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/NbAL-_vt7Mc?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
The combination of the two sensors offers the ability to track absolute orientation with respect to a fixed Earth frame of reference with reasonable accuracy. Orientation can be modelled in 3 dimensions and most descriptions refer to the analogy of an aircraft. Where <i>yaw</i> represents movement of the nose of the aircraft from side to side, <i>pitch </i>represents the up or down movement of the nose of the aircraft and finally <i>roll</i> represents the up and down movement of the wing tips of the aircraft. Refer to this <a href="http://www.chrobotics.com/library/understanding-euler-angles">post</a> for a detailed description. For our solution the sensor fusion algorithm implements a Kalaman filter. The filter smooths noise from the accelerometer/magnetometer and drift from the gyroscope. We also run magnetic calibration to reduce the effect of hard and soft iron magnetic interference.<br />
<br />
Its important to note that the FXOS8700CQ is mounted on the underside of the pcb, however the axes should be aligned to a Cartesian coordinate system that
follows the Right Hand Rule (RHR). Therefore its important that X,Y,Z values read from the FXOS8700CQ should to be adjusted accordingly before applying to the fusion algorithm. Although the FXOS8700CQ has been placed at the edge of the board its important to remember that the pcb spacer hole is about a 1 centimetre away. To reduce magnetic interference on the magnetometer a metal pcb spacer or screw shouldn't be used in this hole.<br />
<br />
In this example we reallocated I2C4 to the M4 in order read data from both sensors. In our set-up the FXOS8700CQ output data rate (ODR) was 200Hz while the FXAS21002C was configured for 400Hz. As per previous posts the code was developed using the i.mx6sx FreeRTOS SDK.<br />
<br />
<br />
<br />
<span style="background-color: white; color: black; display: inline; float: none; font-family: "arial" , "helvetica" , sans-serif; font-size: x-small; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"></span> <br />
<i></i><br />
<br />
<br />
<br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2207190930236349395.post-81753290341390371222017-12-28T11:27:00.004-08:002017-12-28T11:39:12.441-08:00i.mx6sx - M4 SD Card access on the UDOO NEOIn this post I demonstrate that it is possible to interface the M4 to a SD card shield (similar to using an Arduino SD card shield) in order to retrieve or save data locally (without relying on the A9). This work is the result of a larger data logger <span style="font-family: inherit;"><span style="font-size: small;">project</span></span>, where the A9 remains in sleep mode to conserve power while the M4 performs data logging from numerous sensors.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/Xx9HmvqID6U/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/Xx9HmvqID6U?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
In the video an SD card shield is interfaced to the UDOO NEO and accessed from the M4. The code initialises the SD card, mounts and reads the FAT32 partition. Subsequently we read bitmap files from the FAT32 partition and display the contents to the LCD display (320x240). Code is written using the FreeRTOS bsp and exeutes on the M4 while the A9 boots linux. Each bitmap file is 230400 bytes and when reading 720 byte blocks throughput is around 230KB/sec. If we increase the block size to 23040 bytes then throughput is around 340KB/sec.<br />
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgszqkregugUi2vMkfpJzIZUf3jGtxNOMrRpafi6ucpJBqX2x5VmHl9a9oon_UQiPIuzt1h1qgPKfhcNmu31VX3G9wXhnsU8ozZ2QHHky4PkalDGsWuHtxmJCSwNmjdEyc40NwpoDv0sAYf/s1600/wemos_data_logger_5.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" data-original-height="828" data-original-width="1004" height="164" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgszqkregugUi2vMkfpJzIZUf3jGtxNOMrRpafi6ucpJBqX2x5VmHl9a9oon_UQiPIuzt1h1qgPKfhcNmu31VX3G9wXhnsU8ozZ2QHHky4PkalDGsWuHtxmJCSwNmjdEyc40NwpoDv0sAYf/s200/wemos_data_logger_5.jpg" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">NEO Connectivty</td></tr>
</tbody></table>
<br />
I chose to use a WeMos data logger shield over a Arduino SD card shield mainly for the following reasons:<br />
<br />
1. 3.3v compatible<br />
2. An RTC (plus battery backup) is available on the shield, although the accuracy of the DS1307 compared to DS3231 is questionable.<br />
3. Nice stackable design for inclusion of additional WeMos shields<br />
<div style="text-align: left;">
</div>
<br />
<span style="font-size: small;">The shield (just the Arduino equivalent) supports an SPI interface. The
<a href="https://www.sdcard.org/downloads/pls/index.html">Phyical Layer of the SD Card Specification</a> mentions that the primary
hardware interface is the SD bus which is implemented through 4 data lines and one command
line. On power up the native operating mode of
the card is SD bus however it possible to switch the card to SPI bus which is considered a secondary operating mode. The main
disadvantages of SPI mode versus SD mode are:</span><br />
<span style="font-size: small;"><br /></span>
<span style="font-family: inherit;"><span style="font-size: small;">1. The loss of performance single data line versus 4 data lines .</span></span><br />
<span style="font-family: inherit;"><span style="font-size: small;">2. Only a subset of the SD commands and functions are supported.</span></span><br />
<span style="font-family: inherit;"><span style="font-size: small;">3. The maximum SPI clock speed is limited to 25Mhz regardless of the SD card Class.</span></span><br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3ZTYYrnFxy3YiMpa9C1rZufL2voGaAn8DE4nXeEWJT4RKmFHCTiq3RYqqXTER13V3G25Ng08Nj3y5-LxK9Eop0Q2fKsUNn8TIeXfOVOUQg-crSTWx8kVViUrIG9dETNiwtgTVq1TvB2KB/s1600/sd_shield_spi.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" data-original-height="130" data-original-width="280" height="92" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3ZTYYrnFxy3YiMpa9C1rZufL2voGaAn8DE4nXeEWJT4RKmFHCTiq3RYqqXTER13V3G25Ng08Nj3y5-LxK9Eop0Q2fKsUNn8TIeXfOVOUQg-crSTWx8kVViUrIG9dETNiwtgTVq1TvB2KB/s200/sd_shield_spi.jpg" width="200" /></a><span style="font-family: inherit;"><span style="font-size: small;"><br /></span></span><br />
<span style="font-family: inherit;"><span style="font-size: small;">Minimum connectivity from a host MCU for SPI mode requires 3 SPI pins plus a GPIO pin for CS.</span></span><br />
<br />
<br />
<span style="font-family: inherit;"><span style="font-size: small;">From the NEO the shield can be connected to ECSPI 2 plus a arbitrary GPIO pin and 3.3v, see above (NEO Connectivity) image for wiring.</span></span><br />
<span style="font-family: inherit;"><span style="font-size: small;"></span></span>
<span style="font-family: inherit;"><span style="font-size: small;"></span></span>
<span style="font-family: inherit;"><span style="font-size: small;"></span></span><br />
<span style="font-family: inherit;"><span style="font-size: small;">Power up and initialisation of the card along with commands and responses are well documented in the </span></span><span style="font-family: inherit;"><span style="font-size: small;"><span style="font-size: small;"><a href="https://www.sdcard.org/downloads/pls/index.html">Phyical Layer of the SD Card Specification</a>. </span>After powering up the card it should be initialised by applying 74 clock cycles (eg. sending 10 bytes with 0xff as the payload). Followed by CMD0 as the first command to send the card to SPI mode, a positive R1 response will contain 0x01. Next step is to interrogate SD version support by sending CMD8 and lastly we can use ACMD41 to set or determine :</span></span><br />
<span style="font-family: inherit; font-size: small;"><span style="font-size: x-small;"><br /></span></span>
<span style="font-size: small;"><span style="font-family: inherit;">1. Card is initialised</span></span><br />
<span style="font-size: small;"><span style="font-family: inherit;">2. Card capacity type (SDHC or SDXC)</span></span><br />
<span style="font-size: small;"><span style="font-family: inherit;">3. Switch to 1.8V signal voltage</span></span><br />
<br />
After initialising the card we can interrogate the card data for example:<br />
<br />
1. Read the Card Identification (CID) Register, a 16 byte code that contains information that uniquely identifies the SD card, including the card serial number (PSN), manufacturer ID number (MID) and manufacture date (MDT).<br />
<br />
2. Read the Card Specific Data (CSD) Register which defines the data format, error correction type, maximum data access time .. etc<br />
<br />
<br />
<span style="font-size: small;"></span>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: inherit;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhToBwffjCR7XiYLEOmxlHLhIW9MojAoRSaEFiKjAx0XR3RbEM9RgOcxb5X_4a8ES48s5SErjcSarWXDqJWtw2fLyALrDKE8hpdrFdzmIXl1VMC42he7KUjNYNlyy-8QZXksEpJkUeZ8Azo/s1600/sd_card_fat_dump_1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="936" data-original-width="1600" height="187" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhToBwffjCR7XiYLEOmxlHLhIW9MojAoRSaEFiKjAx0XR3RbEM9RgOcxb5X_4a8ES48s5SErjcSarWXDqJWtw2fLyALrDKE8hpdrFdzmIXl1VMC42he7KUjNYNlyy-8QZXksEpJkUeZ8Azo/s320/sd_card_fat_dump_1.jpg" width="320" /></a></span></div>
<br />
<span style="font-size: small;"><span style="font-family: inherit;">Subsequently I built a simple library to read FAT32 partitions and their file contents as shown in the above screen shot.<br /><br />In order to improve performance we would need to see if we can enable DMA for the SPI transfers however this represents a challenge as the DMA engine is initialised on the A9 therefore we would need to wait for Linux to boot before accessing the DMA engine.</span></span><br />
<br /><br />
<br /><span style="font-size: small;"><span style="font-family: inherit;"><br /></span></span>
<span style="font-size: small;"><span style="font-family: inherit;"><br /></span></span>
<span style="font-size: small;"><span style="font-family: inherit;"><br /></span></span>
Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-2207190930236349395.post-19496917808561174232017-07-03T12:47:00.001-07:002017-07-03T12:47:51.146-07:00i.mx6sx - Tiny NEO Scope<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/l7peNaJ0IDQ/0.jpg" src="https://www.youtube.com/embed/l7peNaJ0IDQ?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<br />
<br />
This project span from two requirements, firstly testing the performance of the ADC (analogue to digital) and ECSPI interfaces when running against the M4. Secondly as a diagnostic aid to verify low bandwidth clock signs (<100Khz) from the PWM or clock/GPIO pins were correctly mux'd from the A9 side. The end result is a simple low bandwidth (<100Khz) oscilloscope. In the above video we first demonstrate the capture of the 50Khz PWM (50% duty cycle) which is generated from the A9 side, see how scope manages to display the signal and correctly calculate the frequency. Secondly we feed the scope a PWM ranging from 3Khz to 95Khz to demonstrate its ability to track and best calculate the input frequency of the signal.<br />
<br />
The imx6sx ADC is rated to produce up to 1 Million samples per seconds, however the Reference Manual isn't particularly clear on settings to get this level of performance. It's clear that the fastest conversion are only possible in 8 bit mode (lowest precision) using the IPG clock. From our testing we roughly achieved 500 thousand samples per second by applying no hardware averaging and no clock divider ratio. By disabling hardware averaging we trade off precision for speed in 8 bit mode.<br />
<br />
The scope simply consist of reading an ADC pin (in our case A3), buffering 128 samples, outputting samples to the OLED display through the ECSPI interface clocked at 8Mhz. A primitive trigger mechanism is implemented to catch start of a rising signal. This is complemented with high precision EPIT timer to calculate the frequency of input signal for display.<br />
<br />
<br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2207190930236349395.post-34990867657380154682017-06-10T12:30:00.001-07:002017-06-10T12:46:32.210-07:00i.mx6sx - SPI interfacing an OLED display for fast updates using the cortex M4 on the UDOO Neo<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/ei1EpixUArE/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/ei1EpixUArE?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
It has taken me a considerable amount of time to prove that the SPI interface can be made to work with cortex M4 on the UDOO Neo. To demonstrate the speed of SPI I chose to interface with a SSD1306
OLED which can be driven at a clock speed of 8Mhz. Which theoretically
should allow a complete RAM buffer to transmitted to the SSD1306 in a
very short time period hence offering fast screen updates. To control the speed of screen updates I hooked up an old potentiometer to the ADC input. By varying the potentiometer we control the rate of updates from slow to fast as shown in the video. <br />
<br />
The SPI interface is configured for Master mode using interrupts for data transmission resulting in acceptable performance. To reduce interrupt latency further, DMA could be used to transmit the whole RAM buffer in one go.<br />
<br />
I chose to use ECSPI 2 (normally allocated to the A9 side) and not ECSPI 5 because after reviewing the schematics I think there is a hardware bug with ECSPI 5 as the ECSPI5_SCLK line is shared with Red on board LED (see SPI3_CLK on J6 connector).<br />
<br />
This is a C application developed using i.mx6sx FreeRTOS SDK. The graphics rendering code was converted from the <a href="https://github.com/hwiguna/g33k/blob/master/ArduinoProjects/2015/_Done/099-Arduino_OLED_Display/HariChord/HariChord.ino">Haricord</a> example by hwiguna.<span style="background-color: white; color: #969896; display: inline; float: none; font-family: , "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; font-style: normal; font-weight: normal; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: pre; word-spacing: 0px;"></span>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2207190930236349395.post-82173420677136188922017-05-19T11:42:00.001-07:002017-05-19T11:50:52.407-07:00i.mx6sx - One Wire Digital Temperature gauge using DS18B20 + UDOO Neo + LCD<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/tB0AkWVZzrc/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/tB0AkWVZzrc?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
The challenge here was to implement the Dallas one wire protocol so that two DS18B20 sensors could be wired to the cortex M4 to form the basis for a temperature gauge. I recycled the LCD display from the <a href="http://jas-hacks.blogspot.com/2017/04/imx6sx-using-cortex-m4-to-drive-4-touch.html">previous post</a> to display temperature readings using 'ring meter' widgets. The end result is that the M4 is used for reading DS18D20 sensors and updating the 'ring meter' widgets. In the video we have one sensor in a 3 pin TO-92 package reading room temperature and the other as a waterproof probe dipped into hot/cold water.<br />
<span style="background-color: white; color: #333333; display: inline; float: none; font-family: "whitney ssm a" , "whitney ssm b" , "arial" , "helvetica" , sans-serif; font-size: 16px; font-style: normal; font-weight: normal; letter-spacing: normal; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWApOJ4_G6JmKp8DFZrFcwq1tQmmWkUjZcO1585X2hyphenhyphen1eEQ1Sd9Me5BPh3pnR6FUuSGMnaG-wwqAiAJptLoWHgYI0tTpi7AAvuLbQfV2VJ8u91aMMgR50-5Gjws8QigrLFGmUa-KKsdtwW/s1600/IMG_20170516_190112.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWApOJ4_G6JmKp8DFZrFcwq1tQmmWkUjZcO1585X2hyphenhyphen1eEQ1Sd9Me5BPh3pnR6FUuSGMnaG-wwqAiAJptLoWHgYI0tTpi7AAvuLbQfV2VJ8u91aMMgR50-5Gjws8QigrLFGmUa-KKsdtwW/s320/IMG_20170516_190112.jpg" width="240" /></a>The DS18D20 one wire protocol simply requires a gpio pin that can be toggled between input and output modes albeit using precise timings to the microsecond (between 1 and 480 microseconds). The protocol and timings are described in the <a href="https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjKh5mghfXTAhWMBcAKHeTQCAkQFgiFATAA&url=https%3A%2F%2Fdatasheets.maximintegrated.com%2Fen%2Fds%2FDS18B20.pdf&usg=AFQjCNFcz140ptYlFe4x2zjCdaMuJiUxCw">DS18B20 datasheet</a>. To achieve the necessary level of time precision we utilised the imx6 Enhanced Periodic Interrupt Timer (EPIT) timers sourced from a 24Mhz clock. Since there are two DS18B20 sensors on the one wire bus we first query for the address of the sensors and then in turn poll for the temperature from each which takes approximately 480 microseconds.<br />
<br />Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-2207190930236349395.post-62575859421034136272017-04-30T07:16:00.001-07:002017-05-05T09:54:51.726-07:00i.MX6SX - Using the cortex M4 on a UDOO NEO to drive a $4 Touch ScreenThe challenge here was to see if a ultra cheap touch screen could be made to work with the cortex M4 on the UDOO NEO. The video (below) demonstrates its possible, the M4 is controlling the display and a simple UI allows the toggling of the green LED.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/YEfxjBVWJ0U/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/YEfxjBVWJ0U?feature=player_embedded" width="320"></iframe></div>
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJzqHR7nwPAA0NeNpyMbhXvFltlMxkYno3e5_GnfQhU27r9CXuP9m40HWtM6T8zLIiGZYCb3u9nyP8F1a3uXikpBldG5LT4Fv8LkyKmBBSKNje0P1awuVHkHD3nj3Do7YaAOxa1u3uMsEO/s1600/20-33V-TFT-LCD-Touch-Screen-Breakout-Board-_57+%25281%2529.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJzqHR7nwPAA0NeNpyMbhXvFltlMxkYno3e5_GnfQhU27r9CXuP9m40HWtM6T8zLIiGZYCb3u9nyP8F1a3uXikpBldG5LT4Fv8LkyKmBBSKNje0P1awuVHkHD3nj3Do7YaAOxa1u3uMsEO/s320/20-33V-TFT-LCD-Touch-Screen-Breakout-Board-_57+%25281%2529.jpg" width="320" /></a>I managed to pick up a 2.0" touch screen for around $4, although the screen is on the slightly small side, it still manages to support a resolution of 320x240. Normally these type of screens are sold into a secondary market with the hope that they can be interfaced to an Arduino, their primary use seems to a PDA or home communication device given the imprinted icons (which aren't removable).<br />
<br />
<br />
The display supports a 8 bit parallel LCD interface (using the S6D1125) along with a 4 wire resistive touch. The downside with using such a display is that it requires a large number of pins, in our case 12 GPIO pins and 2 analogue pins (for resistive touch).<br />
<br />
The code to drive the display was built using the i.mx6sx FreeRTOS sdk.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2207190930236349395.post-41385362935576295812017-03-26T06:13:00.003-07:002017-03-26T12:08:31.484-07:00i.MX6SX - Pulse oximetry with MAX30100, SSD1306 and UDOO NEOIn this blog I demonstrate <span style="color: #333333;"><span style="font-family: "whitney ssm a" , "whitney ssm b" , "arial" , "helvetica" , sans-serif;">how</span></span> the cortex m4 on the imx6sx could be used to develop a Pulse Oximeter.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/3Jc-bfvOTl8/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/3Jc-bfvOTl8?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
I chose the max30100 because its features an integrated pulse oximetry and heart-rate monitor sensor. Unfortunately Maximi provide little documentation or application notes covering how best to determine heart rate or spo2 values using the data returned from the sensor. The sensor hosts an IR and RED LED that can be adjusted to provide pulses for spo2 and heart rate measurements. As a trade off between adequate data samples, i2c bus speed and post measurement <span style="font-size: small;">hr/spo2 calculation processing the max30100 was configured to return 100 IR and RED values per second. Below is a graph showing raw data values gathered from the NEO at a 100 readings per second.</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJ6Hax8sTs_MdyQ0mg5ANS25U9bfUWknAHSe7s0PEibiFuwFPZfrFnLvEIypvVbQMUmB9Tc1_vR-A9uA_kiz_m4Vyi4ag2om43axB6ud_skuIwJ2o-BpWXp-6nn4wF1JMJ0iYT5aBl_pG1/s1600/Screenshot+from+2017-02-07+20%253A49%253A16.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="232" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJ6Hax8sTs_MdyQ0mg5ANS25U9bfUWknAHSe7s0PEibiFuwFPZfrFnLvEIypvVbQMUmB9Tc1_vR-A9uA_kiz_m4Vyi4ag2om43axB6ud_skuIwJ2o-BpWXp-6nn4wF1JMJ0iYT5aBl_pG1/s320/Screenshot+from+2017-02-07+20%253A49%253A16.png" width="320" /></a></div>
<br />
<br />
<span style="font-family: inherit;"><span style="font-size: small;"><span style="-webkit-text-stroke-width: 0px; background-color: white; display: inline ! important; float: none; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span class="Apple-converted-space"><span style="-webkit-text-stroke-width: 0px; background-color: white; display: inline ! important; float: none; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span class="Apple-converted-space">For spo2 calculation we have taken a simplified approach of taking the AC components of both signals and determining the ratio. The ratio is referenced in an memory table containing empirical sp02 values. <i>Hence the spo2 isn't clinically accurate, for greater accuracy the table would normally be based on experimental measurements from healthy patients cross referenced against clinically accurate readings. </i></span></span></span></span></span></span><br />
<span style="font-family: inherit;"><span style="font-size: small;"><span style="-webkit-text-stroke-width: 0px; background-color: white; display: inline ! important; float: none; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span class="Apple-converted-space"><span style="-webkit-text-stroke-width: 0px; background-color: white; display: inline ! important; float: none; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span class="Apple-converted-space"><br /></span></span></span></span></span></span>
<span style="font-family: inherit;"><span style="font-size: small;"><span style="-webkit-text-stroke-width: 0px; background-color: white; display: inline ! important; float: none; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span class="Apple-converted-space">As shown in the graph the IR values are smoother than RED values possible due to secondary emissions from the RED LED. For this reason the IR value are normally used to determine the heart rate. The heart rate is calculated by feeding the IR values in to a first order 6Hz low pass filter which in turn are used to calculate the time interval between 2 peaks. Sample output of applying the low pass filter is shown below <i>ignore the graph labels, top is IR values, bottom is low pass filter</i>.</span></span></span></span><br />
<span style="background-color: white; color: #333333; display: inline; float: none; font-family: "whitney ssm a" , "whitney ssm b" , "arial" , "helvetica" , sans-serif; font-size: 16px; font-style: normal; font-weight: normal; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span class="Apple-converted-space"><br /></span></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjr3qywYWRyXVplIgTvV5ZJzyFVswBTBEs4hej4f0WUA6plWuE85U9HSNkmqkxQQvXhTtOot_ePhlZcuBnqOyefmbGXCNS3X2qpEuBeGq9jJg4HocL1hvVdEU-rXG3o7VEF-6qv7gsCfozr/s1600/Screenshot+from+2017-02-25+18%253A09%253A41.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="232" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjr3qywYWRyXVplIgTvV5ZJzyFVswBTBEs4hej4f0WUA6plWuE85U9HSNkmqkxQQvXhTtOot_ePhlZcuBnqOyefmbGXCNS3X2qpEuBeGq9jJg4HocL1hvVdEU-rXG3o7VEF-6qv7gsCfozr/s320/Screenshot+from+2017-02-25+18%253A09%253A41.png" width="320" /></a></div>
<span style="background-color: white; color: #333333; display: inline; float: none; font-family: "whitney ssm a" , "whitney ssm b" , "arial" , "helvetica" , sans-serif; font-size: 16px; font-style: normal; font-weight: normal; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span class="Apple-converted-space"><br /></span></span>
<span style="background-color: white; color: #333333; display: inline; float: none; font-family: "whitney ssm a" , "whitney ssm b" , "arial" , "helvetica" , sans-serif; font-size: 16px; font-style: normal; font-weight: normal; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span class="Apple-converted-space"><br /></span></span>
<span style="background-color: white; color: #333333; display: inline; float: none; font-family: "whitney ssm a" , "whitney ssm b" , "arial" , "helvetica" , sans-serif; font-size: 16px; font-style: normal; font-weight: normal; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span class="Apple-converted-space"><span style="color: black;"><span style="font-size: small;">I also hooked up a SSD1306 oled display to the same i2c bus so that the calculated heart rate and spo2 values are displayed. The main challenge of this exercise has been to be ensure the code running on the M4 is as efficient as possible because there are many time critical elements such reading data samples, hr/spo2 calculation and display updates which can interfere with the output results. As with my previous examples on the imx6sx this was developed using the FreeRTOS sdk. </span></span></span></span><br />
<span style="background-color: white; color: #333333; display: inline; float: none; font-family: "whitney ssm a" , "whitney ssm b" , "arial" , "helvetica" , sans-serif; font-size: 16px; font-style: normal; font-weight: normal; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span class="Apple-converted-space"><br /></span></span>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-2207190930236349395.post-78629909452342979942017-01-24T04:55:00.002-08:002017-02-04T10:10:24.321-08:00Adding a DS3231 Real Time Clock to UDOO NEO/QUADIts well know that the in-built RTC on the imx6 processor isn't the best in terms of battery life (performance). Using an external RTC provides better battery life and fortunately the process isn't too complicated implement. The DS3231 is a popular RTC especially with the RPI community given ease of integration (via I2C) and accuracy. There's a few variations of the DS3231 for the RPI and the one I using is the one below which can be easily sourced.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-X82BMj2TRFPeT4vipXuo-98TVsRu05enHwlmszu8AnkJY5ehKl71b7z0GAOUSmDO1fsCI0fk45VEtvA6P6lsFxfl-KAAagkouud7UW-Cdu1QBP9q6h3ItY3ZVzPHwuQ-rdPNFxxUvKD7/s1600/DS3231_moduleq_2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-X82BMj2TRFPeT4vipXuo-98TVsRu05enHwlmszu8AnkJY5ehKl71b7z0GAOUSmDO1fsCI0fk45VEtvA6P6lsFxfl-KAAagkouud7UW-Cdu1QBP9q6h3ItY3ZVzPHwuQ-rdPNFxxUvKD7/s200/DS3231_moduleq_2.jpg" width="196" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
In the image I have highlighted the pin out to simplify wiring. I'm going to take the UDOO NEO as a example and use I2C4 (alternatively you can use I2C2). For <a href="http://www.udoo.org/docs-neo/Hardware_&_Accessories/I2C_bus.html">I2C4</a> wire SDA to pin 35 and SCL to pin 34 on header J5, 3.3v and GND are available on J7. On power up you can verify the DS3231 is visible by executing:<br />
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">udooneo:~# i2cdetect 3</span></span><br />
<br />
which should return the DS3231 at address 0x68.<br />
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">WARNING! This program can confuse your I2C bus, cause data loss and worse!<br />I will probe file /dev/i2c-3.<br />I will probe address range 0x03-0x77.<br />Continue? [Y/n] Y<br /> 0 1 2 3 4 5 6 7 8 9 a b c d e f<br />00: -- -- -- -- -- -- -- -- -- -- -- -- -- <br />10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- UU -- <br />20: UU -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- <br />30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- <br />40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- <br />50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- <br />60: -- -- -- -- -- -- -- -- UU -- -- -- -- -- -- -- <br />70: -- -- -- -- -- -- -- --</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> </span></span>
<br />
Next step is to enable kernel support by enabling the Dallas/Maxim DS1307 driver as below.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhq1-Go8YMhPQkBg2K-p3xg2uslTzEkRJbmb5nJWBj790shO2HTLr_W9pVf_BX-yUHd5G6jRWv6UYeWnki8rmLYjyLZrxAX9-K-T9yjetbnyu6fuJaxolDdhqlwrXJs04fXkLg-orm5dpmR/s1600/Screenshot+from+2017-01-24+11%253A30%253A29.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhq1-Go8YMhPQkBg2K-p3xg2uslTzEkRJbmb5nJWBj790shO2HTLr_W9pVf_BX-yUHd5G6jRWv6UYeWnki8rmLYjyLZrxAX9-K-T9yjetbnyu6fuJaxolDdhqlwrXJs04fXkLg-orm5dpmR/s320/Screenshot+from+2017-01-24+11%253A30%253A29.png" width="320" /></a></div>
<br />
Build the kernel and modules (this is important). Lastly we need add the DS3231 to the device tree to I2C4, below is an example,<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="font-size: x-small;">diff --git a/arch/arm/boot/dts/imx6sx-udoo-neo.dtsi b/arch/arm/boot/dts/imx6sx-udoo-neo.dtsi<br />index abbf0d8..2ffa6cb 100644<br />--- a/arch/arm/boot/dts/imx6sx-udoo-neo.dtsi<br />+++ b/arch/arm/boot/dts/imx6sx-udoo-neo.dtsi<br />@@ -298,6 +298,11 @@<br /> compatible = "fsl,fxas2100x";<br /> reg = <0x20>;<br /> };<br />+<br />+ rtc@68 {<br />+ compatible = "dallas,ds1307";<br />+ reg = <0x68>;<br />+ };<br /> };</span></span><br />
<br />
Rebuild the relevant dtb file depending on your set-up. Deploy the newly generated kernel, modules and dtb to the NEO. <br />
<br />
On power up the kernel output should include the following lines ( <span style="font-family: "courier new" , "courier" , monospace;"><span style="font-size: x-small;"><span style="font-family: inherit;"><span style="font-size: small;">try</span></span> dmesg | grep ds1307</span></span>)<br />
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">[ 8.095963] rtc-ds1307 3-0068: rtc core: registered ds1307 as rtc0<br />[ 8.095989] rtc-ds1307 3-0068: 56 bytes nvram</span></span><br />
<br />
If all is ok we can query the clock for it current time by using the <i>hwclock</i> utility:<br />
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">udooneo:~# hwclock -r<br />Tue 24 Jan 2017 12:32:25 PM UTC -0.858087 seconds</span></span><br />
<br />
We can sync with the ntp time:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="font-size: x-small;">udooneo:~# hwclock -s</span></span><br />
<br />
On reboots the RTC time may become corrupt with the udooubuntu release to overcome this ntp service needs to be disabled with the following commands:<br />
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">echo manual | sudo tee /etc/init/ntp.override</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">timedatectl set-ntp false</span></span><br />
<br />
The <i>timedatectl</i> command is extremely useful as it provides a complete picture of the system and rtc times. For example to sync RTC with system time:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="font-size: x-small;">udooneo:~# timedatectl <br /> Local time: Fri 2016-01-01 01:18:06 UTC<br /> Universal time: Fri 2016-01-01 01:18:06 UTC<br /> RTC time: Tue 2017-01-24 12:40:36<br /> Timezone: Etc/UTC (UTC, +0000)<br /> NTP enabled: no<br />NTP synchronized: no<br /> RTC in local TZ: no<br /> DST active: n/a<br />udooneo:~# hwclock -s<br />udooneo:~# timedatectl <br /> Local time: Tue 2017-01-24 12:42:03 UTC<br /> Universal time: Tue 2017-01-24 12:42:03 UTC<br /> RTC time: Tue 2017-01-24 12:42:03<br /> Timezone: Etc/UTC (UTC, +0000)<br /> NTP enabled: no<br />NTP synchronized: no<br /> RTC in local TZ: no<br /> DST active: n/a</span></span><br />
<br />
<span style="background-color: white; color: #606569; display: inline; float: none; font-family: "open sans" , "arial" , sans-serif; font-size: 13px; font-style: normal; font-weight: normal; letter-spacing: normal; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"></span>Unknownnoreply@blogger.com3tag:blogger.com,1999:blog-2207190930236349395.post-63470857135987736712017-01-18T12:18:00.000-08:002017-02-24T13:00:00.202-08:00i.MX6SX - Prototype VW (VAG) vehicle diagnostic adapter for KWP2000 services (UDOO NEO)<br />
An interesting use case for the i.mx6sx is as vehicle diagnostic (or interface) adapter. In this blog I will demonstrate how we can re-purpose a UDOO NEO as a prototype diagnostic adapter.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/FR6Yilzh6vE/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/FR6Yilzh6vE?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
The adapter targets VW vehicles supporting KWP2000 services. Typically an adapter requires a real time interface to the vehicle in order to keep the diagnostic session alive after establishing communications with an ECU. The real time needs can easily be met on the M4 and we can exploit the A9 side to offer data transformation and API services, for example to make the data available a Mobile application. The end goal is demonstrated in the video where a custom developed Android Application retrieves diagnostic information from the vehicle in real time. The application communicates over Wifi with the NEO which in turn is connected to the vehicles OBD-II port.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSz-qzIULS2sGIm5XfgHfaNItX7E0oPnilAf4FT5eb82SdRhq4CNHwYjdVOchfHerBCb5881dQGX3vLtVG-6syGh8vVSy5Z7BEtoL53q1dYD0BV7sqFJK90Leqd-B1LY2BjI4dQvlsVcdD/s1600/vw_screen_1.JPG" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSz-qzIULS2sGIm5XfgHfaNItX7E0oPnilAf4FT5eb82SdRhq4CNHwYjdVOchfHerBCb5881dQGX3vLtVG-6syGh8vVSy5Z7BEtoL53q1dYD0BV7sqFJK90Leqd-B1LY2BjI4dQvlsVcdD/s200/vw_screen_1.JPG" width="150" /></a></div>
The application is first used to query the vehicle for a list of available ECUs (modern vehicles can contain tens of ECUs). For each ECU the physical address and overall status is displayed (OK or has DTC errors). <br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPMOJffydve3C2MBmnbxXaFqmSNiPWqIQ2lGKsO2mmBWZlk0Nhyphenhyphenn7PXslMAG6A0NJInjoLXMPkIJQPxFnHyV_2CdbP3jxQt5sONLAj8ACDWPiC7BE_WTMjO1Y-_kbSLt9UtDhvBG8Igma0/s1600/vw_screen_3.JPG" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPMOJffydve3C2MBmnbxXaFqmSNiPWqIQ2lGKsO2mmBWZlk0Nhyphenhyphenn7PXslMAG6A0NJInjoLXMPkIJQPxFnHyV_2CdbP3jxQt5sONLAj8ACDWPiC7BE_WTMjO1Y-_kbSLt9UtDhvBG8Igma0/s200/vw_screen_3.JPG" width="150" /></a>Subsequently after selecting an individual ECU the application retrieves information about the ECU including the short/long coding value (if applicable). <br />
<br />
<br />
<br />
Although the video demonstrate a few KWP2000 services being invoked, its actually possible to invoke most if not all of the services available. Furthermore it could be enhanced to support UDS services.<br />
<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8Kgm4FZiC9n97UPx9whuRZ71zEWXRVQk10owv2EpGDTI0QHiz_SCZfYMGbFEeZeeIJBZtFANtFmf7LG54LbVkBWhPsnBIXy6cwGlWK8OaGAa0Nd_dGEiKp2OeV8_Glu8bElRoH8G0kggh/s1600/P1030884.JPG" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8Kgm4FZiC9n97UPx9whuRZ71zEWXRVQk10owv2EpGDTI0QHiz_SCZfYMGbFEeZeeIJBZtFANtFmf7LG54LbVkBWhPsnBIXy6cwGlWK8OaGAa0Nd_dGEiKp2OeV8_Glu8bElRoH8G0kggh/s200/P1030884.JPG" width="200" /></a>At a hardware level VW KWP2000 is supported through CAN (some older vehicles use K-Line) and is accessible through the vehicles on board 16-pin OBD-II connector. For the prototype (in the photo to right) the NEO is simply connected to a SN65HVD230 which in turn is connected to the CAN pins of the OBD-II connector.<br />
<br />
<br />
Typically VW KWP2000 services are supported over VW's proprietary TP2.0 protocol. TP2.0 is used to establish a session and pass datagrams between 2 ECUs, one of which is our case is the NEO and normally referred to as the 'tester'. Implementing TP2.0 is a challenge as accurate timings are required to correctly deal with time out and error scenarios in addition to implementing logic to cater for varying ECU behaviours depending on their age. Above TP2.0 is the KWP2000 protocol which implements a simpler request response model. As shown the diagram below a complete TP2.0 and KWP2000 stack was developed to run on the M4.<br />
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWCNm3zllkXHFS5bBa9IpXjP7J-7wPdFEQ4ZVuVNfW6a8cJbqPbIOPHvAGe5IrN7ViApWoBnf3s0Cj5y8u-FPyS6N72bm-s_PVfmbL0FHhmvAyb16Qt6GeLHDinA_vMMJSVAwfLNLOZg1Q/s1600/vag_adapter_design_2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="188" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWCNm3zllkXHFS5bBa9IpXjP7J-7wPdFEQ4ZVuVNfW6a8cJbqPbIOPHvAGe5IrN7ViApWoBnf3s0Cj5y8u-FPyS6N72bm-s_PVfmbL0FHhmvAyb16Qt6GeLHDinA_vMMJSVAwfLNLOZg1Q/s400/vag_adapter_design_2.jpg" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
On the A9 side KWP2000 services are exposed through a custom API interface that when invoked communicate with the M4 over a bidirectional link. The A9 allows the data to be enriched and transformed eg XML/JSON before being exposed via a number of network interfaces such as bluetooth, wifi or even ethernet. For the demo its done by enabling a Wifi Access Point on the NEO. For those sharp eyed readers you will notice the prototype uses a NEO Basic which has no on board wifi support, instead a wifi dongle was plugged into the USB port to create the Access Point.<br />
<br />Unknownnoreply@blogger.com7tag:blogger.com,1999:blog-2207190930236349395.post-56948676008444347542016-08-12T11:03:00.003-07:002017-01-22T03:07:55.380-08:00i.MX6SX - Realtime CAN interfacing Part 3 (UDOO Neo)<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/B6qFNVDGuIU/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/B6qFNVDGuIU?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
In the last post we covered how the RX8 instrument cluster could be controlled from the A9 side. Given the A9 side is feature rich in its capabilities if offers numerous possibilities. As a simple demonstration we will attempt to replicate the RPM and Speed gauges and two warning indicators on the hdmi (720p) display to represent a simplified virtual instrument cluster. The main challenges here are:<br />
<br />
1. On screen widget need high performance.<br />
2. Keeping the on screen gauges/warning indicators in sync with the instrument cluster.<br />
3. Minimising CPU usage.<br />
<br />
<br />
<br />
In the video, the on screen RPM gauge (on left fairly) accurately tracks the RPM needle on the instrument cluster. The on screen Speed gauge (on the right tracks) the digital Speed indicator. We also toggle the battery and oil warning indicators on the screen and cluster. Notice that the cluster_gauges application is consuming rough 10% of the CPU.<br />
<br />
In order to deliver the necessary performance the on screen gauges were rendered using custom Open GL ES 2.0 code. Note the graphic images used for gauges are modified versions derived from this <a href="https://github.com/riis/AndroidArduino">Android Ardunio</a> project. Compared to the previous post there is now a single application (cluster_gauges) on the A9 side which renders the widgets but also controls the instrument cluster through the M4.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMF1xmJ-oacBl4Z1gsS6EO7_OBmZ6FOUg-TqbT6d8mkxYNUZWO0Aw9k_VZvO2-NO-i1hChuWBoQuxTK35bNZKqk0cdEu6WYcpQ_nM4-kHtNpzQBQZasUbngFkXGlp9iThQeiXKfah7x8Bp/s1600/cluster_control_design_5.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="178" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMF1xmJ-oacBl4Z1gsS6EO7_OBmZ6FOUg-TqbT6d8mkxYNUZWO0Aw9k_VZvO2-NO-i1hChuWBoQuxTK35bNZKqk0cdEu6WYcpQ_nM4-kHtNpzQBQZasUbngFkXGlp9iThQeiXKfah7x8Bp/s400/cluster_control_design_5.jpg" width="400" /></a></div>
<br />
<span id="goog_1036206056"></span><span id="goog_1036206057"></span><br />
<br />
I hope through these 3 blog posts have I managed to demonstrate how the M4 and A9 processor can be combined to provide a rich real-time interface and data distribution mechanism for your applications. <br />
<br />
<br />
<br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2207190930236349395.post-73318521069372643372016-08-11T10:41:00.003-07:002016-08-11T10:54:36.536-07:00i.MX6SX - Realtime CAN interfacing Part 2 (UDOO Neo)As mentioned in my previous post the next stage of development with the RX8 instrument cluster is to offer the ability to control the gauges/indicators on the cluster from the A9 side. In the last post I also pointed out that to keep the cluster active the M4 side has the responsibility of regularly sending/receiving CAN messages from/to the cluster. Hence we can't use the CAN interface from the A9, instead we get the A9 'to talk to' the M4. Within the i.mx6sx the Messaging Unit module enables the two processors to communicate with each other by coordinated passing message. We can make use of this feature to send messages to the M4 from A9 to ask it to update the cluster and use the M4 to forward CAN messages received from the cluster to the A9. By the way, this all happens while the M4 also is feeding the cluster with its own CAN messages. For the i.mx6sx the inter-processor communication is abstracted and implemented through
the Remote Procedure Messaging (RPMSG) framework which is supported both
in the Linux and FreeRTOS BSP. <br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9yvNx7B6fnUlv7DCBAEAZNaq3yEPAOkbKYlbpyCl6rNTZNsjRGg0dil7pEe-A6WphcRUmpzTWJ6gJuBdPHh_4mzP1k6NLnRfqbaqMq_KCCuafkVcvwv8bVmkARbFVP0PozddHCkxI_2Rd/s1600/cluster_control_design_3.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9yvNx7B6fnUlv7DCBAEAZNaq3yEPAOkbKYlbpyCl6rNTZNsjRGg0dil7pEe-A6WphcRUmpzTWJ6gJuBdPHh_4mzP1k6NLnRfqbaqMq_KCCuafkVcvwv8bVmkARbFVP0PozddHCkxI_2Rd/s400/cluster_control_design_3.jpg" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
Compared to the last demonstration, the main change to the configuration is the introduction of a relay board so that the instrument can be powered on and off (the Power Control box in the diagram) by the M4.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVnYmjGBIbIpNZmSKseVq0HnuOG3WlWc0QKDqYOfETQabDq2eLzi-B9Xtib9YnxftAA74zF4zY2FiZ9TACx_edOhwazlCONUZxmJ-hFOWIlBtxp6SPz8szQrmcfeGSkC7CM529Gd5dOVEf/s1600/IMG_20160811_184953.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVnYmjGBIbIpNZmSKseVq0HnuOG3WlWc0QKDqYOfETQabDq2eLzi-B9Xtib9YnxftAA74zF4zY2FiZ9TACx_edOhwazlCONUZxmJ-hFOWIlBtxp6SPz8szQrmcfeGSkC7CM529Gd5dOVEf/s320/IMG_20160811_184953.jpg" width="320" /></a></div>
<br />
The M4 firmware has been amended to accept commands from the A9 via RPMSG. On the A9 side we have 2 applications:<br />
<br />
1. read_can - Which constantly reads CAN messages sent from the instrument cluster via the M4 .<br />
2. cluster_control - Controls the instrument cluster via the M4.<br />
<br />
cluster_control has the following capabilities through a number of command line options:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">-i - turns the instrument on/off</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-r - set the RPM</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-s - set the SPEED (in kilometres although display is in mph)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-b - set Battery Warning indicator</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-o - set Oil Warning indicator</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-c - set Cruise control indicator</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/WGaW7RhLtJY/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/WGaW7RhLtJY?feature=player_embedded" width="320"></iframe></div>
<br />
The short video demonstrates two applications running on the A9, read_can is running in the window on the left and on the right cluster_control is used to manipulate the cluster. Note that prior the 2 applications being launched, the M4 was preloaded with new firmware and RPMSG initialised on the A9 side. Once cluster_control changes the cluster state, the M4 is responsible for continuing updating the instrument cluster with the new state until the next change. <br />
<br />
Now that we have the ability to control the cluster and receive messages from the A9 side, in the final post I will demonstrate how another feature of i.mx6sx can be used with it. Unknownnoreply@blogger.com1