Rockchip is releasing low power SOC with NPU targeting deep learning.


1808

We are hacking cheap Chinese soldering robot aiming to make it usable with camera fiducials and solder joint inspection. I shared some info on Hackaday 2018 Belgrade conference.

As we want to make the robot easy to use we are looking around for capable SOC with Deep Learning capability. It seems the only embedded available solution now is nVidia.

Allwinner has put in their V5 SOC info about AI and Trensorflow support, but looking at info for the only available board on the market it looks just statement and no actual implementation.

The AI they advertise looks more like OpenCV / Tensorflow lite libraries using the V5 GPUs, but not real NPU.

Rockchip seems to be this time a little bit ahead of Allwinner and has released RK1808 and RK3399pro SOCs.

Some info also start to appear in their rockchip-linux repositories.

We got RK1808 brief datasheet and here are the SOC internals:

screenshot from 2019-01-25 12-52-48

  • Dual core Cortex-A35
  • Internal 2MB SRAM
  • DDR 32-bit data width, 2 ranks max 2GB of DDR3/DDR3L/LPDDR3/LPDDR3L -1600
  • Neural Process Unit with 512KB internal buffer and Support for: max 1920 Int8, max 192 Int16 and max 64 FP16 MAC operations per cycle
  • eMMC 4.5 1-4-8 bit max 150MB/s
  • SD/MMC support
  • SPI Flash x1-4-8 data
  • video encoder/decoder up to 1080p
  • video input DPI 8-10-12-16 bit up to 150MB/s
  • camera input MIPI CSI up to 4 data lane, 2.0Gbps, MIPI-HS, MIPI-LP
  • LCD RGB 8/8/8 up to 1280×800@60fps
  • MIPI DSI 1920×1080 up to 4 data lane, 2.0GbpsA
  • Audio I2S
  • Gigabit Ethernet
  • USB2.0 HOST/OTG
  • USB3.0 5Gbps
  • PCIe 1/2 links with 2.5Gbps per link
  • SPI, I2C, UART
  • x4 10bit SAR ADC 1Msps
  • -40+125C operating temperature, targeting automotive and industrial vision apps

This chip is definitely not hobby friendly with FCCSP 420 0.3mm balls spaced at 0.5/0.35mm!

screenshot from 2019-01-25 13-41-56

Price info is not available yet. First evaluation boards will be ready end of March 2019. Rockchip will sell SDK with the NPU API also at unknown yet price.

Rockchip also upgraded their RK3399 including inside RK1808 and naming it RK3399Pro.
They keep the same RK3399 ball layout, so people who already made RK3399 boards can upgrade with RK3399Pro without changing lot on their PCB layout.

How they do it? They bond RK1808 in the same package and connect RK3399 with RK1808 via USB3.0 this is why RK3399Pro has NO externally available USB3.0:

screenshot from 2019-01-25 13-48-45

How they will manage power dissipation when they put together two quite power hungry chips is yet to be seen. RK3399 alone requires quite big heatsink as it dissipates up to 20W when the Cortex-A72 cores are running.

10 Comments (+add yours?)

  1. bobby
    Jan 25, 2019 @ 16:24:49

    Teres-II with RK3399?

    Reply

  2. Shervin Emami
    Jan 25, 2019 @ 22:45:01

    Wow so do you think in an RK3399Pro it only uses the NPU of the RK1808 and the rest of the RK1808 silicon is turned off? Or is the whole RK1808 powered and running with 2 CPU clusters, 2 GPUs, 2 DDR subsystems, etc?

    Reply

  3. Koen
    Jan 26, 2019 @ 15:51:59

    If you need an NPU for automation, why not simply buy a Huawei mobile phone with a Kirin 980 chipset (dual NPU), connect the robot using usb otg, and write the app in Java? Just wondering.

    Reply

  4. Trackback: #rockchip https://olimex.wordpress.com/2019/01/25/rockchip-is-relea… | Dr. Roy Schestowitz (罗伊)
  5. tkaiser
    Feb 02, 2019 @ 19:33:43

    Wrt ‘RK3399Pro has NO externally available USB3.0’. What is most probably Rockchip’s RK3399Pro reference design has both an USB3-A receptacle as well as SuperSpeed data lines via USB-C: https://www.96rocks.com/blog/2018/12/11/toybrick-rk3399pro-board-is-pre-order-now/

    Reply

  6. jonsmirl
    Feb 10, 2019 @ 20:05:19

    “The AI they advertise looks more like OpenCV / Tensorflow lite libraries using the V5 GPUs, but not real NPU.”

    The V5 supports an image recognition technique called “HAAR cascade” this predates the discovery made about three years ago that so hugely improved CNN’s. Today’s CNNs can easily beat HAAR systems in terms of accuracy so it is likely that HAAR will fall into disuse. This doesn’t mean that HAAR doesn’t work, CNN is simply better.

    The hardware needed to implement HAAR is nothing like the hardware needed for CNNs. CNNs need massively parallel FP or INT MAC. HAAR is custom hardware that implements the HAAR elements.

    OpenCV is a little confusing. OpenCV supports both HAAR and CNN. But is looks like they are removing HAAR in OpenCV 4.01+. Tensorflow is only CNN.

    So V5 has hardware accelerated HAAR, and it has a quadcore NEON FPU which can run a decent CNN. The HAAR hardware can run in real-time over 30FPS video.

    The RK1808 can’t do HAAR but it has a NPU which accelerates running a CNN.

    A complex CNN like Inception_V4 runs in about four seconds on V5 NEON, same operation on RK1808 might be 100ms(?).

    There are pluses and minuses to using both chips. The HAAR hardware on the V5 is very fast and it has almost zero impact on the CPU. One strategy I’d playing with is to use the HAAR hardware to detect regions of interest in 4K video. Then I feed those regions into a CNN. Doing it that way greatly reduces the surface area the CNN has to look at. I can run CNNs like MobileNet over these smaller regions in 100-200ms.

    I’ll buy a RK1808 dev board as soon as they are available. But there are still some unanswered questions. Is the NPU going to be fast enough to process real-time video? How many FPS can the h.264 encoder do? Hopefully it is 60FPS+ – that’s needed to to make two 30FPS streams with different encodings (hires/lowres, can’t ship 8Mb/s 1080P to cell phones).

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: