How to correctly power off OLinuXino running Android


Image

OLinuXino boards now are used at many different places and environments. Few days ago we got call from customer who is using A13-OLinuXino-WIFI in industrial applications and have installations all around the world including China 🙂 His industrial controller is running on Android and his problem is that randomly from time to time the Android NAND image gets corrupted. We also have noticed on our forum posts that some boards have NAND flash corrupted and need to be re-flashed. Usually we blame for the NAND corruption the cheap power supplies used, but this customer assured me that he is using very good, industrial power supply with all noise protections etc.

With our engineers we decided to do some extensively tests on this subject and here are the results:

1. It’s very important the power supply to stay ON uninterrupted while Android boots, for freshly programmed image this may take SEVERAL minutes as obviously when run for very first time Android creates lot of files and buffers.

After the first boot the Android usually starts faster for less than minute after you apply power supply (and it is not put in mode for start with POWER key).

What we found is that if the power supply is interrupted while Android boots it almost 100% sure that NAND will be corrupted. In 9 of 10 times when we power off board while boots NAND gets corrupted.

This is usually critical part for non-patient and non-experienced users I guess, they plug TV and power supply and wait few seconds – nothing happen on the TV and they power off the board – now already NAND is corrupted!

2. After Android boot is complete and power is switched off, there is rare but still chance to kill your NAND content. We try and out of 1 of 50 switch ON/OFF NAND will be corrupted, what we guess is that Android even after boot from time to time write files to NAND file system and if you hit such time slot to power off your board your NAND will be corrupted.

OK we start thinking how to avoid this, first we though it would be clever to add brown-out protection circuit which to disable NAND write (Write Protect) when glitch on power supply is detected.

Unfortunately (and logically) this made things even worse, now even when you plug USB device or when you enable/disable LCD backlight small glitch on the power supply disables the NAND write, but the processor is working and what happens when processor is running and try to write file while NAND is disabled – MESS!

Reading our forum we saw suggestion how correctly to power off Android device, using this command sequence:

$ sync
$ reboot -p

and cut power after these commands execution finish, we didn’t manage to corrupt the NAND flash even after hundreds of power on/off cycles this way.

The conclusion, the one and only correct way to power OFF OLinuxino running Android/Linux is by executing above commands and after them to cut power off, otherwise there is always chance processor to start write to file system and if power is cut at this point NAND Flash will be corrupted.

Experiments with SD cards show that they are much more reliable to such power interruptions. Probably because they have their own internal Flash write controller.

12 Comments (+add yours?)

  1. cnxsoft
    Jan 24, 2014 @ 17:01:36

    We had the same problem with Linux based system. Asking customers to allow properly shutdown the system may not always be satisfactory because they may not be the end users, and power failures do happen.

    One solution was to call fsync after each write operations in our application, but a better solution was to set the critical files (e.g. root file system) as read-only, and store config files in another partition (e.g. /etc directory), also including a backup, and mount to to the ramdis. This obviously requires some extra work, and I’m not sure how it’s feasible in Android.

    Reply

  2. Georg Sassen
    Jan 24, 2014 @ 17:56:25

    If the OLinuXino has a battery charger circuit you can connect a small rechargeable battery and tell Linux/Android to power down gracefully when mains power is lost, effectively behaving like a UPS powered big machine. You will have to take care of situations like where power gets back while the board is powering down, though.

    Reply

    • Ramon La Pietra
      May 22, 2014 @ 08:52:09

      With the olinuxino A20 if you shutdown with the main power supply and the lipo battery the board will reboot.

      IE: the olinuxino will reboot if the main power come back during the graceful shtudown caused by a main power fail .

      Reply

  3. Radu - Eosif Mihailescu
    Jan 24, 2014 @ 18:59:10

    With all due respect for everybody involved, this should have been obvious from the beginning. Did anyone (from Olimex and your customers) actually expect to be able to turn power off at ANY time without ANY side-effects? Really?

    “Android” is only meaningful for the GUI part, otherwise it’s a Linux system. A Linux system will use a filesystem with advanced features (like atime support) which is, surprise, liable to corruption anytime it’s uncleanly unmounted. That liability may decrease if journalling is used and, comparatively, may be less than the one for NTFS — but it’s still there, no matter how much you try to ignore it.

    As the previous poster suggested (and like the inventors of the LiveCD discovered), while this liability cannot be avoided (unless you use monsters like XFS or ZFS), it’s effects can 🙂

    The first step is to allow write access on a need-to-have basis, i.e. have (most of) the root filesystem mounted read-only. One good starting point is here: https://wiki.debian.org/ReadonlyRoot (covers Debian, but applies equally to any distribution). By definition, all filesystems mounted read-only, are read-only so there is no state to lose by losing writes which means there’s no corruption.

    The second step, like the previous poster explained, is to switch to a transactional architecture for configuration data: instead of allowing anyone to write to (let’s say) /var/lib whenever they wish, read all configuration from flash on startup into a ramdisk and have people write there. A separate cron job can easily periodically dump the ramdisk back to flash each few seconds.

    The third step, if you want to go further and make your device really bulletproof, is to do what PC motherboards (and SmartCard readers) do and have a means to detect loss of power (/POWER_GOOD tied to /NMI) in conjunction with a power reserve (a capacitor). You then write good system firmware/software that, when a power loss is detected, immediately makes the read-write filesystems consistent and then locks the board up (allowing the power-on reset circuitry or the brown-out detector to unlock it via reset when the power comes back up). Of course, this requires that the code knows how much time it has at its disposal: if “very little”, then probably not more than the SysRq-U equivalent can be done (i.e. umount everything or remount readonly if still in use); if “a few seconds”, then more advanced things could be done such as asking the cron job above to make one last dump and then unmounting everything.

    It goes without saying that if the particular flash chip used has a manufacturer-recommended shutdown/freeze procedure, that is to be followed to the letter 🙂 It also goes without saying that the power-loss handler will be a top priority interrupt (equivalent of NMI on i386) and that the first thing it does is switch everything it can off on the board and run the CPU in its lower power state to make the most of the power reserve.

    It is possible to make an “I don’t care when you unplug me” board/device, just requires attention to design 😉

    Reply

    • OLIMEX Ltd
      Jan 24, 2014 @ 19:10:51

      actually this is what one forum member wrote – I always cut the power of my Linux machine at any time and never had problems which puzzled us a bit 🙂
      also these cheap OpenWRT routers never lost ther SPI Flash content no matter how many power cuts occur, but I guess they do not write to this flash but just read the image uncompress and boot at the beginning then everything is done in RAM

      Reply

      • Martin Schleisner Nisted
        Jan 27, 2014 @ 09:55:36

        Some days ago my linksys router running OpenWRT lost its configuration while I cut off the power in my house. I have never tried this before and thought that it was designed to be “fool”-proof regarding to power cut-off;)

  4. Jess
    Jan 25, 2014 @ 01:54:36

    For some reason this “feature” of NAND corruption does NOT happen in A20 Olinuxino design – go figure….

    Reply

    • vincenet
      Jan 27, 2016 @ 12:00:36

      Jess, I am interested by your information. Have you got proof of that ? Did you do similar test to Olimex one and also confirm the same test can corrupt the A13 nand ? Olimex still recommend to properly shutdown the board in A20 documentation.
      I wonder if I can save cost of the lithium battery in A20 but I probably need anyway the same battery to keep RTC time updated during long disconnection (few days or weeks) of the main supply.

      Reply

  5. Tom
    Jan 25, 2014 @ 10:31:30

    I can tell you from experience that SD cards and USB sticks are just as vulnerable to corruption as Flash file systems. The only thing you can do to make it absolutely sure that things don’t break is using some sort of UPS ( battery or otherwise ) that makes sure there is enough power left to do a clean shutdown where you want to sync and unmount any writable filesystems.

    Reply

  6. Jeroen
    Jan 25, 2014 @ 12:58:28

    When I need a “foolproof” board I use a circuit to provide the board with power to shutdown. The circuit is controlled by a gpio that has the power state of the board (inverted) and a gpio in to detect the mains power presence. The log files and other files that needs read/write are in a ram file system which is synced every 24 hour and every start/shutdown. There’s only one weak point, when the system is shutting down due power loss and the power comes back on in that time, the system doesn’t come back on again (very rare condition).
    link to the schema: http://leachy.homeip.net/olinuxino/pwr/schema.png
    link to a ramlog: http://www.tremende.com/ramlog or http://leachy.homeip.net/olinuxino/pwr/ramlog_2.0.0_all.deb

    Reply

  7. freakingtux
    Jan 26, 2014 @ 12:48:51

    Also one needs to remember that devices like MMC and managed nands contains firmware that will operate. These operation, to make wear leveling and bad block management transparent, happen regardless of the operations done on the device.

    It is therefore quite probable that a read action on a read-only mounted MMC still causes writes to happen internally. Datasheets often will specify a certain inactivity timeout one needs to obey before one powers down the hardware. So powering off the device while such an internal action is happening can cause the whole MMC to become corrupted.

    Reply

  8. Trackback: .NET i jiné ... : Odkazy z prohlížeče – 10.2.2014

Leave a comment