Mi­cro­con­trollers

August 7 2020Daniel Tompkins

Archive KB

Rasp­berry Pi

Freezing

System sta­bility with the Rasp­berry Pi has been a struggle for me. It can re­ally ruin the fun of dis­cov­ering mi­cro­con­trollers/​SBCs if you're con­stantly re­flashing SDs and swap­ping power sup­plies.

Once in a while, pretty much anyone who tin­kers with the Rasp­berry Pi may find that their system starts freezing up. It can be a total night­mare to debug. If you're dealing with this, first of all— you have my sym­pa­thies.

Now, let's get down to brass tacks and figure out what's causing this issue. A rasp­berry pi that is sud­denly locking up and re­quires a hard re­boot (power cycle), usu­ally suf­fers from one of the fol­lowing is­sues:

Power

Make sure you have an of­fi­cially-sup­ported power supply. A good one for the Rasp­berry Pi 4 (where I most often en­counter this problem) needs to have 5V and at least 3A of cur­rent.

WARNING: Even offical PSUs can go bad

I had a CanaKit PSU that was within this range, but ended up being the smoking gun. I tried re­flashing the OS on a new mi­croSD, dis­con­necting pe­riph­erals, en­suring swap was en­abled, re­moving all over­clocking, and I was still having power is­sues.

After switching to a backup PSU (also rated at 5V3A), the RPi stopped ex­pe­ri­encing the random freezes. There had been some thunder storms re­cently, so maybe a power surge dam­aged the orig­inal. I re­ally don't know.

If you have other pe­riph­erals (like a shield, USB de­vices, etc.), try dis­con­necting them and using the Pi for a while. If you're on a cir­cuit in your house with other power-hungry de­vices, try moving the Pi to an­other plug.

Tem­per­a­ture

If an RPi gets past a cer­tain tem­per­a­ture limit, it might freeze up and re­quire a power cycle— which could po­ten­tially cor­rupt your mi­croSD card! If your pi boots at all, look at the tem­per­a­ture:

bash
# CPU Temperature # This returns the temperature in millidegrees Celsius (e.g., `54000` = 54.0°C) cat /sys/class/thermal/thermal_zone0/temp # To get a human-readable format: awk '{printf "CPU Temp: %.1f°C\n", $1/1000}' /sys/class/thermal/thermal_zone0/temp # GPU Temperature: # This outputs something like `temp=54.0'C` vcgencmd measure_temp # Both together (quick alias idea): echo "GPU: $(vcgencmd measure_temp)" && echo "CPU: $(awk '{printf "%.1f°C", $1/1000}' /sys/class/thermal/thermal_zone0/temp)"

If your tem­per­a­ture is con­sis­tently spiking above 70 or 80°C, you might con­sider a better cooling so­lu­tion— make sure you have a solid thermal con­nec­tion with any heatsinks, try adding a fan or larger heatsink. En­sure that your pi isn't closed off from am­bient air­flow (like in a sealed en­clo­sure).

CPU/​GPU Over­clocking

This is an easy one to debug, be­cause you can just re­move the over­clocking from you /boot/config.txt and see if the lock-up oc­curs. Often, this issue pig­gy­backs on tem­per­a­ture/​power is­sues.

To check and en­sure your Rasp­berry Pi has no over­clocking set­tings, in­spect the boot con­fig­u­ra­tion file:

Check current config
bash
cat /boot/config.txt # or on newer Raspberry Pi OS versions: cat /boot/firmware/config.txt
Look for these overclocking-related parameters
plaintext
arm_boost # Set to 0 to disable arm_freq # CPU frequency (default varies by model, 1.5GHz on RPi4) gpu_freq # GPU frequency (default varies by model, 500MHz on RPi4) core_freq # Core frequency sdram_freq # SDRAM frequency over_voltage # CPU/GPU voltage adjustment over_voltage_sdram # SDRAM voltage force_turbo # Forces max frequency at all times

If any of these are pre­sent and set above de­fault values, the Pi is over­clocked. To re­move over­clocking, com­ment them out with # or delete those lines en­tirely.

Un­sup­ported GPU Dri­vers

One of my Rasp­berry Pi SBCs has En­deav­ourOS in­stalled. This is an awe­some op­tion if you want an easy Arch Linux in­stall on the Pi. I don't know if it's doc­u­mented, but I kept seeing system lock-ups when the RPi's v3d kernel module (Video­Core VI 3D ac­cel­er­a­tion driver) was en­abled. It could be un­re­lated to the OS, but here were the symp­toms:

  • Random system freezes/​lockups re­quiring hard power cy­cles
  • No kernel panic, no er­rors on screen— just an un­re­spon­sive system
  • Pro­gres­sively shorter up­times (per­haps under a faulty PSU... after re­place­ment, less fre­quent, but still hap­pening)
  • Cor­rup­tion on boot par­ti­tion (journal uncleanly shut down)

Root Causes

The v3d kernel module has a bug in the DRM GPU sched­uler on kernel 6.18.18. The failure se­quence:

  1. A GPU bin job hangs
  2. The timeout han­dler fires: v3d_bin_job_timedout
  3. It calls v3d_gpu_reset_for_timeout -> drm_sched_stop
  4. drm_sched_stop calls dma_fence_wait_timeout on a fence that never sig­nals
  5. The kernel worker thread en­ters un­in­ten­tion­ally sleep (D state) per­ma­nently
  6. Re­peated GPU re­sets flood the sched­uler, cas­cading into a full system lockup
  7. No panic or oops is logged— the system simply be­comes un­re­spon­sive

This oc­curs re­gard­less of the dis­play overlay (both vc4-kms-v3d and vc4-fkms-v3d), be­cause v3d module loads in­de­pen­dently for 3D ac­cel­er­a­tion.

Key log ev­i­dence

txt
kernel: v3d fec00000.v3d: [drm:v3d_reset v3d] ERROR Resetting GPU for hang. kernel: v3d fec00000.v3d: [drm:v3d_reset v3d] ERROR V3D_ERR_STAT: 0x00001000 kernel: v3d fec00000.v3d: MMU error from client L2T (13) at 0x200, pte invalid kernel: INFO: task kworker/1:3:25194 blocked for more than 120 seconds. kernel: Workqueue: events drm_sched_job_timedout gpu_sched kernel: Call trace: dms_fence_default_wait dms_fence_wait_timeout drm_sched_stop gpu_sched v3d_gpu_reset_for_timeout v3d v3d_bin_job_timedout v3d drm_sched_job_timedout gpu_sched

Fixes

  1. Black­list v3d module (pri­mary fix)
/etc/modprobe.d/blacklist-v3d.conf
conf
blacklist v3d blacklist gpu_sched

Then re­build initramfs:

shell
sudo mkinitcpio -P
Trade-off

You will have no hard­ware OpenGL/​3D ac­cel­er­a­tion. Desktop, VNC, 2D, and video play­back all work fine via vc4 FKMS

  1. Switch dis­play overlay to FKMS
/boot/config.txt
txt
dtoverlay=vc4-fkms-v3d

This uses the firmware-backed dis­play path in­stead of full KMS (Kernel Mode Set­ting).

  1. En­able hard­ware watchdog
/etc/systemd/system.conf.d/watchdog.conf
ini
[Manager] RuntimeWatchdogSec=15 WatchdogDevice=/dev/watchdog RebootWatchdogSec=2min

Also, in /boot/config.txt:

txt
dtparam=watchdog=on
  1. Fix fstab for au­to­matic fsck
/etc/fstab
txt
UUID=A4C6-A593 /boot vfat defaults 0 2 UUID=5e1cff67-6a84-4e49-b936-503c8bc142d9 / ext4 defaults 0 1

The root par­ti­tion was missing from fstab en­tirely, and the boot par­ti­tion had pass 0 (no fsck). Now both are checked au­to­mat­i­cally on boot.

Di­ag­nostic Tools

Check throttle/undervoltage flags
shell
vcgencmd get_throttled
Check for v3d errors
shell
dmesg | grep -iE "v3d|gpu.*reset|hang" journalctl -b 1 -p err
Check for hung tasks
shell
journalctl -b 1 | grep "blocked for more than"
Check boot history for crash patterns
shell
journalctl --list-boots
Check filesystem corruption
shell
dmesg | grep -iE "corrupt|unclean|fsck"
Check SD card health
shell
cat /sys/block/mmcblk0/device/life_time cat /sys/block/mmcblk0/device/pre_eol_info

VNC

Rem­mina

Taken from the "Linux Con­sulting and Training" docs:

Note

Tested with Rasp­berry Pi OS on July 2020

When trying to con­nect from your Linux ma­chine using Rem­mina to a Rasp­berry Pi run­ning Rasp­berry PI OS with Re­alVNC en­abled you get the error:

Un­known au­then­ti­ca­tion scheme from VNC server

Re­alVNC only sup­ports a few se­cu­rity schemes. Authentication=VncAuth seems to be the only scheme that al­lows di­rect con­nec­tions from VNC-com­pat­ible Viewer pro­jects from third par­ties. In order to change to VncAuth scheme in your Rasp­bian and set a pass­word to ac­cept con­nec­tions from Rem­mina VNC plugin, open a SSH ses­sion (or a ter­minal window) on the Rasp­berry and gen­erate your VNC pass­word with:

bash
sudo vncpasswd -service

Now, edit the file /root/.vnc/config.d/vncserver-x11

bash
sudo nano /root/.vnc/config.d/vncserver-x11

and add the fol­lowing line at the end of the file:

plaintext
Authentication=VncAuth

Now your config file should look more or less like mine:

plaintext
_AnlLastConnTime=int64:0000000000000000 _LastUpdateCheckSuccessTime=int64:01d65c12272dff1a _LastUpdateCheckTime=int64:01d65c12272dff1a Password=c3abbea3b003a0b231737c0541892d72 Authentication=VncAuth

c3abbea3b003a0b231737c0541892d72 is the en­crypted ver­sion of "rasp­berry"; your line will likely be dif­ferent.

Even­tu­ally, restart the VNC server ser­vice with:

bash
sudo systemctl restart vncserver-x11-serviced

...and you are ready to con­nect to your Rasp­berry Pi using Rem­mina.

Im­ager

How do I open "Ad­vanced Set­tings" in Rasp­berry Pi Im­ager ap­pli­ca­tion?

  • Hold Ctrl+Shift+x on the main Im­ager screen