Debugging @ 5.3Mbit/sec (5333333 Baud) on Arduino and other Embedded systems

When debugging embedded system the long tried method has been to generously distribute printf() statements through your code, once everything is working as expected, you then have to comment them out or remove them.

Over the years I have perfected my way of performing this art of debugging, addressing many of its shortcomings namely:

  • Serial port – not all devices have an UART, or the UART might be used already.  Out of band debugging is still preferable – like having separate lines for stdout and stderr.
  • RAM vs ROM – on Harvard machines (8-bit AVR) you have to do some special magic to use/put strings in ROM, even if you are not limited by the Harvard architecture, your linker may not put your const in flash automatically, and you still need some magic.
  • Timing impact – sending text over a serial line takes time,  transmission of “hello world” takes 1 millisecond @ 115200 Baud, this might ruin the functionality of the routine you are debugging, you could of course postpone your problem with ring-buffers, use interrupts…., and now you have also got that to debug ;-(. With the solution presented here “hello world” would transmit in 20usec @5.3Mbit.
  • Cleaning up – once it is all working you comment out, or worse remove the debug prints which we used to get it rolling.  Unfortunately bugs might return – well hopefully we were using git, which can help getting the debug prints back.

When I worked on my first embedded system 37 years ago, 1200 was not bad, and 9600 baud was just blazing. Today we are using USB-serial adapters costing less than a dollar delivered all the way from Ali-Land. Looking at the data-sheet for f.ex. PL2303MX we see that it supports up to 12Mbit/sec and have 256 bytes buffers for both input and output, this is certainly something we should take advantage of.​ – Lets go.

Serial Port

So how fast can we go? – I am only interested in output, using no feedback, the simplest is just a bit-bang serial implementation. An 8-bit AVR and probably most CPUs can handle a bit in 3 instruction

  • test bit in the character we want to send
  • set bit in byte we have to output on the port
  • and finally write the byte to the output pins

The code to send a byte on an atmega* looks like this, we copy the full 8bit-port to _tmp_reg_ so we do not have to re-read it on every bit (saves a cycle), interrupts must be disabled, because other parts of the code might change other pins on that port.

We need a little initialization and on the atmega2560 we need to use a port which can be accessed as a register.

Well we got our message out there, now we just need to pick it up.

The CPU which in this case is an atmega2560 operates at 16Mhz, with a bit time of 3 instructions we end up with a baudrate of 5333333   that is about 5.3 Mbit/sec

Newer versions of Linux already support non-standard baudrate, so we just need a terminal emulation program which does the same – pyserial will do the job .

My favorite picocom is an older version (1.7) on Ubuntu-16.04.  The remedy is to just get and install the newest version (3.2a)

And voila we will be greeted with “Hello World” once the ATMEGA2560 calls hello()

The best thing about this approach is that you can get going in a very short time on a new processor, literally just translate the above assembler code to what ever is spoken on the new processor, no care about baudrate or silly things like that on the embedded CPU. Just adjust the baudrate on the host system (Linux), take the processors clock-rate and divide with how long one bit takes – or if you are the lazy type try them all FREQ/1 FREQ/2 FREQ/3 …. And before you know it you are ready to get working on the new toy.

Debug strings in Flash

Harvard CPU have code and data in separate spaces, 8-bit AVR, i8051, stm8 are of this type. and can be a painfull experience, I certainly prefer Von Neumann CPU like MSP430 in any case depending on the compiler we might have to do something to make sure that constant text strings are placed in Flash and not in RAM.

Using a little “include-file” magic we can hide some of this idiocrasy, that way we can use the same source code on Harvard and Von Neumann CPUs, and if we are using C++ we can use overloaded funtions to make our life easier. The goal here is to supply 3 funtions,  to print our debugging information

  • dprint( variable )
  • DPRINT( “text which will be stored in Flash” ,  variables…) ;
  • DPRINTLN( “text which will be stored in Flash” ,  variables… )

The actual c++ overload function, is called from within a macro which test for debug_level  and serializes the print statements.

In the C-world we normally use the printf(), so lets compare the two.   On a Linux box you might write something like

  • if (debug) printf(“Name \”%s\” is %d bytes tall\n”, cpt=“StorePeter”, strlen(cpt));
  • result: Name “StorePeter” is 10 bytes tall

The printf() function will parse “Name \”%s\” is %d bytes tall\n”, printing the text and calling subroutines on the way to print the string and the number.

On our embedded system we could write:

  • DPRINTLN(“Name #”, cpt=“StorePeter”, strlen(cpt));
  • result: Name # StorePeter 000a

Not as readable as the printf(), but for debug purposes this is fine, and much more efficient both in code size and speed compared to using printf().  The macro DPRINTLN() basicly expands to

The file dprint.h with all its magic is shown below

You can dynamically turn debugging on or off, by changing the debug_level variable,

When you got parts of your code bug-free turn DPRINT() into DDPRINT() which will only print at a higher debug level. so you will not see them, but ypu can still get the debug prints back without much effort.

Once your code is working set debug_level = 0, pulling down the tx_pin does exactly that, and you will have an production version with very little overhead, if an error resurfaces, you can change the debug_level variable and follow what is going on.

Another tip could be modifying the debug_level variable around code you working on f.ex.

The _dprint() overload functions are trivial as you see below

If you are tight on space, or want to move the code to a smaller CPU you can get rid of all the debugging code, just include the file no_dprint.h (instead of dprint.h) which is shown below.

Show me the code

The idea and the code is yours free to use and explore, released on the BeerWare license no strings attached, you might even have had this idea yourself long before this was written.

I know I have invented a lot of stuff which other people invented before me – and they didn’t even ask.

Just download and try it out on your arduino, it worked for me under Arduino-1.8.5

stm8s Microcontroller

I have implemented the same functionality for stm8s microprocessor, different instruction-set, different compiler, and slightly different language – C vs C++ Debugging @ 5.3Mbit/sec (5333333 Baud) on stm8s microprocessors

Other Microcontrollers

And the home-work until next time:  Write a version for your favorite microcontroller and I will put a link in right here, maybe you can even make it faster than mine.

Stress test

If you are curious if the system can sustain this speed please have look at Stress testing 5.3Mbit/sec debug-stream from Arduino

Happy hacking


Print Friendly, PDF & Email
This entry was posted in Arduino, Embedded, on Hackaday. Bookmark the permalink.