Debugging @ 5.3Mbit/sec (5333333 Baud) on Arduino and other Embedded systems

When debugging embedded system the long tried method has been to generously distribute printf() statements through your code, once everything is working as expected, you then have to comment them out or remove them.

Over the years I have perfected my way of performing this art of debugging, addressing many of its shortcomings namely:

  • Serial port – not all devices have an UART, or the UART might be used already.  Out of band debugging is still preferable – like having separate lines for stdout and stderr.
  • RAM vs ROM – on Harvard machines (8-bit AVR) you have to do some special magic to use/put strings in ROM, even if you are not limited by the Harvard architecture, your linker may not put your const in flash automatically, and you still need some magic.
  • Timing impact – sending text over a serial line takes time,  transmission of “hello world” takes 1 millisecond @ 115200 Baud, this might ruin the functionality of the routine you are debugging, you could of course postpone your problem with ring-buffers, use interrupts…., and now you have also got that to debug ;-(. With the solution presented here “hello world” would transmit in 20usec @5.3Mbit.
  • Cleaning up – once it is all working you comment out, or worse remove the debug prints which we used to get it rolling.  Unfortunately bugs might return – well hopefully we were using git, which can help getting the debug prints back.

When I worked on my first embedded system 37 years ago, 1200 was not bad, and 9600 baud was just blazing. Today we are using USB-serial adapters costing less than a dollar delivered all the way from Ali-Land. Looking at the data-sheet for f.ex. PL2303MX we see that it supports up to 12Mbit/sec and have 256 bytes buffers for both input and output, this is certainly something we should take advantage of.​ – Lets go.

Serial Port

So how fast can we go? – I am only interested in output, using no feedback, the simplest is just a bit-bang serial implementation. An 8-bit AVR and probably most CPUs can handle a bit in 3 instruction

  • test bit in the character we want to send
  • set bit in byte we have to output on the port
  • and finally write the byte to the output pins

The code to send a byte on an atmega* looks like this, we copy the full 8bit-port to _tmp_reg_ so we do not have to re-read it on every bit (saves a cycle), interrupts must be disabled, because other parts of the code might change other pins on that port.

void dprint_init(uint8_t level)
{
        DPRINT_TX_DDR &= ~_BV(DPRINT_TX_BIT);   // set tx_bit as input
        DPRINT_TX_PORT |= _BV(DPRINT_TX_BIT);   // stop bit and weak pull-up ~ 20kohm
        if ((DPRINT_TX_PORT & _BV(DPRINT_TX_BIT)) == 0) {
                level = 0; // pull the tx_bit down by a 2k2 resitor to disable debug prints
        }
        DPRINT_TX_DDR |= _BV(DPRINT_TX_BIT);    // set tx_pin as output
        debug_level = 1;
        DPRINTLN("debug_level", level);
        debug_level = level;
}
void _dprintc(char ch)  // irq disabled for 33 instruction that is 2usec
{
        __asm__ __volatile__(
        "   in r25,__SREG__\n\t"                // get interrupt state
        "   cli\n\t"                            // disable irq
        "   in __tmp_reg__,%[tx_port]\n\t"      // read the full port
        "   clt\n\t"                            // T=0
        "   bld __tmp_reg__,%[tx_bit]\n\t"      // set tx_bit=T
        "   out %[tx_port],__tmp_reg__\n\t"     // sending Startbit
        "0: bst %[ch],0\n\t"                    // T=lsb
        "   bld __tmp_reg__,%[tx_bit]\n\t"
        "   out %[tx_port],__tmp_reg__\n\t"     // sending lsb
        "1: bst %[ch],1\n\t"
        "   bld __tmp_reg__,%[tx_bit]\n\t"
        "   out %[tx_port],__tmp_reg__\n\t"
        "2: bst %[ch],2\n\t"
        "   bld __tmp_reg__,%[tx_bit]\n\t"
        "   out %[tx_port],__tmp_reg__\n\t"
        "3: bst %[ch],3\n\t"
        "   bld __tmp_reg__,%[tx_bit]\n\t"
        "   out %[tx_port],__tmp_reg__\n\t"
        "4: bst %[ch],4\n\t"
        "   bld __tmp_reg__,%[tx_bit]\n\t"
        "   out %[tx_port],__tmp_reg__\n\t"
        "5: bst %[ch],5\n\t"
        "   bld __tmp_reg__,%[tx_bit]\n\t"
        "   out %[tx_port],__tmp_reg__\n\t"
        "6: bst %[ch],6\n\t"
        "   bld __tmp_reg__,%[tx_bit]\n\t"
        "   out %[tx_port],__tmp_reg__\n\t"
        "7: bst %[ch],7\n\t"
        "   bld __tmp_reg__,%[tx_bit]\n\t"
        "   out %[tx_port],__tmp_reg__\n\t"     // sending msb
        "8: set\n\t"                            // T=1
        "   bld __tmp_reg__,%[tx_bit]\n\t"
        "   out %[tx_port],__tmp_reg__\n\t"     // sending Stopbit
        "   out __SREG__,r25\n\t"               // restore irq
        :
          [ch] "+r" (ch)
        :
          [tx_port] "M" (DPRINT_TX_PORT_ADDR),
          [tx_bit] "M" (DPRINT_TX_BIT)
        :
          "r25"
        );
}

We need a little initialization and on the atmega2560 we need to use a port which can be accessed as a register.

#include "dprint.h"

#define DPRINT_TX_PORT PORTB
#define DPRINT_TX_DDR DDRB
#define DPRINT_TX_BIT 5

#define DPRINT_TX_PORT_ADDR _SFR_IO_ADDR(DPRINT_TX_PORT)

uint8_t debug_level;
void dprint_init(uint8_t level)
{
        DPRINT_TX_DDR &= ~_BV(DPRINT_TX_BIT);   // set tx_bit as input
        DPRINT_TX_PORT |= _BV(DPRINT_TX_BIT);   // stop bit and weak pull-up ~ 20kohm
        if (DPRINT_TX_PORT & _BV(DPRINT_TX_BIT) == 0) {
                level = 0; // pull the tx_bit down by a 2k2 resitor to disable debug prints
        }
        DPRINT_TX_DDR |= _BV(DPRINT_TX_BIT);    // set tx_pin as output
        debug_level = 1;
        DPRINTLN("debug_level", level);
        debug_level = level;
}
void hello(void)
{
  char *txt = "Hello World";
  while (*txt) {
    dprintc(*txt++);
  }
}

Well we got our message out there, now we just need to pick it up.

The CPU which in this case is an atmega2560 operates at 16Mhz, with a bit time of 3 instructions we end up with a baudrate of 5333333   that is about 5.3 Mbit/sec

Newer versions of Linux already support non-standard baudrate, so we just need a terminal emulation program which does the same – pyserial will do the job .

My favorite picocom is an older version (1.7) on Ubuntu-16.04.  The remedy is to just get and install the newest version (3.2a)

git clone https://github.com/npat-efault/picocom.git
cd picocom
make
sudo cp picocom /usr/local/bin
sudo cp picocom.1 /usr/local/share/man/man1

picocom -l /dev/ttyUSB0 -b 5333333 --imap lfcrlf

And voila we will be greeted with “Hello World” once the ATMEGA2560 calls hello()

The best thing about this approach is that you can get going in a very short time on a new processor, literally just translate the above assembler code to what ever is spoken on the new processor, no care about baudrate or silly things like that on the embedded CPU. Just adjust the baudrate on the host system (Linux), take the processors clock-rate and divide with how long one bit takes – or if you are the lazy type try them all FREQ/1 FREQ/2 FREQ/3 …. And before you know it you are ready to get working on the new toy.

Debug strings in Flash

Harvard CPU have code and data in separate spaces, 8-bit AVR, i8051, stm8 are of this type. and can be a painfull experience, I certainly prefer Von Neumann CPU like MSP430 in any case depending on the compiler we might have to do something to make sure that constant text strings are placed in Flash and not in RAM.

Using a little “include-file” magic we can hide some of this idiocrasy, that way we can use the same source code on Harvard and Von Neumann CPUs, and if we are using C++ we can use overloaded funtions to make our life easier. The goal here is to supply 3 funtions,  to print our debugging information

  • dprint( variable )
  • DPRINT( “text which will be stored in Flash” ,  variables…) ;
  • DPRINTLN( “text which will be stored in Flash” ,  variables… )

The actual c++ overload function, is called from within a macro which test for debug_level  and serializes the print statements.

In the C-world we normally use the printf(), so lets compare the two.   On a Linux box you might write something like

  • if (debug) printf(“Name \”%s\” is %d bytes tall\n”, cpt=“StorePeter”, strlen(cpt));
  • result: Name “StorePeter” is 10 bytes tall

The printf() function will parse “Name \”%s\” is %d bytes tall\n”, printing the text and calling subroutines on the way to print the string and the number.

On our embedded system we could write:

  • DPRINTLN(“Name #”, cpt=“StorePeter”, strlen(cpt));
  • result: Name # StorePeter 000a

Not as readable as the printf(), but for debug purposes this is fine, and much more efficient both in code size and speed compared to using printf().  The macro DPRINTLN() basicly expands to

if (debug_level > 0) {
    _dprint(F("NAME #"));
    _dprintc(' ');
    _dprint(cpt = "StorePeter");
    _dprintc(' ');
    _dprint(strlen(cpt));
    _dprintc('\n');
}

The file dprint.h with all its magic is shown below

extern uint8_t debug_level;
char nibble2ascii(uint8_t);
void _dprintc(char);
void _dprint(uint8_t);
void _dprint(uint16_t);
void _dprint(uint32_t);
void _dprint(int8_t);
void _dprint(int16_t);
void _dprint(int32_t);
void _dprint(char *);
void _dprint(const char *);
void _dprint(const __FlashStringHelper *);
void dprint_init(uint8_t);
void dwheel(void);  // have one in loop() to show your system is alive

#define   dprint(v) if (debug_level>0) _dprint(v)
#define  ddprint(v) if (debug_level>1) _dprint(v)
#define dddprint(v) if (debug_level>2) _dprint(v)

#define dprint3(level,trail,string) do {\
	if (debug_level>level) {\
		_dprint(F(string)); _dprintc(trail);\
	}\
} while (0)
#define dprint4(level,trail,string,value) do {\
	if (debug_level>level) {\
		_dprint(F(string)); _dprintc(' ');\
		_dprint(value); _dprintc(trail);\
	}\
} while (0)
#define dprint5(level,trail,string,val1,val2) do {\
	if (debug_level>level) {\
		_dprint(F(string)); _dprintc(' ');\
		_dprint(val1); _dprintc(' ');\
		_dprint(val2); _dprintc(trail);\
	}\
} while (0)
#define dprint6(level,trail,string,val1,val2,val3) do {\
	if (debug_level>level) {\
		_dprint(F(string)); _dprintc(' ');\
		_dprint(val1); _dprintc(' ');\
		_dprint(val2); _dprintc(' ');\
		_dprint(val3); _dprintc(trail);\
	}\
} while (0)

#define GET_MACRO(_3,_4,_5,_6,NAME,...) NAME
//https://stackoverflow.com/questions/11761703/overloading-macro-on-number-of-arguments
#define   DPRINT(...) GET_MACRO(__VA_ARGS__, dprint6, dprint5, dprint4, dprint3, dprint2)(0,' ',__VA_ARGS__)
#define  DDPRINT(...) GET_MACRO(__VA_ARGS__, dprint6, dprint5, dprint4, dprint3, dprint2)(1,' ',__VA_ARGS__)
#define DDDPRINT(...) GET_MACRO(__VA_ARGS__, dprint6, dprint5, dprint4, dprint3, dprint2)(2,' ',__VA_ARGS__)

#define   DPRINTLN(...) GET_MACRO(__VA_ARGS__, dprint6, dprint5, dprint4, dprint3, dprint2)(0,'\n',__VA_ARGS__)
#define  DDPRINTLN(...) GET_MACRO(__VA_ARGS__, dprint6, dprint5, dprint4, dprint3, dprint2)(1,'\n',__VA_ARGS__)
#define DDDPRINTLN(...) GET_MACRO(__VA_ARGS__, dprint6, dprint5, dprint4, dprint3, dprint2)(2,'\n',__VA_ARGS__)

You can dynamically turn debugging on or off, by changing the debug_level variable,

When you got parts of your code bug-free turn DPRINT() into DDPRINT() which will only print at a higher debug level. so you will not see them, but ypu can still get the debug prints back without much effort.

Once your code is working set debug_level = 0, pulling down the tx_pin does exactly that, and you will have an production version with very little overhead, if an error resurfaces, you can change the debug_level variable and follow what is going on.

Another tip could be modifying the debug_level variable around code you working on f.ex.

...
debug_level++;
this_is_function_I_am_debugging();
debug_level--;
...

The _dprint() overload functions are trivial as you see below

char nibble2ascii(uint8_t d)
{
        d &= 0xf;
        if (d < 10)
                d += '0';
        else
                d += 'a' - 10;
        return (d);
}
void _dprint(uint8_t i)
{
        dprintc(nibble2ascii(i>>4));
        dprintc(nibble2ascii(i));
}
void _dprint(uint16_t i)
{
        dprint((uint8_t)(i>>8));
        dprint((uint8_t)i);
}
void _dprint(uint32_t i)
{
        dprint((uint16_t)(i>>16));
        dprint((uint16_t)i);
}
void _dprint(int8_t i)
{
        dprint((uint8_t)i);
}
void _dprint(int16_t i)
{
        dprint((uint16_t)i);
}
void _dprint(int32_t i)
{
        dprint((uint32_t)i);
}
void _dprint(char *string)
{
        unsigned char c;
        while ((c = *string++)) {
                dprintc(c);
        }
}
void _dprint_F(const char *string)
{
        unsigned char c;
        while ((c = pgm_read_byte(string++))) {
                dprintc(c);
        }
}
void dwheel(void)
{
	static uint8_t n;
	_dprintc('\b');
	n++;
	if (n>3) n=0;
	_dprintc("-/|\\"[n]);
}

If you are tight on space, or want to move the code to a smaller CPU you can get rid of all the debugging code, just include the file no_dprint.h (instead of dprint.h) which is shown below.

#define nibble2ascii(...)
#define dprint_init(...)
#define _dprintc(char)
#define _dprint(...)
#define _dprint(...)
#define _dprint(...)
#define _dprint(...)
#define _dprint(...)
#define _dprint(...)
#define _dprint(...)
#define _dprint(...)
#define _dprint(...)
#define   dprint(v)
#define  ddprint(v)
#define dddprint(v)
#define   DPRINT(...)
#define  DDPRINT(...)
#define DDDPRINT(...)
#define   DPRINTLN(...)
#define  DDPRINTLN(...)
#define DDDPRINTLN(...)

Show me the code

The idea and the code is yours free to use and explore, released on the BeerWare license no strings attached, you might even have had this idea yourself long before this was written.

I know I have invented a lot of stuff which other people invented before me – and they didn’t even ask.

Just download  test_dprint.zip and try it out on your arduino, it worked for me under Arduino-1.8.5

stm8s Microcontroller

I have implemented the same functionality for stm8s microprocessor, different instruction-set, different compiler, and slightly different language – C vs C++ Debugging @ 5.3Mbit/sec (5333333 Baud) on stm8s microprocessors

Other Microcontrollers

And the home-work until next time:  Write a version for your favorite microcontroller and I will put a link in right here, maybe you can even make it faster than mine.

Stress test

If you are curious if the system can sustain this speed please have look at Stress testing 5.3Mbit/sec debug-stream from Arduino

DebugSerial the Arduino way

I have implemented a version which can replace the standard Serial class normally used on Arduino, you can read more about it here: MySensors Debug print @ 5.3Mbit/sec freeing Hardware Serial Port

Happy hacking

StorePeter peter@lorenzen.us

Print Friendly, PDF & Email
This entry was posted in Arduino, Embedded, on Hackaday. Bookmark the permalink.