Storing a typical event now takes only 130 cycles, less than 0.7us on a 200MHz Cortex-M4 MCU.
This makes SystemView even less intrusive, reducing the CPU load to less than 1% in a System with 10k events/sec. No more excuses for not verifying system behavior!