SPI-nning up

As part of a bigger project that I'll go into in a future post, I'm beginning to mess around with the SPI peripheral on the STM32G4, a new-as-of-last-year microcontroller geared towards BLDC motor control. Instead of starting with a middleware like mBed, I figured if I really wanted to learn electrical engineering it'd probably be in my best interest to bite the bullet and start at the bare metal to understand exactly what's going on at the chip level.

After reading all 2075 pages of the datasheet (heh) I set about using the HAL drivers provided by ST's STM32CubeMX to get the SPI port up and running in order to talk to the LSM9DS1 9-DoF chip (datasheet). Adafruit has created a solid breakout board for it:

I had previously played around with Adafruit's other 9-DoF breakout board, but that requires communication with two different chips. On top of that, according to the datasheet for the LSM303DLHC chip the accelerometer registers are big-endian while the magnetometer registers are little-endian. Only lost a whole day to that, grrr...

To bootstrap learning, I'm using the NUCLEO-G474RE dev board (datasheet). CubeMX has built-in support for this board and an initial pinout for its peripherals, which is nice to get up and running quickly. On top of that, it's got a shiny new ST-LINK V3 on board (well, the V3E version anyway), which makes for super nice USB hi-speed debugging.

On to exploring the SPI interface. I configured the chip to use SPI2, since SPI1 uses PB3 that's tied to T_SWO.

Easy enough. Next, I exported the code and looked through the auto-generated source files created by CubeMX.

Yeah, no, not doing that. I guess I'll use it as an example of how to twiddle the registers, but that's about it. I ended up spending a few days getting a toolchain up and running using a combination of VSCode, Cortex-Debug, gcc-arm-.*, and the awesome build system Bazel. And once this patch to OpenOCD landed, I had a full debugging setup up and running. Never underestimate the utility of full IDE debugging support, even in embedded systems.

Here's the SPI port from the LSM9DS1 to SPI2 on the Nucleo, and my trusty (read: I barely know how to use it) Rigol DS1054

I initially tried to bit-bang the CSS (NSS according to ST) at 5.3125 Mbit, polling for the WHO_AM_I register. According to the datasheet, we're expecting to read 0x68. 

Nice! If we expand the MOSI/MISO decoding lines we see that 0x8F is written first (0x07 register, with 0x80 bit set), and the chip responds with 0x68. Woo! Looking closer though, the CSS goes low, then about 3us later the clock begins to switch. The clock switching appears to be at the appropriate 5.3 Mhz, but there's a *huge* delay between the initial clock pulses and the subsequent pulses. The total read-then-write takes almost 20us (!).

Looking back at the code, I'm using the blocking version of the SPI HAL to do a write to write the register address, then subsequently a read. My suspicion was that the delay in the clock was likely due to spending time between the completion of the SPI frame, remaining HAL code, and my next read call. The compilation target is in dbg mode though, so this isn't entirely unexpected.

I wondered what would happen if instead of doing two sequential Read/Write calls, if the HAL function HAL_SPI_TransmitReceive would have less CPU overhead. 

Much better. A tiny bit of a delay between the read and write, but better. Though there's still a large delay between pulling the NSS switching and the SPI transfers. Again, we're in debug here, so I wondered what it would look like if we switched to opt compilation mode.

Even better! Digging through the datasheet, I found out that the SPI interface does in fact have support for hardware chip-select switching. I switched that on, for opt+hardware NSS performance.

Wow. Yeah, I don't think there's any way bit-banging will be faster than hardware. That said, it's probably more than acceptable for reading/writing config registers. For doing multi-reads on the other hand, probably best to try a hardware implementation.

One other trick I tried is to take advantage of the fact that the LSM9DS1 does most transactions in 16-bit windows. The STM32G4 has support for SPI transactions up to 16 bit. Switching from the standard 8-bit DataSize to 16-bit resulted in an even tighter timing, just about as optimal as you can get:

Including overhead, this setup was able to transfer 16 bits in about 3.2us, or almost exactly 5Mbit. Not too shabby!


Popular posts from this blog

It's dead[time], TIM

It's about that time[r]