Skip to content

Slow SPI speed with SPI library #898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stas2z opened this issue Jan 29, 2020 · 9 comments · Fixed by #912
Closed

Slow SPI speed with SPI library #898

stas2z opened this issue Jan 29, 2020 · 9 comments · Fixed by #912
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@stas2z
Copy link
Contributor

stas2z commented Jan 29, 2020

Im trying to work with several SPI LCD displays and found SPI transfer made by SPI library is slower then for example ESP32 SPI library running the same code. It looks like SPI working at lower speed than initialized.

Im not completely sure, but seems like i found a problem causing this issue (at least it can be one of it)

All SPI library methods use calls to internal spi_transfer routine, which also calls HAL_SPI_TransmitReceive to do the job. But LCDs are not answer to master, so seems like its waiting for reply timeout.
As a workaround ive changed all (except on which called directly with answer buffer) spi_transfer calls to spi_send calls and it improved throughput dramatically (flickering i experience is almost gone).

here is my diff

diff --git a/libraries/SPI/src/SPI.cpp b/libraries/SPI/src/SPI.cpp
index e162338e..e3d0a74e 100644
--- a/libraries/SPI/src/SPI.cpp
+++ b/libraries/SPI/src/SPI.cpp
@@ -278,7 +278,7 @@ byte SPIClass::transfer(uint8_t _pin, uint8_t data, SPITransferMode _mode)
     digitalWrite(_pin, LOW);
   }
 
-  spi_transfer(&_spi, &data, &rx_buffer, sizeof(uint8_t), SPI_TRANSFER_TIMEOUT);
+  spi_send(&_spi, &data, sizeof(uint8_t), SPI_TRANSFER_TIMEOUT);
 
   if ((_pin != CS_PIN_CONTROLLED_BY_USER) && (_mode == SPI_LAST) && (_spi.pin_ssel == NC)) {
     digitalWrite(_pin, HIGH);
@@ -330,7 +330,7 @@ uint16_t SPIClass::transfer16(uint8_t _pin, uint16_t data, SPITransferMode _mode
     digitalWrite(_pin, LOW);
   }
 
-  spi_transfer(&_spi, (uint8_t *)&data, (uint8_t *)&rx_buffer, sizeof(uint16_t), SPI_TRANSFER_TIMEOUT);
+  spi_send(&_spi, (uint8_t *)&data, sizeof(uint16_t), SPI_TRANSFER_TIMEOUT);
 
   if ((_pin != CS_PIN_CONTROLLED_BY_USER) && (_mode == SPI_LAST) && (_spi.pin_ssel == NC)) {
     digitalWrite(_pin, HIGH);
@@ -379,7 +379,7 @@ void SPIClass::transfer(uint8_t _pin, void *_buf, size_t _count, SPITransferMode
     digitalWrite(_pin, LOW);
   }
 
-  spi_transfer(&_spi, ((uint8_t *)_buf), ((uint8_t *)_buf), _count, SPI_TRANSFER_TIMEOUT);
+  spi_send(&_spi, ((uint8_t *)_buf), _count, SPI_TRANSFER_TIMEOUT);
 
   if ((_pin != CS_PIN_CONTROLLED_BY_USER) && (_mode == SPI_LAST) && (_spi.pin_ssel == NC)) {
     digitalWrite(_pin, HIGH);
@stas2z
Copy link
Contributor Author

stas2z commented Jan 29, 2020

By the way, related issue exists with curren spi implementation - SPI.transfer(buffer, size) will modify your buffer unexpectedly.

@stas2z
Copy link
Contributor Author

stas2z commented Feb 2, 2020

Seems like nobody cares.

Yes, i understand that my workaround can't be a solution (thats why ive not made a PR), but speed issue is huge, using HAL_SPI_Transmit instead of HAL_SPI_TransmitRecieve boosts SPI transmit speed about 4+ times.
It's a pity, but in the current state its preffered to use atmega328p for SPI LCD cuz it's working faster than my stm32 F4 (401&405)

@stas2z
Copy link
Contributor Author

stas2z commented Feb 2, 2020

here is my suggestion, it doubles SPI speed (but at least twice slower than transmit w/o recieve)

spi_status_e spi_transfer(spi_t *obj, uint8_t *tx_buffer, uint8_t *rx_buffer,
                          uint16_t len, uint32_t Timeout) {
  spi_status_e ret = SPI_OK;
  uint32_t tickstart;
  SPI_TypeDef *_SPI = obj->handle.Instance;

  if ((obj == NULL) || (len == 0)) {
    return SPI_ERROR;
  }
  tickstart = HAL_GetTick();
  while (len--) {
    while (!LL_SPI_IsActiveFlag_TXE(_SPI))
      ;
    LL_SPI_TransmitData8(_SPI, *tx_buffer++);
    while (!LL_SPI_IsActiveFlag_RXNE(_SPI))
      ;
    *rx_buffer++ = LL_SPI_ReceiveData8(_SPI);
    if ((((HAL_GetTick() - tickstart) >= Timeout) &&
         ((Timeout != HAL_MAX_DELAY))) ||
        (Timeout == 0U)) {
      ret = SPI_TIMEOUT;
      break;
    }
  }
  return ret;
}

what do you think?

@fpistm
Copy link
Member

fpistm commented Feb 3, 2020

Hi @stas2z
this will be checked, no worries.
This is already an opened issue: #257
Anyway, this will require some times to properly invest this. The SPI is functional, slow but functional.

@fpistm fpistm added the enhancement New feature or request label Feb 3, 2020
@stas2z
Copy link
Contributor Author

stas2z commented Feb 3, 2020

probably we need an extra flag for spisettings to skip recieve if slave is never answer? it will double (4x atm, but with optimized spi_transfer it will double) spi transactions speed for silent slaves like SPI LCDs and will not break default behaviour and will be compatible with multiple slaves on the same bus usage case

@stas2z
Copy link
Contributor Author

stas2z commented Feb 3, 2020

@fpistm a little offtopic, which tool do you use to format a code for this core? my ide formatted it with clang-format and i don't want to reformat it back manually in case of making pull requests etc

@fpistm
Copy link
Member

fpistm commented Feb 3, 2020

https://github.com/stm32duino/wiki/wiki/Astyle

@ABOSTM
Copy link
Contributor

ABOSTM commented Feb 7, 2020

Hi @stas2z,
I review your PR Thanks.
By the way I have 2 remarks:

  • Currently we use theoretical default SPI speed SPI_SPEED_CLOCK_DEFAULT = 4MHz , but this value is rounded down: in my case L476 due system clock 80MHZ and prescaler which is a power of 2, I get really 2,5MHZ (80MHz/32). You can try to increase SPI frequency (taking into account you systemCore clock and the power of 2 prescaler). The limit depends on your hardware setup (length of wires, slave device, ... ) but it is worth to give it a try.
  • I don't know which LCD library you are using, but it is probably similar than mine: each byte on SPI is send one by one : SPI.transfer(c) this cause a lot of useless overhead compare to sending several bytes at once (ex: SPI.transfer(buf, _count)). In other word, there may be some room for improvement in library itself too.

@stas2z
Copy link
Contributor Author

stas2z commented Feb 7, 2020

Im using my own library for st7735 and ili9341 (only init and setting orientation different) written from scratch and sure i use buffered transfers to speed up lcd operations
also im using fastest spi clock available for my mcu (84mhz/2 for f401 currently)

@fpistm fpistm added this to the 1.9.0 milestone Feb 12, 2020
@fpistm fpistm linked a pull request Feb 12, 2020 that will close this issue
@fpistm fpistm closed this as completed Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants