Skip to content

Olimex boards ESP32-EVB/Gateway ethernet fix #6188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Olimex boards ESP32-EVB/Gateway ethernet fix #6188

wants to merge 1 commit into from

Conversation

Stanimir-Petev
Copy link
Contributor


Summary

Added short delay (350 ms) for ESP32-EVB at the start of the ethernet begin function (in ETH.cpp) to solve the issue with the inability to initialize the phy immediately after reset.

Added values of the revision macros (in boards.txt) for ESP32-Gateway to match the #if conditions in the variant file and applying the changes for the respective revision.

Impact

I have described the delay for ESP32-EVB subject in more details here: #6142

The values for ESP32-Gateway macros are needed so the changes can be applied not only for a specific revision but for the revisions after which is achieved by comparing the revision value to a constant inside the "pins_arduino.h" file inside the variants folder. For example:

#if	ARDUINO_ESP32_GATEWAY >= 'D'
#define ETH_CLK_MODE ETH_CLOCK_GPIO17_OUT
#define ETH_PHY_POWER 5
#endif

The hardware changes were not only for revision D, but also for E, F etc. And without the values those comparisons were meaningless and the code inside was ignored. As a result the default ethernet example wasn't working. With these changes implemented the ethernet clock and power pins are defined and it works properly now.

Added short delay (350 ms) for ESP32-EVB at the start of the ethernet begin function (in ETH.cpp) to solve the issue with the inability to initialize the phy immediately after reset.

Added values of the revision macros (in boards.txt) for ESP32-Gateway to match the #if conditions in the variant file and applying the changes for the respective revision.
@CLAassistant
Copy link

CLAassistant commented Jan 25, 2022

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@sauttefk
Copy link
Contributor

This is due to the reset-/supervisor-chip U8 on the ESP32-EVB that creates a 470ms delay (OSC_DIS)on cold-start before the crystal oscillator for the PHY-clock starts.
Caveat: This board has an issue with warm-starts, as then the crystal oscillator is already running and sending its 50MHz signal to GPIO0 causing a 50:50 chance of getting in the serial bootloader instead of starting the application.

@@ -228,6 +228,9 @@ ETHClass::~ETHClass()

bool ETHClass::begin(uint8_t phy_addr, int power, int mdc, int mdio, eth_phy_type_t type, eth_clock_mode_t clock_mode)
{
#if defined ARDUINO_ESP32_EVB
delay (350); // Olimex board ESP32-EVB requires short delay before the phy initialization after reset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it make more sense to check the clock mode?
If the clock mode is external crystal on GPIO-0, then this should apply.

Copy link
Contributor

@sauttefk sauttefk Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this is just the very long 470ms delay on this particular board and how this is implemented to save a GPIO pin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

470 ms?
The change mentions 350 ms

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

470ms is the RC-time constant of the supervisor chip, that enables the PHY-clock.
The ESP32 has also an RC-reset circuit with 100ms delay.
So 470ms - 100ms = 370ms. Plus the startup time the ESP32 take until it reaches the code, where the Ethernet is being configured.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that sounds awfully critical timed and extremely board specific.
Tolerances of capacitors are quite big (tens of percent), and capacity of a capacitor may reduce over time as the component ages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey and sorry for the late reply. I didn't expect such an interest considering I posted an issue about a week ago but there was only one suggestion there, so I thought it would take days (if at all) before this pull request is discussed.

Anyway - for the issue I thought it is something specific for ESP32-EVB board and with colleagues after some testing we figured that this solves the issue. Although it's more of a workaround rather than an actual fix. We are not aware if other boards will need it or not.

The value for the delay is empirically derived by testing about 20 of our boards. Some of which behaved properly and the sketch worked as intended with or without the delay. While others needed between 100-250 ms to make it work at all. The most demanding boards required ~275ms at which point sometimes they sometimes worked, sometimes failed. And at 300 I didn't find any that aren't working. The extra 50 ms on top of that are more of a "insurance" although you might be right that it will need more.

As for the suggestion with the clock mode check - what exactly do you mean? I knew my solution (or should I say workaround) is lame but considering it get the job done I decided it's better than not having it at all. I am open for suggestions in that regard. It's just that I am uncertain how to implement a more elegant solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the clock mode.
There are 2 configuration related issues here:

  • The need to power manage (or reset) the LAN controller
  • Whether or not there is an actual need for blocking a clock signal to GPIO-0

Resetting the LAN controller via the reset pin needs roughly 100 msec for the LAN controller to properly work.
Power cycling the LAN controller can be useful for saving power (it needs 40 - 100 mA, depending on whether it is connected to a switch). This can also be used to suppress any clock signal to GPIO-0 when using an external crystal.
Depending on the power supply and R/C timings this may take a few-100 msec to get stable + the 100 msec as with performing a reset.
Blocking a clock signal to GPIO-0 can be handled with an analog switch chip or holding down the EN pin of the crystal.
Switching this takes a few msec at most (when toggling the EN pin of the crystal).

If the clock mode is set to have an external crystal, then it is likely the power pin is either used for power cycling the LAN controller and/or switching the clock signal to GPIO-0. I assume most boards will power cycle the LAN controller as this also can be used as a reset to clear any unrecoverable error state on the LAN controller (which does happen every now and then)

Thus having either the PWR or RST pin set, or the clock mode set to GPIO-0 external crystal, can be used to add some extra delay.

@Jason2866
Copy link
Collaborator

Is there a chance to do a more general fix? Other boards when change chips will fail too.

@TD-er
Copy link
Contributor

TD-er commented Jan 25, 2022

Is it possible to 'catch' a crash and then set a GPIO pin right before the actual reboot?
For example to set some value for some GPIO at a boot? Before the bootloader evaluates the GPIO-0 state?
Or maybe disable the ESP to boot into flash mode except for a cold boot?

@me-no-dev
Copy link
Member

@VojtechBartoska could you please poke the ETH team and see what they think about it and if there is a more proper way to fix this for everyone?

@VojtechBartoska VojtechBartoska self-assigned this Jan 31, 2022
@VojtechBartoska
Copy link
Contributor

@me-no-dev yes, assigning to myself.

@kostaond
Copy link

kostaond commented Feb 9, 2022

Is it possible to 'catch' a crash and then set a GPIO pin right before the actual reboot?
For example to set some value for some GPIO at a boot? Before the bootloader evaluates the GPIO-0 state?
Or maybe disable the ESP to boot into flash mode except for a cold boot?

The most appropriate way of how to handle this kind of scenarios is to have separate GPIO reserved for REFCLK enabling/disabling, in my opinion and enable the clock only when it is safe (e.g. from user program after boot is done). Typically, it can be achieved by pulling CLK EN low by a resistor so the clock is disabled during booting and then configuring the GPIO to high to enable it. However, that is matter of HW design decision and it is not applicable to this ESP32-EVB board where the design goal was to keep as much as possible GPIO's available.

@kostaond
Copy link

kostaond commented Feb 9, 2022

@VojtechBartoska could you please poke the ETH team and see what they think about it and if there is a more proper way to fix this for everyone?

This is very much board specific and I unfortunately don't see any better solution than waiting for PHY is properly started. The question is where to wait though... Frankly speaking, I am not very familiar with Arduino project so I cannot provide any erudite help. However, from philosophical point of view, it should not be done at level closed to driver but in some specific board initialization function.

@kostaond
Copy link

kostaond commented Feb 9, 2022

@Stanimir-Petev, @sauttefk I also noticed that the PHY reset is de-asserted (driving input high) right after the power is applied to LAN8710A. Am I right? Didn't you observed any issues? The reason why I am asking is the LAN8710A datasheet states: "A hardware reset (nRST assertion) is required following power-up".

@sauttefk
Copy link
Contributor

sauttefk commented Feb 9, 2022

@kostaond The Olimex ESP32-EVB is broken by design! Warm-starts have a 50/50 chance of waiting forever in the bootloader instead of starting the application. As far as I can see there is a R/C reset circuit that delays the NRST input of the LAN8710 after a cold-start.

@kostaond
Copy link

kostaond commented Feb 10, 2022

@kostaond The Olimex ESP32-EVB is broken by design! Warm-starts have a 50/50 chance of waiting forever in the bootloader instead of starting the application.

I see. So shouldn't it be stated in some Olimex documentation/errata along with other issue workaround suggested in this PR rather than creating board specific update? However, as I said, I am not Arduino guy so maybe a philosophy is different here...

As far as I can see there is a R/C reset circuit that delays the NRST input of the LAN8710 after a cold-start.

Do you mean C18?

@Jason2866
Copy link
Collaborator

Jason2866 commented Feb 24, 2022

IMHO a fix for a specific faulty device should not be placed in the general driver.
It litters the driver. You could provide your modified needed driver via a library
Btw. a such a big delay in a driver is a bad solution.

@gonzabrusco
Copy link
Contributor

Sorry to hijack this. But since the ethernet team is here. Can somebody confirm that during reset the LAN8720 does not output REFCLKO? I would like to use the LAN8720 with a 25MHz crystal, outputting REFCLKO (50MHz) to GPIO0. But I need to make sure the LAN8720 does shutdown the clock during reset so that the ESP32 will boot ok. The datasheet does not say it explicitly.

@suda-morris
Copy link

I remember we did such a test years ago. The result shows that, LAN8720 will keep REF_CLK output even in reset state.

@sauttefk
Copy link
Contributor

sauttefk commented Mar 23, 2022

@gonzabrusco

does the LAN8720 shutdown the clock during reset

no, unfortunately not. I also wished this would be so :-(
I tested this and I recall the datasheet does mention this (maybe it was the LAN8742 datasheet)

@sauttefk
Copy link
Contributor

sauttefk commented Mar 23, 2022

@gonzabrusco
When using GPIO0 as input for the EMAC clock, I ended up pulling GPIO2 high by default.
Then you have a 50/50 chance of the ESP32 booting normally from the flash on the SPI-port (GPIO0=high; GPIO2=high) or booting from a non-existing flash on the HSPI-port (GPIO0 = low; GPIO2 = high). When booting from the HSPI port, the bootloader does not find anything to continue booting and therefore resets, giving you the next 50/50 chance of booting from the SPI-flash.
Kind of hacky, but works....

I'd also wish there would be a possibility to set the ESP32 to make most of the bootstrap pins obsolete by programming the efuse just as this is possible with GPIO12 for selecting the flash-voltage.

@gonzabrusco
Copy link
Contributor

@gonzabrusco When using GPIO0 as input for the EMAC clock, I ended up pulling GPIO2 high by default. Then you have a 50/50 chance of the ESP32 booting normally from the flash on the SPI-port (GPIO0=high; GPIO2=high) or booting from a non-existing flash on the HSPI-port (GPIO0 = low; GPIO2 = high). When booting from the HSPI port, the bootloader does not find anything to continue booting and therefore resets, giving you the next 50/50 chance of booting from the SPI-flash. Kind of hacky, but works....

I'd also wish there would be a possibility to set the ESP32 to make most of the bootstrap pins obsolete by programming the efuse just as this is possible with GPIO12 for selecting the flash-voltage.

Thanks @sauttefk @suda-morris . So with the LAN8720 there's no way to use it with GPIO0 as a clock input (except with that hack). Or maybe fully power it down.

@sauttefk
Copy link
Contributor

sauttefk commented Mar 23, 2022

@gonzabrusco

So with the LAN8720 there's no way to use it with GPIO0 as a clock input

Well you could use a 50MHz crystal oscillator with an enable pin. Pull this enable pin low by default and connect it to the same GPIO as your LAN8720 reset.

Bildschirmfoto 2022-03-23 um 14 25 41

@VojtechBartoska
Copy link
Contributor

So just keep this Pull Request open to remind us further investigation on Ethernet in general.

@Jason2866
Copy link
Collaborator

A clean reinitialization of all GPIOs involved in RMII, before starting up the ETH module seems to fix the resart issue. Done by @s-hadinger for Tasmota fixing the issue with Tube ZB (based on POE Olimex module)

  // fix an disconnection issue after rebooting Olimex POE - this forces a clean state for all GPIO involved in RMII
  gpio_reset_pin((gpio_num_t)GPIO_ETH_PHY_POWER);
  gpio_reset_pin((gpio_num_t)GPIO_ETH_PHY_MDC);
  gpio_reset_pin((gpio_num_t)GPIO_ETH_PHY_MDIO);
  gpio_reset_pin(GPIO_NUM_19);    // EMAC_TXD0 - hardcoded
  gpio_reset_pin(GPIO_NUM_21);    // EMAC_TX_EN - hardcoded
  gpio_reset_pin(GPIO_NUM_22);    // EMAC_TXD1 - hardcoded
  gpio_reset_pin(GPIO_NUM_25);    // EMAC_RXD0 - hardcoded
  gpio_reset_pin(GPIO_NUM_26);    // EMAC_RXD1 - hardcoded
  gpio_reset_pin(GPIO_NUM_27);    // EMAC_RX_CRS_DV - hardcoded
  switch (Settings->eth_clk_mode) {
    case 0:   // ETH_CLOCK_GPIO0_IN
    case 1:   // ETH_CLOCK_GPIO0_OUT
      gpio_reset_pin(GPIO_NUM_0);
      break;
    case 2:   // ETH_CLOCK_GPIO16_OUT
      gpio_reset_pin(GPIO_NUM_16);
      break;
    case 3:   // ETH_CLOCK_GPIO17_OUT
      gpio_reset_pin(GPIO_NUM_17);
      break;
  }
  delay(1);

@TD-er
Copy link
Contributor

TD-er commented Aug 31, 2022

@Jason2866 That's great and would explain a lot of issues I'm seeing on my own boards as I did base some of my own board designs on the Olimex boards.

Where/when do you reset these pins? I assume right before calling ETH.begin() obviously, but maybe also before an intended reboot?
I also have split the handling of the Eth power pin to pull it high 400 msec before calling ETH.begin() since there is a known issue where a call to this function may fail if the LAN chip isn't ready yet.
So maybe it makes sense to reset the other pins too when powering the chip, to avoid strange states where pins might be high at boot of the lan chip?
Following this idea, I think it makes sense to call the reset of the power pin as the last command.
Have you tested this?

@s-hadinger
Copy link
Contributor

Oh, I didn't know about this issue. I don't know what is the root cause, but I've seen this issue with the latest Olimex POE.

The forced reset of GPIOs are made just before calling ETH.begin(). If I understand well it also shuts down the Power to the ETH module. I didn't experience any issue, but adding a 400ms pause wouldn't harm anyways.

However I suspect more a problem in the ESP32 IO matrix between reboots. It always works fine after power up, but fails often after a reboot.

@TD-er
Copy link
Contributor

TD-er commented Aug 31, 2022

However I suspect more a problem in the ESP32 IO matrix between reboots. It always works fine after power up, but fails often after a reboot.

Yep, I've seen that too, also for other pins. (e.g. I2C needing some tricks to get the bus unstuck after reboot)
The thing is, the LAN chip (at least the LAN87x0) does assume some internal states based on the state of some pins at boot.
The ones I'm aware of are thus explicitly pulled up or down on my boards to make sure their state is known. But perhaps the state of some other pins wired to a GPIO may also set some internal state in the LAN chip. Have to check the datasheet for it to know for sure.

@s-hadinger
Copy link
Contributor

I will be lazy on this issue. This small fix above makes it work 100% of the time for Olimex POE, that will do it for now. I hope you will find the root cause.

@s-hadinger
Copy link
Contributor

Hmmm. Unfortunately the problem came back. My patch above is not enough.

@TD-er
Copy link
Contributor

TD-er commented Sep 2, 2022

Hmmm. Unfortunately the problem came back. My patch above is not enough.

What exactly is the problem you're experiencing?

@s-hadinger
Copy link
Contributor

What exactly is the problem you're experiencing?

After a first start Ethernet works well. When I restart (no reset button, no power off), the Ethernet seems to connect and after 2 seconds goes off (the green led lights for 1-2 seconds). Then it tries to reconnect and goes off again.

Surprisingly after some time, restarting the device does work.

@TD-er
Copy link
Contributor

TD-er commented Sep 3, 2022

Does this happen on all switches/routers, or only on some?
For example I have seen that some 4G routers may take quite a while to "acknowledge" something is plugged into one of the ports.
This may cause a number of "connect" and "disconnect" events.
The ETH object does only register those events only once in its lifetime.
So what I do is that on a disconnect event, I immediately destruct the Eth object and recreate it.

@s-hadinger
Copy link
Contributor

I have only tried on a Unifi switch with auto-negotiation enabled. This could also come from the auto-nego failing. I'm sorry that I couldn't spend more time on it, nor enable more logs. I will try to gather more information in the following days.

@Stanimir-Petev Stanimir-Petev closed this by deleting the head repository Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.