Skip to content

Add a couple of sanity checks#2

Open
danielg4 wants to merge 1 commit into
tchebb:mainfrom
danielg4:main
Open

Add a couple of sanity checks#2
danielg4 wants to merge 1 commit into
tchebb:mainfrom
danielg4:main

Conversation

@danielg4

Copy link
Copy Markdown
Contributor

After the Arduino responds over I2C, its TWI interface was being left in some kind of undefined state, so if the target device were to lose power momentarily while the Arduino is still powered, it wouldn't respond a second time, and the target device would attempt a full boot. I borrowed a bit of code from Wire.cpp in the Arduino IDE that seems to clean this up such that the Arduino can respond again and again most of the time if need be.

Clarify connection instructions

@tchebb tchebb left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, and sorry for the slow review! I assume the unhandled TW_BUS_ERROR state is what was causing your issue, and I'd love to fix that. Some of the other changes seem unneeded, though. Take a look at my individual comments, which all reference the ATmega48P/V/88P/V/168P/V Datasheet, chapter 22.

Comment thread i2c-flash.c
uint8_t offset;

switch (TWSR) {
switch (TW_STATUS) {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, TW_STATUS is definitely more correct here. The datasheet says we need to mask out the prescaler bits, and I hadn't done that.

Comment thread i2c-flash.c

// Acknowledge interrupt to let hardware continue.
TWCR |= _BV(TWINT);
TWCR = _BV(TWEN) | _BV(TWIE) | _BV(TWINT) | _BV(TWEA);

@tchebb tchebb Mar 20, 2025

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this change? I know that, in general, read-modify-write of registers can have unintended side effects, but I don't think that's the case for TWCR. The only bits that might potentially cause the hardware to do things when rewritten are TWSTA and TWSTO, but the only place we use them is in the new case above, which should have an early return anyway. Unless I'm missing something, I'd prefer to keep the old line for clarity.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this change? I know that, in general, read-modify-write of registers can have unintended side effects, but I don't think that's the case for TWCR. The only bits that might potentially cause the hardware to do things when rewritten are TWSTA and TWSTO, but the only place we use them is in the new case above, which should have an early return anyway. Unless I'm missing something, I'd prefer to keep the old line for clarity.

THIS is what was causing the issue. Read-modify-write is a big no-no here because the read bits are volatile and not meant to be written back.

@tchebb tchebb Mar 20, 2025

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I stand corrected. Do you know which specific bit is the issue? Out of the R/W bits in the register, the only ones your version doesn't write are TWSTA and TWSTO. The hardware never sets those according to the datasheet, and they're not write only, so they should read as zero in my version and result in the exact same write you have.

Also, this ISR has to successfully fire many times (once for each byte) for even a single boot to work. So I'm curious what the error state is and how we enter it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I stand corrected. Do you know which specific bit is the issue?

All three of the other ones I'm writing are needed. I tested.

Also, this ISR has to successfully fire many times (once for each byte) for even a single boot to work. So I'm curious what the error state is and how we enter it.

When the i2c read wraps around the end of the buffer, the state reported by the bits changes to something other than what we want.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three of the other ones I'm writing are needed. I tested.

Right, all three of those are for sure needed. I'm asking which extra bits were written in my implementation and broke things. Or, conversely, if any of those three were somehow cleared by hardware, which would also break things. My reading of the datasheet is that, for this register, read-modify-write is equivalent to what you've written. But if this fixed your issue, I must be wrong. I'll do some more investigation on my own and see if I can repro the issue.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, so I guess that means that they all get set to zero by hardware in the error state. I'm very curious what it is now. By "wraps around the end of the buffer", do you mean when this check starts hitting the second case and setting TWDR to 0?

TWDR = offset < sizeof(emulated_data) ? emulated_data[offset] : 0;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean when offset starts being less than sizeof again after having been greater than or equal, yes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, IIRC, not all of them are cleared each time, but sometimes they are. Either way, the library code makes it plain as day that preserving the values of those bits is not the thing to be done here.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are often multiple correct ways to program hardware. I wouldn't necessarily take the Arduino code as gospel with regards to AVR: it's gotten better over the years, but it's often had some pretty suboptimal stuff. For example, the entire reason this repo exists is because the Arduino I2C library can't manage to respond to interrupts fast enough to avoid clock stretching.

That's not to say it's wrong in this case, but it is why I want to investigate further before merging this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, the entire reason this repo exists is because the Arduino I2C library can't manage to respond to interrupts fast enough to avoid clock stretching.

I would attribute that to the significantly higher overhead the library has.

Comment thread i2c-flash.c
// Turn on TWI in slave mode, enable interrupt.
sei();
TWSR = TW_NO_INFO & 1;
TWBR = 18;

@tchebb tchebb Mar 20, 2025

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither of these lines should be needed, since they're both for clock setup but we don't generate an I2C clock in slave mode. Did they make a difference for you?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither of these lines should be needed, since they're both for clock setup but we don't generate an I2C clock in slave mode. Did they make a difference for you?

That might be true if you're using a real Arduino, but I'm using an Adafruit Boarduino, which doesn't have a crystal and is therefore more prone to drift without this.

@tchebb tchebb Mar 20, 2025

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really shouldn't matter regardless. I2C is timed using pulses on the SCL line, which is driven by the master. As a slave device, the AVR isn't timing anything at all. See 22.5.2 in the datasheet on the Bit Rate Generator Unit, which is what both these writes configure:

This unit controls the period of SCL when operating in a Master mode. … Slave operation does not depend on Bit Rate or Prescaler settings, but the CPU clock frequency in the Slave must be at least 16 times higher than the SCL frequency.

The only thing these lines change are the "Bit Rate or Prescaler settings".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK, I tried it both ways and its behavior just seemed more stable this way. It could be mere coincidence, but it can't hurt anything.

Comment thread README.md
apply power to the device. It should enter BootROM USB DFU mode with no
additional interaction needed.
on the ATmega48/88/168/328), attach HDMI pin 17 to GND, and shunt HDMI pins 18
and 19 together, then connect that to your target device and apply power to it.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are hotplug detect, right? Are they needed on every device? I don't remember having to do anything specific with them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SoC uses hotplug detect as a sanity check to make sure a dongle is actually plugged in. It doesn't attempt i2c if there is nothing there.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which SoC? That might be new behavior, as I developed this with reference to an S905D3 BootROM dump, and that seems to poll I2C unconditionally without checking any GPIO first. I'll double check on real hardware, though, since maybe there's a check hidden somewhere and I got lucky by leaving it floating.

This is a good thing to mention regardless, but I'd like to be specific about whether it's needed for all chips or only for some.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one I was messing with is S905W4, so that one at the very least. And no, the W4 part is not a typo.

Comment thread i2c-flash.c
TWCR = _BV(TWEN) | _BV(TWIE) | _BV(TWEA) | _BV(TWINT) | _BV(TWSTO);
while (TWCR & _BV(TWSTO))
continue;
case TW_NO_INFO:

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datasheet says that we should never observe this value after an interrupt. If we do, it's still probably more correct to clear the interrupt instead of returning immediately. So I'd remove this case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may say that, but it did not work correctly until I copied this here from the Arduino IDE's library.

Comment thread i2c-flash.c
switch (TWSR) {
switch (TW_STATUS) {
case TW_BUS_ERROR:
TWCR = _BV(TWEN) | _BV(TWIE) | _BV(TWEA) | _BV(TWINT) | _BV(TWSTO);

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case should be written as TWCR |= _BV(TWSTO) | _BV(TWINT) followed by an early return. There's no point spinning on the TWSTO clear when we could just sleep and wait for the next interrupt instead. And the early return stops us from clearing the interrupt twice, which is probably harmless but isn't what the datasheet instructs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case should be written as TWCR |= _BV(TWSTO) | _BV(TWINT) followed by an early return. There's no point spinning on the TWSTO clear when we could just sleep and wait for the next interrupt instead. And the early return stops us from clearing the interrupt twice, which is probably harmless but isn't what the datasheet instructs.

Again: Read-modify-write == bad juju

@danielg4

Copy link
Copy Markdown
Contributor Author

Thanks, and sorry for the slow review! I assume the unhandled TW_BUS_ERROR state is what was causing your issue, and I'd love to fix that.

Nope. This part was just bikeshedding for completeness. It never entered this state. What was causing the issue is below.

@tchebb

tchebb commented Mar 20, 2025

Copy link
Copy Markdown
Owner

Gonna hold off on merging until I can investigate more myself. There's clearly something I'm misunderstanding about the AVR's behavior, and I don't want to merge code I don't understand. Thanks for all the info, that should help me repro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants