-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major
-
None
I've only observed this issue on one specific SMuRF crate + server node at the LAT, other crates I've tried have not run into it. It occurs when I start up all 6 slots simultaneously, with at least two or three of the slots crashing during initialization. I haven't seen it happen when I start them up individually (although I have a smaller sample size for that).
The failure is a a `TimeoutError`in the `Root._read` call. Typically, something like the following.
Traceback (most recent call last): File "/usr/local/src/smurf-streamer/scripts/stream.py", line 140, in <module> main() File "/usr/local/src/smurf-streamer/scripts/stream.py", line 133, in main with CmbRoot(pcie=pcie, **root_kwargs): File "/usr/local/src/rogue/python/pyrogue/_Root.py", line 174, in __enter__ self.start() File "/usr/local/src/pysmurf/python/pysmurf/core/roots/Common.py", line 189, in start pyrogue.Root.start(self) File "/usr/local/src/rogue/python/pyrogue/_Root.py", line 420, in start self._read() File "/usr/local/src/rogue/python/pyrogue/_Root.py", line 732, in _read self.checkBlocks(recurse=True) File "/usr/local/src/rogue/python/pyrogue/_Device.py", line 646, in checkBlocks value.checkBlocks(recurse=True, **kwargs) File "/usr/local/src/rogue/python/pyrogue/_Device.py", line 646, in checkBlocks value.checkBlocks(recurse=True, **kwargs) File "/usr/local/src/rogue/python/pyrogue/_Device.py", line 646, in checkBlocks value.checkBlocks(recurse=True, **kwargs) [Previous line repeated 3 more times] File "/usr/local/src/rogue/python/pyrogue/_Device.py", line 642, in checkBlocks pr.checkTransaction(block, **kwargs) File "/usr/local/src/rogue/python/pyrogue/_Block.py", line 71, in checkTransaction block._checkTransaction() rogue.GeneralError: Block::checkTransaction: General Error: Transaction error for block AMCc.FpgaTopLevel.AppTop.AppCore.RtmCryoDet.LutCtrl.Lut[1].MEM[244] with address 0x823203d0. Error Timeout (5.000000s) waiting for register transaction 7205 message response.
Although the precise register that times out varies.
After the read, when `checkBlocks` is called on the entire tree, a timeout occurs at some step. In this case, `checkTransaction` should not be doing anything, since only read operations have been performed. It must be failing when waiting on an existing transaction to complete. I don't understand how it encounters this state, because from reading the code, it looks like each transaction waits for existing ones to finish (`waitTransaction(0)` at the start of the `startTransaction` function). So it would have to be the very last one that times out? I don't think I understand the rogue code well enough to make that conclusion. The timeout is set to 5s.
Reverting back to the old version of the code (based on rogue 4), I have not encountered this issue despite trying multiple times to reproduce it.