Loading...

XML

Word

Printable

Here are my meeting notes from Jungfrau DMA data corruption meeting

When the failure happens, the DMA payload size is correct but the data is all zeros
Using dmaReadBulkIndex for their DMA buffer reading:
- https://github.com/slac-lcls/lcls2/blob/66e454ec7e70db005802f88aa969ab47bbf5bfe6/psdaq/psdaq/aes-stream-drivers/DmaDriver.h#L387
Observing the issue for both v5.17.3 & v6.5.1 of aes-stream-driver tag release
cfgMode=1 (default) and cfgMode=2 did NOT affect the issue
- Tried both no change to the failure mode
Matt Weaver says using SURF v2.54.0 for their application
- v2.55.0 is the latest with Mudit's change. So we are using an older version
- I asked Matt to provide me the version of SURF he uses for an older application that he think works fine
This failure mode does not appear to be related to cfgRxCount.
- So increasing or decreasing the cfgRxCount has not impact on when the
This failure mode does appear to be related to cfgSize.
- Increasing cfgSize decreases when the index failure happens
- Decreasing cfgSize increases when the index failure happens
This failure mode does not appear to be related to cfgTxCount.
- Tried by cfgTxCount=4 and cfgTxCount=16
This failure mode does not appear to be a data race/timing issue
- Adding 1 second sleep after error, then printing the DMA buffer again shows the same data.
- The data was still all zeros after 1 second

Configuration#1: Original Configuration
- Data RX Payload Size = 1056928 bytes
- cfgTxCount=16
- cfgRxCount=4096
- cfgSize=2097152 bytes
- failed when index>=2000
Configuration#2: Increase buffer size by 1.5 times
- Everything the same as config#1 but increased buffer size
- cfgSize=3145728 bytes
- Fails when index>=1000
Configuration#3: decreases index
- Everything the same as config#1 but decreased buffer count
- cfgRxCount=1024
- No FAILURES (WORK AROUND FOR NOW)