Friday, March 7, 2014

BBB Performance Notes

As part of this activity some of the BBB processor and network performance have been measured, investigated and characterized.  One of basic points is that 1MSPS (treated as 16 bit samples) is readily processed by the ARM processor and transported over the network.  In the SDR server application a down conversion and integer FIR filter is applied to the samples and sent via UDP.  This takes on the order of 87% of the processor (measured via “top” application).  While the processing is minimal, it is sensitive and on the edge.  If "–O3" is not used in compiling the FIR filter, the processed sampling rate drops from meeting 500k complex samples/second to fluctuating around 300k complex samples/second.

One of the open questions which I had no data on was the FFT performance of the ARM processor.  The Si (spectral investigation) application was designed to conduct all processing except display on the BBB.  This allows for thin java clients for control and presentation only (e.g. tablet or phone).  The FFT performance is an important aspect of spectral evaluation in the application (at this point I haven’t moved to poly-phase filter but wanted to focus on FFT based processing as a start).  The original work used a simple FFT in C from a reference text.  This approach was intended to be instructive, not high performance.  This was then updated to use the FFTW package.  The table below captures the measured FFT performance with a magnitude squared calculation on the ARM comparing both implementations.  Note: These are double implementation FFTs (not integer – which will be evaluated later if need be, wanted to start simple), also the mag squared operation appears to take a very small fraction of the time.
FFT Size
Time (uS) Reference
Time(uS) FFTW
256
907
557
512
2069
777
1024
4978
1619
2048
10156
4128
4096
21990
9143
8192
49383
20219
16384
116065
45883
The impetus for focusing on this metric is that in a spectrum analyzer like application, one of the key metrics is the refresh rate at a given frequency span.  Based on the hardware at hand this translates into frequency stepping speed.  The driving aspect of this is the collection of samples.  Based on previous noise measurements, a good starting point seems to be around an 8k FFT.  At 1MSPS the collection of 8k samples will require 8mS [i.e. (8E3 sample)/(1E6sample/sec)=8E-3 seconds ].  Sample collection can be overlapped with power spectrum estimate calculation (i.e. FFT) and transmission of results.  So the bottom line is the target is 8mS per 8k FFT which is not being met based on the data above.  There are a couple of options including switching to an alternate power spectrum estimate technique or evaluating integer FFT performance. (FFT3.3.3 includes ARM NEON support).  This will be deferred until further hardware characterization is complete.  An interim target of ~30 steps per second appears readily achievable which if we use 250kHz per step yields a sweep rate of 7.5MHz per second. 

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.