Tuesday, August 23, 2016

Prj 146 - Device Interface Software (Part 3)

The software to control and interface with the dual channel ADC board is fundamentally an SPI interface.  There are a couple of aspects which make this interesting.

First the primary SPI software is a function of the F board and was developed for multiple projects. This software provides a SPI cpu C++ and PRU interface.  The former toggles the SPI lines directly via Beaglebone black gpios while the later sends commands to PRU0 which toggles the io lines to the fpga.  The advantage is that the later can achieve SPI clock rates in the 9 – 16 MHz while the former is limited to kilohertz range.

The second interesting aspect is that the basic digital control block of the VHDL image not only provides registers to control the signal block and quadrature down conversion but also two ports of 6 pin gpio.  These ports can be used simultaneously by other boards and their controlling software.  The software needs to provide an independent mechanism to access these ports while not significantly impacting streaming of IQ ADC streams.

The final aspect of interest is that the down conversion software on the Beaglebone Black  provides a standard IQ ADC interface to other software applications .  In the cases where the 40MSP and 4MSPS channels are selected the cpu cannot process the full set of streaming data.  In the case of the 200kHz channel (2 bytes/sample or 400kB/s) the processor can stream the channel to a network SDR application.

To address the above considerations, the software was structured along the lines of the following diagram.
Software Organization
Starting at the bottom, PRU0 controls the SPI pins to the FPGA.  Based on simple throughput measurements the PRU can drive a SPI clock of ~ 9MHz driving 1x SPI width.  This results in ~515 kwords/second measured (with a word being 16 bits).  The 2x SPI would give slightly less than 2x performance (due to additional instructions for manipulating two bits at a time).  PRU0 has two command mail boxes in its SRAM which it monitors.  They are serviced round robin.  Each command contains an operation, a number of words/bytes, and the words/bytes to transfer.  As the words are shifted onto the SPI interface, the values shifted in are placed in the SRAM mailbox.  One mail box is used by the CPU to request SPI transfers while the other can be used by PRU1.

The CPU interface SpiXferArray16 is used by the BDC software to read and write 16 bit registers.  The writes can occur in a SpiXferArray16 of length one, while reads require two operations.  Register reads are conducted this way so that a read command (command in first word, and results obtained in second) are not split across transfers done by the PRU1 (or visa versa).  The BDC software SpiRead16/Write16 is used by the QDC100 software to configure the quadrature conversion as well as the gpio ports implemented by the BDC software.

Tboard , Mboard , DDC/QDC
The BDC gpio interface is used by the user space I2C (UI2C) software to support the devices on the T board (MAX2112 tuner, and MCP4725 DAC for variable gain amplifier), while the BDC gpio interface is also used by the SPI interface within the ADF4351 control software used by the M board (synthesizer section).

Concurrent with all of the above operations PRU1 under the control of the QDC100 software continuously requests PRU0 to conduct SPI read operations to the FPGA.  These operations are: a) read fifo samples in sets of 256, and b) read the sample fifo threshold register to ensure it is not empty (and can safely be read).   PRU1 places the samples it streams from the fpga fifo to a DRAM circular buffer shared with the cpu.  The QDC100 Get2kSamples() interface extracts those from the DRAM buffer and provides them to the calling application.

Other interfaces are present at all horizontal lines and not shown for clarity (e.g. the QDC100 provides an interface to select the channel to be streamed through the fifo and the digital downconversion frequency, the M board provides an interface to set the LO frequency, and the T board provides a tuning frequency for its LO, rf and baseband gain settings and baseband filter cutoff frequency).

The above structure allows each of the pieces of software to support various environments and configurations while at the same time maintaining device independence and allowing concurrent operation and control of the devices.

Wednesday, July 20, 2016

Prj 146 - VHDL for Quadrature Downconverter (Part 2)

This post walks through a VHDL application for quadrature down conversion on a dual channel 40MSPS ADC. A previous project constructed a digital down converter with a single real input channel and CIC filters and decimators with compensation filter in a LX9.  For this project I wanted to do something similar but not use CIC filters and provide multiple stages of down conversion to allow variable bandwidth sampling.  The following is a block diagram of the VHDL image - referred to as QDC-100.  All of the VHDL discussed is here.
QDC-100 Summary Block Diagram
The digital control portion of the FPGA image is the BDC block described here.  It provides the SPI interface to the BeagleBone Black along with control and status discretes to the signal processing block. Internally it executes at 96MHz (using the DCM). All of the sample data from the signal block is transferred via a fifo and all of the control information to the signal block are discretes.
QDC-100 Signal Block Summary
The channel match and test block provides  a mux to select the actual ADC samples or a test signal and equalize the channels to compensate for DC bias and gain differences in the analog front end.  The mixer and LO stage is just as it sounds, a numerically controlled oscillator or QDC which is complex multiplied by the input signal.  The decimators are two stages of filter and decimate to bring the sample rate down to a manageable level.  Two stages are used to provide different bandwidths and reduce the filter requirements.  The final stage selects samples from any of the previous stages and writes them to a fifo in I/Q 16 bit signed format for extraction by the digital control block.  The fifo provides a minimum number of samples (2048) indicator and a flush/reset control.

The following is a block diagram of the channel match and test pattern generator section.
QDC-100 Channel Match and Test Pattern Generator
The test pattern generator creates a I/Q signal with a known level using a small DDS (8 bit phase, 12 bit output) and a shift right level control (0dB, -6dB, …).  The ability to use a known digital high quality signal is extremely helpful in testing the rest of the design.  If the TPG value is 0 the muxes select the ADC data, otherwise, they select the test pattern and the test pattern is determined by the TPG value.  The output of the muxes  for each I and Q channel is then run through an adder and divider to remove the DC offset bias and equalize the amplitude of the channels.  Removing the DC bias removes down stream carrier products of the subsequent mixer and equalizing the amplitudes (removing channel gain variation) reduces down stream images.  The gain equalization is accomplished by multiplying the signal by an 8 bit signed value and then selecting the high bits of the output.  This has the net effect of multiplying the signal by N/128, where N is the Inum or Qnum.  This allows small variations to be matched with essentially the higher gain signal being reduced by a small fraction.  Using a signed quantity has the added benefit of being able to switch the I/Q channels.  If we use a –N value for one channel it shifts the channel by 180 degrees.  If that channel lagged the other by 90 degrees, it now leads it by 90 degrees thus swapping I/Q.  The outputs are 12 bit I and Q values which have been equalized.  The entire block works on a sample clock basis.

The following is a block diagram of the mixer and LO section.
QDC-100 Mixer and Local Oscillator
This block is relatively straight forward and uses a Xilinx DDS and complex multiply generated cores. The DDS block generates an in phase (12 bits) and quadrature (12 bits) sinusoid which is fed to the multiplier.  The multipliers output is two 16 bit values (I and Q) and routed to subsequent stages as a 32 bit vector.   Since the PINC bits are a set of discrete lines from another clock domain, a small state machine – the AXI Slave writer monitors for values changes and writes the phase increment register of the DDS core. Again, the entire block works on a sample clock basis.

The following is a block diagram of the decimator section.
QDC-100 Two Stage Decimation
There are two decimator blocks.  Each operates on 16 bit inputs and outputs and has its own internal clock.  The decimators are Xilinx generated cores and include a FIR filter applied prior to the decimation.  The first filter decimates by 10 and the second by 20.  The following is a block diagram of a decimator block.  Both A and B are structurally the same with the differences being the decimation rate, filter applied, and internal clock rate used.

The following is an internal block diagram of a single decimation stage:
QDC-100 Decimation Stage Internals
Each decimator uses an independent filter clock derived from the sample clock.  A DCM is used to generate a filter clock multiple times the sample clock (e.g. from 40MHz to 200MHz).  Each decimator has input and output fifos with independent read and write clocks.  This isolates the higher frequency filter clock domain.  The higher rate filter clock allows a smaller number of DSPs to be used to generate a FIR filter and decimator with a larger number of taps than would be possible using a filter clock at the same rate as the sample clock.

The signal block output is simply a fifo.  The input to the fifo is a 32 bit wide interface (16 bits I, 16 bits Q), while the output is a 16 bit wide interface.  This takes advantage of the Xilinx generated fifo capability to have different input/output widths.  This allows single cycle writes of I and Q data during a single sample clock while allowing the digital processor interface to retain a 16 bit wide register/fifo read interface.  The following is a block diagram of the output.
QDC-100 Signal Block Sample Output to Digital Block
The mux select determines what gets written to the fifo.  Output from any of the stages or processing can be selected.  The input fifo clock is the sample clock with the write valid always being true for non-decimated inputs or the valid signal from the decimators.  The output fifo clock is the digital control clock with the read enable driven from the digital control register read block.


Wednesday, June 22, 2016

Prj 146 - Dual Channel 40MSPS ADC (Part1)

With the F board (BeagleBone Black Spartan6 LX9 FPGA), I now have the option of using both faster ADCs as well as dual channel variants to support quadrature down converters.  This board uses a LTC2292 which is a dual channel 40MSPS 12 bit ADC.  It is similar to previous LTC ADCs I have used (FPGA variant and non-FPGA).  The schematic of the board is shown below.
Dual ADC Schematic - LTC2292, IF inputs, and sample clock.
Dual ADC Schematic - Voltage Regulators
There are a couple of variants and decisions on the board worth noting.  First, I configured it to use 0.5V full scale rather than 1V and configured the output to be signed 12 bit values.  Second, the analog inputs are similar to previous versions using a center tap transformer with 50 ohm input termination. This comes directly from the manufacture reference design and has worked well in the past.  Finally, a 40MHz CMOS oscillator is used but with a buffer to the FPGA on the carrier board. Separate regulators are used for the analog and digital supply voltages.  Everything is 3.3V and liberally supplied with bypass capacitors and chokes. The Kicad source material for the the board is available on github here. The board is a two layer OSHPark with the layout shown below.
Dual ADC Two Layer PCB Layout
A picture of the first unit assembled and mounted on an F board (mounted on a BBB) is shown below.
Dual 40MSPS ADC board.  Connectors at right are from the underlying F board it is mounted on.  Connectors on the left are from the underlying Beagle Bone Black which the F board is mounted on.
For testing purposes the anti-aliasing low pass filter elements were not populated.  This board is designed to work with a digital down converter VHDL image on a Spartan 6 LX9 (F board).  The construction of this board is straightforward.  The only difference is that with this unit I tried applying solder paste using 22gauge plastic needle with paste syringe.  In previous work I only had the default metal large needle that comes with the paste syringe itself.  Using the fine plastic needles make a huge difference, you can dispense fine amounts of paste and let the needle touch the pads without fear of the metal scraping.

Prj 145 - Beagle Bone Black Simple LX9 FPGA board (Part 1)
Prj 145 - BBB LX9 FPGA Board Design (Part 2)
Prj 145 - BBB LX9 FPGA Board Construction (Part 3)
Prj 145 - BBB LX9 JTAG Boundary Scan Utilities (Part 4)
Prj 145 - BBB LX9 C++ and VHDL (Part5)

Thursday, June 2, 2016

Prj 144 - DVB Tuner Board

I wanted to use one of the existing DVB tuners for various RF applications.  The appeal of these ICs is that they include a high level of integration with quadrature mixers, amplifiers and filters, are cheap and easily accessible, come in small packages, and are relatively easy to use.

I finalized on the MAX2112.  Based on experience with other devices with high levels of integration I decided to start simple and build a small board based on the circuit in the manufacturer evaluation board (I would have just used their eval board, however, these are always incredibly expensive).  The following circuit captures that board.
DVB Tuner Board Schematic
Two separate low noise regulators are provided although one is sufficient given the low power of the device.  The MAX2112 has a 75 ohm input impedance so provisions are made for a resistive broadband input match (and associated loss of input power) or a LC tuned input match to bring the board input impedance to 50 ohms.  The loop filter for the synthesizer was copied from evaluation circuit without modification.  The differential outputs were converted to single ended using op-amps.  The only change from the evaluation board is the inclusion of a DAC to provide a programmable voltage to the AGC input of the device.  The DAC was selected to have an I2C address different from the tuner.

Since this was my first part using an I2C interface it took some time to develop and debug the software (Tboard and user space I2C).  The board was populated with only the I2C DAC and an LED on its output.  Normally adding an LED is not good practice as it can add noise (something not desirable on the input voltage to a high gain AGC amplifier), however, for testing purposes it proved very helpful. Initial development and testing was conducted using an I board and then updated to support an Fboard with BDC VHDL.  The first unit used a 20MHz crystal, while the loop filter values were specified for a 27MHz crystal from the evaluation board.  This worked out ok since via software control I was able to divide the reference oscillator by 2 and achieve lock.  Below is a picture of that unit.
MAX2112 Based Tuner Board (second regulator not populated)
The programming information is a little sparse.  If you have used a synthesizer before it makes sense but I would not choose this as the first part to work with a PLL (loop filter, lock debug).  The one subtlety was the initial value of the VCO filter registers.  If you change these from the power on default I had problems with the device locking.  It seems to conduct the VCO search in only one direction in frequency (this wasn’t entirely clear from the data sheet or reference board material).  Having overcome this, I was able to test both IF channels using RF inputs across the fully specified range.

There are all kinds of characterizations I wanted to perform but without much test equipment and particularly equipment setup for quadrature baseband evaluation, I decided to keep it simple and move on to a dual channel ADC I could use with this board.  A quick check of the input amplifier gain showed reasonable and expected performance.  The other quick test easily accomplished was checking the programmable filter response.  The simplest, albeit not quite so accurate, was to set a tuning value and filter cut off frequency and scan a RF tone about the tuning frequency.  I could then use a spectrum analyzer with max hold history on and get the outer envelope of the fundamental as it was sweep through the frequency range.  The following diagram captures those results.

Programmable Filter Response (See text for measurement approach and caveats)
The down side with this approach is that the second harmonic of the spectrum analyzer is higher than the filter roll off very far into the stop band.  What this translates into is frequencies far away from the pass band end up seeing a higher max hold value than actual due to the second harmonic pushing up the history value.  So basically once you get -30dB or more down in the response you cannot see the true roll off of the filter, rather something less which is pushed up due to harmonics in the measurement device when it is seeing the fundamental at lower frequencies.  The tuner is set to 975MHz with the input swept from this to +20MHz.  An attenuator is used at the analyzer input to keep the signal level low to minimize its harmonic responses.  The analyzer is a 50 ohm input on a single output IF channel with the other terminated in 50 ohms.

So in short, the tuner is working as expected and within my current measurement capabilities.  Further characterization will have to wait until I have a dual channel ADC. 

Tuesday, May 17, 2016

Prj145 - BBB LX9 C++ and VHDL (Part5)

With a FPGA board and JTAG tools to load an image, the next step is to actually develop and use a VHDL application.  Since much of the VHDL is custom to the daughter board I focused on trying to get a basic digital control interface (BDC) that I could use with different daughter cards.  This has the added benefit of testing out the BBB-Fboard SPI and providing another set of GPIO ports.  The effort includes both the C++ and VHDL sides.  The following is a conceptual organization of the software.
C++ and VHDL Organization
The Fboard software primarily provides an interface to read and write the SPI interface to the Fboard hardware.  There are different flavors of SPI access including 8 and 16 bit accesses.  A device tree is installed and determines how the SPI interface is access – via the BBB sysfs interface or a PRU0 image.  The PRU0 is significantly faster and drives the SPI at multiple MHz while the sysfs version is on the order of kilohertz and is easier to start debug and test with.  The intent is that PRU1 can be used to issue SPI commands to PRU0 for application specific streaming of high speed data.

The basic digital control block (BDC) can be included in any FPGA image and provides a register interface via the SPI link between the BBB and the Fboard.  The control block is based on previous efforts and looks a lot like the Prj 141 digital interface.  The difference is that I wanted to simplify the programming model and allow each access to specify the register rather than needing a register access to control a selector for the next access to get to the target register.  A basic block diagram is below.
VHDL ControlBlock Implementing Basic Digital Control Interface
Each control block contains a set of 8 bit read/write registers, some 16 bit counters, and two GPIO units.  Registers/counters can be added with each bit or set of bits interfacing with other application specific VHDL logic. Each GPIO unit has three registers (one for input, output, and direction of pins). The GPIO units connect directly with each of the 10 pin ports on the Fboard. Each of these registers has its data and read/write valid signals muxed based on a register id from a finite state machine.
This state machine conducts the read/write of a register based on an SPI command.  The finite state machine has a register fifo with a 16 bit SPI slave.  The state machine to process SPI commands operates at 96MHz (8x the XO on the F board using a DCM) and takes multiple clocks to read or write a register.

The SPI interface timing is specified to support a 20MHz SPI clock (for FPGA timing and layout only, the BBB even with the PRU will not exceed 16MHz with 9MHz being more realistic).  The software is designed to clock 16 bits into the SPI register at the same time 16 bits are clocked out.  The format of a SPI word is below.
         -- Bit position
         -- 1111 1100 0000 0000
         -- 5432 1098 7654 3210
         -- Where:
         -- R = 1 => read
         -- W = 1 => write
         -- N = register selector
         -- V = 8 bit write value
The upper two bits select whether a read or write is conducted.  In both cases the next 6 bits identify the register to be operated on.  In the case of writes, the following 8 bits of value are written to the specified register.  In the case of a read, these 8 bits are ignored, and the specified register is read (16bits) and saved in an internal register.  On the next SPI access, while the new commands bits are being shifted in these 16 bits are being shifted out.

This might seem a bit counter intuitive, however, it keeps the finite state machine simple and extensible, focuses on single operation writes (8 bits of which is sufficient for my purposes) and allows streaming 16 bit reads with little overhead (e.g. for ADCs).  It also allows 2^6 = 64 registers to be defined. The first 15 registers are dedicated to common operations like: a) FPGA image and version identification, b) Debug counters, c) LED control/signaling, and d) the two GPIO ports.  The second set of 15 registers is dedicated to the application specific basics (for example the DDS frequency value).  A 16 bit read fifo is used as the last register available.


Saturday, April 30, 2016

Prj145 - BBB LX9 JTAG Boundary Scan Utilities (Part 4)

Previous posts walked through the overview, schematic, and fabrication of a fpga board for mixed signal use with a Beaglebone Black.  This post summarizes the boundary scan tools used to load the fpga image.

The JTAG boundary scan (JTAG for short here after) along with the DONE, INIT, and PROGRAM_B pins are accessible via BBB GPIO pins.  There are two utilities used to provide key functionality with these pins.
JTAG Boundary Scan Tools for Beagelbone Black LX9 Board
The first is Fxvc which is a virtual cable daemon based on software from Xilinx and tmbinc. This utility allows the Xilinx tool set to program and interrogate the fpga without a hardware cable. Within iMPACT, you select a loadable module under cable setup and supply:

xilinx_xvc host= disableversioncheck=true

The tool then uses the network connection to conduct all JTAG operations.  The code is factored into two components: the general server which handles network transactions and the board specific portion which turns JTAG operations into pin level settings.  The tool was developed using a JTAG device simulator and was actually a really insightful exercise in understanding JTAG boundary scan.

The second tool is Fxsvf which is an embedded SVF player.  SVF is a way to express JTAG operations in a text file, while XSVF is a Xilinx binary form of SVF which results in more compact files.  Again, the application is broken into two components, a portion which handles the reading and parsing of an XSVF file and a portion which is board specific and sets the JTAG pins appropriately. The general part comes from the Xilinx XAPP058.  Due to licensing, this portion is currently not open source and can only be obtained by registering with Xilinx.  For this reason, the general XSVF player portion is treated as an installed library that you link the board specific pin manipulation code against to produce the final application.

One of the down sides to the current approach is the performance.  Manipulating GPIO pins from user space with the sysfs interface is quite slow (but simple).  I knew this going in but underestimated the convenience of being able to just attach to the fpga JTAG interface by running an application and having an Ethernet connected (which is always the case for my BBB work).  Not having to drag out yet another cable is really nice.

One of the issues I encountered was getting the ISE 14.7 tools to properly program the flash.  This process is what Xilinx calls indirect programming.  It involves loading a fpga image via JTAG that can manipulate the flash SPI pins via the JTAG interface.  This would not work for me.  At first I suspected a problem with my layout of the flash SPI, then I suspected a fabrication error, then I investigated Fxvc errors.  Eventually I ended developing my own utility Fflash to access the SPI flash and found no problems.  I found a couple of data points indicating the ISE tools sometimes have issues with SPI flash access (e.g. my identical issue - ID check failing, however, the workaround failed to solve my problem).  Given this along with the support state of ISE, I decided to abandon this approach and just work with my own flashing utility.  This is less of an issue than I first thought since my general use model is to load an FPGA image with iMPACT while debugging and then once the image is finalized save a copy and flash it.

The process involves first loading via Fxsvf a fpga image which directly connects the host SPI pins to the flash SPI pins. The Fflash utility then programs the SPI flash using the host SPI lines.  When using the PRU interface to the host SPI pins this is extremely fast - about 3 seconds to erase the device (device limited) and less than a second to program and verify the image into the flash.

Thursday, April 14, 2016

Prj 145 - BBB LX9 FPGA Board Construction (Part 3)

This post captures a few notes on the fabrication of a BeagleBone Black minimal FPGA board. Previous posts covered the block diagram and schematic.  The board is a 2 layer OSH Park order at roughly 3" x 2".  One of the differences with this board is that I used 0603 resistors and capacitors for density reasons rather than 0805's I normally use.  I have used these in the past in a limited capacity.  The mechanics of mounting these are no different, however, I did find that the smaller parts slowed me down.  In the end, I think it was worth it as there were a couple of places where the 0805's would have made the board layout more difficult. Beyond the this, and the TQG package, there is nothing too challenging about this build.
Spartan 6 LX9 Board.

LX9 Board Mounted to Beagel Bone Black
This was the first time I had used a TQG package, so I was a little nervous about how it would turn out.  I have gotten reasonably good at working with 0.5mm pitch QFNs but only in the 40 pin range. Airgunning the QFNs works really well and they self align nicely if you get the solder paste application right.  I only have a jewelers loop not a microscope so manual alignment was a concern. My attempt on the first version of this board used an airgun.  This was not a good idea.  The problem with this is the shear area - its 22mm X 22mm.  It took forever to get the paste to melt and I had a hard time evenly distributing the hot air around the perimeter of the part.  There are hoods for air guns (which I do not have).  The board above used manual placement with a soldering iron.  I tacked down a pin on one corner, inspected, and then tacked down a pin on the opposing corner.  This was followed up with running a solder bead down each side and then wicking off the excess solder (you can see the flux residue from this around the part).  This worked out extremely well and was simple to do.  The picture below captures a closeup of the end result.
Closeup of hand soldering and alignment of TQG-144.
The only issue with the technique is that if too much solder is applied it tends to walk up the knee of the pins where it creates shorts with adjacent pins.  This high in the knee makes it difficult to wick off. I found that inspecting all of the pins from three different angles (front on, top angled left, and top angled right) allowed me to catch all instances of this.

Prj 145 - Beagle Bone Black Simple LX9 FPGA board (Part 1)
Prj 145 - BBB LX9 FPGA Board Design (Part 2)
Prj 145 - BBB LX9 FPGA Board Construction (Part 3)
Prj 145 - BBB LX9 JTAG Boundary Scan Utilities (Part 4)
Prj 145 - BBB LX9 C++ and VHDL (Part5)

Thursday, March 24, 2016

Prj 145 - BBB LX9 FPGA Board Design (Part 2)

This post summarizes the schematic of a BeagleBone Black FPGA board (overview and block diagram).  This is a minimal FPGA board intended for use with ADCs and DACs so it does not include complicating aspects like DRAM, HDMI, or high speed serial.  Having said this, Xilinx UG-380 does a good job describing the JTAG boundary scan interface, SPI flash interface and configuration interfaces (done, program_b, M0, M1).  You just have to read and study it quite a bit.  The only subtle aspect is the power-on/reset sequencing.

I wanted to use the +5V directly from the unregulated BBB input to allow for higher current draw of this board and daughter boards.  The power on sequence of the LX9 is well defined with high-Z IO pins.  It is the reset and shutdown sequence that has to be addressed.  In this scenario, the BBB and the FPGA board can be powered on, loaded, and operating normally when the user provides a shutdown command to the BBB linux which disables the on board regulated supplies but leaves the +5V unregulated input on (i.e. the wallwart is plugged in but a shutdown has been issued and the BBB is off).  In this case, the LX9 could be left powered up, configured and driving the SPI and JTAG pins while the ARM on the BBB is powered off.  To avoid this problem and allow use of the unregulated supply, a high side switcher is used to control the +5V with the enable being the BBB 3.3V regulated supply.

There are 32 IO pins at one edge of the board with a ground every 4 pins.  The very end provides +5 with 4 pins.  This nicely fits within a 2x23 header (the same used by the BBB).  Since additional pins are available those are brought out to two 2x5 headers.  These are the same used on the I board and include +5V power.  This allows interfacing with other boards using a low speed SPI or I2C.

Schematic Page 1 - LX9, Connectors, Flash, Clock.

Schematic Page 2 - Power and Bypass Capacitors

Screenshot of PCB Layout

The KiCad source material along with pdf of schematic and zip of Gerbers is available here.

The overall cost sourcing from OSHPark and Digikey is $12 for the PCB (quantity 3), LX9 $18 (quantity 1), connectors $3, oscillators, and regulators $3,  Flash and passives $3 for a rough total of $39 per board.

Prj 145 - Beagle Bone Black Simple LX9 FPGA board (Part 1)
Prj 145 - BBB LX9 FPGA Board Design (Part 2)
Prj 145 - BBB LX9 FPGA Board Construction (Part 3)
Prj 145 - BBB LX9 JTAG Boundary Scan Utilities (Part 4)
Prj 145 - BBB LX9 C++ and VHDL (Part5)

Friday, March 11, 2016

Prj 145 - Beagle Bone Black Simple LX9 FPGA board (Part 1)

One of my objectives is still to have a simple and low cost FPGA board to use with an ADC or DAC of intermediate sampling rate (10MSPS – 60MSPS).  There are a number of good off the shelf FPGA boards available to support this (for instance).  After looking at several I kept coming back to the concern that the I/O configuration was just a little less than optimal for what I wanted to do.  For me, I only need one or two FPGA boards, while I pay per square inch for each analog board.  If the headers are wide and offset, then it takes another 3x2 inches I do not really need just to mate the analog and FPGA board.  This then ends up costing 6sq. inches x $5/sq. inch for each analog design in dead area. Said differently, if I could just get the I/O in a slightly different configuration… (this appears to be a costly and slippery slope akin to telling yourself that your current house would be just fine if you only had one more room).

Given the IO desire and a better understanding of what could fit in a Spartan6 LX9 along with what I did and did not require for additional hardware, I decided to try a minimalist hand solderable FPGA board.  The LX9 is the largest part available in a TQG-144, beyond that its BGAs.  To keep things simple all of the IO is 3.3V with a minimal set.  There are 32 pins with a ground every 4 pins in a single header at one edge of the board.  A  SPI and JTAG boundary scan ports are at the other edge of the board.  This configuration allows the mounting on a BeagleBoneBlack and keeping analog daughter card growth at one end of the board.

Notional Board Stack Up of BBB, FPGA Board, and Mixed Signal Board.
Since there are plenty of IO pins (and in an accessible spot) there is also room to add 2 ports of 6 pin GPIO at the end of the board.  These would match the pin-out used in previous I board peripherals and include a +5V supply.  A few additional BBB GPIOs are needed to address the LX9 reset and control pins (DONE, INIT, PROGRAM_B).  A block diagram of the interface configuration is shown below.
BeagleBoneBlack-FPGA Board Interface
One of the key points is that the SPI interface is on BBB pins which are accessible via PRU0.  This is important since the PRUs can be used to obtain higher performance SPI interfaces than the built in hardware units (examples herehere, and here).  While the basic SPI is 4 pins (SCLK, SS, MISO, MOSI) an additional 2 pins were dedicated to provide a 2x SPI (i.e. MISO0, MISO1, MOSI0, MOSI1). I would dedicate more, however, there are virtually no more easily accessible PRU mappable GPIO pins available on P9.  P8 could be used, however, P9 has the +5V from the main power so this would require pins (and obstructions) on both ends of the FPGA board and subsequently the stacked mixed signal board.

Prj 145 - Beagle Bone Black Simple LX9 FPGA board (Part 1)
Prj 145 - BBB LX9 FPGA Board Design (Part 2)
Prj 145 - BBB LX9 FPGA Board Construction (Part 3)
Prj 145 - BBB LX9 JTAG Boundary Scan Utilities (Part 4)
Prj 145 - BBB LX9 C++ and VHDL (Part5)

Prj 141 - Spartan6 LX9, ADC, and BBB (Part1)

Saturday, February 20, 2016

AdcHttpd - HTML5/Browser Based Signal Analysis Tool

As part of the X board/DDC efforts I decided to redo the adc tool I was using.  The original set used a java application on the PC and a server on the BBB.  The java application provided controls and presentation of the data and processing of the samples.  This worked fine but each time I went to take a measurement I needed the PC application.  There also seemed to be a lot of interface work in both the java client and on the server every time I wanted to make a small change (e.g. particularly for controlling the acquisition and hardware configuration).  Previous experiments with javascript/browsers indicated that network and processor performance and loading would be sufficient to support the intended use.  There are two advantages to this approach in my mind: a) there is no client application, just a browser and b) the interfacing software is simpler and better localized by using JSON which provides good support for serialize/deserialization with javascript software and is easy to produce/parse within a C++ application.

The following image shows the browser presentation.
Chrome with AdcHttpd on a BeagleBone Black with 16 bit DDC data from a LX9

The main area is a HTML5 canvas which is drawn using a basic graphing toolkit I developed for instrumentation purposes.  The buttons at the bottom are HTML buttons with javascript actions.  The first row allows presentation changes (envelope, peak picking, user markers, storing a memory, changing Y limits).  The second row controls server side parameters such as the function to perform, in this case PSE or power spectral estimation, the number of points to produce, the number of averages to use, the channel to select and the LO frequency used in down conversion.  The channel is the tap point referenced in the DDC.  The final row sets the state to run or not run, or resets  various pieces of software.  The line above the graph also includes key parameters in effect along with the date and status of the connection to the http server (it turns red and indicates not connected if a threshold of failed server requests is reached).

The numeric quantities that are free form input (e.g. F1(Hz) or LO mixer or the upper and lower graph limits) are entered via a javascript keypad.  The following screen capture shows the same channel or signal as the previous capture but this time in the time domain and with the keypad popped up to modify the Y limits.
Javascript Keypad Input (Time series in background of signal in previous figure)
The browser tab consumes about 10% of a CPU core on an older PC (AMD 1.8GHz quad core) and about 1% load on the Ethernet interface.  The BeagleBone is at 98% running the http server.  The BBB limits the update rate based on its ability to process the samples (fftw for ARM is used for best performance).

There are several components to the internals of the browser code, http server code, hardware model, and device code on the BeagleBone Black.  The basic model is that the server is the definitive controller of state and operations while the browser code simply requests changes and conducts presentation.  At the highest level the following sequence diagram summarizes the interactions of the application.

Browser-AdcHttpd Summary Sequence Diagram 
The Browser lifeline captures the javascript code behavior executing within the browser while the AdcHttpd lifeline captures the software executing on the BeagleBone Black.  There are only two events handled by the javascript code: a timer and user input.  The timer forces a sequence of events which boil down to getting state and data from the server and drawing a plot of the current data.  All information is obtained from the server via http GET requests.  The results coming from the server are JSON text representing application variables.  The diagram is slightly off in that the responses coming back are processed asynchronously to the request.  There is additional logic to keep the state requests at a lower request rate than the requests for data.  In addition, there is logic to adjust the polling rate based upon the operational state of the instrument.  Quantities in the overall state include things like channel number to process, run/stop state, mixer frequency.  The data coming from the server is just a x,y list of points to be plotted on the graph.

The other events processed by the browser javascript are user inputs.  This ends up taking two forms: a) those which change server state and are sent directly to the server, and b) those which change local browser application state and can be processed without interaction with the server.  Examples of the former include run/stop state or channel number and examples of the later include graphing x and y limits, peak picking enable/disable, and markers on/off.

The server is more involved and is responsible for not only acquiring the data and processing it but also presenting it in JSON format to the client.  The following sequence diagram summarizes the key server components and interactions.

Summary Sequence Diagram of AdcHttpd Internals
There are only two autonomous lifelines here by design.  The first is the http server thread that interacts directly with the client and processes the http GET requests.  The second is the hardware model thread which drives processing of data and hardware interactions.  The only point the two threads interact is the hardware data model.  The interactions are designed to require minimal locking and interaction between the two autonomous threads.  This keeps things conceptually simple and decouples the browser client and its interactions and requests from the core hardware servicing and operations.  The hardware object model contains the current and desired operation parameters along with the most recent XY data to be presented in the browser application.

The hardware model thread checks various operating parameters based on their current value in the memory shared with the http thread.  If changes have occurred, the hardware model thread calls the appropriate device object methods to change the operating parameters.  Examples include which channel is being processed/exported by the hardware and the frequency of the downconversion local oscillator.  After a check on current parameters the hardware model thread invokes the ProcessCoherentInterval() on one of the processing objects.  There are processing objects for each of the main types of processing: power spectral estimation, time series, and histogram.  The product of processing a coherent interval is a set of XY points which is updated in the hardware model (and can then be accessed by the remote client via the main http thread). The hardware model thread continually loops on this processing until a reset is conducted.

Each of the processing objects deals with a coherent interval a little differently.  The key one is the power spectral estimation processing.  Since the data rates of the selected ADC channel can exceed the processing capability (indeed the network transport capacity), a processing interval is treated as a flush of current ADC data followed by the acquisition of one or more sets of 2k samples.  The collected data is treated as a coherent set and in the case of spectral estimation has a window applied along with an FFT and magnitude and normalization applied to produce the XY results.  In the case of low sample rate channels, 16k or more samples can be treated coherently since successive Get2k calls retrieves contiguously sampled data.  For streams with high bandwidth, the user simply has to not configure power spectra estimates greater than 2k (if you do you just get phase discontinuities across 2k sample sets and see spectrum broadening).

The device object encapsulates different boards.  There are a common set of C++ ADC and Mixer interfaces required that each board exports.  The startup method for the device object selects which boards and interfacing software to use.  This can be done by configuration file, code  changes or reasoning on which device tree(s) are currently installed.

Tuesday, February 9, 2016

Prj 141 - DDC at 60MSPS (Part 8 - Final)

This if the final post walking through a digital down converter and ADC on a Beaglebone Black (Part1, Part2Part3Part4Part5Part6  and Part7).

All of the previous work was conducted with a 10MSPS clock and ADC.  The next step was to increase to 60MSPS.  The only changes required were updates to the DDC filters.  Due to the increased decimation some of the internal paths experienced bit growth and needed to be widened.  The internal test pattern generation capability within the fpga image allowed this to be worked through all of these changes.

Everything worked fine until an actual signal was applied.  At that point I had problems with digital values being read twice.  This shifts the IQ values by one every now and then which destroys the quadrature relationship. The transfer rate across the IDC was 100k samples per second with each sample being 16 bits.  The transfer rate (or down converted bandwidth) was dropped below 50kSPS which still did not fully rectify the problem.  Based on numerous experiments and trial and error I think the single ground pin (looking seen in schematics in this post) is causing problems with the SPI port voltage level sensing.  Using a single ground is a bit much to ask, I’m using the FPGA board in ways it was not designed for.  I believe that as the digital pins begin switching (changing ADC values) the current required through the ground pin increases.  A small residual resistance on the ground return with a large current can push the ground reference at the FPGA up.  This results in a lower voltage across the SPI digital inputs (i.e. between the IO pin input from the SPI clock or slave select to the FPGA ground reference value).  The final confirmation of this was getting error free results with the internally generated test pattern with an open analog input but the same setup began generating errors when a simple wire antenna was attached to the input.  I contemplated using a differential converter at the SPI headers but decided against it as I suspected additional problems at this sample rate and a single ground through 0.1 headers.

This project was a great non-trivial introduction to VHDL and could be broken up into purchased pieces (XuLA2) and simple unique analog designs (ADC board).  At this point I'm going to have to contemplate my options for alternate approaches.

Prj141 Schematic
Prj141 Overview
Prj141 Digital Down Converter
Prj141 Digital Interface
Prj141 Software
Prj141 Filter Design
Prj141 Filter Evaluation
Prj141 LX9 Utilization
Prj141 Higher Sampling Rates

Thursday, February 4, 2016

Prj 141 - DDC LX9 Utilization (Part 7)

As this was my first FPGA I had no feeling for what would and would not fit within an LX9. The design (see Part1, Part2Part3Part4Part5 and Part6 ) takes roughly 40% of the slices in an LX9 with 17% of the slice registers and 28% of the slice LUTs being used. A single DCM is used for internal logic resulting in a 25% utilization and only 4 of 16 DSP48’s are used for a 25% DSP utilization. A full report from ISE 14.7 is attached below.

Device Utilization Summary[-]
Slice Logic UtilizationUsedAvailableUtilizationNote(s)
Number of Slice Registers 2,037 11,440 17%
    Number used as Flip Flops 2,037
    Number used as Latches 0
    Number used as Latch-thrus 0
    Number used as AND/OR logics 0
Number of Slice LUTs 1,640 5,720 28%
    Number used as logic 1,355 5,720 23%
        Number using O6 output only 719
        Number using O5 output only 98
        Number using O5 and O6 538
        Number used as ROM 0
    Number used as Memory 126 1,440 8%
        Number used as Dual Port RAM 0
        Number used as Single Port RAM 0
        Number used as Shift Register 126
            Number using O6 output only 8
            Number using O5 output only 0
            Number using O5 and O6 118
    Number used exclusively as route-thrus 159
        Number with same-slice register load 151
        Number with same-slice carry load 8
        Number with other load 0
Number of occupied Slices 577 1,430 40%
Number of MUXCYs used 704 2,860 24%
Number of LUT Flip Flop pairs used 1,931
    Number with an unused Flip Flop 288 1,931 14%
    Number with an unused LUT 291 1,931 15%
    Number of fully used LUT-FF pairs 1,352 1,931 70%
    Number of unique control sets 66
    Number of slice register sites lost
        to control set restrictions
175 11,440 1%
Number of bonded IOBs 25 186 13%
    Number of LOCed IOBs 25 25 100%
Number of RAMB16BWERs 5 32 15%
Number of RAMB8BWERs 2 64 3%
Number of BUFIO2/BUFIO2_2CLKs 0 32 0%
Number of BUFIO2FB/BUFIO2FB_2CLKs 0 32 0%
Number of BUFG/BUFGMUXs 3 16 18%
    Number used as BUFGs 3
    Number used as BUFGMUX 0
Number of DCM/DCM_CLKGENs 1 4 25%
    Number used as DCMs 0
    Number used as DCM_CLKGENs 1
Number of ILOGIC2/ISERDES2s 0 200 0%
Number of IODELAY2/IODRP2/IODRP2_MCBs 0 200 0%
Number of OLOGIC2/OSERDES2s 0 200 0%
Number of BSCANs 0 4 0%
Number of BUFHs 0 128 0%
Number of BUFPLLs 0 8 0%
Number of BUFPLL_MCBs 0 4 0%
Number of DSP48A1s 4 16 25%
Number of ICAPs 0 1 0%
Number of MCBs 0 2 0%
Number of PCILOGICSEs 0 2 0%
Number of PLL_ADVs 0 2 0%
Number of PMVs 0 1 0%
Number of STARTUPs 0 1 0%
Number of SUSPEND_SYNCs 0 1 0%
Average Fanout of Non-Clock Nets 3.37

Prj141 Schematic
Prj141 Overview
Prj141 Digital Down Converter
Prj141 Digital Interface
Prj141 Software
Prj141 Filter Design
Prj141 Filter Evaluation
Prj141 LX9 Utilization
Prj141 Higher Sampling Rates

Wednesday, January 20, 2016

Prj 141 - DDC Filter Evaluation (Part 6)

The previous post walked through the design of the DDC filter and this note covers the evaluation and measurement of that filter. The following are links for Part1, Part2Part3Part4 and Part5.

The DDC filter was evaluated by updating Xutil (here) to scan the filter response.  This is accomplished by using the built in test pattern generator used to generated DDC input data having a 1.25MHz square wave [ 1/8th Fs ]. The DDS internal frequency was swept from this to +500kHz.  At each step 2k samples are collected, have an FFT done on them and the peak is picked across the spectrum.  These values are normalized to the first response (at DC) and converted to dB.  The following figure shows those results compared to Octave calculations. 

Firmware 0603 DDC Filter Response Measured vs. Predicated.  See text for measurement technique used.
The measured values agree quite well with the predicted response.  The passband noise floor being somewhat high is suspected to be due to the fact that peaks were picked in the response and the input is not a true tonal input rather a square wave. This would result in harmonics being present across multiple Nyquist regions which may end up being selected as a peak in the cases where the fundamental response is quite low.  In addition there is quantization between the CIC and CFIR stage as well as at the CFIR output stage (all to signed 16 bits) and the input is only 12 bits.  Due to all of these I am not concerned about the stop band region.  The peak responses in the 2nd and 3rd Nyquist regions align well with predicted and indict there are no missing factors of 2 or pi lying around anywhere in the modelling or implementation. 

The same setup was used to evaluate ripple in the passband and is shown below.

Firmware 0603 DDC Filter Passband Ripple Evaluation Using FPGA Internal Test Pattern Generator.
Fc/R = 0.25 is the CFIR cutoff frequency.  The design backed this up to Fc/R=0.225 to provide some margin.  This is the source of the tail of the response prior to 50kHz.  

The final part of the evaluation is to use a real analog input rather than the internal test pattern generator.  Those results are shown below.

Firmware 0603 DDC Measured Response with Signal Swept Through Passband
The figure was generated by using an external analog input (DDS filtered through a 2x 10.7MHz ceramic filter).  This was stepped in 100Hz increments on the left and about 1kHz on the right.  The envelope history was enabled and produces the green line.  The subtle level variance is due to the analog input variance.  This agrees quite well with the predicted and test pattern generator measured response including the small side lobe in the 49kHz region.

Prj141 Schematic
Prj141 Overview
Prj141 Digital Down Converter
Prj141 Digital Interface
Prj141 Software
Prj141 Filter Design
Prj141 Filter Evaluation
Prj141 LX9 Utilization
Prj141 Higher Sampling Rates