06.05.2014
08:59:17
Summary
This
document summarises requirements, design changes post FDR, production issues,
and plans. Also test measurements are summarised. This summary might be
particularly useful to the reader acquainted with the L1Topo prototype already.
For details see other documents provided for the PRR, in particular the L1Topo
specifications document which has been updated wrt
the FDR reviewed version.
Requirements
There are a
few hard requirements against which the L1Topo module needs to be assessed. The
module has to meet the target data rates at the external interfaces: 6.4 Gb/s
(TDR baseline) on the optical real-time I/O (MGT links), 160MB/s standard
RO-link operation out to the DAQ and ROI (bi-directional), 80 Mb/s on the
electrical LVDS links into the CTP. Links have been successfully tested (tests
going on), see summary below and separate documents.
Total link
bandwidth and logic resource availability are fixed by the components chosen
(largest devices on the market at time of detailed design work) and cannot be
changed. The availability of resources was checked against the needs of the
physics algorithms. The algorithms easily fit available resources on a total of
two L1Topo modules, i.e. four processor chips. The detailed assessment is
documented in a separate paper.
The ATLAS
latency envelope for Phase 1 (documented in EDMS at ATL-DA-ES-0059) includes
requirements to be met by L1Topo from 2015. For the total MGT latency (RX and
TX) 4 LHC bunch ticks are available. Measurements have shown that this
requirement is met. The results are consistent with the manufacturer’s
datasheets. Deserialization into the LHC bunch clock domain takes one bunch
tick, as required according to the envelope document. The algorithmic latency
is required to be five bunch ticks or lower. All algorithms projected and
implemented so far have been kept to three bunch ticks or below. CRC check sum
decoding had not been accounted for in the EDMS document (0.25 bunch ticks) but
that’s easily covered by spare algorithm latency. See separate document on menu
and algorithms.
Real-time I/O mapping
The
baseline fibre scheme relies on supplying all four processor FPGAs (two
modules, two FPGAs each) with separate copies of the input data, as required.
That avoids data re-transmission on L1Topo and associated latency penalties.
The cluster
and tau CMXes generate four copies of all TOBs, 6
fibres worth of data each. The jet CMXes generate 3
copies of all TOBs, 8 fibres worth of data each. So as to supply four topo processor FPGAs, one set of data will have to be split
optically. The energy sum data per CMX consist of four copies of two fibres
each. The Muon signals run on 3 fibres, duplicated at source, split two-fold with
optical fibre splitters.
Electrical output
to the CTP is accomplished by 32 differential lanes (LVDS) from each L1Topo
module. At the baseline rate of 80Mb/s that allows for 64 trigger / overflow bits
per module. Optical output at 6.4 Gb/s will allow for
128 trigger bits per fibre, and is limited by CTP capabilities only.
Module production
There had
been massive issues during initial production of L1Topo prototype. The module
production was delayed considerably. The causes have been investigated. The PCB
production process as such had failed to run smoothly and had been incomplete.
The micro vias used had not been filled and led to
faulty solder joints in the assembly process. The module had to undergo a
re-work and it took the assembly company one month to deliver a partially
functional module. The connectivity issues led to further significant delays in
lab tests.
The 2nd
round of PCB production (same manufacturer) had been successful with a
production time of 25 working days. For module assembly a company different
from the initial one was chosen (prodesign) and assembly was accomplished
within 3 working days. No faults have been observed on the assembled module.
For the
production modules prodesign will act as a one stop shop. Total production time
25 working days (PCB and assembly).
Module tests
The
production modules will be subjected to rigorous tests, as reported for the
prototype. After initial “smoke tests” the modules will be boundary scanned,
the high speed links will be iBERT tested for data
eye width and bit error rate and then the full module tests with production
firmware and final data formats will follow. The latter tests, comprising all
high speed interfaces (including DAQ/ROI) will take place at CERN (bldg. 104
and P1). Please note that initial link tests will for the production modules be
performed at 12.8Gb/s (phase-1 jFEX rate), whereas the full module tests can be
done at the pre-phase1 target bit rate of 6.4Gb/s only.
Design changes
There have
been several reasons for changes of the L1Topo design on its evolution to the
production module. Several cases have been made by the reviewers in the FDR. In
total the L1Topo design team have tried to keep the design modifications as low
as possible.
The MGT
clocking scheme has been amended so as to allow for each MGT quad to be driven
by each MGT clock. Thus signal integrity can be guaranteed up to highest bit
rates, by avoiding internal quad crossing. On the other hand that modification
has reduced the number of independent clocks, since there are only two separate
clock paths from the pins into each quad. The compromise found is a total of
four independent clock trees, two sub-trees each can be joined together via low
jitter CML logic devices for default operation of one crystal clock and one
low-jitter LHC bunch clock directly into each quad. The plethora of LHC clock
multiples anticipated for the prototype is now not considered necessary any
more, due to the successful operation of L1Topo prototype at 12.8 Gb/s, which suggests this rate as a straightforward upgrade
path after run2. This bit rate can coexist with 6.4 Gb/s
target rate on the same clock tree.
For the
global clocks into the FPGA fabric provision was made for the connectivity
required to insert jitter reduction hardware, should that ever be required.
The
SystemACE configuration scheme had been questioned for reason of component
obsolescence. The L1Topo design team has decided to stick to SystemACE for run
2, but move the configurator onto a separate mezzanine. The mezzanine carries
all connectivity to replace the SystemACE by either a JTAG or serial (CCLK/DIN)
based configurator. The SPI memory is kept on the mainboard as documented for
the prototype.
As
presented by the L1Topo designers at the FDR, phase-1 compatibility regarding
clocking and readout (DAQ/ROI) is being achieved by routing the ATCA base
interface (2*4 lanes) and the fabric interface (2*8 lanes) onto the extension
mezzanine, which will be replaced if an upgrade to the phase-1 L1Calo scheme is
required. The L1Calo clock/readout scheme is described in the PDR documents
that have been/will be written for eFEX, jFEX, hub, and ROD modules. The now
confirmed phase-1 compatibility scheme has allowed for the removal of some
spare readout/clock circuitry available on prototype (SFPs). The use of up to
12 channels of optical fibres of S-Link output from the embedded ROD (default
is 160MB/s standard) is not affected by this design modification.
The Zynq
based Mars module has been considered unnecessary (already at FDR time) and was
consequently removed.
Several
issues have come up during L1Topo prototype tests, most of them very minor
ones. Some additional very minor changes were made for convenience only.
Due to a
design error, a bank of wrong bank type (HR rather than HP) had been used for LVDS
links between Virtex processors and Kintex control FPGA. That has affected
maximum data rate on the ROD links (pre-phase1 embedded ROD). The error has
been corrected for the production module.
The power
regulators for the MGT supply voltages were suspected to have too low margins
for operation at highest speed on all channels concurrently. Also the noise on
the power distribution (ripple from the switcher) was found in need of
improvement. A power distribution mezzanine was built for test purposes and
successfully run on the prototype. The scheme is now being moved onto the
production mainboard. The main power brick pin-out was modified.
Bandwidth
between the processor FPGAs and the control FPGA has been increased by routing
spare MGT transmitter outputs onto the extension module, where spare bandwidth
into the Kintex control FPGA is available. That will improve connectivity on
the ROD and control (IPBus) paths.
Minor
modifications have been applied to the I2C based environmental monitoring, so
as to ease IPMC firmware design. Minor changes were made on the optical TTC
input. Serial numbering was added.
Prototype tests
The
prototype modules have been used to test the performance of the high speed
communication channels, up to data rates of 12.8 Gb/s in a realistic
environment, with realistic fibre lengths and configurations. The 1st
L1Topo prototype is equipped with FPGAs XC7VX485T-2. This device type supports
data rates (in terms of LHC clock multiples) up to 10.2Gb/s.
The module has been successfully operated up to these rates in standalone link
tests. The results are documented elsewhere. Currently system tests are being
conducted at CERN with the Muon subsystem and the L1Calo CMX modules at the
target rate of 6.4 Gb/s. First results on these
measurements will follow.
The 2nd
L1Topo prototype is currently under test in the home lab. The XC7VX690T-3
device mounted on the board supports (in terms of LHC bunch clock multiples) up
to 12.8 Gb/s. Along with a suitable speed grade of the Avago MiniPOD device attached,
the circuit has been successfully tested up to this rate. Standalone link tests
have been and will be conducted up to these rates to allow for some headroom,
and for a possible link speed upgrade after ATLAS run 2.
So far, bit
error rates at 12.8 Gb/s have been measured down to 10-16.
Not a single error has been observed. Power dissipation tests are to be
continued, initial results suggest a power consumption consistent with the
Xilinx power estimator spreadsheet. Approximately 10A MGT supply current are
expected with a maximum availability of 16A from the regulators. The current
limit for the Vccint regulator is 20A. It should be
noted that the estimates and measurements were made for bit rates of 12.8 Gb/s,
a regime way beyond the TDR baseline that might only possibly be reached by the
time of Phase-1 or Phase-2 upgrade.
Change log
May 01, 2014 updated BER @ 12.8Gb/s
May 02, 2014 updated power figures
May 05, 2014 added section on
real-time input mapping
May 06, 2014 added CTP signals