06.05.2014 08:59:17

 

Summary

This document summarises requirements, design changes post FDR, production issues, and plans. Also test measurements are summarised. This summary might be particularly useful to the reader acquainted with the L1Topo prototype already. For details see other documents provided for the PRR, in particular the L1Topo specifications document which has been updated wrt the FDR reviewed version.

Requirements

There are a few hard requirements against which the L1Topo module needs to be assessed. The module has to meet the target data rates at the external interfaces: 6.4 Gb/s (TDR baseline) on the optical real-time I/O (MGT links), 160MB/s standard RO-link operation out to the DAQ and ROI (bi-directional), 80 Mb/s on the electrical LVDS links into the CTP. Links have been successfully tested (tests going on), see summary below and separate documents.

Total link bandwidth and logic resource availability are fixed by the components chosen (largest devices on the market at time of detailed design work) and cannot be changed. The availability of resources was checked against the needs of the physics algorithms. The algorithms easily fit available resources on a total of two L1Topo modules, i.e. four processor chips. The detailed assessment is documented in a separate paper.

The ATLAS latency envelope for Phase 1 (documented in EDMS at ATL-DA-ES-0059) includes requirements to be met by L1Topo from 2015. For the total MGT latency (RX and TX) 4 LHC bunch ticks are available. Measurements have shown that this requirement is met. The results are consistent with the manufacturer’s datasheets. Deserialization into the LHC bunch clock domain takes one bunch tick, as required according to the envelope document. The algorithmic latency is required to be five bunch ticks or lower. All algorithms projected and implemented so far have been kept to three bunch ticks or below. CRC check sum decoding had not been accounted for in the EDMS document (0.25 bunch ticks) but that’s easily covered by spare algorithm latency. See separate document on menu and algorithms.

 

Real-time I/O mapping

The baseline fibre scheme relies on supplying all four processor FPGAs (two modules, two FPGAs each) with separate copies of the input data, as required. That avoids data re-transmission on L1Topo and associated latency penalties.

The cluster and tau CMXes generate four copies of all TOBs, 6 fibres worth of data each. The jet CMXes generate 3 copies of all TOBs, 8 fibres worth of data each. So as to supply four topo processor FPGAs, one set of data will have to be split optically. The energy sum data per CMX consist of four copies of two fibres each. The Muon signals run on 3 fibres, duplicated at source, split two-fold with optical fibre splitters.

Electrical output to the CTP is accomplished by 32 differential lanes (LVDS) from each L1Topo module. At the baseline rate of 80Mb/s that allows for 64 trigger / overflow bits per module. Optical output at 6.4 Gb/s will allow for 128 trigger bits per fibre, and is limited by CTP capabilities only.

 

Module production

There had been massive issues during initial production of L1Topo prototype. The module production was delayed considerably. The causes have been investigated. The PCB production process as such had failed to run smoothly and had been incomplete. The micro vias used had not been filled and led to faulty solder joints in the assembly process. The module had to undergo a re-work and it took the assembly company one month to deliver a partially functional module. The connectivity issues led to further significant delays in lab tests.

The 2nd round of PCB production (same manufacturer) had been successful with a production time of 25 working days. For module assembly a company different from the initial one was chosen (prodesign) and assembly was accomplished within 3 working days. No faults have been observed on the assembled module.

For the production modules prodesign will act as a one stop shop. Total production time 25 working days (PCB and assembly).

 

Module tests

The production modules will be subjected to rigorous tests, as reported for the prototype. After initial “smoke tests” the modules will be boundary scanned, the high speed links will be iBERT tested for data eye width and bit error rate and then the full module tests with production firmware and final data formats will follow. The latter tests, comprising all high speed interfaces (including DAQ/ROI) will take place at CERN (bldg. 104 and P1). Please note that initial link tests will for the production modules be performed at 12.8Gb/s (phase-1 jFEX rate), whereas the full module tests can be done at the pre-phase1 target bit rate of 6.4Gb/s only.

 

Design changes

There have been several reasons for changes of the L1Topo design on its evolution to the production module. Several cases have been made by the reviewers in the FDR. In total the L1Topo design team have tried to keep the design modifications as low as possible.

The MGT clocking scheme has been amended so as to allow for each MGT quad to be driven by each MGT clock. Thus signal integrity can be guaranteed up to highest bit rates, by avoiding internal quad crossing. On the other hand that modification has reduced the number of independent clocks, since there are only two separate clock paths from the pins into each quad. The compromise found is a total of four independent clock trees, two sub-trees each can be joined together via low jitter CML logic devices for default operation of one crystal clock and one low-jitter LHC bunch clock directly into each quad. The plethora of LHC clock multiples anticipated for the prototype is now not considered necessary any more, due to the successful operation of L1Topo prototype at 12.8 Gb/s, which suggests this rate as a straightforward upgrade path after run2. This bit rate can coexist with 6.4 Gb/s target rate on the same clock tree.

For the global clocks into the FPGA fabric provision was made for the connectivity required to insert jitter reduction hardware, should that ever be required.

The SystemACE configuration scheme had been questioned for reason of component obsolescence. The L1Topo design team has decided to stick to SystemACE for run 2, but move the configurator onto a separate mezzanine. The mezzanine carries all connectivity to replace the SystemACE by either a JTAG or serial (CCLK/DIN) based configurator. The SPI memory is kept on the mainboard as documented for the prototype.

As presented by the L1Topo designers at the FDR, phase-1 compatibility regarding clocking and readout (DAQ/ROI) is being achieved by routing the ATCA base interface (2*4 lanes) and the fabric interface (2*8 lanes) onto the extension mezzanine, which will be replaced if an upgrade to the phase-1 L1Calo scheme is required. The L1Calo clock/readout scheme is described in the PDR documents that have been/will be written for eFEX, jFEX, hub, and ROD modules. The now confirmed phase-1 compatibility scheme has allowed for the removal of some spare readout/clock circuitry available on prototype (SFPs). The use of up to 12 channels of optical fibres of S-Link output from the embedded ROD (default is 160MB/s standard) is not affected by this design modification.

The Zynq based Mars module has been considered unnecessary (already at FDR time) and was consequently removed.

Several issues have come up during L1Topo prototype tests, most of them very minor ones. Some additional very minor changes were made for convenience only.

Due to a design error, a bank of wrong bank type (HR rather than HP) had been used for LVDS links between Virtex processors and Kintex control FPGA. That has affected maximum data rate on the ROD links (pre-phase1 embedded ROD). The error has been corrected for the production module.

The power regulators for the MGT supply voltages were suspected to have too low margins for operation at highest speed on all channels concurrently. Also the noise on the power distribution (ripple from the switcher) was found in need of improvement. A power distribution mezzanine was built for test purposes and successfully run on the prototype. The scheme is now being moved onto the production mainboard. The main power brick pin-out was modified.

Bandwidth between the processor FPGAs and the control FPGA has been increased by routing spare MGT transmitter outputs onto the extension module, where spare bandwidth into the Kintex control FPGA is available. That will improve connectivity on the ROD and control (IPBus) paths.

Minor modifications have been applied to the I2C based environmental monitoring, so as to ease IPMC firmware design. Minor changes were made on the optical TTC input. Serial numbering was added.

 

Prototype tests

The prototype modules have been used to test the performance of the high speed communication channels, up to data rates of 12.8 Gb/s in a realistic environment, with realistic fibre lengths and configurations. The 1st L1Topo prototype is equipped with FPGAs XC7VX485T-2. This device type supports data rates (in terms of LHC clock multiples) up to 10.2Gb/s. The module has been successfully operated up to these rates in standalone link tests. The results are documented elsewhere. Currently system tests are being conducted at CERN with the Muon subsystem and the L1Calo CMX modules at the target rate of 6.4 Gb/s. First results on these measurements will follow.

The 2nd L1Topo prototype is currently under test in the home lab. The XC7VX690T-3 device mounted on the board supports (in terms of LHC bunch clock multiples) up to 12.8 Gb/s. Along with a suitable speed grade of the Avago MiniPOD device attached, the circuit has been successfully tested up to this rate. Standalone link tests have been and will be conducted up to these rates to allow for some headroom, and for a possible link speed upgrade after ATLAS run 2.

So far, bit error rates at 12.8 Gb/s have been measured down to 10-16. Not a single error has been observed. Power dissipation tests are to be continued, initial results suggest a power consumption consistent with the Xilinx power estimator spreadsheet. Approximately 10A MGT supply current are expected with a maximum availability of 16A from the regulators. The current limit for the Vccint regulator is 20A. It should be noted that the estimates and measurements were made for bit rates of 12.8 Gb/s, a regime way beyond the TDR baseline that might only possibly be reached by the time of Phase-1 or Phase-2 upgrade.

 

Change log

May 01, 2014      updated BER @ 12.8Gb/s
May 02, 2014      updated power figures
May 05, 2014      added section on real-time input mapping
May 06, 2014      added CTP signals