Reviewers Questions/Comments and Designers Answers

Feb 01, 2013 9:53 added drawing of layer stackup and high speed routes

http://www.staff.uni-mainz.de/uschaefe/browsable/L1Calo/Topo-Proto/review-stackup.pdf

Will be updated as questions are coming in

Weiming / Weiming :

1. Jitter cleaner chip Si5326 (Page11). The datasheet says "The input to output skew is not specified". This means that the jitter cleaned clock would have a random phase after a power cycle. Since the trigger system needs fixed latency, this random phase could be a problem. Si5326 does have "digitally-controlled output phase adjustment" capability, but an automatic phase calibration function is needed to ensure the fixed latency after a power cycle. Is there any thinking/solution on this problem?

The jitter cleaned, multiplied clocks are used for MGT reference only. Inside the MGT they are again multiplied up and divided down. It can be suspected that phase relationship is therefore lost anyway. We expect the data that are sent out on the fibre (to CTP) therefore to have an arbitrary phase offset of up to one usrclk2 tick after any reconfigure of the FPGAs. If that assumption were wrong, or if any scheme were known to suppress that phase offset we’d appreciate to be told about that. Please note that baseline operation of the MGTs will be buffered mode (at low buffer depth), not phase aligned, due to difficulties experienced with the phase alignment scheme on Virtex-6 at high link count. “Buffer Bypass in Multi-Lane Auto Mode”, available in GTH transceivers only might be an interesting option for latency reduction wrt baseline, but that can be explored only once GTH devices are available. We are currently building a test adapter to explore the properties of Si5326 before the topo prototype going into production

I think a deterministic fixed low latency is a must for trigger system. I had though Virtex-7 MGT is capable of that, certainly we need more exploration in this area. I am currently testing TI CDCE62005 PLL, which, according to its datasheet, has a sync input and can guarantee a fixed phase relationship with a precision of ~1 ns.

We will need further discussion. On the prototype a different cleaner might possibly be mounted on the extension mezzanine, if required (need to check)

2. TTCDec Mezzanine card (Page7). It seems assumed that TTCDec cards are available. Is this true? How many spares does L1Calo still have?

Thanks for pointing out. At current we do have just some old models plus the modules plugged on the spare JEMs. I would suspect that availability of TTCRx might be the real problem, since another run of PCB production and assembly would be possible, in particular if RAL were to provide the Gerber files. Since the TTCdec design exists and the modules work well, it would probably be pointless to design something new. Anybody happen to know about availability ?

CERN has some spare TTCrx chips. But that is only for the maintenance of existing systems. So probably you can only steal some from L1Calo spares if any. So it's better to find out beforehand.

3. Connect the MGT clocks to the central quad of 3 quads (Page8). Although the user guide for virtex-7 transceivers allow for this usage, the user guide for virtex-6 transceivers forbid this usage for line rate above 2.8Gbps for the reason of additional jitter. I wonder if it is better to route reference clock to every MGT quad for best performance.

All quads are supplied with clocks, though on separate trees. Plan is to use crystal references on the receive part of the MGTs. Segmenting those trees does allow for various reference clocks in case of differing bit rates in various upstream modules. Baseline is jitter cleaned LHC clock multiples for transmitter references only. If there were any problems with that scheme on the prototype, we’d modify before production.

4. Use local POL linear regulators for MGT link supplies (Page18). I have used TI DC-DC switching power supplies for all the MGT links on RAL High-Speed Demonstrator (HSD), which work fine. I also measured the ripple noise on those DC-DC supplies to be less than ~5mV, which is well within the requirement (10mV) for MGT power supplies. So it is nice to have linear regulators but not absolutely necessary.

Error in documentation. Using switching supplies. What regulator model have you been using?

5. Power supply (Page 18). Power sequence is not mentioned. It worth to check it meets the requirement of Virtex-7.

Thanks for pointing out. We are sequencing, we have checked the documentation but should double check.

6. Place coupling capacitors close to source (Page19). The position of these capacitors doesn't matter. What matters is the parasitics of these capacitors. My advice is to use the smallest package as you can for these capacitors. On RAL HSD, we used 0201 capacitors and it work well up to 10Gbps.

Thanks. Would be good if we could learn about details. Did larger components fail your test? Any specific make of capacitor shown to be good/better?

7. Minimise cross talk by running buses as widely spread as possible (Page 19). This is a very general statement and not measureable in practice. Some quantitative numbers are needed to ensure the cross talk is under control in a high-density high-speed design. For example, on RAL HSD, I defined S/H>5 for stripline differential trace to reduce the crosstalk below 1%.

Always keen to learn, since we cannot simulate. We are using tightly coupled pairs on both the high speed and parallel links. The coupled length between adjacent high speed pairs is up to 6cm, on a few links it is 10cm. We are using strip lines only, distance to planes is asymmetric, 100µm and 140µm. Trace width is 80µm, in pair distance is 140µm, pairs are separated by 400µm trace edge to trace edge. However, we are looking into possible improvement. No single ended aggressors on same plane.

Crosstalk control. 400um separation between pair is a bit on the low side, if possible make it 700um (5x140um).

8. Avoid in-pair skew (Page19). I do not know if there are any long (say >10cm) high-speed link on L1Topo. If yes, you may want to consider rotating the model (say 22 degree) with reference to the PCB panel during PCB manufacture to control the intra-pair differential skew. We did that for RAL HSD with good results.

Thanks, will talk to manufacturer

9. Avoid large number of vias perforating power and ground planes near critical components (Page19). I do not know how this could be possibly done under Virtex-7 near MGTs. I think the more important thing is to design vias properly for the required speed range. Always used the smallest shortest vias (blind via) that you can afford for MGT links if no simulation tool is used to aid the design.

You are right there. Please note we are using micro vias on the high speed links. Anything known about their suitability for highest speeds?

What kind of size you are talking about. There shouldn't be a problem for 10G design as long as it can be manufactured reliably.

10. Module control (page 13). Is there an Ethernet addressing scheme for L1Topo? Can it get its IP address from ATCA IPMI or what?

Data buses are available to transmit that required information. Alternatively using DHCP

11. In various places of the document, various speed ranges are mentioned, 6.4Gbps/10Gbps/13.1Gbps. What is exactly the target? The test methods and pass/fail criteria (e.g. eye mask) are not mentioned at all. The L1Topo is going to interface to a lot of external modules running MGTs from different serials of Virtex FPGAs. Compatibility is extremely important.

Baseline is 6.4, the module will need to run at that rate. Acceptance criteria are reliable operation at that speed. Since Minipods are currently rated 10G, we will not be able to do any meaningful tests at rates above. It has not yet been decided what FPGA speed grade to buy for production modules. We will in any case explore the accessible phase space once we got the prototype. That’s particularly important since we are unfortunately not able to do detailed simulations. We are intending measuring eye widths with help of iBERT. In case you have any suggestion for the criteria, please let us know

As the acceptance criteria for high-speed links is concerned, I'd recommend SFF-8431 standard as a baseline.

Sam:

1. CTP parallel output: In 1.2.1 it says that a mezzanine board will be required for parallel LVDS output to CTPcore. For the optical output, one ribbon per FPGA through the front panel is clearly overkill, but how many parallel LVDS outputs will be possible per FPGA if the CTP output goes through the mezzanine? This has some impact on our flexibility in assigning different algorithms to different FPGAs.

We run 22 pairs from each processor onto the mezzanine. There we can (at low cost if need to be redone) pick and route into 32 lanes on the connector. That’s what was decided for the prototype. I wouldn’t rule out minor improvements to the scheme for production.

2. In addition to Weiming's detailed comments on power supply, I also wanted to flag that linear regulator noise increases with load current, and that Texas Instruments recommends using as large an output capacitor as possible for this reason. I have already discussed this with Eduard, but it may be useful information for others as well.

As pointed out already: that was an error in the specs. We use switched converters.

3. This is more a comment then a question, but I have given some thought to the parallel LVDS lines between the two processor FPGAs. While the bandwidth is not enough to share ALL TOBs, a good goal is to be able to share all sorted TOBs, so that most algorithms can be implemented in either FPGA (i.e. wherever they fit).

In my back-of-the-envelope calculation, Phase-0 generic TOBs can be sent in 20 bits each. (em/tau TOBs have 8 bits ET and 6 bits each eta-phi, jets have 10 bits ET and 5 bits eta-phi). Phase-1 generic TOBs might have a little more resulution, let's say 24 bits each.

For 238 lines running at 960 Mbit/s each (specification is 1 Gbit/s), this yields up to 285 Phase-0 generic TOBs, or 238 Phase-1 TOBs. I haven't talked about Muon TOBs (which are smaller) or the missing ET vector (just a few bits), but this seems sufficent to me, even in Phase-1.

Thanks.

4. In implementation, the board thickness is given as about 2mm, with discussion of milling if necessary. ATCA supports board thicknesses up to 2.4 mm, so no milling should be necessary.

Again an error in the specs that escaped our attention…

5. In 3.1 (scalable design), the possibility of mounting five shrouds in Zone 3 is given. If we are going to use commercial ATCA backplanes, I am not sure we can make room for that many. That said, 4 shrouds should be more than enough. Even with only 48 fibers per shroud, that is enough for 192 fibers.

Yes, our assumption at the time had been we could mill down the shrouds and fit 5 rather than 4. We are not following that idea any longer.

6. In section 4.0, there is a paragraph on higher-level software and firmware development. Is it clear who will do these, and on what timescale?

No. Mainz is in charge of infrastructure firmware (getting in contact with Krakow colleagues, looking into supporting us on ROD firmware). Mainz doing some of the algorithms. Software wise Mainz attempt to cover L1Topo specific coding (module services). However, there is the wide field of Ethernet based control, plus data basery, plus…

7. (5.1) Is it decided that TTC will come in on the front panel? eFEX is looking at backplane-based distribution, but perhaps the L1Topo timescale is too short for implementing this.

Well, for the prototype that might be the only option, unless someone tells us immediately to which backplane lane to route the alternative path. Any suggestion?

Victor:

1) as far as I know (I might be wrong), Xilinx has announced some time

ago that it will stop the support for System ACE. If this is true, do

you see any disadvantage for the long-term operation and maintenance

of L1Topo ?

2) the documentation mentions System ACE as the standard mode for FPGA

configuration at power-up, and presents also alternative solutions for

each group of FPGAs (processors, control). For the control FPGA, the

proposed solution is a local SPI memory. With both methods implemented

on the module (SysACE, SPI), how do you decide which entity configures

the control FPGA at power-up ? Or is the SPI intended to store a

secondary bit file version, of which downloading into FPGA is

triggered from software (via IPMC or MARS) at a different time ?

3) do you plan at future module iterations a reduction of the

configurations schemes ? Especially if the SD card and the SPI memory

do not provide/store different firmware versions ? My personal feeling

is that after a final board level control scheme is adopted (IPMC or

MARS), one could remove either System ACE or the SD & SPI, and

eventually relax a bit the hardware design.

1-3 : We will be using the SystemACE initially only, on the prototype. It is just easier for us to start with. We will narrow down the number of schemes before production. Either just not mounted on the PCB, or removed from the layout, if the real estate is required for any fixes.

SystemACE will configure always, if mounted and flash card inserted. SPI config can be enabled or disabled via CPLD. We are not anticipating frequent change of scheme. To my understanding 2^nd image in SPI memory should be possible if sufficient large device chosen.

4) how large you estimate to be the implementation on the Zynq and the

corresponding power consumption ? The same for the processor FPGAs.

Zynq is commercial MARS module and we do not have any estimate other than the data sheets. Consumption should be low. Power consumption of processors cannot be estimated until algorithms are final. We do expect considerable dissipation and heat sinks will be mounted. Initial operation on the bench with active cooling. Since the L1Topo crate will be sparsely populated, exotic coolers in adjacent slots are not ruled out. If it eventually turned out that certain algorithms couldn’t be run for reason of dissipation that would be unfortunate, however, there is nothing we could possibly do about that. Except redesign with next generation FPGA.

5) I guess that the final system for Phase-I will include a maximum of

two L1Topo modules. Will these be the mounted alone in a dedicated

ATCA crate or will they share a crate with other modules (e.g. FEXs) ?

If the latter, then would the height of the mezzanine cards be an

issue in case of high-density crate configuration ? Is the crate

airflow system sufficient for optimal heat dissipation or would a

cooling body be needed for certain (hot) spots on the

board (e.g. FPGAs) ?

talking about 3 modules, see above, no mix with other modules anticipated

E: The FPGA power consumption depends in large fraction by the algorithms in place. A metallic radiator on the top of the FPGA might be sufficient but we can't for sure state this now. A local fan on the FPGA can optionally put in place if needed, maybe a power plug can be put in place on that purpose in the L1Topo production module.

6) the implementation of high-speed outputs in Spartan-6 is limited by

a few annoying geographical constraints (i.e. only top PLLs can drive

high-speed clocks, and in only one bank). Is there any similar

constraint in Virtex-7 that could restrict the implementation of the

~240 high-speed communication links between the two processor FPGAs ?

Thanks for pointing out. To our best knowledge, no. It is assumed that both global clock based and clock forwarding schemes should be viable. We hope we can cross check before production.

7) the documentation mentions that the real-time data are copied to

the DAQ upon reception of the L1A signal. I did not find more text or

schematic description on this, but I guess the event data is going to

be accumulated in the processor FPGAs and, upon the arrival of the

L1A, transferred to the control FPGA, which should send it to the DAQ

in a format that has still to be defined. Is this scenario correct ?

Latency buffers and de-randomizer buffers will be located on the processors. We are about to revise the baseline for the ROD functionality. There is enough bandwidth available out of the processor FPGAs to send the data. The data format would need to be defined soon, though it would be very difficult to predict any data that might be required from within specific (future) algorithms.

8) in section 4 (page 18) it is said that initial tests of the L1Topo

receiving stage will employ the GOLD as a data source. I was wondering

whether the same GOLD might not be also used as a receiver for data

integrity tests on the DAQ and ROI optical channels. No official

DAQ/ROI format would be required, only playback vectors. I think such

a setup will help spotting any eventual misbehaviour already in the

home lab, before performing system level tests with other L1Calo

modules in the CERN test rig.

Thanks, yes, we will definitely do initial simplified tests of that kind. Though, full tests will be possible at CERN only.

Dave:

i) Power management.
My understanding is that this is the first module to actually go in an ATCA crate - GOLD was only ever on the bench.
I am therefore particularly concerned with when only management power is on the board, payload off.
So no zone 2 access, no main FPGAs, possibility of sneak paths between powered and unpowered devices. And which devices are on management power.
I'm really just looking for confirmation that you're on top of this area as it's not really addressed in the specification.
Thanks for pointing out. We have looked into which components are powered from which source. We will cross check.
Regarding currents flowing from powered to unpowered signal pins that’s a bit tricky. Well, we are using the LAPP IPMC and would hope that this issue is addressed there. No communication to a section unless it is powered. Failing that we would have to use buffers on all the lines. That doesn’t make a lot of sense. I would think we have to design the prototype on the assumption that the IPMC is well designed. In case of problems design mods of either L1Topo or the IPMC.

ii) Module control.
You're looking at either the LAPP IPMC or the MARS bar as providing the IPMC functionality. Does this mean you also need to route the IPMB through the extension mezzanine? In case we want to run without IPMC, we will have to insert a jumper module in the slot bridging the lines to the mezzanine. However, that right now considered a backup. Historically we moved from an FPGA based IPMC (now ruled out) to Zynq (not sure that scheme is suitable) to IPMC DIMM. That’s the consequence of various reviews, availability of information, and evolution of standard solutions.
I'm assuming that the control FPGA is module control in the sense we used to think of it, rather than the very low level IPMC control. So I'm assuming that it's on payload power? correct

iii) R/O.
You are implying glink-style R/O to L1Calo ROD. Is this feasible?
There is no way we can build more RODs, there is a risk in using one of the spares, so this means using so far unused inputs on existing RODs. Has this been thought through? I would say far better to have local ROD functionality to slink. As you say, the hardware does allow for this, it's just that I think that this is really the only solution!
current baseline is embedded ROD with S-Link via MiniPOD. Just trying to get effort sorted out

iv) and finally…
Still using SystemACE??? Yes, initially. Still better than most alternatives. NOR flash is a pain in the … due to write latency