Reviewers Questions/Comments and Designers
Answers
Feb 01, 2013 9:53 added drawing of
layer stackup and high speed routes
http://www.staff.uni-mainz.de/uschaefe/browsable/L1Calo/Topo-Proto/review-stackup.pdf
Will be updated as questions are coming in
Weiming / Weiming :
1. Jitter
cleaner chip Si5326 (Page11). The datasheet says "The input to output skew
is not specified". This means that the jitter cleaned clock would have a
random phase after a power cycle. Since the trigger system needs fixed latency,
this random phase could be a problem. Si5326 does have
"digitally-controlled output phase adjustment" capability, but an
automatic phase calibration function is needed to ensure the fixed latency
after a power cycle. Is there any thinking/solution on this problem?
The
jitter cleaned, multiplied clocks are used for MGT reference only. Inside the
MGT they are again multiplied up and divided down. It can be suspected that
phase relationship is therefore lost anyway. We expect the data that are sent
out on the fibre (to CTP) therefore to have an arbitrary phase offset of up to
one usrclk2 tick after any reconfigure of the FPGAs. If that assumption were
wrong, or if any scheme were known to suppress that phase offset we’d
appreciate to be told about that. Please note that baseline operation of the
MGTs will be buffered mode (at low buffer depth), not phase aligned, due to
difficulties experienced with the phase alignment scheme on Virtex-6 at high
link count. “Buffer Bypass in Multi-Lane Auto Mode”, available in GTH
transceivers only might be an interesting option for latency reduction wrt baseline, but that can be explored only once GTH
devices are available. We are currently building a test adapter to explore the
properties of Si5326 before the topo prototype going
into production
I
think a deterministic fixed low latency is a must for trigger system. I had
though Virtex-7 MGT is capable of that, certainly we need more exploration in
this area. I am currently testing TI CDCE62005 PLL, which, according to its
datasheet, has a sync input and can guarantee a fixed phase relationship with a
precision of ~1 ns.
We
will need further discussion. On the prototype a different cleaner might
possibly be mounted on the extension mezzanine, if required (need to check)
2. TTCDec Mezzanine card (Page7). It seems assumed
that TTCDec cards are available. Is this true? How
many spares does L1Calo still have?
Thanks
for pointing out. At current we do have just some old models plus the modules
plugged on the spare JEMs. I would suspect that availability of TTCRx might be the real problem, since another run of PCB
production and assembly would be possible, in particular if RAL were to provide
the Gerber files. Since the TTCdec design exists and
the modules work well, it would probably be pointless to design something new.
Anybody happen to know about availability ?
CERN has some spare TTCrx chips. But that is only for the maintenance of
existing systems. So probably you can only steal some from L1Calo spares if
any. So it's better to find out beforehand.
3. Connect the MGT clocks to the central quad of 3 quads (Page8). Although the
user guide for virtex-7 transceivers allow for this usage, the user guide for
virtex-6 transceivers forbid this usage for line rate above 2.8Gbps for the
reason of additional jitter. I wonder if it is better to route reference clock
to every MGT quad for best performance.
All
quads are supplied with clocks, though on separate trees. Plan is to use
crystal references on the receive part of the MGTs. Segmenting those trees does
allow for various reference clocks in case of differing bit rates in various
upstream modules. Baseline is jitter cleaned LHC clock multiples for
transmitter references only. If there were any problems with that scheme on the
prototype, we’d modify before production.
4. Use local POL linear regulators for MGT link supplies (Page18). I have used
TI DC-DC switching power supplies for all the MGT links on RAL High-Speed
Demonstrator (HSD), which work fine. I also measured the ripple noise on those
DC-DC supplies to be less than ~5mV, which is well within the requirement
(10mV) for MGT power supplies. So it is nice to have linear regulators but not
absolutely necessary.
Error
in documentation. Using switching supplies. What regulator model have you been
using?
5. Power supply
(Page 18). Power sequence is not mentioned. It worth to check it meets the
requirement of Virtex-7.
Thanks
for pointing out. We are sequencing, we have checked the documentation but
should double check.
6. Place coupling capacitors close to source (Page19). The position of these
capacitors doesn't matter. What matters is the parasitics
of these capacitors. My advice is to use the smallest package as you can for
these capacitors. On RAL HSD, we used 0201 capacitors and it work well up to
10Gbps.
Thanks.
Would be good if we could learn about details. Did larger components fail your
test? Any specific make of capacitor shown to be good/better?
7. Minimise
cross talk by running buses as widely spread as possible (Page 19). This is a
very general statement and not measureable in practice. Some quantitative
numbers are needed to ensure the cross talk is under control in a high-density
high-speed design. For example, on RAL HSD, I defined S/H>5 for stripline differential trace to reduce the crosstalk below
1%.
Always
keen to learn, since we cannot simulate. We are using tightly coupled pairs on
both the high speed and parallel links. The coupled length between adjacent
high speed pairs is up to 6cm, on a few links it is 10cm. We are using strip
lines only, distance to planes is asymmetric, 100µm and 140µm. Trace width is
80µm, in pair distance is 140µm, pairs are separated by 400µm trace edge to
trace edge. However, we are looking into possible improvement. No single ended
aggressors on same plane.
Crosstalk control. 400um
separation between pair is a bit on the low side, if possible make it 700um
(5x140um).
8. Avoid in-pair skew (Page19). I do not know if there are any long (say
>10cm) high-speed link on L1Topo. If yes, you may want to consider rotating
the model (say 22 degree) with reference to the PCB panel during PCB
manufacture to control the intra-pair differential skew. We did that for RAL
HSD with good results.
Thanks,
will talk to manufacturer
9. Avoid large number of vias perforating power and
ground planes near critical components (Page19). I do not know how this could
be possibly done under Virtex-7 near MGTs. I think the more important thing is
to design vias properly for the required speed range.
Always used the smallest shortest vias (blind via)
that you can afford for MGT links if no simulation tool is used to aid the
design.
You
are right there. Please note we are using micro vias
on the high speed links. Anything known about their suitability for highest
speeds?
What
kind of size you are talking about. There shouldn't be a problem for 10G design
as long as it can be manufactured reliably.
10. Module control (page 13). Is there an Ethernet addressing scheme for
L1Topo? Can it get its IP address from ATCA IPMI or what?
Data
buses are available to transmit that required information. Alternatively using
DHCP
11. In various places of the document, various speed ranges are mentioned,
6.4Gbps/10Gbps/13.1Gbps. What is exactly the target? The test methods and
pass/fail criteria (e.g. eye mask) are not mentioned at all. The L1Topo is
going to interface to a lot of external modules running MGTs from different
serials of Virtex FPGAs. Compatibility is extremely
important.
Baseline
is 6.4, the module will need to run at that rate. Acceptance criteria are
reliable operation at that speed. Since Minipods are
currently rated 10G, we will not be able to do any meaningful tests at rates
above. It has not yet been decided what FPGA speed grade to buy for production
modules. We will in any case explore the accessible phase space once we got the
prototype. That’s particularly important since we are unfortunately not able to
do detailed simulations. We are intending measuring eye widths with help of iBERT. In case you have any suggestion for the criteria,
please let us know
As
the acceptance criteria for high-speed links is concerned, I'd recommend
SFF-8431 standard as a baseline.
Sam:
1. CTP parallel
output: In 1.2.1 it says that a mezzanine board will be required for parallel
LVDS output to CTPcore. For the optical output, one
ribbon per FPGA through the front panel is clearly overkill, but how many
parallel LVDS outputs will be possible per FPGA if the CTP output goes through
the mezzanine? This has some impact on our flexibility in assigning different
algorithms to different FPGAs.
We run
22 pairs from each processor onto the mezzanine. There we can (at low cost if
need to be redone) pick and route into 32 lanes on the connector. That’s what
was decided for the prototype. I wouldn’t rule out minor improvements to the
scheme for production.
2. In addition to Weiming's
detailed comments on power supply, I also wanted to flag that linear regulator
noise increases with load current, and that Texas Instruments recommends using
as large an output capacitor as possible for this reason. I have already
discussed this with Eduard, but it may be useful information for others as
well.
As
pointed out already: that was an error in the specs. We use switched
converters.
3. This is more a comment then a question, but
I have given some thought to the parallel LVDS lines between the two processor
FPGAs. While the bandwidth is not enough to share ALL TOBs, a good goal is to
be able to share all sorted TOBs, so that most algorithms can be implemented in
either FPGA (i.e. wherever they fit).
In my back-of-the-envelope calculation, Phase-0 generic
TOBs can be sent in 20 bits each. (em/tau TOBs have 8 bits ET and 6 bits each eta-phi, jets have
10 bits ET and 5 bits eta-phi). Phase-1 generic TOBs might have a little more resulution, let's say 24 bits each.
For 238 lines running at 960 Mbit/s each
(specification is 1 Gbit/s), this yields up to 285
Phase-0 generic TOBs, or 238 Phase-1 TOBs.
I haven't
talked about Muon TOBs (which are smaller) or the
missing ET vector (just a few bits), but this seems sufficent
to me, even in Phase-1.
Thanks.
4. In implementation, the board thickness is
given as about 2mm, with discussion of milling if necessary. ATCA supports
board thicknesses up to 2.4 mm, so no milling should be necessary.
Again
an error in the specs that escaped our attention…
5. In 3.1 (scalable design), the possibility
of mounting five shrouds in Zone 3 is given. If we are going to use commercial
ATCA backplanes, I am not sure we can make room for that many. That said, 4
shrouds should be more than enough. Even with only 48 fibers
per shroud, that is enough for 192 fibers.
Yes,
our assumption at the time had been we could mill down the shrouds and fit 5
rather than 4. We are not following that idea any longer.
6. In section 4.0, there is a paragraph on
higher-level software and firmware development. Is it clear who will do these,
and on what timescale?
No.
Mainz is in charge of infrastructure firmware (getting in contact with Krakow
colleagues, looking into supporting us on ROD firmware). Mainz doing some of
the algorithms. Software wise Mainz attempt to cover L1Topo specific coding
(module services). However, there is the wide field of Ethernet based control,
plus data basery, plus…
7. (5.1) Is it
decided that TTC will come in on the front panel? eFEX is looking at backplane-based distribution, but
perhaps the L1Topo timescale is too short for implementing this.
Well,
for the prototype that might be the only option, unless someone tells us
immediately to which backplane lane to route the alternative path. Any
suggestion?
Victor:
1) as far as I know (I might be
wrong), Xilinx has announced some time
ago that it will stop the support for
System ACE. If this is true, do
you see any disadvantage for the
long-term operation and maintenance
of L1Topo ?
2) the documentation mentions System
ACE as the standard mode for FPGA
configuration at power-up, and presents also
alternative solutions for
each group of FPGAs (processors,
control). For the control FPGA, the
proposed solution is a local SPI memory. With
both methods implemented
on the module (SysACE,
SPI), how do you decide which entity configures
the control FPGA at power-up ? Or is the
SPI intended to store a
secondary bit file version, of which downloading
into FPGA is
triggered from software (via IPMC or MARS) at
a different time ?
3) do you plan at future module
iterations a reduction of the
configurations schemes ? Especially if the SD card
and the SPI memory
do not provide/store different firmware
versions ? My personal feeling
is that after a final board level
control scheme is adopted (IPMC or
MARS), one could remove either System ACE or the SD &
SPI, and
eventually relax a bit the hardware design.
1-3 : We will be using the
SystemACE initially only, on the prototype. It is
just easier for us to start with. We will narrow down the number of schemes
before production. Either just not mounted on the PCB, or removed from the
layout, if the real estate is required for any fixes.
SystemACE will configure always, if
mounted and flash card inserted. SPI config can be
enabled or disabled via CPLD. We are not anticipating frequent change of
scheme. To my understanding 2nd image in SPI memory should be
possible if sufficient large device chosen.
4) how large you estimate to be the
implementation on the Zynq and the
corresponding power consumption ? The same for the
processor FPGAs.
Zynq is commercial MARS module
and we do not have any estimate other than the data sheets. Consumption should
be low. Power consumption of processors cannot be estimated until algorithms
are final. We do expect considerable dissipation and heat sinks will be
mounted. Initial operation on the bench with active cooling. Since the L1Topo
crate will be sparsely populated, exotic coolers in adjacent slots are not
ruled out. If it eventually turned out that certain algorithms couldn’t be run
for reason of dissipation that would be unfortunate, however, there is nothing
we could possibly do about that. Except redesign with next generation FPGA.
5) I guess that the final system for Phase-I will include a
maximum of
two L1Topo modules. Will these be the
mounted alone in a dedicated
ATCA crate or will they share a crate with other modules
(e.g. FEXs) ?
If the latter, then would the height of the mezzanine cards
be an
issue in case of high-density crate
configuration ? Is the crate
airflow system sufficient for optimal heat
dissipation or would a
cooling body be needed for certain (hot)
spots on the
board (e.g. FPGAs) ?
talking about 3 modules, see above,
no mix with other modules anticipated
E: The FPGA power consumption depends in large fraction
by the algorithms in place. A metallic radiator on the top of the FPGA might be
sufficient but we can't for sure state this now. A local fan on the FPGA can
optionally put in place if needed, maybe a power plug can be put in place on
that purpose in the L1Topo production module.
6) the implementation of high-speed
outputs in Spartan-6 is limited by
a few annoying geographical
constraints (i.e. only top PLLs can drive
high-speed clocks, and in only one bank). Is
there any similar
constraint in Virtex-7 that could restrict the
implementation of the
~240 high-speed communication links between the two processor
FPGAs ?
Thanks for pointing out. To our best knowledge, no.
It is assumed that both global clock based and clock forwarding schemes should
be viable. We hope we can cross check before production.
7) the documentation mentions that
the real-time data are copied to
the DAQ upon reception of the L1A
signal. I did not find more text or
schematic description on this, but I guess the
event data is going to
be accumulated in the processor FPGAs
and, upon the arrival of the
L1A, transferred to the control FPGA, which should send it to
the DAQ
in a format that has still to be
defined. Is this scenario correct ?
Latency buffers and de-randomizer buffers will be
located on the processors. We are about to revise the baseline for the ROD
functionality. There is enough bandwidth available out of the processor FPGAs
to send the data. The data format would need to be defined soon, though it
would be very difficult to predict any data that might be required from within
specific (future) algorithms.
8) in section 4 (page 18) it is said
that initial tests of the L1Topo
receiving stage will employ the GOLD as a data
source. I was wondering
whether the same GOLD might not be also used
as a receiver for data
integrity tests on the DAQ and ROI optical
channels. No official
DAQ/ROI format would be required, only playback vectors. I
think such
a setup will help spotting any
eventual misbehaviour already in the
home lab, before performing system level
tests with other L1Calo
modules in the CERN test rig.
Thanks, yes, we will
definitely do initial simplified tests of that kind. Though, full tests will be
possible at CERN only.
Dave:
i)
Power management.
My understanding is that this is the first module to actually go in an ATCA
crate - GOLD was only ever on the bench.
I am therefore particularly concerned with when only management power is on the
board, payload off.
So no zone 2 access, no main FPGAs, possibility of sneak paths between powered
and unpowered devices. And which devices are on management power.
I'm really just looking for confirmation that you're on top of this area as
it's not really addressed in the specification.
Thanks for pointing out. We have looked into which components
are powered from which source. We will cross check.
Regarding currents flowing from powered to unpowered signal pins that’s a bit
tricky. Well, we are using the LAPP IPMC and would hope that this issue is
addressed there. No communication to a section unless it is powered. Failing that
we would have to use buffers on all the lines. That doesn’t make a lot of
sense. I would think we have to design the prototype on the assumption that the
IPMC is well designed. In case of problems design mods of either L1Topo or the
IPMC.
ii) Module control.
You're looking at either the LAPP IPMC or the MARS bar as providing the IPMC
functionality. Does this mean you also need to route the IPMB through the
extension mezzanine? In case we want to run without
IPMC, we will have to insert a jumper module in the slot bridging the lines to
the mezzanine. However, that right now considered a backup. Historically we
moved from an FPGA based IPMC (now ruled out) to Zynq
(not sure that scheme is suitable) to IPMC DIMM. That’s the consequence of
various reviews, availability of information, and evolution of standard
solutions.
I'm assuming that the control FPGA is module control in the sense we used to
think of it, rather than the very low level IPMC control. So I'm assuming
that it's on payload power? correct
iii) R/O.
You are implying glink-style R/O to L1Calo ROD.
Is this feasible?
There is no way we can build more RODs, there is a risk in using one of the
spares, so this means using so far unused inputs on existing RODs. Has this
been thought through? I would say far better to have local ROD
functionality to slink. As you say, the hardware does allow for this,
it's just that I think that this is really the only solution!
current baseline is embedded ROD with S-Link via MiniPOD.
Just trying to get effort sorted out
iv)
and finally…
Still using SystemACE??? Yes,
initially. Still better than most alternatives. NOR flash is a pain in the …
due to write latency