On 12/29/11 11:23 PM, Barott, William Chauncey wrote:
Third, not so good: B14.bfa has begun misbehaving. It appears to have been
spontaneously rebooting during my measurements. Two hard power cycles did not fix
it.
...
From: Colby Gutierrez-Kraybill [mailto:colby@hcro.org]
Sent: Friday, December 30, 2011 12:35 AM
To: Barott, William Chauncey
Cc: colby@sri.com; Jon Richards
Subject: Re: Status of array?
At 2131, I cannot ping b14 so I went and checked on it. It seemed to be
providing power to the ethernet port and to the various powered 10gb
transceivers but no cylon lights. I just power cycled it and the cylon
lights came back and were on for the 30 seconds I watched it. Fans in
front seem to be working okay.
Can ssh into it now.
- Colby
...
On 12/31/11 12:03 PM, Jill Tarter wrote:
> who should begin the process of diagnosing B14.bfa? how soon can this happen?
Matt Dexter and/or Dennis Kennedy. It may be the same issue of screws shorting/pressing on critical areas.
- Colby
...
From: Matt Dexter mdexter@berkeley.edu
Subject: [ata-staff] BEE2 2.2.26 aka b14.bfa
Date: 1/2/12 1:57PM
To: ata-staff
FYI:
Here's a cut-n-paste of my historical testing notes related
to BEE2 S/N 2.2.26 which is believe now in use as b14.bfa.
I won't be able to schedule a trip to work on this onsite until
contracts are worked out.
summary:
I didn't find any records of previous rebooting problems.
Or any other interesting problems since the very initial debug
efforts.
2007jul11
first debugging: delivered with error report "1.8V shorted to GND"
visual inspection found CF-0 CX4 connector w/ munged GND pin (not
related to 1.8V short to GND).
Visual inspection under steroe microscope:
removed solder block at CF3-0 Infiniband 4x connector
removed solder blob across AC coupling caps near CF-0 Infinband 4x conn.
measure Z between XTAL diff outputs see 100 ohms as expected
except at FPGA 4's 125 MHz XTALs X14, X15
remove X15, and nearby bypass cap, and find perfect solder short
across differential outputs.
remove short now impedance is 101 ohms.
remove X14 and find perfect solder short across differential outputs.
remove short now impedance is 100 ohms.
remove X11 and find perfect solder short across differential outputs.
remove short now impedance is 100 ohms.
2007oct-nov
SAE Materials performed the X11, X14 and X15 related rework
discussed above.
2007nov-
used in the PAPER systems.
2009jun08
UCB's PAPER project finished using this and move to ATA BF application.
set up to Boot off of ATA's server.
2009jun26
Billy installed into the BF as b14.bfa.
I didn't find any records of previous rebooting problems.
Or any problems for that matter.
Colby is correct to flag the issue of
long chassis screws mashing into the PSU.
Conferred with Mike Huff today about further testing of b14 and recommended he keep b14 until I have a CX4 cable sent down so that he can run the exhaustive self tests. Dan C points out further testing can also happen using laptops to allow b14 to boot.
Open question on ata-staff list is should b14 stay at Menlo Park for further testing and repair or return to HCRO ASAP for Matt Dexter to take over testing and repair.
I'd be interested in assisting in the testing an repair with Matt Dexter if schedules permit. I think there is a lot that I could learn from troubleshooting this component him. I'm sure Dennis would like to join as well.
Right now, what would make the most sense would be to come to HCRO at the same time that Matt Dexter is here. He is planning on making a trip up Feb 15-17th. Confirming with him now. Understandably short notice. Or, a visit between Matt and SRI in the Bay Area.
In reply to M Huff's test report (on SRI's bug tracking system?)
Using a few high watt resistors (0.2 Ohms/ea), we were able to load the power
supply unit at various currents. The power supply failed or shutdown at 25A
and 12.5A loads, far below the unit's spec. At 8A, the supply seemed to
be pretty stable, delivering 4.93V to the load.
I don't recall another BEE2 having this exact same problem but I do recall a BEE2
that had very poor crimping at the end of the ~8 or 10 AWG copper cable from the
PSU to the connector that mates to the PCB. There was so much resistance that
the voltage delivered to the PCB was much less than that output by the PSU that
the PCB would crash and then reboot itself or hang while the PSU output stayed at 5.0 VDC.
This doesn't seem to match the behavior with BEE2 2.2.26 especially if the load resistors
were directly connected to the PSU's output posts but maybe useful for future debug efforts...
Has the PSU's internal fan died ?
A different BEE2 had a bulging cap C5 on the middle board called
"BEE2 Power Supply HUB Board V2"
That cap is labeled : NIPPON CHEMI_CON 200V 1000uF 680068 SMQ
The BOM calls out :
Digikey 565-2738-ND ERSMQ201VSN102MR30S
Now Digikey 565-2738-ND maps to ESMQ201VSN102MR30S
Newark had them in stock back in Feb 2011and called them 16M8462.
That 1 cap, C5, was replaced with a new CAP (new part from Newark) and the PSU and BEE2
has been AOK since.
Even if the ripple reducing caps haven't died maybe the inverter itself has died ?
this would be a first that I've heard of.
Watch out : 300 V internal voltages - probe w/ care!
Reported on 29 Dec 2011:
On 12/29/11 11:23 PM, Barott, William Chauncey wrote:
Third, not so good: B14.bfa has begun misbehaving. It appears to have been
spontaneously rebooting during my measurements. Two hard power cycles did not fix
it.
...
From: Colby Gutierrez-Kraybill [mailto:colby@hcro.org]
Sent: Friday, December 30, 2011 12:35 AM
To: Barott, William Chauncey
Cc: colby@sri.com; Jon Richards
Subject: Re: Status of array?
At 2131, I cannot ping b14 so I went and checked on it. It seemed to be
providing power to the ethernet port and to the various powered 10gb
transceivers but no cylon lights. I just power cycled it and the cylon
lights came back and were on for the 30 seconds I watched it. Fans in
front seem to be working okay.
Can ssh into it now.
- Colby
...
On 12/31/11 12:03 PM, Jill Tarter wrote:
> who should begin the process of diagnosing B14.bfa? how soon can this happen?
Matt Dexter and/or Dennis Kennedy. It may be the same issue of screws shorting/pressing on critical areas.
- Colby
...
From: Matt Dexter mdexter@berkeley.edu
Subject: [ata-staff] BEE2 2.2.26 aka b14.bfa
Date: 1/2/12 1:57PM
To: ata-staff
FYI:
Here's a cut-n-paste of my historical testing notes related
to BEE2 S/N 2.2.26 which is believe now in use as b14.bfa.
I won't be able to schedule a trip to work on this onsite until
contracts are worked out.
summary:
I didn't find any records of previous rebooting problems.
Or any other interesting problems since the very initial debug
efforts.
2007jul11
first debugging: delivered with error report "1.8V shorted to GND"
visual inspection found CF-0 CX4 connector w/ munged GND pin (not
related to 1.8V short to GND).
Visual inspection under steroe microscope:
removed solder block at CF3-0 Infiniband 4x connector
removed solder blob across AC coupling caps near CF-0 Infinband 4x conn.
measure Z between XTAL diff outputs see 100 ohms as expected
except at FPGA 4's 125 MHz XTALs X14, X15
remove X15, and nearby bypass cap, and find perfect solder short
across differential outputs.
remove short now impedance is 101 ohms.
remove X14 and find perfect solder short across differential outputs.
remove short now impedance is 100 ohms.
remove X11 and find perfect solder short across differential outputs.
remove short now impedance is 100 ohms.
2007oct-nov
SAE Materials performed the X11, X14 and X15 related rework
discussed above.
2007nov-
used in the PAPER systems.
2009jun08
UCB's PAPER project finished using this and move to ATA BF application.
set up to Boot off of ATA's server.
2009jun26
Billy installed into the BF as b14.bfa.
I didn't find any records of previous rebooting problems.
Or any problems for that matter.
Colby is correct to flag the issue of
long chassis screws mashing into the PSU.
Matt
...
Followup
Subject: Re: b14 rebooting/freezing upby colby on 13 February 2012 - 8:06pm - Login to post comments
Conferred with Mike Huff today about further testing of b14 and recommended he keep b14 until I have a CX4 cable sent down so that he can run the exhaustive self tests. Dan C points out further testing can also happen using laptops to allow b14 to boot.
Open question on ata-staff list is should b14 stay at Menlo Park for further testing and repair or return to HCRO ASAP for Matt Dexter to take over testing and repair.
Subject: Re: b14 rebooting/freezing upby mhuff on 14 February 2012 - 11:24am - Login to post comments
I'd be interested in assisting in the testing an repair with Matt Dexter if schedules permit. I think there is a lot that I could learn from troubleshooting this component him. I'm sure Dennis would like to join as well.
Subject: Re: b14 rebooting/freezing upby colby on 14 February 2012 - 11:55am - Login to post comments
Right now, what would make the most sense would be to come to HCRO at the same time that Matt Dexter is here. He is planning on making a trip up Feb 15-17th. Confirming with him now. Understandably short notice. Or, a visit between Matt and SRI in the Bay Area.
Subject: Re: b14 rebooting/freezing upby colby on 13 February 2012 - 8:07pm - Login to post comments
Added SRI/SETI/et al to notify list.
Subject: Re: b14 rebooting/freezing upby mdexter on 14 February 2012 - 1:43pm - Login to post comments
OK by me to delay trip to HCRO until Feb 21-24 to allow time to plan for a more extensive
BEE2 debug session.
Or even the following week.
goal of travel this week was to get FB heatsink rework/experiment off my todo list.
Someone else needs to set priorities ...
Subject: Re: b14 rebooting/freezing upby mdexter on 14 February 2012 - 3:06pm - Login to post comments
In reply to M Huff's test report (on SRI's bug tracking system?)
Using a few high watt resistors (0.2 Ohms/ea), we were able to load the power
supply unit at various currents. The power supply failed or shutdown at 25A
and 12.5A loads, far below the unit's spec. At 8A, the supply seemed to
be pretty stable, delivering 4.93V to the load.
I don't recall another BEE2 having this exact same problem but I do recall a BEE2
that had very poor crimping at the end of the ~8 or 10 AWG copper cable from the
PSU to the connector that mates to the PCB. There was so much resistance that
the voltage delivered to the PCB was much less than that output by the PSU that
the PCB would crash and then reboot itself or hang while the PSU output stayed at 5.0 VDC.
This doesn't seem to match the behavior with BEE2 2.2.26 especially if the load resistors
were directly connected to the PSU's output posts but maybe useful for future debug efforts...
Has the PSU's internal fan died ?
A different BEE2 had a bulging cap C5 on the middle board called
"BEE2 Power Supply HUB Board V2"
That cap is labeled : NIPPON CHEMI_CON 200V 1000uF 680068 SMQ
The BOM calls out :
Digikey 565-2738-ND ERSMQ201VSN102MR30S
Now Digikey 565-2738-ND maps to ESMQ201VSN102MR30S
Newark had them in stock back in Feb 2011and called them 16M8462.
That 1 cap, C5, was replaced with a new CAP (new part from Newark) and the PSU and BEE2
has been AOK since.
Even if the ripple reducing caps haven't died maybe the inverter itself has died ?
this would be a first that I've heard of.
Watch out : 300 V internal voltages - probe w/ care!
sorry for these rather basic or generic thoughts.