Difference between revisions of "Logbook CameraCommissioning ORM Oct18"

From my_wiki
Jump to: navigation, search
(Enter comments in reverse time order)
Line 27: Line 27:
  
 
2) 1 module OFF!: 10.1.5.3  
 
2) 1 module OFF!: 10.1.5.3  
 +
 +
3) 1 module OFF!: 10.1.5.3
  
 
|- valign="top"
 
|- valign="top"

Revision as of 14:19, 17 December 2018

Enter comments in reverse time order

Glossary at the end!

Date Actor/Author Action summary Comments Documents
2018-12-17 Léa Jouvin, Oscar Blanch Rate Scans
  • We do three rate scans, first one with step 10, the other two with step 2. For DAC 1, and trigger mod 1.
    • All three look fine, with no strange behaviour.
  • We reconfigure the adder and we take a rate scan, this is done two times (for the first time, the rate scan is taken twice without reconfiguring adder)
    • All three look fine, with no strange behaviour.
  • We enable/disable the test pulse (only disable/enable, no reconfiguration) and we take rate scan:
    • Previous ones were finishing at DT~120
    • First one, module 13 stays at 300 Hz until DT~170
    • Second one, module 13 stays at 300 Hz until DT~170
    • Third one, module 13 stays at 300 Hz until DT~170
2018-12-17 Léa Jouvin, Oscar Blanch Start up

1) 2 modules OFF!: 10.1.5.16 and 10.1.5.3

2) 1 module OFF!: 10.1.5.3

3) 1 module OFF!: 10.1.5.3

2018-12-1è Léa Jouvin, Oscar Blanch Camera Inspection

Visual inspection once the drive test have finished. We checked that nothing had fall down. We also check some connectors and screws. Everything looks fine.


2018-12-14 Léa, Seiya, Sunsuke Data taking

In 20181214:

- Run00001.0000 to Run00001.0008 (Run00073): Camera 27 degree in zenith, HV: 1000 V, L1 DT applied to each module at 20% more than the NSB DT

Run stoppping, start again - Run00001.0008 to Run00001.0013 (Run00074): Camera 27 degree in zenith, HV: 1000 V, L1 DT applied to each module at 20% more than the NSB DT

Run stoppping, start again - Run00001.0014 to Run00001.0027 (Run00075): Camera 27 degree in zenith, HV: 1000 V, L1 DT applied to each module at 20% more than the NSB DT

2018-12-14 Léa, Seiya, Sunsuke Rate scan

During the afternoon, shutter close

- mode 1: withthout TP - mode 3: withthout TP

-mode 1: TP, 300 Hz, 5 pe -> TP saturation issue for some modules presenting a rate of 300 HZ at high DT -mode 3: TP, 300 HZ, 5 pe -> module 44 strange as on the wednesday night but others fine

2018-12-14 Léa, Seiya, Sunsuke Start up

1) Two modules off: 10.1.5.3 and 10.1.5.16

2) One module off: 10.1.5.3

2018-12-14 Léa, Seiya, Sunsuke Test IR On/OFF one by one by ECC

1) From safe, impossible to swith ON the relay

2) From ready ready: - (-1, false) means all IR at OFF: only IR 0,1,2,3,4 was OFF

- (-1, True) means all ON : all ON but IR 1 that remains at 0 so we did (1, True) again and then IR 1 also ON.

- Then we switch them all again with (-1,off): all off but IR 6. we did again (-1,off) and then all OFF.

- Again (-1,True): all OFF but IR 6 so we did again (-1,True) and then ECC to error to to the PDB

-> hard reset

From ready: - OFF one by one: from 0 to 7, all ok - ON one by one: all ok until IR 7. We start switching ON from 0 to 7 and fro 7 ECC went to error again with the PDB

-> hard reset

- ECC still in Error and error say PDB communication error so ->hard reset again

- ECC safe but then after one minute went to error

- hard reset again

- ECC safe ans seems stabilized



2018-12-13 Léa, Seiya, Sunsuke Start up

1) one module off: 10.1.5.3

2) one module off: 10.1.5.3

3) Two modules off: 10.1.5.3 and 10.1.6.28

4) Two modules off: 10.1.5.3 and 10.1.6.28

5) Two modules off: 10.1.5.3


2018-12-13 Léa, Seiya, Sunsuke Datataking


In 20181213

- Run0001.0000 - Run0001.0001 : TP in all modules, 300Hz, change module by module generated trigger. But TIB didn't send any trigger during the test, so we tried this test again

- Run0001.0002 - Run0001.0009 : TP in all modules, 300Hz, change module by module generated trigger.

- Run0001.0010: we wanted to to some random trigger test from TIB but we don't know why EVB didn't receive anything trigger whereas collected rate was 600 Hz

From here, we copy again the config xml file in ~hoffman/20181212 - - Run0001.0010: one test EVB recorData=False to confirm that for pedestalfrequency higher than 6500 HZ, we have a collected rate that doesn't match the camera rate and busy rate

- Run0001.0011 - Run0001.0044: AnalogPedestal Run

In 20181212 - Run0001.0000 : TP in all modules, 1HZ, change module by module generated trigger

- Run0001.0000: test new pixel id scheme -> Run confusion

- Run0001.0001: Park position, shutter open, HV 1000 V, threshold for L1 DT was 10% of NSB level,

- Run0001.0002, Run0001.0003, 00001.00004: Park position, shutter open, Nominal HV 1000 V,first we trigger on noise for the EVB to receive high rate (1000 HZ) and then we move threshold for L1 DT at 10% of NSB level so trigger rate was between 5 and 10 Hz


- Run00001.00005: Park position, shutter open, Nominal HV 1000 V,threshold for L1 DT at 10% of NSB level so trigger rate was between 5 and 10 Hz

2018-12-12 UCTS:new MOS on tcs01

- stop the MOS on Osaka, is it now on tcs01 and we can well configure the UCTS

2018-12-12/2018-12-13 Léa, Nadia, Julie, Jan Luc ECC test

Short summary of tests done at ORM this week on the ECC:

-Remote loading is now understood and available.

-Release “V32 patch” is available. This version is similar to the one used since September, however it:

· Corrects the issues met with the intelligent relays in the transition alarm to safe.

· Adds more understandable shutter datapoints

· Heart beat with CaCo is temporary disabled to avoid disconnection with CaCo as seen last week.

This version has been extensively tested on Thursday and was used during the previous night.


-A new ECC version called V34 has been finalized. In addition to the “V32 patch” features,

· More data points are available to improve the camera monitoring & control (individual power control of IR, PSB (avoid), TIB, UCTS, data switches, …)

· A configuration file is available. It allows to configure different parameters with recompiling the ECC (delays, CaCo heart beat …)

· Better alarm identification & recovery is also available

This version has been tested ~1h on the camera. More tests will be done the coming days to take enough insurance before using it in the night runs.


-The 3 exe versions (V32, V32 patch, V34) are available for the shifters. A script will be given to the shift leader to facilitate the exe loading.


2018-12-11 Léa, Seiya, Sunsuke start up

1) one module off: 10.1.5.3

2) one module off: 10.1.5.3

3) Two modules off: 10.1.5.3 and 10.1.6.28

4) one module off: 10.1.5.3

2018-12-11 Léa, Seiya, Sunsuke Operation with HV

- Shutter close: 265 modules to 400 V and then nominal HV. Everything went smooth so then:

- Shutter open: 1 module (central one): 400 V then 800 V then nominal HV. Everything smooth so we went for 265

- L1 and L0 scan. For Data taking we went from 60 to 40 in the DT by step of 5. From 40, some modules present to high rate for L1

2018-12-11 Léa, Seiya, Sunsuke (Daniel, Daniela from remote) Monitoring fix

Now we use only CLusCo on tcs01 and L0 and L1 internal and external are also monitored

2018-12-11 Léa, Seiya, Sunsuke Data Taking

In 20181211:

- All TP synchronised, 1 Hz, trigger sent module by module, 10 ns additional delay in TIB and 40 ns in the UP trigger propagation of the trigger for the central BP: from run0001

- run0002 should be delete it wad to test ZFW writting

- At night with HV ON and shutter open: run 0004

2018-12-11 Léa, Seiya, Sunsuke start up


1) Two modules off: 10.1.5.3 and 10.1.6.28

2) one module off: 10.1.5.3

3) Two modules off: 10.1.5.3 and 10.1.6.28

4) Two modules off: 10.1.5.3 and 10.1.6.28

5) Two modules off: 10.1.5.3 and 10.1.6.28

6) Two modules off: 10.1.5.3 and 10.1.6.28

7) one modules off: 10.1.5.3

8) one modules off: 10.1.5.3

9) Two modules off: 10.1.5.3 and 10.1.6.28

10) Two modules off: 10.1.5.3 and 10.1.6.28

2018-12-10 Shunsuke, Léa, Seiya HV

We perform tests as follows.

- HV Supplying Test for Central Module with shutter closed.

--We supplied 400 V, 500 V, 600 V, 700 V, 800 V, 900V, 1000 V and Nominal Voltage to central module (module:133).

--In the test, HV are put off by script. It came from Shunsuke's mistake. But we confirmed his script works well. Any other problems were found.

- HV Supplying for central 19 modules as before test.

- HV Supplying for all modules as before test.

- .L0 & L1 rate scan with all modules applied with nominal HV with shutter closed


2018-12-10 Léa, Seiya, sunsuke Data taking

In 2018/12/10

-TP synchronised in all the modules, trigger sent by all the modules, 10 ns external delay add in the TIB: 0001.0000, 0001.0001 and 0001.0002.

-TP synchronised in all the modules, trigger sent by all the modules, 10 ns external delay add in the TIB, 40 ns add in the trigger propagation from CBP to TIB: 0001.0004, 0001.0005 and 0001.0006.

-TP synchronised in all the modules, trigger sent only by module 265, 10 ns external delay add in the TIB, 40 ns add in the trigger propagation from CBP to TIB: 0001.0007 to 0001.0009

-TP synchronised in all the modules, trigger sent only by module 100, 10 ns external delay add in the TIB, 40 ns add in the trigger propagation from CBP to TIB: 0001.0010 to 0001.0012

2018-12-10 Léa, Seiya, Sunsuke start up

1) one module off: 10.1.5.3

2) Two modules off: 10.1.5.3 and 10.1.6.28

3) Two modules off: 10.1.5.3 and 10.1.6.28

4) Two modules off: 10.1.5.3 and 10.1.6.28

5) one module off: 10.1.5.3

6) one module off: 10.1.5.3

7) one module off: 10.1.5.3


2018-12-07 Léa, Seiya, Shunsuke Rate scan

- L0 and L1 scan

- With No TP, L0 from 300 to 650 step=5 and L1 from 0 to 200 step=2

- TP, 300 Hz, gain=40 (50 p.e.): L0 from 400 to 900 step=5 and L1 from 0 to 200 step=2

- TP, 300 Hz, gain=20 (5 p.e.): L0 from 400 to 700 step=5 and L1 from 0 to 200 step=2

2018-12-07 Léa, Seiya, Shunsuke Datataking

- TP synchronisation test with 1 us widown of legacy daq -> adding 10 ns delay for 10 MHz clock to all module seems to fix the problem

- Random trigger. 1) Run for 2 minutes and several module remains busy, no more trigger rate comming ->restart

2018-12-06 Léa, Seiya, Shunsuke Osaka interface p1p2

Something strange happens. As often, p1p2 was done. but it was impossible to have it up again. We had to switch OFF the camera.

2018-12-07 Léa, Seiya, Daniel and Daniela (from remote) Test of slow control from Japan
- Pb monitor/slow control seems solved

- temperature monitoring of each pixel in the monitor function

2018-12-07 Léa, Seiya, Daniel and Daniela (from remote) start up

1) Two modules off: 10.1.5.3 and 10.1.6.28

2) One module off: 10.1.5.3

3) One module off: 10.1.5.3

4) Two modules off: 10.1.5.3 and 10.1.6.40

5) One module off: 10.1.5.3

6) Two modules off: 10.1.5.3 and 10.1.6.40

7) One module off: 10.1.5.3

8) One module off: 10.1.5.3

9) One module off: 10.1.5.3

2018-12-06 Léa, Seiya, Shunsuke, Satoshi Module remaining OFF during the whole day

10.1.5.3

2018-12-06 Léa, Seiya, Shunsuke, Satoshi ECC/CAco: Error and undefined

- after one hour module ON, ECC went to error and Caco and undefined state. Then in implies hard reset for ECC to have current back in the bus bar

- SwitchOn() from Caco, Caco fin in state 2 but ECC in error state...

- ECC went to error and this time we saw Caco fine at state 3 whereas ECC in error and then Caco going to undefined as expected since ECC was OFF. This error state of the ECC appeared twice today in the middle of Data taking and after around 1 hours and half of camera switch on.

2018-12-06 Léa, Seiya, Shunsuke, Satoshi Data taking

- With legacy DAQ to have the 1024 ns window to work on BP synchronisation

- Try random trigger with writting with EVB during 7 minutes fine even if at one moment TIB rate went to 0 but it get back. ECC error so camera stop we couldn't test more

- Try legacydaq run with random trigger without writtin but module 10.1.6.38 reach a connected but busy state and so no trigger was sent anymore.. Try again. Legacy daq data present same result of too busy rate, including higher than EVB

- Try long run of data with EVB and randon trigger: we run for one hour run in 20181206


2018-12-06 Léa, Seiya, Shunsuke, Satoshi start up

1) One module off: 10.1.5.3

2) One module off: 10.1.5.3

3) One module off: 10.1.5.3

4) One module off: 10.1.5.3

5) One module off: 10.1.5.3: Today we will perform operation without this module

Caco to undefined, ECC to error state -> hard reset 6) Two modules OFF: 10.1.5.3 and 10.1.6.28

7) One module off: 10.1.5.3

8) ECC went to error at the switchon() -> hard reset then One module off: 10.1.5.3

9) After one hour and half of Camera on, ECC went to error suddenly -> hard reset. Then Two modules OFF: 10.1.5.3 and 10.1.6.28

10) One module off: 10.1.5.3

11) Human mistake -> current went to high because dragon was reset before DAQ deconnection... -> switch off/On the camera

12) One module off: 10.1.5.3

2018-12-04 Léa, Seiya, Shunsuke, Satoshi start up

1) Multiple this morning but with current busbar to 0

2) startup in the afternoon with normal version of ECC: ALL modules ON, All pixels ON

3) One module OFF: 10.1.6.28

4) ALL modules ON, All pixels ON

2018-12-04 Léa, Seiya, Shunsuke, Satoshi Data taking
With legacy DAQ to have 1024 ns window to work on BP synchronisation

- TP synchronised in all modules and all sending trigger. Some BP setting was updated since it was previously not done

- TP synchronised in all modules and all sending trigger with default BP setting from the ring distribution

- Random Trigger with EVB, all runs in 20181205

- Random trigger with writing: some TIB crashs where all the rate and digital pedestal frequency go to 0 at one moment. It really depends each trial but sometimes we can reach 6500 Hz.

- Random trigger without writing: running for 13 minutes without any crash reach 6500 camera rate with 1700 HZ of busy rate.

- last try random trigger with writting and no crash for more than 10 minutes so we are lost now...

- 3 initialization of modules without monitoring and no problem of busy modules

2018-12-05 Léa + Jean Luc and NAdia from remote ECC version
- install current version v32+small update (delay applied when coming back from error state) with an executable produced by their own. ECC to ready but current un the busbars to 0

->hard reset but then fan off so second hard reset

- Second try, still 0 in the bus bars -> hardreset and installation of the new ECC version v34 -> fan at 0 so new hard reset -> fan still at 0

- Went back to the old version v32 we are using since two months and fan ON but current in the busbars to 0 ->hardreset

- Fan ON. Try to kill the ECC programm and start it again -> Fan OFF. This test indicates that there is a communication issue between ECC and PDB and when PDB lost ECC connection, goes to a safe mode

- Try again kill ECC and start again - > Fan ON so PDB get back to ECC connection without hardreset. Then try again kill/start ECC -> fan ON few seconds and went to 0. Then try again kill/start ECC -> fans still at 0 -> hard reset

- new version v32 that cut the communication with the PDB -> fan ON after the hard reset. Two attemps of kill/start ECC and Fan still ON so clearly this indicates than Fan OFF comes from a problem of communication between ECC and PDB.

- Now coming back to the old version we are using since two months. Restart -> Fan OFF -> hard reset -> Fan OFF -> second hard reset > fan ON

Two things learn: - Fan OFF seems to be due to no communication between PDB and and ECC. It is possible that PDB doens't connect to ECC after first hard reset and then go is safe mode. Maybe be due to a time connection problem of ECC now, to investigate since this problem appears only since one week... More over normally after a hard reset, this is the first connection from ECC to PDB so PDB should wait as time as it needs and not go in safe mode.

- Executable of ECC created from remote seems to work from some instances like controlling the mode, controlling the fans etc... but for the current in the bus bars for example it always went to 0



2018-12-04 Léa, Seiya, Shunsuke, Satoshi Data taking
- Data taken with TP synchornised but trigger sent by only one module

- Data taken with TP only in central. until 15000 Hz, no busy since we are below the maximal writting speed. Then at 15 Khz, we have: 265 modules * 1344 kOctet * 15 khz= 5.3 GB = 40 Gbits/s. what we expect with the four link at 10 Gbits/s.

In /fefs/onsite/data/20181204 - Run0001.0000 to .00101: one TP in the central BP, triggering the whole camera

- Run0001.00103, .00104, .00105: TP in every modules synchronised, only module 265 sent trigger, new BP delay: it seems now we can see the pulse in the central module but we don't see it in other, to investigate...

- Run0001.0106 to .00107, .00108: old BP delay, TP in every modules synchronised, all modules sending trigger: no TP visible in the data, to investigate....

- Run0001.0109 to Run0001.0154: random trigger pedestal run. At the pedestal frequency of 3100 Hz, all pedestal frequency, collected rate, Camera rate and busy rate went to 0 at the same time.


2018-12-04 Léa, Seiya, Shunsuke, Satoshi Caco-ECC error/undefined state

- First power up, ECC went to error state whereas Caco was fine

- For one of the power up in which we stay 1h30 on ready mode ECC went to error and Caco to undefined (we don't know in which order). ECC to state 1 after disabling the _error_heart_bit but then no current in the busbar->hardreset Then impossible to get Caco back in a normal mode even after ECC to state 1-> kill and restart Cacolaucher...

- After the second hard reset, fan was off so we add to do another hardreset... Then everything ok

2018-12-04 Léa, Seiya, Shunsuke, Satoshi TIB 255 issue fixed

- We forgot to call the reset() method between each initialization of the modules. Now it works fine.

2018-12-04 Léa, Seiya, Shunsuke, Satoshi start up

3)All modules ON and all pixel ON (data taking but then TIB state 255)

4)All modules ON and all pixel ON (data taking but then TIB state 255)

5)All modules ON and all pixel ON (Caco went to undefined and ECC to error state, we don't know in wich order): For this start up we stayed one hour and half on ready mode

7)All modules ON and all pixel ON


2018-12-04 Léa, Seiya, Shunsuke, Satoshi start up

1) SiwthOn() from Caco, went to state 2 normal, but ECC to state 4 -> hard reset

2) Caco and ECC fine but one module OFF: 10.1.6.28 -> OFF/ON again

6) ECC went to error state and Caco to state undefined (We don't know in wich order...). Before hard reset, we try directly from ECC to go to state ready but current to 0 in the bus bars ->hard reset Then The fan speed were at 0 so hard reset again. Then it works fine

2018-12-03 Dirk CamerasToACTL v1.7

Installed on tcs03 and tcs04 from repository (https://cta.cppm.in2p3.fr/repo/x86_64/) and tested with/by Léa and Seiya.

2018-12-03 Léa,Seiya increase of the bus bars current

Due to the BP reset wheread EVB was still connected to the modules, current increase to 40. It is now know than the current increase when DAQ is connected and no clock distributed... We still don't know why, in study!!!

2018-12-03 Léa, Seiya Data Taking

- TP synchronised with legacy and EVB data

- Too many files created compared to the number of ZFW instances

2018-12-03 Léa,Seiya TIB/UCTS

- TIB went to state 255 even after reset. so we shut down and off the camera...

- Again, TIB went to 255 5 secdondes after reaching state 5, all rate at 1444O.

- TIB went to 255 from state 4 directly. The feeling in one day is that after 3 cycle of TIB going to 5 and then reset, it is going to 255 and we have to switch off the camera

2018-12-03 Léa Fix UCTS configuraiton

- a virtual machine was using the IP 10.4.8.4 of the UCTS.... this is why it was not possible to configure it.

- I change to the IP it should take in the future: 10.1.4.4 and now it works


2018-12-03 Léa Power up

- 1): All modules ON

-2) All module ON

-3) All module ON

-4) All module ON

-5) All modules ON

-6) module 10.1.6.40 off

-7) All module ON

-7) All module OFF (I think due to the previous increase of the current in the bus bars)

-8) All moduls ON

2018-11-30 Léa TIB/UCTS

- TIB remains in state 2 even when UCTS is configured

2018-11-29 Léa Data Taking

- Procedure of TP synchronised in all the modules

- EVB configuration file in /home/dragon/EVB/20181130

- First try, EVB conected but one module busy: 10.1.6.10 -> initialisation again of the modules

- Second try, EVB connected, all modules no busy but TIB remains at state 2 even if UCTS configuration seems ok -> I switched OFF and ON the camera...

- Third try, same than before. Try now to disconnect the cable from the WR switch to TCS07. same problem TIB remains at state 2 when UCTS is configured.


2018-11-29 Léa Power up

- Fourth startup: ECC and Caco works well, All modules ON

- Five startup: ECC and Caco works well, All modules ON

- sixth startup: ECC and Caco works well, All modules ON

2018-11-30 Léa Power up

- second start up: ECC and Caco works well but module 10.1.6.28 was OFF so I started again

-Third startup: ECC and Caco works well, all module ON. But humane mistake (mine), ECC went to Error state and then no current in the pulse bar -> hard reset

2018-11-30 Léa Power up
- From Caco, switchON() it went to his state 2, then good communication with the ECC. Then GetCameraStanby(), ECC was fine and went to state ready, all modules ON but Caco was in an undefined state so I did a sleep(), Caco recover his state ready (state=3) and ECC was still ready. I did a second call of the sleep() method to start from a clean environment and everything was fine Caco went to state safe and ECC also.
2018-11-29 Léa Power up

- after the second hardreset, Fan ON and ECC went to ready from Cacoo day finished(-:

- All modules ON, configuration for TP synchronisation in all the modules seems fine. EVB segfault in GOTOREADY s

2018-11-29 Léa Fan Off

Following the hard reset since ECC went to error, again as yesterday morning the fan were down... It is a problem of heart_beat between the PDB and ECC

- Second hard reset

2018-11-29 Léa Power up

1) Power up from Caco, powerON and GetCameraStandby(), ECC to Ready and all modules ON. Monitoring issue so go back to safe the time it is fixed

2) From Caco, SwithON, then ECC goes to errorstate 4 with _error_heart_bit to 4 without any clear reason. Caco was ok on state 2

3) Fixe the _error_heart_bit issue of ECC and try again to swtich on from Caco. Same issue, Caco state fine but ECC went to error state 4 due to _error_heart_beat at true.

4) Fixed _error_heart_beat and try directly to switch on from ECC. Works fine, ECC went to Ready but no current in the pulse bar, only the 4 one had current.

I did a hardreset

2018-11-29 Léa WR switch

RJ45 port installed on port 9 of the WR switch for the connection to tcs07

2018-11-28 Daniel K., Léa test of cluscolauncher

We tried the connection between Caco and Clusco: all fine. The current monitoring was not active because not the same files were updated. Will be fixed soon and then tested again.

2018-11-28 Daniel K., Léa fans stopped

This morning around 8:45am the fans stopped running before we arrived on site. When we arrived we noticed the ECC was still in safe state (we expected error satte but it was not the case). We checked the rest of the ECC variables and everything looked fine. Using a multimeter we checked the 400 was properly arriving to the PDB inside the camera. We contacted the ECC experts that asked for screenshots of the ECC datapoints regarding the PDB for later evaluation of the problem. Then we hard reset the ECC and fans started just fine.


2018-11-27 Daniel K., Cristobal (remote) fix of a compilation problem for ClusCo

Small fix for compilation, tested and merged to the master branch. Compilation on site works again.

2018-11-27 Daniel K. test of new ECC version

Following and more extensive tests of the control of the individual intelligent relays with the new version of the ECC. No improvment. Detailed description of the tests performed will be emailed to the experts. As a consequence the old ECC version was reinstalled for now.

2018-11-27 Daniel K., Otger (remote) installation of new version of libcluster

Following successful test of last week the fixes of libcluster were merged in the master branch and install on site.

2018-11-26 Daniel K. test of new ECC version

After some small fixes of data point "Error description" and for control of the fans, the new version of the ECC version was tested. The control of the individual intelligent relays (main update with this version) was unstable. As a consequence the old ECC version was reinstalled for now.

2018-11-23 Daniel K., Yuki, Seiya too high temprature

The status of ECC monitoring went to "red" from "green" around 16:30pm. The change of temperature we are monitoring was quite different from as usual. It may be related with the water pressure of chiller. It is above 1 and stable as usual, but it was too low at the morning and rising during the day.

Media:bptemp.JPG Media:Tempbad.JPG Media:ECCTemp.png Media:CameraPressure14-16.png

2018-11-23 Daniel K., Yuki, Seiya some network interface of osaka sometimes not running

some network interface of the osaka server doesn't start running at first every day... We activated p1p2 manually.


2018-11-23 Daniel K., Yuki, Seiya bad behaviour of mezzanine

After configuration of modules(init7 & pulse_injection_all), bad behavior of mezzanine was shown at three modules.

  • mod115: L0 & L1 trigger rate was 0, it has been no problem until yesterday.
  • mod167: L0 & L1 trigger rate was 0, it has been no problem until yesterday.
  • mod226: L0 trigger rate was 65535, but L1 trigger rate had no problem. So there may be a problem only at the line for IPR. It sometimes happened in this week.
2018-11-23 Daniel K., Yuki, Seiya take data for TP synchronization

We took data with ClusCo monitoring for the test pulse synchronization.

1) 300Hz, ROI=1024, trigger was generated by mod265, 3000events

  • This was almost all the same condition as yesterday, the difference was only that ClusCo monitoring was being done.
  • The file is /mnt/cs1/store/DragonDaqData/Data20181123/TP300HzTrigMod265RD1024Delay3028_RD1024_FEB...

2) 300Hz, ROI=40, trigger was generated by all modules, 3000events

  • During the operation, I did some mistake(mistake about the DAQ setting, so I restarted again after TIB state goes to 5. So PPS and 10MHz couter shoud be synchronized), so this result may be worse
  • The file is /mnt/cs1/store/DragonDaqData/Data20181123/TP300HzTrigModAllRD40Delay3528_again_RD40_FEB...


2018-11-23 Daniel K., Yuki, Seiya validation test of ClusCo

- the strange value of humidity

- SiTCP reset

  • This function worked well, but due to the bug in DragonFPGA it worked with only half of the camera and took too much time(~5min) to finish this command.
  • Seiya will fix this problem from DragonFW side.
2018-11-23 Daniel K., Yuki, Seiya new version of ECC

We implemented the new version of ECC. After reboot of ECC the fan didn't start working.

  • We did hardware reset(13:40pm). But this problem was still remain and ECC state went to 4(error state).
  • We changed the default setting of ECC, 1) T_safe_min -> 5, 2) disable light sensor. After hardware reset(13:50pm), the situation was same...
  • Moreover we changed the default value of T_safe_min to 2. After hardware reset(14:00), the result was same(error state and fan was still stopped)

As a result we decided to replace it with the current version of ECC. After reboot of ECC, all function worked well.

2018-11-23 Daniel K., Yuki, Seiya monitoring plots were not updated

ECC monitoring plots were not updated after 9:30am. We can get various values(temperature etc.) in OPCUA client, only monitoring plots were not updated. After reboot of ECC for the update of ECC version monitoring plots started to be updated again.

2018-11-22 Yuki, Seiya take data for TP synchronization study

I discussed with Taka, then I tried to take data as below;

  • set test pulse frequency for external reference clock
  • after that start TP synchronization

We took data with the following conditions and managed to synchronize test pulse at all modules finally.

0)

  • I wanted to take data with 300 Hz at first, but L1_local trigger rate was ~22Hz after initialization even though we set 300Hz as TP frequency.
  • So I decided to take data without changing test pulse frequency from the default one(444 444 counts for 10MHz = 22Hz)

1) 22Hz, ROI=1024, trigger was generated by mod265, 1000events

  • During initialization we didn't change test pulse frequency, so TP frequency was 22 Hz at that time.
  • The data is in /mnt/cs1/store/DragonDaqData/Data20181122/Trigger22HzRD1024...

2) 300Hz, ROI=1024, trigger was generated by mod1, 3000events

  • Before TP synchronization, we changed the test pulse frequency with "SET_TP_FREQUENCY 0 Off 33333" from ClusCo instead of "SET_TP_FREQUENCY 0 On 300". Then L1_local trigger was 300Hz with external reference clock.
  • The data is in /mnt/cs1/store/DragonDaqData/Data20181122/Trigger300HzRD1024...

Media:TPSynchNov22nd22Hz.gif Media:TPSynchNov22nd300Hz.gif

2018-11-21 Seiya home directory of osaka server was full

Home directory of osaka (/home) went to be full today.

 
 Osaka ~ > df -h
 Filesystem                   Size  Used Avail Use% Mounted on
 /dev/mapper/scientific-root   50G   22G   29G  43% /
 devtmpfs                     252G     0  252G   0% /dev
 tmpfs                        252G     0  252G   0% /dev/shm
 tmpfs                        252G   50M  252G   1% /run
 tmpfs                        252G     0  252G   0% /sys/fs/cgroup
 /dev/sdb                      15T  8.5T  5.3T  62% /mnt/cs1
 /dev/sda1                    497M  272M  226M  55% /boot
 /dev/mapper/scientific-home  504G  504G   20K 100% /home
 tmpfs                         51G   12K   51G   1% /run/user/42
 tmpfs                         51G  4.0K   51G   1% /run/user/1000
 tmpfs                         51G     0   51G   0% /run/user/1001
 tmpfs                         51G     0   51G   0% /run/user/1002

Almost all of files(~80%) are the data taken by LegacyDAQ for the tests and in /home/dragon/IACMiniCamSetup/DragonDaqM

Osaka DragonDaqM > du -sh .
417G

So I moved the data taken by LegacyDAQ to /mnt/cs1/store/DragonDaqData temporary. (We could transfer those data on the Lustre sytem (/fefs/ on tcs) later.)

2018-11-21 Daniel K., Seiya take data with LegacyDAQ for EVB tests

Julien wants to use raw data of full camera for EVB debug tests. We took data with LegacyDAQ by random trigger(300Hz), which is digital pedestal trigger TIB generated. These files are in /mnt/cs1/store/DragonDaqData/Data20181121.

I wanted to take 30min data(300Hz*(60*30)=540,000 events), but the disk in osaka server went to be full during the test. The size of each file is ~219MB, which is equivalent to ~168,000 events and ~10min data.


2018-11-21 Seiya how to run again the network interface

Some network at osaka server sometimes stopped running.

p2p2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 10.1.6.192  netmask 255.255.255.128  broadcast 10.1.6.255
        inet6 fe80::a236:9fff:fef0:ccd6  prefixlen 64  scopeid 0x20<link>
        ether a0:36:9f:f0:cc:d6  txqueuelen 1000  (Ethernet)
        RX packets 68478703  bytes 95858361688 (89.2 GiB)
        RX errors 1  dropped 9  overruns 0  frame 1
        TX packets 30112278  bytes 1622848602 (1.5 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

At that time, we should do for restart running;

  • sudo ifconfig <name of interface> down
  • sudo ifconfig <name of interface> up
2018-11-21 Otger, Daniel K., Seiya ECC went to error state

I used ClusCo@tcs01 for the monitoring, all the plots except "Amp. Temp" was updated indeed. After that, I did init7 from ClusCo@cacooperator and waited the update of "Amp. Temp" plot. At that time, ClusCo@tcs01 showed timeout, so I realized I cannot ping these modules and relay current went to 0 and ECC state went to error state(4). I powered up again and ECC status went to 2(ready) as usual, but relay current was 0.

Taka explained why relay current was still 0 as below;

When ECC goes to Error state, relay modules are also in a strange state. You need to reset relay modules as well.
 However, even if you go to "safe" state in ECC, relays are still powered (not bus bars, but relay modules). That means, "safe" does not reset relays.

Lea explained why ECC went to error as below;

Maybe what is possible also is that you lost the slow control connection during few seconds and then get it back without realising. Then If the modules are ON and that we lost the slow control connection, ECC goes to error and the relay current will remain at 0 as Taka explained.

We did hardware reset three times(15:00, 15:50, 16:45), but the situation was same. This ECC error state seemes to be caused by loss of heart beat of CaCo. We survived without CaCo (directory use ECC) for data taking today.


1228269 [PublishTask-com.prosysopc.ua.client.UaClient@166f6c4f] ERROR com.prosysopc.ua.client.UaClient - Exception in ServerStatusListener
java.lang.ClassCastException: cat.ifae.cta.opcua.dataaccess.uaobjects.OPCUAVariable$DataInformation cannot be cast to java.lang.Integer
	at cat.ifae.cta.cameracontrol.server.base.clients.ecc.OPCUAECCControl$ECCVariableStatus.update(OPCUAECCControl.java:25)
	at java.util.Observable.notifyObservers(Observable.java:159)
	at cat.ifae.cta.opcua.dataaccess.basicobjects.BasicCallbackVariable$ObservableVariable.setValue(BasicCallbackVariable.java:36)
	at cat.ifae.cta.opcua.dataaccess.uaobjects.OPCUAAssembly._newStateWarn(OPCUAAssembly.java:533)
	at cat.ifae.cta.opcua.dataaccess.uaobjects.OPCUAAssembly.consumeMessage(OPCUAAssembly.java:526)
	at cat.ifae.cta.opcua.dataaccess.uaobjects.OPCUAServerStatusListener.statusChanged(OPCUAServerStatusListener.java:59)
	at cat.ifae.cta.opcua.dataaccess.uaobjects.OPCUAServerStatusListener.onStateChange(OPCUAServerStatusListener.java:33)
	at com.prosysopc.ua.client.UaClient.a(Unknown Source)
	at com.prosysopc.ua.client.UaClient.updateServerStatus(Unknown Source)
	at com.prosysopc.ua.client.UaClient$a.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:745)
1228371 [PublishTask-com.prosysopc.ua.client.UaClient@166f6c4f] WARN com.prosysopc.ua.client.Subscription - Server sent a previously acknowledged sequence number 0 for Subscription 47786
1228372 [PublishTask-com.prosysopc.ua.client.UaClient@166f6c4f] INFO org.opcfoundation.ua.transport.tcp.io.SecureChannelTcp - 47856 Closed
1228372 [PublishTask-com.prosysopc.ua.client.UaClient@166f6c4f] INFO org.opcfoundation.ua.transport.tcp.io.TcpConnection - /10.1.4.66:4841 Closed
1228373 [TcpConnection/Read] INFO org.opcfoundation.ua.transport.tcp.io.TcpConnection - /10.1.4.66:4841 Closed (expected)

2018-11-21 Otger, Daniel K., Seiya dhcpd server for TIB restart

DHCPd server for TIB stopped due to the shutdown of tcs01 yesterday, so we activated the server as below,

ifae@tcs01 ~]$ sudo service dhcpd status
Redirecting to /bin/systemctl status  dhcpd.service
● dhcpd.service - DHCPv4 Server Daemon
   Loaded: loaded (/usr/lib/systemd/system/dhcpd.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:dhcpd(8)
           man:dhcpd.conf(5)
[ifae@tcs01 ~]$ sudo service dhcpd start
Redirecting to /bin/systemctl start  dhcpd.service
[ifae@tcs01 ~]$ sudo service dhcpd status
Redirecting to /bin/systemctl status  dhcpd.service
● dhcpd.service - DHCPv4 Server Daemon
   Loaded: loaded (/usr/lib/systemd/system/dhcpd.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2018-11-21 09:14:08 WET; 2s ago
     Docs: man:dhcpd(8)
           man:dhcpd.conf(5)
 Main PID: 453 (dhcpd)
   Status: "Dispatching packets..."
   CGroup: /system.slice/dhcpd.service
           └─453 /usr/sbin/dhcpd -f -cf /etc/dhcp/dhcpd.conf -user dhcpd -group dhcpd --no-pid

Nov 21 09:14:08 tcs01 dhcpd[453]: All rights reserved.
Nov 21 09:14:08 tcs01 dhcpd[453]: For info, please visit https://www.isc.org/software/dhcp/
Nov 21 09:14:08 tcs01 dhcpd[453]: Not searching LDAP since ldap-server, ldap-port and ldap-base-dn were not specified in...ig file
Nov 21 09:14:08 tcs01 dhcpd[453]: Wrote 0 deleted host decls to leases file.
Nov 21 09:14:08 tcs01 dhcpd[453]: Wrote 0 new dynamic host decls to leases file.
Nov 21 09:14:08 tcs01 dhcpd[453]: Wrote 4 leases to leases file.
Nov 21 09:14:08 tcs01 dhcpd[453]: Listening on LPF/ens1f0/a0:36:9f:eb:51:34/10.1.0.0/16
Nov 21 09:14:08 tcs01 dhcpd[453]: Sending on   LPF/ens1f0/a0:36:9f:eb:51:34/10.1.0.0/16
Nov 21 09:14:08 tcs01 dhcpd[453]: Sending on   Socket/fallback/fallback-net
Nov 21 09:14:08 tcs01 systemd[1]: Started DHCPv4 Server Daemon.
Hint: Some lines were ellipsized, use -l to show in full.
2018-11-20 Daniel K., Seiya ClusCo monitoring restart ClusCo monitoring map was not updated after the shutdown of tcs01. We contacted with Carlos and Carlos and they restarted it again.Now it works.
2018-11-20 TCS01 shutdown One of the memory cards of tcs01 is damaged and will be replaced

by an authorized technician today starting 9am La Palma time. We will shutdown the server before that and once the card is exchanged we start it up again.

2018-11-19 Seiya, Daniel K. cannot connect with some modules With the configuration2(100Hz,ROI=1024) we could not connect some modules(IP10.1.6.148-173) and they still busy(busy state=1).

After the re-initianlization, this problem disappeared.

2018-11-19 Seiya, Daniel K. Test pulse data with DragonDaqM(LegacyDAQ) We took test pulse datat with the following conditions;

1) 300Hz, ROI=1024, trigger was generated by mod265 (for reproducing the problem)

  • File name is "TP300HzTrigMod265RD1024Delay3028RD1024_***"

2) 100Hz, ROI=1024, trigger was generated by mod265 (suggested by Taka)

  • File name is "TP100HzTrigMod265RD1024Delay3028RD1024_***"

3) 300Hz, ROI=1024, trigger was generated by mod265 (suggested by Taka)

  • I sent each commands by hand and checked the registers(register8 & scalar) after PPS disable.
  • It seems PPS disable worked well.
  • File name is "TP300HzTrigMod265RD1024Delay3028_CHECKEDRD1024***".

4) 100Hz, ROI=1024, trigger was generated by mod265

  • I set test pulse frequency before PPS synchronization.
  • File name is "TP300HzTrigMod265RD1024Delay3028_TPconfigSynchroRD1024***".
2018-11-19 Seiya, Daniel K. 24V supply problem We powered up the camera with the usual procedure, but only one busbar(the 4th one) worked and others didn't work. We tried this procedure again, but the result is the same(only the 4th busbar worked).So we switched off and on the camera breaker around 15pm. Fan didn't start to work at first, so I switched on and off the breaker again and fan started to work. After that we can power up the whole cameras.


2018-11-13 Mitsunari, Daniel K. Software deployment All the setup (except the uaexpert for ecc, tib and ucts) to control, monitor and take data with the camera was moved to the LST_CALP iMac (+ 1 screen) of the commissioning container.
2018-11-12 Mitsunari, Daniel K. Test pulse data with DragonDaqM Test pulse data were taken by DragonDaqM triggering by the module 264, which did not have a test pulse on 11-09.
2018-11-12 Mitsunari, Satoshi Connect tcs07 to White Rabbit WR switch management port and Management switch (mgtsw2 port 42) are connected by a Ethernet cable. Mitsunari tried to change the IP of the WR switch to 10.200.10.140, which is in VLAN 1001, but I failed. The WR interface file dot-config was not found in spite of the WR manual. Even when we created the file by ourselves, it was lost after rebooting.
2018-11-12 Mitsunari, Daniel K., Carlos Diaz Software deployment Installing and compiling caco, cacoconsole, cacogui on tcs01 under /home/ifae/development. Compiling /home/ifae/clusco on tcs01 and adapting monitoring from CIEMAT. Setting up one additional screen for monitoring to the imac (monitoring computer), adding two forms (one for powering on the camera, one for shutting it down) to be filled by the operators.
2018-11-09 Mitsunari, Daniel K. Test pulse data with EVB Test pulse data were taken by EVB waiting PPS reaching all modules for 2 s. For the read depth 40, DAQ seemed to be successful. For the read depth 1024, however, the data were not stored.
2018-11-09 Mitsunari, Daniel K. Test pulse data with DragonDaqM Test pulse data were taken by DragonDaqM waiting PPS reaching all modules for 2 s. The waveform data of six modules besides the central one were checked, and five modules had test pulses though the other module (No. 0) did not.
2018-11-03 Mitsunari Test pulse injection timing Test pulse data were taken with L1 threshold which all modules can produce camera trigger. According to the data, the timing of test pulse injection distributes aver ~70 ms. Test pulse injection rate: 1 Hz, Read depth: 40, Sampling speed: 1 GHz
2018-11-03 Mitsunari Test pulse injection timing Test pulse data were taken with L1 threshold which all modules can produce camera trigger. According to the data, the timing of test pulse injection distributes aver ~70 ms. Test pulse injection rate: 1 Hz, Read depth: 1, Sampling speed: 5 GHz
2018-11-02 Mitsunari Test pulse data with EVB Data for investigating the test pulse issue were taken with EVB but seems to be failed. This should be inspected.

Pulse rate: 300Hz, Read depth: 1024, Event number: ~9000, /fefs/onsite/data/20181102

2018-11-01 Mitsunari Large data with random trigger Data of ~10^5 events were taken for pedestal random tirgger, EVB, the read depth 40 slices, and the dealy 3528 ns. The data are stored in /fefs/onsite/data/20181101.
  • 1kHz: Run 0001.0275-0001.0288
  • 2kHz: Run 0001.0289-0001.0315
2018-11-01 Mitsunari Avoiding TIB State 255 The TIB state can go to 5 without resetting at state 255 by a combination of reseting TIB at state 0 and configuring dragons without resetting BPs.
  • ECC->SetMode(2)
  • TIB->Reset()
  • TIB->DisablePPS()
  • TIB->ResetRun()
  • ClusCo->Main->@config/init7_woBPreset
  • UCTS->XMLConfiguration
  • UCTS->Start()
  • TIB->EnableTrigger()

Mitsunari repeated this procedure four times and succeeded for all of them. DAQ also seemed to be successful at the last trial. (At the first three trials, DAQ failed because of another reason.)

2018-10-31 Mitsunari TIB State 255 problem init7 without BP reset at the beginning was tested. The first trial failed, namely, the state turned out to be 255. However, TIB state directly went to 5 In the second trial, when TIB was Reset just after turning on Camera. This behavior should be confirmed later.
2018-10-31 Mitsunari Check for test pulse synchronization It should be confirmed whether TenMHz counter vaue is idential among the modules for each test pulse event. Data for the check were taken by DragonDaqM with 300Hz. L1 threshold was set so that only the central module sent triggers. The data were stored in /home/dragon/IACMiniCamSetUp/DragonDaqM/Data20181031. TenMHz counter appeared to be synchronized, but it should be confirmed.
2018-10-31 Oscar, Mitsunari PDB Fixation

PDB fixation: the fixation of the from plate is done know throw a screw and nut fixed to the back plate using a mixture to attach metals (Pattex Nural 21) and an additional nut to fix the front plate.

We have started Modules twice with one hour break in between. Both times all Dragons and BP went up.

2018-10-30 Taka, Mitsunari, Julien, Dirk Random trigger runs with EVB

Two runs (#30, #31) taken at various trigger rates as documented in Run Catalog and Slack.

Corrected pixel map implemented (spiral numbering).


2018-10-29 Oscar, Taka, Mitsunari Power up

The Dragon with IP 10.1.6.28 (3rd column starting by the left from outside, 5th modules from below) was put in the busbar powered by relay 1 instead of 0. In exchange, module in 4th column 5ht from b below was put in the relay 0 instead of relay 1. Camera was powered up only once and all modules and BP went up.


2018-10-27 Taka, Mitsunari Random Trigger

We took the random trigger. Following the instruction with Lea, random trigger could be easily produced. With DragonDaqM,

300 Hz injection -> 300 Daq rate.

1k Hz-> 783 Hz

3k Hz-> 1162 Hz

6.5k Hz -> 1303 Hz.

With EVB, we first tried with 6.5 kHz. Then EVB crashed because of buffer full. But busy state of modules was 03, which means EVB are connected and modules were busy. To recover from this state, we had to reboot Dragons. A few minutes later, Carlos Diaz called us. The current consumption at bus bars are ~10Amp higher than usual. Normally 25-27 Amp but after rebooting Dragons, it was 35 Amp. We shutdown the 24V. After 10 min or so, Carlos allowed us to restart. All Dragons could be communicated from cacoserver, but not from Osaka. ip link set p*p* down/up didn't help. We rebooted Osaka. Then Osaka could ping to all (but one) modules. However, EVB didn't work. Later we learned from Dirk and Julien that we had to do

sudo modprobe -r ixgbe; sudo modprobe ixgbe


2018-10-27 Oscar, Laia , Taka, Mitsunari Power up

After checking that Dragon and BP regulators can stand input voltage above 30 V, we increased the power provide by the Power Supplies to 27.5V (the same for the 8 Power Supplies).

With this configuration, the voltage while ramping up increase up 20.3 V and then only decreases to 19.8 V for about 1 ms. This should be completely find for the Dragons.

We power up the camera with the ECC 10 times. All BP went up all times. Only one Dragon (always the same) does not power up the first time after a ~1 hour break (tried two times), after this first power up all Dragons power up.

2018-10-26 Taka, Mitsunari TIB state machine.

We tried to solve the "State 255" problem in TIB. Luis Angel suggested to configure modules at state 2. We followed his instruction, but we reached state 255. So we tried modules configuration at state 0. Same result. We tried module configuration at state 4, resulting in the same state 255.

We also tried to see the test pulse postion to the center of the readout window. But we could not see the test pulse at all. Delay setting in TIB or backplane is not correct.


2018-10-26 Oscar, Laia , Taka, Mitsunari Power up

The drop in the voltage is due to a current limit in the circuitry of the relay. Increasing the voltage of the power supplies should rise the value of the dip in the voltage so that it does not reach 18V.

We measure again the transients for relay 0 with Power Supply at 24.98 V as reference. We increase the voltage of Power Supplies to 25.25 V, the dip is about 100 mV higher.

2018-10-25 Taka, Mitsunari Yusuke Event Mixing

We understood the origin of EventMixing. It is due to the slow control command "Dragon - Start" after "Enable Trigger" in TIB. "Enable Trigger" should have been after "Dragon Start". This is dangerous actually. Mistake will be noticed only during analysis.


2018-10-25 Oscar , Laia Power up

No water was found inside the camera. We measure the voltage at the output of the Redundancy modules: 24.98 V We connect a Current sensor between master bus bar and relay 0. We power up relay 0 and measure transient for both current and voltage: - Voltage shows a drop of around 1.5 V once it arrive at 20V, which is afterward (4 ms) recovered and keeps increasing until about 24.5 V - Current increases steadily with a small slope change on the drop on the voltage happens. It also show a drop of about 30% when the voltage reach 24.5 V that it recovers after about 80ms after

The voltage reduction for 4ms brings the voltage very close to 18V, and some times may go slightly down.

The same is observed in relay 1.

2018-10-21 Taka, Mitsunari, Yusuke Timing Calibration.

We tried to see the test pulse in the center of window. But we did not succeed. DAQ was with EthDisp from Taka's macbook through slow control network. We need to understand the delay in TIB and backplane. Since it was already 5:50 pm, (though we announced that we use camera until 5:00 pm) we had to shutdown. We kept 230 and 400V on, chiller on, only 24V off.

2018-10-21 Taka, Mitsunari, Yusuke Event Mixing Test

To confirm again the event mixing problem, we took data with the LegacyDaq. After init7.uic, we injected the test pulse in the central module with 300 Hz. TIB could see the rate properly. We took 20000 events. After that, we tried to take data with EVB, but it was not successful. EVB could not connect to all modules. We had the same problem a few times in a row. One of the reasons was dead ports in Osaka. Sometimes, ports in Osaka sleep without obvious reason. This is actually critical problem. We need to investigate further. Finally we gave up to take data with EVB.

2018-10-21 Taka, Mitsunari, Yusuke TIB/UCTS study

After power up, we tried to initialize TIB. But state didn't reach "5". After state 4, if we enable trigger, state went to 255. We knew that the RJ45 cable on the WR was damaged by the rack door. We changed it to new cable. We also used a different port in WR (port 8->5).And we reset TIB. Then with the standard procedure, state reached 5. We were happy. Just to be sure, we changed back to the damaged cable and retried. Then state was again 5. So, the reason was not the cable. But "Reset" of TIB was the key.

After initializing PMT modules, TIB didn't work well. It didn't send back the trigger. Since temperature was too high (BP 35 deg.) We had to switch off the 24V. During this break, we changed the WR port from 5 to 8.

After power up, we repeated the procedure. Again, TIB didn't send back the trigger. But, TIB reset helped. So, currently, startup recipe is that 0->1->2->3->4->255->TIB Reset->0->1->2->3->4->5->configure modules -> TIB Reset -> 0 -> 1 ->2 ->3 ->4 ->5.


2018-10-21 Taka, Mitsunari, Yusuke Restart the Camera.

Before powering up 400V for the first time since last Tuesday, we examined the camera visually. Camera is properly parked. There was a water condensation on the camera body. The platform is not perfectly closed. There was a 2 cm gap between left and right. But it is not dangerous for us. At 11:45, we applied 400 V putting the breaker at the Drive container. After 15 min of stabilization, we started 24V from ECC (state ready). Then, we realized that TIB and UCTS do not respond on Ping. It was because dhcpd on tcs01 was dead. Also, uctsd on Osaka was dead We restarted dhcpd and uctsd and switched off and on 24V. Then, TIB, UCTS could be booted.|


2018-10-15 Dirk UCTSd dead.
● uctsd.service - Execute the UCTS OPC-UA server
   Loaded: loaded (/etc/systemd/system/uctsd.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Di 2018-10-16 13:22:28 WEST; 2h 59min ago
  Process: 152844 ExecStart=/home/dragon/ucm_temp/ucts_opcua_server.sh (code=exited, status=134)
 Main PID: 152844 (code=exited, status=134)

Okt 16 13:22:28 osaka ucts_opcua_server.sh[152844]: (MOS) : Info : 2018-10-13.12:30:47 : Connected to Server : opc.tcp://osaka:48010
Okt 16 13:22:28 osaka ucts_opcua_server.sh[152844]: 
Okt 16 13:22:28 osaka ucts_opcua_server.sh[152844]: (MOS) : Info : 2018-10-13.12:30:47 : Verification of MOS version with lappweb
Okt 16 13:22:28 osaka ucts_opcua_server.sh[152844]: 
Okt 16 13:22:28 osaka ucts_opcua_server.sh[152844]: ********************
Okt 16 13:22:28 osaka ucts_opcua_server.sh[152844]:  Press CTRL-C to shutdown server
Okt 16 13:22:28 osaka ucts_opcua_server.sh[152844]: /home/dragon/ucm_temp/ucts_opcua_server.sh: line 9: 152847 Aborted (core dumped) ./MOS_Device -d /MOS/plugins/Plugin_UCTS/UCTS.xml
Okt 16 13:22:28 osaka systemd[1]: uctsd.service: main process exited, code=exited, status=134/n/a
Okt 16 13:22:28 osaka systemd[1]: Unit uctsd.service entered failed state.
Okt 16 13:22:28 osaka systemd[1]: uctsd.service failed.

Restarted.

2018-10-15 Taka MOXA Switch connected SLOW control connection intact. Drive network can be used from remote tomorrow.
2018-10-16 Léa, Dirk, Julien, taka, Saiya, Mitsunari Modules deconection

- It happens two times today that after around 25 minutes, around 15 modules were not powered anymore whereas ECC was in state 2 and current in the pulse bar. At the newt switch ON, ALL powered

2018-10-16 Léa, Dirk, Julien, taka, Saiya, Mitsunari uaexpert deconnection

- Again, we lost uaexpert that was completely stuck so to have the monitoring back again the DataLogger are now written in /home/cacooperator/CoolingSystem/20181016_003 and 20181016_004

2018-10-16 Léa, Dirk, Julien, taka, Saiya, Mitsunari TIB issues

- TIB goes from state 0 to 4 but then when we enable trigger go to state 255 as the alarms vector

2018-10-16 Léa, Dirk, Julien, taka, Saiya, Mitsunari Small run summary

- 7 GotoSafe and GoToReady for the ECC due to too high temperatures so switch ON/Off of the Modules/BP:

1) All module ON, 2 BPs OFF associated to module 10.1.6.12 and 10.1.6.27

2) All module ON, 2 BPs OFF associated to module 10.1.6.24 and 10.1.6.27

3) All module ON, 1 BP OFF associated to module 10.1.7.171

4) All modules and BPs ON

5) All modules and BPs ON

6) All modules and BPs ON

7) All modules and BPs ON

8) All modules and BPs ON

9) All modules and BPs ON

10) All modules and BPs ON

11) All module ON, 2 BPs OFF associated to module 10.1.7.147 and 10.1.7.149

12) All module ON, 2 BPs OFF associated to module 10.1.7.147 and 10.1.7.149

13) ALL modules ON

14) ALL modules ON


2018-10-15 Dirk Charging Walkie-Talkies with our private mini-USB adapters, while waiting for the real charger to reappear Alternative: $8.99 on Amazon


2018-10-15 Léa, Dirk, Julien, taka, Saiya, Mitsunari Slow control and uaexpert deconnection
- Slow control connection lost in ready mode so then no more current in the pulse bar. GotOsafe GOtOready still no curent with the negative value in the pulse bar. We had to switch off and on the 233 and 400 V

- We lost uaexpert that was completely stuck so to have the monitoring back again the DataLogger are now written in /home/cacooperator/CoolingSystem/20181015_005 and 20181015_006

2018-10-15 Léa, Dirk, Julien, taka, Saiya, Mitsunari Small run summary

- 7 GotoSafe and GoToReady for the ECC due to too high temperatures so switch ON/Off of the Modules/BP:

- Cut busy propagation from BP, dragon on local clock

1) 1 module OFF: 10.1.6.28, 2 BPs OFF associated to module 10.1.6.24 and 10.1.6.27

2) 1 module OFF: 10.1.6.28, 3 BPs OFF associated to module 10.1.6.24, 10.1.6.27 and 10.1.7.147

3) 1 module OFF: 10.1.6.28, 1 BP OFF associated to module 10.1.7.147

4) All modules ON, 2 BPs OFF associated to module 10.1.7.147 and 10.1.7.149

5) All modules ON and All BPs ON

- pb of internal/external trigger clock for the Dragon fixed: For DRS4, referential clock is now 10 MHz external clock. - Configuration of UCTS and TIB, two last runs taken with the TIB so with external clock, external trigger and busy propagation. 6) 1 module OFF: 10.1.5.16, 3 BPs OFF associated to module 10.1.6.27 and 10.1.7.146 and 10.1.7.149

7) 1 module OFF: 10.1.6.28, 1 BP OFF associated to module 10.1.6.28

8) ALL modules On


2018-10-15 Léa, Taka, Dirk, Daniel SLOW Control lost

While Camera was on, the SLOW control connection was interrupted in the Drive container to prepare connection of Drive/AMC network.

Consequently the EMC went to SAFE. But also the UaExpert interface was stuck (which is the current base for Camera monitoring). The setup was then restored as well as we could, including DataLogger function.

2018-10-15 Léa, Taka, Julien, Seiya, Dirk writing speed limitation in data taking

1 ZFW: validate speed 300 MB writing speed 8 ZFW: validate 8* 300 MB/s writing speed 16 ZFW: writing speed: 16*150 MB writing speed. Maybe problem due to the disk. To investigate

2018-10-15 Léa, Dirk, Julien Slow control deconnection+ disconnect from OPC-UA
2018-10-14 Léa, Dirk, Julien Small run summary

Runs0016-0019

- No TIB/UCTS

- Cut busy propagation from BP, dragon on local clock

- 7 GotoSafe and GoToReady for the ECC due to too high temperatures so 7 switch ON/Off of the Modules/BP

1) All modules ON, 2 BPs OFF associated to module 10.1.6.24 and 10.1.6.27

2) All modules ON, didn't check the BPs

3) All modules ON, didn't check the BPs

4) All modules ON, 1 BPs OFF associated to module 10.1.7.147

5) All modules ON, 2 BPs OFF associated to module 10.1.7.147 and 10.1.7.171

6) All modules ON, 1 BPs OFF associated to 10.1.7.147

7) All modules ON, 1 BP OFF associated to module 10.1.7.147


2018-10-14 Eric, Dirk All (DATA) fibres straight now!
  • There are straight and crossed fibre patch cords (AB->AB and AB->BA)! They are obviously used indifferently and mixed on our site. :-(
  • The fibres in the IC-PP that are connected to the couplers are all yellow! (No colour code to trace them.)
  • We have chosen the same convention as on the transceivers for input/output of the LC connectors
  • Problem/drawback: All fibres at the IC-PP are now reversed. Need to think/investigate that (last?) point.
  • All PP boxes now closed and secure. Should not be touched any more without agreement by INFRA experts!
Cisco-Transceiver 13927.jpg
2018-10-14 Dirk Direct measurement of TX lasers

INFO: Direct measurements can be done without danger for Photom-211

測定範囲 -70 ~ +5dBm  

according to datasheet. That is 3.16 mW to 0.1 µW.

2018-10-14 Léa, Dirk, Julien First full-cam data run up to 15kHz! That is what we would have liked to see last week.

Now it's Champagne time. :-)

2018-10-14 Julien, Eric Fibres DATAsp1-6 tested Optically, between DC and Cam. Data1-6spare measurements 20181014.jpg
2018-10-13 Léa, Dirk, Julien Run0015 Still no UCTS(/TIB); fibre broken between DC and Cam. Running with half-cam and two additional missing modules (BP problem): 6.24, 6.27.

- r0015 all events (at runstart), 300Hz and 10kHz, but ZFW problems (testing with 16 instances).

ALL Door knob! Falling apart from the CC door. Urgent action needed. (Bigger screw?)
Taka, Seiya, Mitsunari Fiber Check We checked optical connection between DC and Camera because some labels were lost due to UV damage. We checked Data2, Data 6, SlowControl and UCTS. Only UCTS had a problem (no splicing at Drive PP). The rest where OK.
Taka, Seiya, Mitsunari Labeling fibers We labeled optical fibers of the data (DATA 1 - 6) at patch panels in Drive contaniner and in IT container. The spare cables have not been done yet because a ribbon ran out.
Seiya, Mitsunari Connection validation We validated 12 optical fiber connection (No. 1-6, 13-18) from Drive coontainer to IT container. Strength is -35 to -38 dBm.
2018-10-12 Léa, Taka, Julien, Seiya, Dirk Runs0012 sqq. No UCTS/TIB today.

These runs have 3 modules missing (as identified in the preparatory phase: 6.21, 6.25, 6.28). According to a quick check, all EventNb=TriggerNb otherwise for all runs today. See RunCatalog for details.

Dirk Creation of logbook
Dirk, Taka, Julien, Seiya, Léa Data acquisition

- Pb with the ClusCo on tcs01. The root propagation for the BPs for the trigger doesn't work. Using exactly the same script it works on CacoOperator.

- We validate for 3 fibers the new connections to the dataswitch fiber. Eric is fixing the one missing or broken. So for now only the right part of the camera is used for data acquisition

- No TIB/ UCTS

- Few runs were taken with no external trigger from TIB. 3 Modules didn't appear busy but didn't sent any data. In those test the busy from the CBP was cut. Those missing modules have to be investigated in more details but due to a lot of slow control deconnection problem and high temperature in the camera it was not possible. Script used in CLusCo: init7_noextTrigger_Test.uic. To not cut the busy, name is: init7_noextTrigger_noBusyCut_Test.uic

- One try with no external trigger and clock but with the CBP delivering the clock and pps and using 10 MHz clock as default clock for the dragons. L1 local Trigger didn't generated. No we come back to a configuration of dragon on their local clock but this issue has to be investigated. Script used in CLusCo: init7_noextTriggerClock_Test.uic

Dirk, Taka, Julien, Seiya, Léa Too high temperatures in the Camera

- Limit fix to 27 for the aire temperature inside and 35 for the BP temperature. Pression also get some alarms

- During the day, due to high temperatures we have to gotosafe to wait for the camera to cool almost 11 times but never the BP max temperature went more than 34 degree. The air inside reach at the maximum 26.5.

Dirk, Taka, Julien, Seiya, Léa ECC lost connection (2 times!)

- In the afternoon, 3 lost of ECC slow control communication due to the interruption between IT-Container/Driver-Container. Miscomunnication with AMC people... First time, temperature was already high in the camera, we had to switch OFF the 233 and 400V for security reasons. Two other times, we get the ECC connection back quite fast and ECC was in the same state that when the connection was lost meaning state 2 ready. Just no more current in the pulse bar so we have to gotosafe and gotoready both times. After that the current was -247 in the pulse bar... Not understood for the moment - The second interruption happened, when the Moxa switch was reconnected, probably not correctly configured. It was disconnected again. Presently this impacts AMC and drive operation, until the Moxa can be reconnected.

Léa, Taka, Julien, Seiya, Dirk Discovered SLOW control fiber lost, fibers changed connection recover Interruption. Using UCTS section for replacement.
2018-10-11 Eric, Armand Cable splicing UCTS fibres ready and checked.
Léa, Taka, Julien, Seiya, Dirk DATA5-upstream broken Located between DC-PP. and IC-PP. Eric is going to have a look on Friday, when working on the other (spare) fibres.
Léa, Taka, Julien, Seiya, Dirk Found correct order of DATA1-DATA6 We eventually found that the fibres DATA1-6 were connected in (exactly) wrong order to the camera, which lead to a mismatch of switches/modules with respect to interfaces/addresses in osaka.

This is an item for our "learned lessons": The indoor fibres had been labelled (switch-interface), but stayed in Mirca. The new fibres had been confectioned at ORM, and labels had to be "guessed" in one way or the other.

Glossary

  • CC = Commissioning Container (present LST1 Control Room)
  • DC = Drive Container
  • IC = IT-Container