Logbook CameraCommissioning ORM Oct18

From my_wiki
Revision as of 20:14, 23 November 2018 by Nozaki (talk | contribs) (Enter comments in reverse time order)
Jump to: navigation, search

Enter comments in reverse time order

Glossary at the end!

Date Actor/Author Action summary Comments Documents
2018-11-23 Daniel K., Yuki, Seiya too high temprature
2018-11-23 Daniel K., Yuki, Seiya p1p2 not running
2018-11-23 Daniel K., Yuki, Seiya mezzanine behaviour
2018-11-23 Daniel K., Yuki, Seiya take data for TP synchronization
2018-11-23 Daniel K., Yuki, Seiya validation test of ClusCo

- the strange value of humidity

- SiTCP reset

  • This function worked well, but due to the bug in DragonFPGA it worked with only half of the camera and took too much time(~5min) to finish this command.
  • Seiya will fix this problem from DragonFW side.
2018-11-23 Daniel K., Yuki, Seiya new version of ECC

We implemented the new version of ECC. After reboot of ECC the fan didn't start working.

  • We did hardware reset(13:40pm). But this problem was still remain and ECC state went to 4(error state).
  • We changed the default setting of ECC, 1) T_safe_min -> 5, 2) disable light sensor. After hardware reset(13:50pm), the situation was same...
  • Moreover we changed the default value of T_safe_min to 2. After hardware reset(14:00), the result was same(error state and fan was still stopped)

As a result we decided to replace it with the current version of ECC. After reboot of ECC, all function worked well.

2018-11-23 Daniel K., Yuki, Seiya monitoring plots were not updated

ECC monitoring plots were not updated after 9:30am. We can get various values(temperature etc.) in OPCUA client, only monitoring plots were not updated. After reboot of ECC for the update of ECC version monitoring plots started to be updated again.

2018-11-22 Yuki, Seiya take data for TP synchronization study

I discussed with Taka, then I tried to take data as below;

  • set test pulse frequency for external reference clock
  • after that start TP synchronization

We took data with the following conditions and managed to synchronize test pulse at all modules finally.

0)

  • I wanted to take data with 300 Hz at first, but L1_local trigger rate was ~22Hz after initialization even though we set 300Hz as TP frequency.
  • So I decided to take data without changing test pulse frequency from the default one(444 444 counts for 10MHz = 22Hz)

1) 22Hz, ROI=1024, trigger was generated by mod265, 1000events

  • During initialization we didn't change test pulse frequency, so TP frequency was 22 Hz at that time.
  • The data is in /mnt/cs1/store/DragonDaqData/Data20181122/Trigger22HzRD1024...

2) 300Hz, ROI=1024, trigger was generated by mod1, 3000events

  • Before TP synchronization, we changed the test pulse frequency with "SET_TP_FREQUENCY 0 Off 33333" from ClusCo instead of "SET_TP_FREQUENCY 0 On 300". Then L1_local trigger was 300Hz with external reference clock.
  • The data is in /mnt/cs1/store/DragonDaqData/Data20181122/Trigger300HzRD1024...

Media:TPSynchNov22nd22Hz.gif Media:TPSynchNov22nd300Hz.gif

2018-11-21 Seiya home directory of osaka server was full

Home directory of osaka (/home) went to be full today.

 
 Osaka ~ > df -h
 Filesystem                   Size  Used Avail Use% Mounted on
 /dev/mapper/scientific-root   50G   22G   29G  43% /
 devtmpfs                     252G     0  252G   0% /dev
 tmpfs                        252G     0  252G   0% /dev/shm
 tmpfs                        252G   50M  252G   1% /run
 tmpfs                        252G     0  252G   0% /sys/fs/cgroup
 /dev/sdb                      15T  8.5T  5.3T  62% /mnt/cs1
 /dev/sda1                    497M  272M  226M  55% /boot
 /dev/mapper/scientific-home  504G  504G   20K 100% /home
 tmpfs                         51G   12K   51G   1% /run/user/42
 tmpfs                         51G  4.0K   51G   1% /run/user/1000
 tmpfs                         51G     0   51G   0% /run/user/1001
 tmpfs                         51G     0   51G   0% /run/user/1002

Almost all of files(~80%) are the data taken by LegacyDAQ for the tests and in /home/dragon/IACMiniCamSetup/DragonDaqM

Osaka DragonDaqM > du -sh .
417G

So I moved the data taken by LegacyDAQ to /mnt/cs1/store/DragonDaqData temporary. (We could transfer those data on the Lustre sytem (/fefs/ on tcs) later.)

2018-11-21 Daniel K., Seiya take data with LegacyDAQ for EVB tests

Julien wants to use raw data of full camera for EVB debug tests. We took data with LegacyDAQ by random trigger(300Hz), which is digital pedestal trigger TIB generated. These files are in /mnt/cs1/store/DragonDaqData/Data20181121.

I wanted to take 30min data(300Hz*(60*30)=540,000 events), but the disk in osaka server went to be full during the test. The size of each file is ~219MB, which is equivalent to ~168,000 events and ~10min data.


2018-11-21 Seiya how to run again the network interface

Some network at osaka server sometimes stopped running.

p2p2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 10.1.6.192  netmask 255.255.255.128  broadcast 10.1.6.255
        inet6 fe80::a236:9fff:fef0:ccd6  prefixlen 64  scopeid 0x20<link>
        ether a0:36:9f:f0:cc:d6  txqueuelen 1000  (Ethernet)
        RX packets 68478703  bytes 95858361688 (89.2 GiB)
        RX errors 1  dropped 9  overruns 0  frame 1
        TX packets 30112278  bytes 1622848602 (1.5 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

At that time, we should do for restart running;

  • sudo ifconfig <name of interface> down
  • sudo ifconfig <name of interface> up
2018-11-21 Otger, Daniel K., Seiya ECC went to error state

I used ClusCo@tcs01 for the monitoring, all the plots except "Amp. Temp" was updated indeed. After that, I did init7 from ClusCo@cacooperator and waited the update of "Amp. Temp" plot. At that time, ClusCo@tcs01 showed timeout, so I realized I cannot ping these modules and relay current went to 0 and ECC state went to error state(4). I powered up again and ECC status went to 2(ready) as usual, but relay current was 0.

Taka explained why relay current was still 0 as below;

When ECC goes to Error state, relay modules are also in a strange state. You need to reset relay modules as well.
 However, even if you go to "safe" state in ECC, relays are still powered (not bus bars, but relay modules). That means, "safe" does not reset relays.

Lea explained why ECC went to error as below;

Maybe what is possible also is that you lost the slow control connection during few seconds and then get it back without realising. Then If the modules are ON and that we lost the slow control connection, ECC goes to error and the relay current will remain at 0 as Taka explained.

We did hardware reset three times(15:00, 15:50, 16:45), but the situation was same. This ECC error state seemes to be caused by loss of heart beat of CaCo. We survived without CaCo (directory use ECC) for data taking today.


1228269 [PublishTask-com.prosysopc.ua.client.UaClient@166f6c4f] ERROR com.prosysopc.ua.client.UaClient - Exception in ServerStatusListener
java.lang.ClassCastException: cat.ifae.cta.opcua.dataaccess.uaobjects.OPCUAVariable$DataInformation cannot be cast to java.lang.Integer
	at cat.ifae.cta.cameracontrol.server.base.clients.ecc.OPCUAECCControl$ECCVariableStatus.update(OPCUAECCControl.java:25)
	at java.util.Observable.notifyObservers(Observable.java:159)
	at cat.ifae.cta.opcua.dataaccess.basicobjects.BasicCallbackVariable$ObservableVariable.setValue(BasicCallbackVariable.java:36)
	at cat.ifae.cta.opcua.dataaccess.uaobjects.OPCUAAssembly._newStateWarn(OPCUAAssembly.java:533)
	at cat.ifae.cta.opcua.dataaccess.uaobjects.OPCUAAssembly.consumeMessage(OPCUAAssembly.java:526)
	at cat.ifae.cta.opcua.dataaccess.uaobjects.OPCUAServerStatusListener.statusChanged(OPCUAServerStatusListener.java:59)
	at cat.ifae.cta.opcua.dataaccess.uaobjects.OPCUAServerStatusListener.onStateChange(OPCUAServerStatusListener.java:33)
	at com.prosysopc.ua.client.UaClient.a(Unknown Source)
	at com.prosysopc.ua.client.UaClient.updateServerStatus(Unknown Source)
	at com.prosysopc.ua.client.UaClient$a.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:745)
1228371 [PublishTask-com.prosysopc.ua.client.UaClient@166f6c4f] WARN com.prosysopc.ua.client.Subscription - Server sent a previously acknowledged sequence number 0 for Subscription 47786
1228372 [PublishTask-com.prosysopc.ua.client.UaClient@166f6c4f] INFO org.opcfoundation.ua.transport.tcp.io.SecureChannelTcp - 47856 Closed
1228372 [PublishTask-com.prosysopc.ua.client.UaClient@166f6c4f] INFO org.opcfoundation.ua.transport.tcp.io.TcpConnection - /10.1.4.66:4841 Closed
1228373 [TcpConnection/Read] INFO org.opcfoundation.ua.transport.tcp.io.TcpConnection - /10.1.4.66:4841 Closed (expected)

2018-11-21 Otger, Daniel K., Seiya dhcpd server for TIB restart

DHCPd server for TIB stopped due to the shutdown of tcs01 yesterday, so we activated the server as below,

ifae@tcs01 ~]$ sudo service dhcpd status
Redirecting to /bin/systemctl status  dhcpd.service
● dhcpd.service - DHCPv4 Server Daemon
   Loaded: loaded (/usr/lib/systemd/system/dhcpd.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:dhcpd(8)
           man:dhcpd.conf(5)
[ifae@tcs01 ~]$ sudo service dhcpd start
Redirecting to /bin/systemctl start  dhcpd.service
[ifae@tcs01 ~]$ sudo service dhcpd status
Redirecting to /bin/systemctl status  dhcpd.service
● dhcpd.service - DHCPv4 Server Daemon
   Loaded: loaded (/usr/lib/systemd/system/dhcpd.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2018-11-21 09:14:08 WET; 2s ago
     Docs: man:dhcpd(8)
           man:dhcpd.conf(5)
 Main PID: 453 (dhcpd)
   Status: "Dispatching packets..."
   CGroup: /system.slice/dhcpd.service
           └─453 /usr/sbin/dhcpd -f -cf /etc/dhcp/dhcpd.conf -user dhcpd -group dhcpd --no-pid

Nov 21 09:14:08 tcs01 dhcpd[453]: All rights reserved.
Nov 21 09:14:08 tcs01 dhcpd[453]: For info, please visit https://www.isc.org/software/dhcp/
Nov 21 09:14:08 tcs01 dhcpd[453]: Not searching LDAP since ldap-server, ldap-port and ldap-base-dn were not specified in...ig file
Nov 21 09:14:08 tcs01 dhcpd[453]: Wrote 0 deleted host decls to leases file.
Nov 21 09:14:08 tcs01 dhcpd[453]: Wrote 0 new dynamic host decls to leases file.
Nov 21 09:14:08 tcs01 dhcpd[453]: Wrote 4 leases to leases file.
Nov 21 09:14:08 tcs01 dhcpd[453]: Listening on LPF/ens1f0/a0:36:9f:eb:51:34/10.1.0.0/16
Nov 21 09:14:08 tcs01 dhcpd[453]: Sending on   LPF/ens1f0/a0:36:9f:eb:51:34/10.1.0.0/16
Nov 21 09:14:08 tcs01 dhcpd[453]: Sending on   Socket/fallback/fallback-net
Nov 21 09:14:08 tcs01 systemd[1]: Started DHCPv4 Server Daemon.
Hint: Some lines were ellipsized, use -l to show in full.
2018-11-20 Daniel K., Seiya ClusCo monitoring restart ClusCo monitoring map was not updated after the shutdown of tcs01. We contacted with Carlos and Carlos and they restarted it again.Now it works.
2018-11-20 TCS01 shutdown One of the memory cards of tcs01 is damaged and will be replaced

by an authorized technician today starting 9am La Palma time. We will shutdown the server before that and once the card is exchanged we start it up again.

2018-11-19 Seiya, Daniel K. cannot connect with some modules With the configuration2(100Hz,ROI=1024) we could not connect some modules(IP10.1.6.148-173) and they still busy(busy state=1).

After the re-initianlization, this problem disappeared.

2018-11-19 Seiya, Daniel K. Test pulse data with DragonDaqM(LegacyDAQ) We took test pulse datat with the following conditions;

1) 300Hz, ROI=1024, trigger was generated by mod265 (for reproducing the problem)

  • File name is "TP300HzTrigMod265RD1024Delay3028RD1024_***"

2) 100Hz, ROI=1024, trigger was generated by mod265 (suggested by Taka)

  • File name is "TP100HzTrigMod265RD1024Delay3028RD1024_***"

3) 300Hz, ROI=1024, trigger was generated by mod265 (suggested by Taka)

  • I sent each commands by hand and checked the registers(register8 & scalar) after PPS disable.
  • It seems PPS disable worked well.
  • File name is "TP300HzTrigMod265RD1024Delay3028_CHECKEDRD1024***".

4) 100Hz, ROI=1024, trigger was generated by mod265

  • I set test pulse frequency before PPS synchronization.
  • File name is "TP300HzTrigMod265RD1024Delay3028_TPconfigSynchroRD1024***".
2018-11-19 Seiya, Daniel K. 24V supply problem We powered up the camera with the usual procedure, but only one busbar(the 4th one) worked and others didn't work. We tried this procedure again, but the result is the same(only the 4th busbar worked).So we switched off and on the camera breaker around 15pm. Fan didn't start to work at first, so I switched on and off the breaker again and fan started to work. After that we can power up the whole cameras.


2018-11-13 Mitsunari, Daniel K. Software deployment All the setup (except the uaexpert for ecc, tib and ucts) to control, monitor and take data with the camera was moved to the LST_CALP iMac (+ 1 screen) of the commissioning container.
2018-11-12 Mitsunari, Daniel K. Test pulse data with DragonDaqM Test pulse data were taken by DragonDaqM triggering by the module 264, which did not have a test pulse on 11-09.
2018-11-12 Mitsunari, Satoshi Connect tcs07 to White Rabbit WR switch management port and Management switch (mgtsw2 port 42) are connected by a Ethernet cable. Mitsunari tried to change the IP of the WR switch to 10.200.10.140, which is in VLAN 1001, but I failed. The WR interface file dot-config was not found in spite of the WR manual. Even when we created the file by ourselves, it was lost after rebooting.
2018-11-12 Mitsunari, Daniel K., Carlos Diaz Software deployment Installing and compiling caco, cacoconsole, cacogui on tcs01 under /home/ifae/development. Compiling /home/ifae/clusco on tcs01 and adapting monitoring from CIEMAT. Setting up one additional screen for monitoring to the imac (monitoring computer), adding two forms (one for powering on the camera, one for shutting it down) to be filled by the operators.
2018-11-09 Mitsunari, Daniel K. Test pulse data with EVB Test pulse data were taken by EVB waiting PPS reaching all modules for 2 s. For the read depth 40, DAQ seemed to be successful. For the read depth 1024, however, the data were not stored.
2018-11-09 Mitsunari, Daniel K. Test pulse data with DragonDaqM Test pulse data were taken by DragonDaqM waiting PPS reaching all modules for 2 s. The waveform data of six modules besides the central one were checked, and five modules had test pulses though the other module (No. 0) did not.
2018-11-03 Mitsunari Test pulse injection timing Test pulse data were taken with L1 threshold which all modules can produce camera trigger. According to the data, the timing of test pulse injection distributes aver ~70 ms. Test pulse injection rate: 1 Hz, Read depth: 40, Sampling speed: 1 GHz
2018-11-03 Mitsunari Test pulse injection timing Test pulse data were taken with L1 threshold which all modules can produce camera trigger. According to the data, the timing of test pulse injection distributes aver ~70 ms. Test pulse injection rate: 1 Hz, Read depth: 1, Sampling speed: 5 GHz
2018-11-02 Mitsunari Test pulse data with EVB Data for investigating the test pulse issue were taken with EVB but seems to be failed. This should be inspected.

Pulse rate: 300Hz, Read depth: 1024, Event number: ~9000, /fefs/onsite/data/20181102

2018-11-01 Mitsunari Large data with random trigger Data of ~10^5 events were taken for pedestal random tirgger, EVB, the read depth 40 slices, and the dealy 3528 ns. The data are stored in /fefs/onsite/data/20181101.
  • 1kHz: Run 0001.0275-0001.0288
  • 2kHz: Run 0001.0289-0001.0315
2018-11-01 Mitsunari Avoiding TIB State 255 The TIB state can go to 5 without resetting at state 255 by a combination of reseting TIB at state 0 and configuring dragons without resetting BPs.
  • ECC->SetMode(2)
  • TIB->Reset()
  • TIB->DisablePPS()
  • TIB->ResetRun()
  • ClusCo->Main->@config/init7_woBPreset
  • UCTS->XMLConfiguration
  • UCTS->Start()
  • TIB->EnableTrigger()

Mitsunari repeated this procedure four times and succeeded for all of them. DAQ also seemed to be successful at the last trial. (At the first three trials, DAQ failed because of another reason.)

2018-10-31 Mitsunari TIB State 255 problem init7 without BP reset at the beginning was tested. The first trial failed, namely, the state turned out to be 255. However, TIB state directly went to 5 In the second trial, when TIB was Reset just after turning on Camera. This behavior should be confirmed later.
2018-10-31 Mitsunari Check for test pulse synchronization It should be confirmed whether TenMHz counter vaue is idential among the modules for each test pulse event. Data for the check were taken by DragonDaqM with 300Hz. L1 threshold was set so that only the central module sent triggers. The data were stored in /home/dragon/IACMiniCamSetUp/DragonDaqM/Data20181031. TenMHz counter appeared to be synchronized, but it should be confirmed.
2018-10-31 Oscar, Mitsunari PDB Fixation

PDB fixation: the fixation of the from plate is done know throw a screw and nut fixed to the back plate using a mixture to attach metals (Pattex Nural 21) and an additional nut to fix the front plate.

We have started Modules twice with one hour break in between. Both times all Dragons and BP went up.

2018-10-30 Taka, Mitsunari, Julien, Dirk Random trigger runs with EVB

Two runs (#30, #31) taken at various trigger rates as documented in Run Catalog and Slack.

Corrected pixel map implemented (spiral numbering).


2018-10-29 Oscar, Taka, Mitsunari Power up

The Dragon with IP 10.1.6.28 (3rd column starting by the left from outside, 5th modules from below) was put in the busbar powered by relay 1 instead of 0. In exchange, module in 4th column 5ht from b below was put in the relay 0 instead of relay 1. Camera was powered up only once and all modules and BP went up.


2018-10-27 Taka, Mitsunari Random Trigger

We took the random trigger. Following the instruction with Lea, random trigger could be easily produced. With DragonDaqM,

300 Hz injection -> 300 Daq rate.

1k Hz-> 783 Hz

3k Hz-> 1162 Hz

6.5k Hz -> 1303 Hz.

With EVB, we first tried with 6.5 kHz. Then EVB crashed because of buffer full. But busy state of modules was 03, which means EVB are connected and modules were busy. To recover from this state, we had to reboot Dragons. A few minutes later, Carlos Diaz called us. The current consumption at bus bars are ~10Amp higher than usual. Normally 25-27 Amp but after rebooting Dragons, it was 35 Amp. We shutdown the 24V. After 10 min or so, Carlos allowed us to restart. All Dragons could be communicated from cacoserver, but not from Osaka. ip link set p*p* down/up didn't help. We rebooted Osaka. Then Osaka could ping to all (but one) modules. However, EVB didn't work. Later we learned from Dirk and Julien that we had to do

sudo modprobe -r ixgbe; sudo modprobe ixgbe


2018-10-27 Oscar, Laia , Taka, Mitsunari Power up

After checking that Dragon and BP regulators can stand input voltage above 30 V, we increased the power provide by the Power Supplies to 27.5V (the same for the 8 Power Supplies).

With this configuration, the voltage while ramping up increase up 20.3 V and then only decreases to 19.8 V for about 1 ms. This should be completely find for the Dragons.

We power up the camera with the ECC 10 times. All BP went up all times. Only one Dragon (always the same) does not power up the first time after a ~1 hour break (tried two times), after this first power up all Dragons power up.

2018-10-26 Taka, Mitsunari TIB state machine.

We tried to solve the "State 255" problem in TIB. Luis Angel suggested to configure modules at state 2. We followed his instruction, but we reached state 255. So we tried modules configuration at state 0. Same result. We tried module configuration at state 4, resulting in the same state 255.

We also tried to see the test pulse postion to the center of the readout window. But we could not see the test pulse at all. Delay setting in TIB or backplane is not correct.


2018-10-26 Oscar, Laia , Taka, Mitsunari Power up

The drop in the voltage is due to a current limit in the circuitry of the relay. Increasing the voltage of the power supplies should rise the value of the dip in the voltage so that it does not reach 18V.

We measure again the transients for relay 0 with Power Supply at 24.98 V as reference. We increase the voltage of Power Supplies to 25.25 V, the dip is about 100 mV higher.

2018-10-25 Taka, Mitsunari Yusuke Event Mixing

We understood the origin of EventMixing. It is due to the slow control command "Dragon - Start" after "Enable Trigger" in TIB. "Enable Trigger" should have been after "Dragon Start". This is dangerous actually. Mistake will be noticed only during analysis.


2018-10-25 Oscar , Laia Power up

No water was found inside the camera. We measure the voltage at the output of the Redundancy modules: 24.98 V We connect a Current sensor between master bus bar and relay 0. We power up relay 0 and measure transient for both current and voltage: - Voltage shows a drop of around 1.5 V once it arrive at 20V, which is afterward (4 ms) recovered and keeps increasing until about 24.5 V - Current increases steadily with a small slope change on the drop on the voltage happens. It also show a drop of about 30% when the voltage reach 24.5 V that it recovers after about 80ms after

The voltage reduction for 4ms brings the voltage very close to 18V, and some times may go slightly down.

The same is observed in relay 1.

2018-10-21 Taka, Mitsunari, Yusuke Timing Calibration.

We tried to see the test pulse in the center of window. But we did not succeed. DAQ was with EthDisp from Taka's macbook through slow control network. We need to understand the delay in TIB and backplane. Since it was already 5:50 pm, (though we announced that we use camera until 5:00 pm) we had to shutdown. We kept 230 and 400V on, chiller on, only 24V off.

2018-10-21 Taka, Mitsunari, Yusuke Event Mixing Test

To confirm again the event mixing problem, we took data with the LegacyDaq. After init7.uic, we injected the test pulse in the central module with 300 Hz. TIB could see the rate properly. We took 20000 events. After that, we tried to take data with EVB, but it was not successful. EVB could not connect to all modules. We had the same problem a few times in a row. One of the reasons was dead ports in Osaka. Sometimes, ports in Osaka sleep without obvious reason. This is actually critical problem. We need to investigate further. Finally we gave up to take data with EVB.

2018-10-21 Taka, Mitsunari, Yusuke TIB/UCTS study

After power up, we tried to initialize TIB. But state didn't reach "5". After state 4, if we enable trigger, state went to 255. We knew that the RJ45 cable on the WR was damaged by the rack door. We changed it to new cable. We also used a different port in WR (port 8->5).And we reset TIB. Then with the standard procedure, state reached 5. We were happy. Just to be sure, we changed back to the damaged cable and retried. Then state was again 5. So, the reason was not the cable. But "Reset" of TIB was the key.

After initializing PMT modules, TIB didn't work well. It didn't send back the trigger. Since temperature was too high (BP 35 deg.) We had to switch off the 24V. During this break, we changed the WR port from 5 to 8.

After power up, we repeated the procedure. Again, TIB didn't send back the trigger. But, TIB reset helped. So, currently, startup recipe is that 0->1->2->3->4->255->TIB Reset->0->1->2->3->4->5->configure modules -> TIB Reset -> 0 -> 1 ->2 ->3 ->4 ->5.


2018-10-21 Taka, Mitsunari, Yusuke Restart the Camera.

Before powering up 400V for the first time since last Tuesday, we examined the camera visually. Camera is properly parked. There was a water condensation on the camera body. The platform is not perfectly closed. There was a 2 cm gap between left and right. But it is not dangerous for us. At 11:45, we applied 400 V putting the breaker at the Drive container. After 15 min of stabilization, we started 24V from ECC (state ready). Then, we realized that TIB and UCTS do not respond on Ping. It was because dhcpd on tcs01 was dead. Also, uctsd on Osaka was dead We restarted dhcpd and uctsd and switched off and on 24V. Then, TIB, UCTS could be booted.|


2018-10-15 Dirk UCTSd dead.
● uctsd.service - Execute the UCTS OPC-UA server
   Loaded: loaded (/etc/systemd/system/uctsd.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Di 2018-10-16 13:22:28 WEST; 2h 59min ago
  Process: 152844 ExecStart=/home/dragon/ucm_temp/ucts_opcua_server.sh (code=exited, status=134)
 Main PID: 152844 (code=exited, status=134)

Okt 16 13:22:28 osaka ucts_opcua_server.sh[152844]: (MOS) : Info : 2018-10-13.12:30:47 : Connected to Server : opc.tcp://osaka:48010
Okt 16 13:22:28 osaka ucts_opcua_server.sh[152844]: 
Okt 16 13:22:28 osaka ucts_opcua_server.sh[152844]: (MOS) : Info : 2018-10-13.12:30:47 : Verification of MOS version with lappweb
Okt 16 13:22:28 osaka ucts_opcua_server.sh[152844]: 
Okt 16 13:22:28 osaka ucts_opcua_server.sh[152844]: ********************
Okt 16 13:22:28 osaka ucts_opcua_server.sh[152844]:  Press CTRL-C to shutdown server
Okt 16 13:22:28 osaka ucts_opcua_server.sh[152844]: /home/dragon/ucm_temp/ucts_opcua_server.sh: line 9: 152847 Aborted (core dumped) ./MOS_Device -d /MOS/plugins/Plugin_UCTS/UCTS.xml
Okt 16 13:22:28 osaka systemd[1]: uctsd.service: main process exited, code=exited, status=134/n/a
Okt 16 13:22:28 osaka systemd[1]: Unit uctsd.service entered failed state.
Okt 16 13:22:28 osaka systemd[1]: uctsd.service failed.

Restarted.

2018-10-15 Taka MOXA Switch connected SLOW control connection intact. Drive network can be used from remote tomorrow.
2018-10-16 Léa, Dirk, Julien, taka, Saiya, Mitsunari Modules deconection

- It happens two times today that after around 25 minutes, around 15 modules were not powered anymore whereas ECC was in state 2 and current in the pulse bar. At the newt switch ON, ALL powered

2018-10-16 Léa, Dirk, Julien, taka, Saiya, Mitsunari uaexpert deconnection

- Again, we lost uaexpert that was completely stuck so to have the monitoring back again the DataLogger are now written in /home/cacooperator/CoolingSystem/20181016_003 and 20181016_004

2018-10-16 Léa, Dirk, Julien, taka, Saiya, Mitsunari TIB issues

- TIB goes from state 0 to 4 but then when we enable trigger go to state 255 as the alarms vector

2018-10-16 Léa, Dirk, Julien, taka, Saiya, Mitsunari Small run summary

- 7 GotoSafe and GoToReady for the ECC due to too high temperatures so switch ON/Off of the Modules/BP:

1) All module ON, 2 BPs OFF associated to module 10.1.6.12 and 10.1.6.27

2) All module ON, 2 BPs OFF associated to module 10.1.6.24 and 10.1.6.27

3) All module ON, 1 BP OFF associated to module 10.1.7.171

4) All modules and BPs ON

5) All modules and BPs ON

6) All modules and BPs ON

7) All modules and BPs ON

8) All modules and BPs ON

9) All modules and BPs ON

10) All modules and BPs ON

11) All module ON, 2 BPs OFF associated to module 10.1.7.147 and 10.1.7.149

12) All module ON, 2 BPs OFF associated to module 10.1.7.147 and 10.1.7.149

13) ALL modules ON

14) ALL modules ON


2018-10-15 Dirk Charging Walkie-Talkies with our private mini-USB adapters, while waiting for the real charger to reappear Alternative: $8.99 on Amazon


2018-10-15 Léa, Dirk, Julien, taka, Saiya, Mitsunari Slow control and uaexpert deconnection
- Slow control connection lost in ready mode so then no more current in the pulse bar. GotOsafe GOtOready still no curent with the negative value in the pulse bar. We had to switch off and on the 233 and 400 V

- We lost uaexpert that was completely stuck so to have the monitoring back again the DataLogger are now written in /home/cacooperator/CoolingSystem/20181015_005 and 20181015_006

2018-10-15 Léa, Dirk, Julien, taka, Saiya, Mitsunari Small run summary

- 7 GotoSafe and GoToReady for the ECC due to too high temperatures so switch ON/Off of the Modules/BP:

- Cut busy propagation from BP, dragon on local clock

1) 1 module OFF: 10.1.6.28, 2 BPs OFF associated to module 10.1.6.24 and 10.1.6.27

2) 1 module OFF: 10.1.6.28, 3 BPs OFF associated to module 10.1.6.24, 10.1.6.27 and 10.1.7.147

3) 1 module OFF: 10.1.6.28, 1 BP OFF associated to module 10.1.7.147

4) All modules ON, 2 BPs OFF associated to module 10.1.7.147 and 10.1.7.149

5) All modules ON and All BPs ON

- pb of internal/external trigger clock for the Dragon fixed: For DRS4, referential clock is now 10 MHz external clock. - Configuration of UCTS and TIB, two last runs taken with the TIB so with external clock, external trigger and busy propagation. 6) 1 module OFF: 10.1.5.16, 3 BPs OFF associated to module 10.1.6.27 and 10.1.7.146 and 10.1.7.149

7) 1 module OFF: 10.1.6.28, 1 BP OFF associated to module 10.1.6.28

8) ALL modules On


2018-10-15 Léa, Taka, Dirk, Daniel SLOW Control lost

While Camera was on, the SLOW control connection was interrupted in the Drive container to prepare connection of Drive/AMC network.

Consequently the EMC went to SAFE. But also the UaExpert interface was stuck (which is the current base for Camera monitoring). The setup was then restored as well as we could, including DataLogger function.

2018-10-15 Léa, Taka, Julien, Seiya, Dirk writing speed limitation in data taking

1 ZFW: validate speed 300 MB writing speed 8 ZFW: validate 8* 300 MB/s writing speed 16 ZFW: writing speed: 16*150 MB writing speed. Maybe problem due to the disk. To investigate

2018-10-15 Léa, Dirk, Julien Slow control deconnection+ disconnect from OPC-UA
2018-10-14 Léa, Dirk, Julien Small run summary

Runs0016-0019

- No TIB/UCTS

- Cut busy propagation from BP, dragon on local clock

- 7 GotoSafe and GoToReady for the ECC due to too high temperatures so 7 switch ON/Off of the Modules/BP

1) All modules ON, 2 BPs OFF associated to module 10.1.6.24 and 10.1.6.27

2) All modules ON, didn't check the BPs

3) All modules ON, didn't check the BPs

4) All modules ON, 1 BPs OFF associated to module 10.1.7.147

5) All modules ON, 2 BPs OFF associated to module 10.1.7.147 and 10.1.7.171

6) All modules ON, 1 BPs OFF associated to 10.1.7.147

7) All modules ON, 1 BP OFF associated to module 10.1.7.147


2018-10-14 Eric, Dirk All (DATA) fibres straight now!
  • There are straight and crossed fibre patch cords (AB->AB and AB->BA)! They are obviously used indifferently and mixed on our site. :-(
  • The fibres in the IC-PP that are connected to the couplers are all yellow! (No colour code to trace them.)
  • We have chosen the same convention as on the transceivers for input/output of the LC connectors
  • Problem/drawback: All fibres at the IC-PP are now reversed. Need to think/investigate that (last?) point.
  • All PP boxes now closed and secure. Should not be touched any more without agreement by INFRA experts!
Cisco-Transceiver 13927.jpg
2018-10-14 Dirk Direct measurement of TX lasers

INFO: Direct measurements can be done without danger for Photom-211

測定範囲 -70 ~ +5dBm  

according to datasheet. That is 3.16 mW to 0.1 µW.

2018-10-14 Léa, Dirk, Julien First full-cam data run up to 15kHz! That is what we would have liked to see last week.

Now it's Champagne time. :-)

2018-10-14 Julien, Eric Fibres DATAsp1-6 tested Optically, between DC and Cam. Data1-6spare measurements 20181014.jpg
2018-10-13 Léa, Dirk, Julien Run0015 Still no UCTS(/TIB); fibre broken between DC and Cam. Running with half-cam and two additional missing modules (BP problem): 6.24, 6.27.

- r0015 all events (at runstart), 300Hz and 10kHz, but ZFW problems (testing with 16 instances).

ALL Door knob! Falling apart from the CC door. Urgent action needed. (Bigger screw?)
Taka, Seiya, Mitsunari Fiber Check We checked optical connection between DC and Camera because some labels were lost due to UV damage. We checked Data2, Data 6, SlowControl and UCTS. Only UCTS had a problem (no splicing at Drive PP). The rest where OK.
Taka, Seiya, Mitsunari Labeling fibers We labeled optical fibers of the data (DATA 1 - 6) at patch panels in Drive contaniner and in IT container. The spare cables have not been done yet because a ribbon ran out.
Seiya, Mitsunari Connection validation We validated 12 optical fiber connection (No. 1-6, 13-18) from Drive coontainer to IT container. Strength is -35 to -38 dBm.
2018-10-12 Léa, Taka, Julien, Seiya, Dirk Runs0012 sqq. No UCTS/TIB today.

These runs have 3 modules missing (as identified in the preparatory phase: 6.21, 6.25, 6.28). According to a quick check, all EventNb=TriggerNb otherwise for all runs today. See RunCatalog for details.

Dirk Creation of logbook
Dirk, Taka, Julien, Seiya, Léa Data acquisition

- Pb with the ClusCo on tcs01. The root propagation for the BPs for the trigger doesn't work. Using exactly the same script it works on CacoOperator.

- We validate for 3 fibers the new connections to the dataswitch fiber. Eric is fixing the one missing or broken. So for now only the right part of the camera is used for data acquisition

- No TIB/ UCTS

- Few runs were taken with no external trigger from TIB. 3 Modules didn't appear busy but didn't sent any data. In those test the busy from the CBP was cut. Those missing modules have to be investigated in more details but due to a lot of slow control deconnection problem and high temperature in the camera it was not possible. Script used in CLusCo: init7_noextTrigger_Test.uic. To not cut the busy, name is: init7_noextTrigger_noBusyCut_Test.uic

- One try with no external trigger and clock but with the CBP delivering the clock and pps and using 10 MHz clock as default clock for the dragons. L1 local Trigger didn't generated. No we come back to a configuration of dragon on their local clock but this issue has to be investigated. Script used in CLusCo: init7_noextTriggerClock_Test.uic

Dirk, Taka, Julien, Seiya, Léa Too high temperatures in the Camera

- Limit fix to 27 for the aire temperature inside and 35 for the BP temperature. Pression also get some alarms

- During the day, due to high temperatures we have to gotosafe to wait for the camera to cool almost 11 times but never the BP max temperature went more than 34 degree. The air inside reach at the maximum 26.5.

Dirk, Taka, Julien, Seiya, Léa ECC lost connection (2 times!)

- In the afternoon, 3 lost of ECC slow control communication due to the interruption between IT-Container/Driver-Container. Miscomunnication with AMC people... First time, temperature was already high in the camera, we had to switch OFF the 233 and 400V for security reasons. Two other times, we get the ECC connection back quite fast and ECC was in the same state that when the connection was lost meaning state 2 ready. Just no more current in the pulse bar so we have to gotosafe and gotoready both times. After that the current was -247 in the pulse bar... Not understood for the moment - The second interruption happened, when the Moxa switch was reconnected, probably not correctly configured. It was disconnected again. Presently this impacts AMC and drive operation, until the Moxa can be reconnected.

Léa, Taka, Julien, Seiya, Dirk Discovered SLOW control fiber lost, fibers changed connection recover Interruption. Using UCTS section for replacement.
2018-10-11 Eric, Armand Cable splicing UCTS fibres ready and checked.
Léa, Taka, Julien, Seiya, Dirk DATA5-upstream broken Located between DC-PP. and IC-PP. Eric is going to have a look on Friday, when working on the other (spare) fibres.
Léa, Taka, Julien, Seiya, Dirk Found correct order of DATA1-DATA6 We eventually found that the fibres DATA1-6 were connected in (exactly) wrong order to the camera, which lead to a mismatch of switches/modules with respect to interfaces/addresses in osaka.

This is an item for our "learned lessons": The indoor fibres had been labelled (switch-interface), but stayed in Mirca. The new fibres had been confectioned at ORM, and labels had to be "guessed" in one way or the other.

Glossary

  • CC = Commissioning Container (present LST1 Control Room)
  • DC = Drive Container
  • IC = IT-Container