[Termtools] run script errors on StarExec

Florian Frohn florian.frohn at cs.rwth-aachen.de
Mon Aug 14 14:37:34 CEST 2017


Dear all,

in our tests for TermComp on StarExec, between 1% and 5% of all pairs
fail with status "run script error" (I attached a log file). This
behavior is not reproducible (the same pairs usually work perfectly fine
in the next benchmark, but others fail) and even occurs with a trivial
configuration that just echos "MAYBE". It sounds similar to the
following bug report:

https://www.tapatalk.com/groups/starexec/job-pair-killed-immediately-setting-up-cpuset-cgro-t141.html

Does anyone else have similar problems? Is the workaround proposed in
the bug report (rerunning pairs with run script errors) going to be
applied at the competition or is there another workaround?

Best
Florian


-------------- next part --------------
tomcat
STAREXEC Job Log
(C) 2017 The University of Iowa
=====================================
date: Mon Aug 14 06:46:12 CDT 2017
user: 77
pair id: 303724923
execution host: n090.star.cs.uiowa.edu
benchmarking framework: BENCHEXEC
08/14/17 06:46:12 AM CDT: Using benchexec.

08/14/17 06:46:12 AM CDT: sent job status 4 to starexec2.star.cs.uiowa.edu
08/14/17 06:46:12 AM CDT: got the right to use the lock for sandbox 1
mkdir: cannot create directory ‘/export/starexec/sandboxlock.lock’: File exists
08/14/17 06:46:12 AM CDT: found the pairID = 14967
08/14/17 06:46:12 AM CDT: isPairRunning called on pair pid = 14967
08/14/17 06:46:12 AM CDT: isInteger called on 14967
08/14/17 06:46:12 AM CDT: 14967 -bash /cluster/gridengine-8.1.8/default/spool/n090/job_scripts/8012161
08/14/17 06:46:12 AM CDT: 
08/14/17 06:46:12 AM CDT: found that the pair is not running in sandbox 1
08/14/17 06:46:12 AM CDT: doing rm all on  /export/starexec/sandboxlock.lock
08/14/17 06:46:12 AM CDT: able to get sandbox 1!
08/14/17 06:46:12 AM CDT: Found data for this process 15551 -bash /cluster/gridengine-8.1.8/default/spool/n090/job_scripts/8012647
08/14/17 06:46:12 AM CDT: putting this job into sandbox 1 15551
08/14/17 06:46:12 AM CDT: isInteger called on 1
08/14/17 06:46:12 AM CDT: sent Node Id n090.star.cs.uiowa.edu to starexec2.star.cs.uiowa.edu in sandbox 1
08/14/17 06:46:12 AM CDT: cleaning execution host workspace...
08/14/17 06:46:12 AM CDT: WORKING_DIR is /export/starexec/sandbox
08/14/17 06:46:12 AM CDT: Doing rm all on /export/starexec/sandbox/output (output-directory)
08/14/17 06:46:12 AM CDT: Doing rm all on /export/starexec/sandbox/solver (local-solver-directory)
08/14/17 06:46:12 AM CDT: Doing rm all on /export/starexec/sandbox/tmp (local-tmp-directory)
08/14/17 06:46:12 AM CDT: Doing rm all on /export/starexec/sandbox/benchinputs (bench-inputs)
08/14/17 06:46:12 AM CDT: Doing rm all on /export/starexec/sandbox/benchmark (local-benchmark-directory)
08/14/17 06:46:12 AM CDT: Doing rm all on /export/starexec/sandbox/savedoutput (saved-output-dir)
08/14/17 06:46:12 AM CDT: about to check whether to delete lock files given 1
08/14/17 06:46:12 AM CDT: execution host n090.star.cs.uiowa.edu cleaned
08/14/17 06:46:12 AM CDT: set start time for pair id = 303724923
08/14/17 06:46:12 AM CDT: there are this many benchmark inputs 0
08/14/17 06:46:12 AM CDT: decoding all base 64 encoded strings
08/14/17 06:46:12 AM CDT: node memory in megabytes = 257749
08/14/17 06:46:12 AM CDT: node memory after accounting for pairs allowed to run on node = 128874
truncating max memory from requested 131072 to 128874
08/14/17 06:46:12 AM CDT: has depends = 0
08/14/17 06:46:12 AM CDT: dependenciesExist was 1
08/14/17 06:46:12 AM CDT: copying benchmark dependencies to execution host...
08/14/17 06:46:12 AM CDT: benchmark dependencies copy complete
08/14/17 06:46:12 AM CDT: the temp directory is /tmp/8012647.1.all.q/starexec_base64.Hu29F4GA
08/14/17 06:46:12 AM CDT: the number of stages is 1
08/14/17 06:46:12 AM CDT: solver exists in cache at /export/starexec/solvercache/2017-08-14 06:44:38.0/12687
08/14/17 06:46:12 AM CDT: copying solver: Doing safeCpAll on /export/starexec/solvercache/2017-08-14 06:44:38.0/12687 to /export/starexec/sandbox/solver
08/14/17 06:46:13 AM CDT: solver copy complete
08/14/17 06:46:13 AM CDT: chmod gu+rwx on the solver directory on the execution host (/export/starexec/sandbox/solver)
08/14/17 06:46:13 AM CDT: copying runSolver to execution host...
08/14/17 06:46:13 AM CDT: runsolver copy complete
-rwxr-x---. 1 tomcat star-dev 114628 Aug 14 06:46 /export/starexec/sandbox/solver/bin/runsolver
08/14/17 06:46:13 AM CDT: copying benchmark /home/starexec/Benchmarks/6/20150730-15.26.20.092/TPDB-10.3/Runtime_Complexity_Full_Rewriting/AProVE_04/Liveness8.xml to /export/starexec/sandbox/benchmark/theBenchmark.xml on execution host...
08/14/17 06:46:13 AM CDT: benchmark copy complete
08/14/17 06:46:13 AM CDT: execution host solver configuration verified
08/14/17 06:46:13 AM CDT: execution host benchmark verified
08/14/17 06:46:13 AM CDT: sandboxing workspace with first sandbox user
sudo: no tty present and no askpass program specified
chmod: changing permissions of ‘/export/starexec/sandbox/benchmark/theBenchmark.xml’: Operation not permitted
/export/starexec/sandbox:
total 24
drwxrwsr-x. 2 sandbox star-dev 4096 Aug 14 06:46 benchinputs
drwxrwsr-x. 2 sandbox star-dev 4096 Aug 14 06:46 benchmark
drwxrwsr-x. 3 sandbox star-dev 4096 Aug 14 06:46 output
drwxrwsr-x. 2 sandbox star-dev 4096 Aug 14 06:46 savedoutput
drwxrwsr-x. 9 sandbox star-dev 4096 Aug 14 06:46 solver
drwxrwsr-x. 2 sandbox star-dev 4096 Aug 14 06:46 tmp

/export/starexec/sandbox/benchinputs:
total 0

/export/starexec/sandbox/benchmark:
total 8
-rw-r-----. 1 sandbox star-dev 4611 Aug 14 06:46 theBenchmark.xml

/export/starexec/sandbox/output:
total 4
drwxrwsr-x. 2 sandbox star-dev 4096 Aug 14 06:46 output_files

/export/starexec/sandbox/output/output_files:
total 0

/export/starexec/sandbox/savedoutput:
total 0

/export/starexec/sandbox/solver:

...

/export/starexec/sandbox/tmp:
total 0
08/14/17 06:46:13 AM CDT: listing contents of sandbox...
total 24
drwxrwsr-x. 2 sandbox star-dev 4096 Aug 14 06:46 benchinputs
drwxrwsr-x. 2 sandbox star-dev 4096 Aug 14 06:46 benchmark
drwxrwsr-x. 3 sandbox star-dev 4096 Aug 14 06:46 output
drwxrwsr-x. 2 sandbox star-dev 4096 Aug 14 06:46 savedoutput
drwxrwsr-x. 9 sandbox star-dev 4096 Aug 14 06:46 solver
drwxrwsr-x. 2 sandbox star-dev 4096 Aug 14 06:46 tmp
08/14/17 06:46:13 AM CDT: doing stage index 0
08/14/17 06:46:13 AM CDT: stage number : 1
08/14/17 06:46:13 AM CDT: cpu timeout : 1200
08/14/17 06:46:13 AM CDT: clock timeout : 300
08/14/17 06:46:13 AM CDT: global cpu timeout : 1200
08/14/17 06:46:13 AM CDT: global wallclock timeout : 300
08/14/17 06:46:13 AM CDT: extra time for job pair after wallclock timeout : 120
08/14/17 06:46:13 AM CDT: mem limit : 128874
08/14/17 06:46:13 AM CDT: solver id : 12687
08/14/17 06:46:13 AM CDT: solver name : AProVE 0a22cfd
08/14/17 06:46:13 AM CDT: solver path : /export/starexec/solvercache/2017-08-14 06:44:38.0/12687
08/14/17 06:46:13 AM CDT: solver timestamp : 2017-08-14 06:44:38.0
08/14/17 06:46:13 AM CDT: config name : trivial
08/14/17 06:46:13 AM CDT: pre processor path : 
08/14/17 06:46:13 AM CDT: post processor path : /home/starexec/processor_scripts/1128/20170724-10.06.18.416/plain.5
08/14/17 06:46:13 AM CDT: space id : 
08/14/17 06:46:13 AM CDT: bench path : /home/starexec/Benchmarks/6/20150730-15.26.20.092/TPDB-10.3/Runtime_Complexity_Full_Rewriting/AProVE_04/Liveness8.xml
08/14/17 06:46:13 AM CDT: benchmark suffix : 
08/14/17 06:46:13 AM CDT: argument string : 
08/14/17 06:46:13 AM CDT: results interval : 0
08/14/17 06:46:13 AM CDT: stdout save option : 2
08/14/17 06:46:13 AM CDT: additional output save option : 1
08/14/17 06:46:13 AM CDT: user disk quota : 42949672960
Checking who current user is using: whoami ...
tomcat
Calling killDeadlockedJobPair
KILL_DEADLOCKED_JOB_PAIR_PID = 15735
/export/starexec/sandbox/solver/bin
08/14/17 06:46:13 AM CDT: Wallclock timeout for current jobpair = 300
08/14/17 06:46:13 AM CDT: listing contents of the output directory
total 4
drwxrwsr-x. 2 sandbox star-dev 4096 Aug 14 06:46 output_files
08/14/17 06:46:13 AM CDT: Extra time given to jobpair on top of wallclock timeout before we kill it = 120
08/14/17 06:46:13 AM CDT: Setting up cpuset cgroup...
08/14/17 06:46:13 AM CDT: User whose job will be killed if it exceeds it's runtime = sandbox
/cluster/gridengine-8.1.8/default/spool/n090/job_scripts/8012647: line 431: /sys/fs/cgroup/cpuset/system.slice/benchexec-cgroup.service/tasks: No such file or directory
Mon Aug 14 06:46:13 CDT 2017
08/14/17 06:46:13 AM CDT: Setting up cpuacct cgroup...
/cluster/gridengine-8.1.8/default/spool/n090/job_scripts/8012647: line 433: /sys/fs/cgroup/cpuacct/system.slice/benchexec-cgroup.service/tasks: No such file or directory
08/14/17 06:46:13 AM CDT: Setting up memory cgroup...
/cluster/gridengine-8.1.8/default/spool/n090/job_scripts/8012647: line 435: /sys/fs/cgroup/memory/system.slice/benchexec-cgroup.service/tasks: No such file or directory
08/14/17 06:46:13 AM CDT: Setting up freezer cgroup...
/cluster/gridengine-8.1.8/default/spool/n090/job_scripts/8012647: line 437: /sys/fs/cgroup/freezer/system.slice/benchexec-cgroup.service/tasks: No such file or directory
08/14/17 06:46:13 AM CDT: Running runexec.
runexec --no-container --memlimit 128874000000 --timelimit 1200 --walltimelimit 300  --cores 0-3 --output /export/starexec/sandbox/output/stdout.txt /export/starexec/sandbox/solver/bin/starexec_run_trivial /export/starexec/sandbox/benchmark/theBenchmark.xml /export/starexec/sandbox/output/output_files  
2017-08-14 06:46:13,662 - WARNING - Cannot use cgroup hierarchy mounted at /sys/fs/cgroup/cpu,cpuacct/ for subsystem cpuacct, reason: Permission denied. If permissions are wrong, please run "sudo chmod o+wt '/sys/fs/cgroup/cpu,cpuacct/'".
2017-08-14 06:46:13,662 - WARNING - Without cpuacct cgroups, cputime measurement and limit might not work correctly if subprocesses are started.
2017-08-14 06:46:13,662 - WARNING - Cannot use cgroup hierarchy mounted at /sys/fs/cgroup/freezer/ for subsystem freezer, reason: Permission denied. If permissions are wrong, please run "sudo chmod o+wt '/sys/fs/cgroup/freezer/'".
2017-08-14 06:46:13,662 - WARNING - Cannot reliably kill sub-processes without freezer cgroup.
2017-08-14 06:46:13,662 - WARNING - Cannot use cgroup hierarchy mounted at /sys/fs/cgroup/memory/ for subsystem memory, reason: Permission denied. If permissions are wrong, please run "sudo chmod o+wt '/sys/fs/cgroup/memory/'".
2017-08-14 06:46:13,663 - WARNING - Cannot measure memory consumption without memory cgroup.
2017-08-14 06:46:13,663 - WARNING - Cannot use cgroup hierarchy mounted at /sys/fs/cgroup/cpuset/ for subsystem cpuset, reason: Permission denied. If permissions are wrong, please run "sudo chmod o+wt '/sys/fs/cgroup/cpuset/'".
2017-08-14 06:46:13,663 - INFO - Starting command /export/starexec/sandbox/solver/bin/starexec_run_trivial /export/starexec/sandbox/benchmark/theBenchmark.xml /export/starexec/sandbox/output/output_files
2017-08-14 06:46:13,663 - INFO - Writing output to /export/starexec/sandbox/output/stdout.txt
Cannot limit CPU cores without cpuset cgroup.
08/14/17 06:46:13 AM CDT: Runsolver finished. Killing the defensive killall function.
08/14/17 06:46:13 AM CDT: listing contents of the output directory
total 4
drwxrwsr-x. 2 sandbox star-dev 4096 Aug 14 06:46 output_files
-rw-r--r--. 1 tomcat  star-dev    0 Aug 14 06:46 var.out
-rw-r--r--. 1 tomcat  star-dev    0 Aug 14 06:46 watcher.out
/cluster/gridengine-8.1.8/default/spool/n090/job_scripts/8012647: line 612: 15735 Terminated              killDeadlockedJobPair $CURRENT_STAGE_WALLCLOCK $JOB_PAIR_EXTRA_TIME $SANDBOX_PARAM
08/14/17 06:46:13 AM CDT: recording header of job pair output
head: cannot open ‘/export/starexec/sandbox/output/stdout.txt’ for reading: No such file or directory
08/14/17 06:46:13 AM CDT: checking to see if runsolver output is valid for stage 1
08/14/17 06:46:13 AM CDT: Runsolver output could not be found
08/14/17 06:46:13 AM CDT: sent job status 11 to starexec2.star.cs.uiowa.edu
08/14/17 06:46:13 AM CDT: sending status for stage numbers greater than 0
08/14/17 06:46:13 AM CDT: setting all stats to 0 for stages greater than 0
08/14/17 06:46:13 AM CDT: runsolver output was not valid. Terminating job pair.
08/14/17 06:46:13 AM CDT: sent job status 11 to starexec2.star.cs.uiowa.edu
08/14/17 06:46:13 AM CDT: sending status for stage numbers greater than 0
08/14/17 06:46:13 AM CDT: setting all stats to 0 for stages greater than 0


More information about the Termtools mailing list