0f261debdbba7664ef7b0eacbac0e7daba89eebd |
|
02-Feb-2017 |
Allen Li <ayatane@chromium.org> |
[autotest] [atomic] Remove atomic groups from scheduler BUG=chromium:681906 TEST=Run unittest suite Change-Id: If9c144aae8d2a8df567a5a03b02bc3fec5d14c0d Reviewed-on: https://chromium-review.googlesource.com/435565 Commit-Ready: Allen Li <ayatane@chromium.org> Tested-by: Allen Li <ayatane@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com>
/external/autotest/scheduler/monitor_db.py
|
5e2efb71ffebead22aa4f0744ad843ee79814b43 |
|
07-Feb-2017 |
Dan Shi <dshi@google.com> |
[autotest] Use the metrics_mock object in case chromite is not set up. BUG=chromium:688166 TEST=unittest Change-Id: Ic0077cb2dba75a8d820f229060f3f70f507850a1 Reviewed-on: https://chromium-review.googlesource.com/438754 Commit-Ready: Dan Shi <dshi@google.com> Tested-by: Dan Shi <dshi@google.com> Reviewed-by: Dan Shi <dshi@google.com>
/external/autotest/scheduler/monitor_db.py
|
4de4e74d1eeedf605e272530c780f2632a62181c |
|
30-Jan-2017 |
xixuan <xixuan@chromium.org> |
autotest: Remove backend dead codes related to recurring jobs. This CL removes all backend dead codes related to recurring jobs. BUG=chromium:681913 TEST=Run local AFE, run a repair job for a DUT. Run unittest. Run 'python models.test'. Change-Id: I95d4ba9bdf47741cf82ca05808985a181e3ce467 Reviewed-on: https://chromium-review.googlesource.com/434838 Commit-Ready: Xixuan Wu <xixuan@chromium.org> Tested-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Xixuan Wu <xixuan@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
468c32c935332a5801b29523aaa609cf37dd4beb |
|
21-Dec-2016 |
Prathmesh Prabhu <pprabhu@chromium.org> |
[scheduler] Don't read config values at module import. We were reading some {global,shadow}_config values at module import time. This makes it harder to mock out the config for unittests. BUG=None. TEST=unittests. Change-Id: Id2cd76979289c42b092f2ae7dc1c0349d9f89a6c Reviewed-on: https://chromium-review.googlesource.com/422654 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
3c6a3bfd883d9770af66f9e14fa595271774c903 |
|
20-Dec-2016 |
Prathmesh Prabhu <pprabhu@chromium.org> |
scheduler: Don't schedule host_jobs when inline acquisition is false. host_scheduler, when run as an independent service assumes that it is the sole service scheduilng host_jobs. A fallback path in scheduler would try to schedule these jobs. This situation was never possible in the normal case, but the fallback was dangerous -- it could leave the DB corrupted. Instead, report errors in the fallback path and refuse to schedule the host jobs. BUG=chromium:675048 TEST=unittests, local jobs. Change-Id: I60bb56ded41c34484f6e281fdd12ace5601e739d Reviewed-on: https://chromium-review.googlesource.com/422535 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com>
/external/autotest/scheduler/monitor_db.py
|
7ab1ef65b4696b486849af98a0c7fe57b8e60e5d |
|
15-Dec-2016 |
Prathmesh Prabhu <pprabhu@chromium.org> |
scheduler: Start reporting tick() time breakdown. BUG=chromium:668288 TEST=unittests, local scheduler run. Change-Id: If1df494743cbcc396c130c10abf402b31ddb8f92 Reviewed-on: https://chromium-review.googlesource.com/420476 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
c29b4c7ec10db41f38e0361febe9846a95629b5a |
|
15-Dec-2016 |
Aviv Keshet <akeshet@chromium.org> |
autotest: delete some email alerts; replace some with monarch metrics For email alerts that seem (based on searching my email) to never be sent, I simply deleted them. For those that are sent sometimes and seem easily amenable to a monarch metric instead, I changed them to a metric. This is a first step; there are still many remaining unneccesary email alerts. BUG=chromium:672726 TEST=None Change-Id: Ib1d3715e618623faa16f3faaceabf4218dbad49a Reviewed-on: https://chromium-review.googlesource.com/420468 Commit-Ready: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
c95a66ae3a1881a6ce2506c3549dd80fb941ed24 |
|
23-Nov-2016 |
Prathmesh Prabhu <pprabhu@chromium.org> |
Remove autotest_stats from site_monitor_db. This is first of a two part change. This CL moves some timer metrics from site_monitor_db to monitor_db, and migrates them to monarch. BUG=chromium:667171 TEST=Run jobs on moblab. Change-Id: I503f112fdc5e1d51f0feb61165c11347cbabb883 Reviewed-on: https://chromium-review.googlesource.com/413610 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
ed7ece9fdbb4018f853c96d388f96143322f8f7d |
|
23-Nov-2016 |
Prathmesh Prabhu <pprabhu@chromium.org> |
autotest: Remove autotest_stats from monitor_db This is the second of a two part change. This CL removes the rest of the autotest_stats references in monitor_db. BUG=chromium:667171 TEST=Run jobs on moblab. Change-Id: Ie4e588056e2bf12564ed7bc03b2d82c5d6ce8109 Reviewed-on: https://chromium-review.googlesource.com/414271 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
eedcb8b81de7a686746b342be4732d25ccbfb955 |
|
06-Oct-2016 |
Paul Hobbs <phobbs@google.com> |
[autotest] Fix for indirect=True metrics Currently, chromite's @_Indirect metrics check whether they should be indirect at import-time. Therefore, we need to create the metrics at run-time after ts_mon has already been set up. A fix to the @_Indirect decorator is in progress. TEST=Tested with a local autotest instance. Unit tests still pass BUG=chromium:652620 Change-Id: I46f5d79921f56f5292df08a56b1de77399875861 Reviewed-on: https://chromium-review.googlesource.com/394249 Commit-Ready: Paul Hobbs <phobbs@google.com> Tested-by: Paul Hobbs <phobbs@google.com> Reviewed-by: Paul Hobbs <phobbs@google.com>
/external/autotest/scheduler/monitor_db.py
|
abd3b05597bfa2e5244209b7f5146900a58290a9 |
|
03-Oct-2016 |
Paul Hobbs <phobbs@google.com> |
Revert "Revert "Revert "[autotest] Set up ts_mon with indirect=True""" It turns out that even with the upgraded oauth2client, we're still not seeing metrics. This reverts commit abaaf9a979b52dbd48f33dc147203d631a046443. Change-Id: I5836ae76e0224d3554a9710ee32a62808bac8652 Reviewed-on: https://chromium-review.googlesource.com/391869 Reviewed-by: David Riley <davidriley@chromium.org> Tested-by: Paul Hobbs <phobbs@google.com>
/external/autotest/scheduler/monitor_db.py
|
abaaf9a979b52dbd48f33dc147203d631a046443 |
|
01-Oct-2016 |
Paul Hobbs <phobbs@google.com> |
Revert "Revert "[autotest] Set up ts_mon with indirect=True"" This reverts commit 02af0131ff73c73294b0f080a853fe52e7889299. Change-Id: Ibc86bfc489b9c8a0fd918333c1a3ef55038193f9 Reviewed-on: https://chromium-review.googlesource.com/391129 Reviewed-by: David Riley <davidriley@chromium.org> Tested-by: Paul Hobbs <phobbs@google.com>
/external/autotest/scheduler/monitor_db.py
|
02af0131ff73c73294b0f080a853fe52e7889299 |
|
30-Sep-2016 |
Paul Hobbs <phobbs@google.com> |
Revert "[autotest] Set up ts_mon with indirect=True" This reverts commit a065866598b020e05191cfd4fc0f4e7ced069640. Change-Id: I248ec1e0d2355fea9acb0db7232fd26af695929f Reviewed-on: https://chromium-review.googlesource.com/391124 Reviewed-by: David Riley <davidriley@chromium.org> Tested-by: Paul Hobbs <phobbs@google.com>
/external/autotest/scheduler/monitor_db.py
|
a065866598b020e05191cfd4fc0f4e7ced069640 |
|
20-Sep-2016 |
Paul Hobbs <phobbs@google.com> |
[autotest] Set up ts_mon with indirect=True The scheduler should use indirect=True to support the reset_after=True flag for global metrics. BUG=chromium:648312 TEST=None Change-Id: I4187c84361a51cdc37ca3827493d848e7dcfa3a3 Reviewed-on: https://chromium-review.googlesource.com/387197 Commit-Ready: Paul Hobbs <phobbs@google.com> Tested-by: Paul Hobbs <phobbs@google.com> Reviewed-by: David Riley <davidriley@chromium.org> Reviewed-by: Paul Hobbs <phobbs@google.com>
/external/autotest/scheduler/monitor_db.py
|
65fed0746a6002aae3e37d3486cb6836f71dda28 |
|
29-Jun-2016 |
Aviv Keshet <akeshet@chromium.org> |
autotest: add a monarch stat for scheduler tick BUG=None TEST=None Change-Id: I6c80490c6dd9dbc9cb8ec461266a913ca8d44b61 Reviewed-on: https://chromium-review.googlesource.com/357110 Tested-by: Aviv Keshet <akeshet@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com>
/external/autotest/scheduler/monitor_db.py
|
ffed172ee1d49f70b996814b481e7af19aae403b |
|
19-May-2016 |
Richard Barnette <jrbarnette@chromium.org> |
[autotest] Report DUT repair status to monarch. At the end of each special task, if the task knows whether the target DUT was working or broken, post that information using ts_mon. BUG=None TEST=run repair and verify jobs in a local instance Change-Id: I713a8584eb66820d890e3733c8790b421720672a Reviewed-on: https://chromium-review.googlesource.com/345972 Reviewed-by: Aviv Keshet <akeshet@chromium.org> Tested-by: Richard Barnette <jrbarnette@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
ed0c4b5bd0473d325098f27783ad68eee1b0eff1 |
|
03-Mar-2016 |
Fang Deng <fdeng@chromium.org> |
[autotest] Silence scheduler/host-scheduler email for non-primary servers. TEST=Run scheduler and host-scheduler locally. BUG=None Change-Id: Ie38c213d8779301204f593909b80171e6e6fb33a Reviewed-on: https://chromium-review.googlesource.com/330191 Commit-Ready: Fang Deng <fdeng@chromium.org> Tested-by: Fang Deng <fdeng@chromium.org> Reviewed-by: Fang Deng <fdeng@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
114e17228efd62ab595690be30cb1e3f26fabebe |
|
11-Jan-2016 |
Dan Shi <dshi@google.com> |
[autotest] Support selecting drone in restricted subnet For agent task uses host in restricted subnet, only use drone in the subnet. For agent task uses host NOT in restricted subnet, only use drones NOT in any restricted subnet. BUG=chromium:574872 TEST=local run, unittest Change-Id: I3492fe14660e7629f982937d428d230ca9dcf3dc Reviewed-on: https://chromium-review.googlesource.com/321116 Commit-Ready: Dan Shi <dshi@google.com> Tested-by: Dan Shi <dshi@chromium.org> Reviewed-by: Fang Deng <fdeng@chromium.org> Reviewed-by: Simran Basi <sbasi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
8d7f3561b90d88c985004104c18be44c2af6be70 |
|
11-Jan-2016 |
Dan Shi <dshi@google.com> |
[autotest] Fix a reference bug SchedulerError was moved from host_scheduler to scheduler_lib. Also change suite scheduler driver to use contextlib.closing for threadpool handling. BUG=None TEST=unittest, test suite scheduler run Change-Id: I56016c4817a5a7fece7076a82bede462f74d5d59 Reviewed-on: https://chromium-review.googlesource.com/321380 Commit-Ready: Dan Shi <dshi@google.com> Tested-by: Dan Shi <dshi@google.com> Reviewed-by: Simran Basi <sbasi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
1bf60eb788365f083d0ee8045a6556f906149dec |
|
02-Dec-2015 |
Simran Basi <sbasi@google.com> |
[autotest] autoserv add --lab & --host_attributes arguments Added two new flags to autoserv. --lab indicates that autoserv is running in the lab and has the full Autotest infrastructure at its disposal. --host_attributes allows host attribute information that is usually in the database to be retrievable from the command line arguments. If --lab is pulled in, autoserv will request the host attributes from the database at test runtime. From here this change, then updates the concept of the "machines" list that test control files receive to now be a list of dicts that contain the machine hostname and host attributes. This will enable identifing information the hosts library needs to create host objects to be available whether or not there is a database present. BUG=chromium:564343 TEST=local autoserv runs. Also verified scheduler changes work via MobLab. waiting on trybot results. DEPLOY=scheduler Change-Id: I6021de11317e29e2e6c084d863405910c7d1a71d Reviewed-on: https://chromium-review.googlesource.com/315230 Commit-Ready: Simran Basi <sbasi@chromium.org> Tested-by: Simran Basi <sbasi@chromium.org> Reviewed-by: Simran Basi <sbasi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
70647cafbd061a7754ac304fd9dc067f2b6dbab4 |
|
17-Jul-2015 |
Dan Shi <dshi@chromium.org> |
[autotest] Save parent job id, build, board and suite info to tko_jobs. parent job id is passed in through autoserv commandline. autoserv saves the value to keyval file in results folder. The parser job then reads the parent job id from keyval file. build, board and suite info are parsed from job name. The label column in tko_jobs is essentially the job name. However, that column has a size limit of 100 characters, thus the name could be truncated. This CL parse the actual job name to get the build, board and suite info and save to tko_jobs table. BUG=chromium:509770,chromium:509901 TEST=local test CQ-DEPEND=CL:285026 Change-Id: I06b073b052a9d07ffd36308b1682a7bc12699898 Reviewed-on: https://chromium-review.googlesource.com/286265 Tested-by: Dan Shi <dshi@chromium.org> Reviewed-by: Mungyung Ryu <mkryu@google.com> Commit-Queue: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
36cfd831af781eed114337efa5b90c103a49b502 |
|
10-Oct-2014 |
Dan Shi <dshi@chromium.org> |
[autotest] Support updating both firmware and CrOS from run_suite Add firmware_build option for run_suite to pass in firmware build. Save the build containing server-side test to job_keyvals. Design doc: https://docs.google.com/a/google.com/document/d/115aAHyZaatFDzuKWm61Sj2fImH4NDWikXqifvWKbbW0/edit# BUG=chromium:270258 TEST=local run_suite: make sure it works with older build: ./run_suite.py -b veyron_jerry -i trybot-veyron_jerry-paladin/R45-7086.0.0-b8 -p suites -s dummy_server Call failed as --firmware_ro_build not supported yet. ./run_suite.py -b veyron_jerry -i trybot-veyron_jerry-paladin/R45-7122.0.0-b11 --firmware_build veyron_jerry-firmware/R41-6588.9.0 --test_source_build trybot-veyron_jerry-paladin/R45-7122.0.0-b11 --firmware_ro_build veyron_jerry-firmware/R41-6588.9.0 -p suites -s dummy_server Call failed as the firmware build does not have test_suites package built: ./run_suite.py -b veyron_jerry -i trybot-veyron_jerry-paladin/R45-7122.0.0-b11 --firmware_build veyron_jerry-firmware/R41-6588.9.0 --test_source_build veyron_jerry-firmware/R41-6588.9.0 -p suites -s dummy_server make sure it works with new build: ./run_suite.py -b veyron_jerry -i trybot-veyron_jerry-paladin/R45-7122.0.0-b11 --firmware_build veyron_jerry-firmware/R41-6588.9.0 --test_source_build trybot-veyron_jerry-paladin/R45-7122.0.0-b11 -p suites -s dummy_server DEPLOY=apache,scheduler Change-Id: I50d23a7e81e4d6224b3483111110f910eb407074 Reviewed-on: https://chromium-review.googlesource.com/272793 Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Dan Shi <dshi@chromium.org> Trybot-Ready: Dan Shi <dshi@chromium.org> Tested-by: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
cf2e8dd3f81d5eb4c9720db396ebbf64fd7b9ae4 |
|
08-May-2015 |
Dan Shi <dshi@chromium.org> |
[autotest] Add a new thread to upload metadata reported by scheduler Currently host state change was reported to metadb before the change is committed to database. Each change makes a ES post call to send data. To avoid performance overhead for scheduler, UDP is used. UDP has a data lost issue. Especially that the ES server now lives in GCE, while scheduler runs in a different network. This CL attempts to fix the issue by reporting metadata in a separate thread in bulk. The performance of ES bulk API is much better than individual calls. For example, a single index request through HTTP might take 80ms. For bulk API, 1000 records can be indexed in less than 0.5 second. BUG=chromium:471015 TEST=run local scheduler, make sure all metadata was uploaded. Also, confirm scheduler can be properly shut down. Change-Id: I38991b9e647bb7a6fcaade8e8ef9eea27d9aa035 Reviewed-on: https://chromium-review.googlesource.com/270074 Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Dan Shi <dshi@chromium.org> Trybot-Ready: Dan Shi <dshi@chromium.org> Tested-by: Dan Shi <dshi@chromium.org> Reviewed-by: Keith Haddow <haddowk@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
55d5899addf09ad0fa4a6ea7260e7c7b143b430b |
|
05-May-2015 |
Dan Shi <dshi@chromium.org> |
[autotest] Queue calls in drone after drone refresh. Drone refresh is done in a non-thread safe fashion. It starts the refresh at the beginning of the tick, then follow by couple other operations, then wait for the refresh to finish. When it starts, it executes all queued calls in drone using drone_utils. After drone_utils finishes processing the calls, the scheduler will empty the queued calls in drones. That means any calls added between the drone refresh is started and the completion of drone refresh will be removed without being called. This CL moves the cleanup call after the drone refresh, also add a comment about potential future issues. A better fix might fix the root cause. For example, add a tracker in each drone's call queue. After drone refresh is done, only clear the calls being processed within refresh. crbug.com/484715 is filed to track this issue. BUG=chromium:484039 TEST=local scheduler run, make sure lxc_cleanup is kicked off and finished. Change-Id: I1bb3229a3da578299949a00af25b3d4674eeed4b Reviewed-on: https://chromium-review.googlesource.com/269255 Trybot-Ready: Dan Shi <dshi@chromium.org> Tested-by: Dan Shi <dshi@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@chromium.org> Reviewed-by: Simran Basi <sbasi@chromium.org> Commit-Queue: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
c458f66959fde1d934abfff92d20b2dbf115b9c2 |
|
29-Apr-2015 |
Dan Shi <dshi@chromium.org> |
[autotest] Add lxc_cleanup to TwentyFourHourUpkeep of the scheduler. So scheduler can kick off lxc_cleanup script in each drone every 24 hours. cron job requires puppet change and it's not supported in moblab. BUG=chromium:479383 TEST=start scheduler locally, check logs/lxc_cleanup.log to confirm the script finished running. Change-Id: I83ebfd6b0888b6f3b2c58d1f3824a692660bd4f7 Reviewed-on: https://chromium-review.googlesource.com/268318 Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Dan Shi <dshi@chromium.org> Trybot-Ready: Dan Shi <dshi@chromium.org> Tested-by: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
b92af21b84c5e27af7f2023ea54409c124d0968e |
|
10-Apr-2015 |
Paul Hobbs <phobbs@google.com> |
[autotest] Remove per-tick process restriction. The per-tick process restriction was causing a performance problem when a tick took a long time, and there isn't a good reason to keep the per-tick process constraint as there is already a total process constraint. TEST=Ran the scheduler. The unit tests pass. BUG=chromium:471352 Change-Id: I2b669fb758fbcc898e1727da51bd6d4cd99cd5d2 Reviewed-on: https://chromium-review.googlesource.com/265072 Trybot-Ready: Paul Hobbs <phobbs@google.com> Tested-by: Paul Hobbs <phobbs@google.com> Commit-Queue: Paul Hobbs <phobbs@google.com> Reviewed-by: Fang Deng <fdeng@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
ce59fe57b88c962412a20eeffbd8fa963d35d51a |
|
18-Mar-2015 |
Shuqian Zhao <shuqianz@chromium.org> |
[autotest] Adjust the pause in each tick of the lab scheduler Add a global config to set a minimum tick time of the scheduler. BUG=chromium:448940 TEST=Test on the moblab DEPLOY=scheduler,host_scheduler Change-Id: Icb9096d4a01be8441a6c4583f213ef0b67b63405 Reviewed-on: https://chromium-review.googlesource.com/262213 Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Shuqian Zhao <shuqianz@chromium.org> Tested-by: Shuqian Zhao <shuqianz@chromium.org> Trybot-Ready: Shuqian Zhao <shuqianz@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
ec1d47d85cc83f30631518d8fbb6406036a3ac39 |
|
13-Feb-2015 |
Dan Shi <dshi@chromium.org> |
[autotest] Add support for scheduler to honor require_ssp attribute in control file This CL adds changes to pipe require_ssp attribute in control file to autoserv command. Following are the work flow: 1. The control file parser stores require_ssp attribute value in afe_jobs table. 2. QueueTask compiles command line list, --require-ssp option will be added to the command line list if following conditions are met: a. AUTOSERV/enable_ssp_container in global config is True b. The test is a server-side test c. require_ssp for the job entry is None or True. 3. When agent_task tries to call run method to run the command, it will check if there is any drone supporting server-side packaging first. If no drone is found, the agent task will will run the command in a drone without using server-side packaging. A warning will be posted in the autoserv log. 4. If a drone without SSP supported is assigned to a test requires SSP, the test will be run without ssp. BUG=chromium:453624 TEST=unittest, local test: set AUTOSERV/enable_ssp_container to True in shadow config; Create a job for dummy_PassServer in AFE, check require SSP, confirm the job succeeds but with a warning in the autoserv log. Create a job for dummy_PassServer_nossp in AFE, uncheck require SSP, confirm the job passes without warning in the autoserv log. set AUTOSERV/enable_ssp_container to False in shadow config, restart scheduler. Create a job for dummy_PassServer in AFE, check require SSP, confirm the job succeeds without warning in the autoserv log. also run test_that in local chroot to make sure test_that is not affected. DEPLOY=apache,scheduler, db migrate must be done before push this CL to prod. Change-Id: I02f3d137186676ae570e8380d975a1bcd9ffbb94 Reviewed-on: https://chromium-review.googlesource.com/249841 Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Dan Shi <dshi@chromium.org> Trybot-Ready: Dan Shi <dshi@chromium.org> Tested-by: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
1e1c41b1b4a1b97c0b7086b8430856ed45e064d3 |
|
05-Feb-2015 |
Gabe Black <gabeblack@chromium.org> |
graphite: Separate out configuration from the statsd classes. The new version of the statsd classes should be created using an instance of the new Statsd class which sets up some defaults without having to specify them over and over. This makes it essentially compatible with the existing usage in autotest, but will allow chromite to configure things differently and avoid having side effects from importing the module or global state. BUG=chromium:446291 TEST=Ran unit tests, ran stats_es_functionaltest.py, ran the stats_mock_unittest, ran a butterfly-paladin tryjob with --hwtest, testing by fdeng. DEPLOY=apache,scheduler,host-scheduler Change-Id: I1071813db197c0e5e035b4d8db615030386f1c1c Reviewed-on: https://chromium-review.googlesource.com/246428 Reviewed-by: Fang Deng <fdeng@chromium.org> Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Gabe Black <gabeblack@chromium.org> Tested-by: Gabe Black <gabeblack@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
8c98ac10beaa08bfb975c412b0b3bda23178763a |
|
23-Dec-2014 |
Prashanth Balasubramanian <beeps@google.com> |
[autotest] Send frontend jobs to shards. Frontend jobs on hosts that are on the shard are disallowed currently, because the host-scheduler on master currently ignore jobs based on meta-host, but frontend jobs have no meta-host. This CL have the following changes: - Make host-scheduler ignore frontend jobs that are supposed to be picked by shard. - Send such frontend jobs in heartbeat. - Allow creation of frontend jobs in rpc. TEST=Test the follows: - Create a job on a host on shard from AFE frontend. Observe it runs on shards and completes on master. - Create a job on two hosts (one host on shard, the other on master) from AFE frontend. Make sure exception is railed with correct message. - Run a normal dummy suite on shard, make sure normal flow still works. Heartbeat contains the right information. - Run a normal dummy suite on master, make sure it works. BUG=chromium:444790 DEPLOY=apache, host-scheduler Change-Id: Ibca3d36cb59fed695233ffdc89506364c402cc37 Reviewed-on: https://chromium-review.googlesource.com/240396 Reviewed-by: Mungyung Ryu <mkryu@google.com> Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Fang Deng <fdeng@chromium.org> Tested-by: Fang Deng <fdeng@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
047e1c56e50f28d9b45692bfe2d25719e8af8112 |
|
23-Dec-2014 |
Prashanth Balasubramanian <beeps@google.com> |
[autotest] Allow shards to abort their tests and sync back status. Currently if a job is running on a shard and we abort its parent, eg: via abort_suite.py, the children jobs on the shard *may not* abort. They will abort if the shard does a heartbeat inbetween when abort_suite sets the aborted bit on the hqes and the master scheduler sets the completed bit (in _find_aborting). This change makes it so the master scheduler ignores all jobs that are running on shards. This way whenever the shard does a heartbeat it will learn about the aborted bit set by abort_suite, kill the jobs, and sync back the aborted status to the master. BUG=chromium:444754 TEST=Aborted jobs running on shard via their parent suites on the master. DEPLOY=scheduler,shard Change-Id: I00784138e9080988651b463410668221cf6f0267 Reviewed-on: https://chromium-review.googlesource.com/237325 Trybot-Ready: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org> Commit-Queue: Dan Shi <dshi@chromium.org> Trybot-Ready: Dan Shi <dshi@chromium.org> Reviewed-by: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
b9144a457ac2be505eac18f61e75d68751b3cea0 |
|
02-Dec-2014 |
Dan Shi <dshi@chromium.org> |
[autotest] Check server database before services start Add code in scheduler, host scheduler and suite scheduler, to check if the server has required role in server database, if use_server_db is set to True in global config. scheduler will also get the drone list from server database. Drone manager will load drone attributes like max_processes from server database. BUG=chromium:422535 CQ-DEPEND=CL:232003 TEST=setup local server database, add servers for scheduler, host scheduler and drone. Check following conditions: 1. Disable use_server_db, make sure all services can start and run tests. 2. Enable use_server_db, make sure all services can start and run tests. 3. Add a new server to be configured as primary of scheduler, host scheduler and suite scheduler, try to start each service in a different server. Confirm that the service can't start. Change-Id: I5105f6de7ed959c76ed6b920240f1ba5898bebd6 Reviewed-on: https://chromium-review.googlesource.com/232525 Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Dan Shi <dshi@chromium.org> Trybot-Ready: Dan Shi <dshi@chromium.org> Tested-by: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
c330bee1a59b0c2bb385b79db43b49098b1ee55a |
|
22-Oct-2014 |
Fang Deng <fdeng@chromium.org> |
[autotest] Deprecate the use of site_parse Some digging through the git history shows that site_parse was introduced to generate a json file about the test results and crash reports, which was later included in an email sent out by the old dashboard (see crbug.com/196118). Since we've completedly deprecated the old dashboard (crosreview.com/220690), there is no point to call site_parse any more. And since around Nov, 2013, we actually has stopped calling site_parser anyway because the import_site_function has been failing to load site_parser. Search scheduler logs you'll see tko/parser is called currently. So it won't harm to just remove the dead code that is trying to call site_parser. This will also simplify the measuring of duration of parsing (no need to take care of site_parser) BUG=chromium:422581 TEST=run scheduler locally. run run_suite. DEPLOY=scheduler Change-Id: I2232f245ab9e532c18bce723fb559617e6dab121 Reviewed-on: https://chromium-review.googlesource.com/224844 Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Fang Deng <fdeng@chromium.org> Tested-by: Fang Deng <fdeng@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
35d661e09666d315325f8942d06949ca7283666f |
|
26-Sep-2014 |
MK Ryu <mkryu@google.com> |
[autotest] Integrate crashlog collection into repair workflow. When a DUT goes offline before logs are gathered, we lose those logs if the DUT is re-imaged. To grab such logs as much as we can, we integrate crashlog collection into the repair workflow. BUG=chromium:215160 TEST=./server/autoserv -R -m <DUT ip> -r ~/tmp/repair CQ-DEPEND=CL:221510 Change-Id: Ifd562bfd539b133381572aeec503d9a3940ab448 Reviewed-on: https://chromium-review.googlesource.com/219999 Reviewed-by: Fang Deng <fdeng@chromium.org> Commit-Queue: Mungyung Ryu <mkryu@google.com> Tested-by: Mungyung Ryu <mkryu@google.com>
/external/autotest/scheduler/monitor_db.py
|
628bfcff1b8dd3137bdb7cf5a6fec515be717bbd |
|
02-Oct-2014 |
Prashanth Balasubramanian <beeps@google.com> |
[autotest] Add stats for jobs started/finished. BUG=None TEST=Ran suites, unittests. Change-Id: I86fa156a428e16e65071cee386d637d429fd50aa Reviewed-on: https://chromium-review.googlesource.com/221215 Reviewed-by: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
06b09b7db7e65b45d07a204d92bf434fe7f6f2b9 |
|
10-Sep-2014 |
Dan Shi <dshi@chromium.org> |
[autotest] Change production setting's default to False puppet CL was merged and prod was updated. This CL will force the option to default to False, thus any local scheduler run targeting remote database will fail without --production option. BUG=chromium:409091 TEST=local scheduler test DEPLOY=scheduler,host_scheduler Change-Id: Id49a4d8080534a16cb827038d6121d3c94a75dfe Reviewed-on: https://chromium-review.googlesource.com/217373 Tested-by: Dan Shi <dshi@chromium.org> Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
d615a1e8c37ff5af145d9ef6b5af219c20728321 |
|
04-Sep-2014 |
Jakob Juelich <jakobjuelich@chromium.org> |
[autotest] Fix missing execution_subdir for hostless jobs The status of hostless jobs is set to Starting in schedule_new_jobs. If the scheduler is interrupted after doing that, it will try to restore the agents after a starting again. The execution_subdir is not set at that point though. Therefore an assertion will fail and an exception will be raised. Before this commit, the execution_subdir is set to 'hostless' in the prolog of hostless jobs. This commit moves this also to start_new_jobs, before setting the status, so when the status is set to Starting, the execution subdir will always be already set. In case the scheduler is interrupted after setting the execution_subdir but before setting the status, nothing bad will happen as the execution_subdir is never accessed if the status isn't Starting, Running, Gathering, Parsing or Archiving. BUG=chromium:334353 DEPLOY=scheduler TEST=Ran utils/unittest_suite.py and manually killed+restarted scheduler Change-Id: I048bf18883857d6ff5016ace64526729f631bc26 Reviewed-on: https://chromium-review.googlesource.com/215394 Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Jakob Jülich <jakobjuelich@chromium.org> Tested-by: Jakob Jülich <jakobjuelich@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
f6c65bd150f50771145545b58b89b88ca97bb250 |
|
30-Aug-2014 |
Dan Shi <dshi@chromium.org> |
[autotest] Add a --production argument to stop scheduler from starting with database set to remote server This is the first CL to add the option. Once this CL is merged and pushed to prod. A second CL will update the upstart job in puppet to use the option. Once puppet change is populated to prod, a third CL will remove the override of production argument and make it effective in prod. BUG=chromium:409091 TEST=local setup, change host in AUTOTEST_WEB to various settings and verify the option works. DEPLOY=scheduler,host-scheduler Change-Id: I0b3b961db3ae7a95b887f5ae32b73c3a33b6e82c Reviewed-on: https://chromium-review.googlesource.com/215672 Tested-by: Dan Shi <dshi@chromium.org> Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
f47a6bbb9971efd228eaa22425431b91fa9f69bf |
|
29-Aug-2014 |
Prashanth B <beeps@chromium.org> |
Revert "[autotest] Restore from inconsistent state after the scheduler was interrupted." This reverts commit b7c842f8c8ba135bb03a0862ac0c880d3158bf07. Change-Id: I8d34329b8a2771eb4068ab50414c9eac6fd73d3f Reviewed-on: https://chromium-review.googlesource.com/215612 Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
b7c842f8c8ba135bb03a0862ac0c880d3158bf07 |
|
24-Jul-2014 |
Jakob Juelich <jakobjuelich@google.com> |
[autotest] Restore from inconsistent state after the scheduler was interrupted. If the scheduler assigns hosts to hqes but hasn't set a execution_subdir yet, an exception is thrown. With this, the database will be cleaned up once, when the scheduler starts. Jobs that are in an inconsistent state, will just be resetted so they can be scheduled again. BUG=chromium:334353 DEPLOY=scheduler TEST=Ran utils/unittest_suite.py and manually set db into inconsistent state. Change-Id: I96cc5634ae5120beab59b160e735245be736ea92 Reviewed-on: https://chromium-review.googlesource.com/209635 Tested-by: Jakob Jülich <jakobjuelich@chromium.org> Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Jakob Jülich <jakobjuelich@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
36accc6a2a572e9d502407b34701f535a169f524 |
|
23-Jul-2014 |
Jakob Jülich <jakobjuelich@google.com> |
[autotest] Fixing and re-enabling monitor_db_functional_test. The test was disabled and outdated. Database access and mocking of the drone manager changed. This fixes these issues, updates the unit tests to the current status and reanables them. BUG=chromium:395756 DEPLOY=scheduler TEST=ran ./utils/unittest_suite.py Change-Id: I6a3eda5ddfaf07f06d6b403692b004b22939ffb6 Reviewed-on: https://chromium-review.googlesource.com/209567 Reviewed-by: Alex Miller <milleral@chromium.org> Tested-by: Jakob Jülich <jakobjuelich@google.com> Commit-Queue: Jakob Jülich <jakobjuelich@google.com>
/external/autotest/scheduler/monitor_db.py
|
ac189f3c6cafa7d445162b5ec54e4162d0e679b2 |
|
23-Jun-2014 |
Alex Miller <milleral@chromium.org> |
[autotest] Remove indirection in scheduler config. We shouldn't encourage things to be named two different things in two different places. BUG=None DEPLOY=scheduler TEST=ran scheduler Change-Id: I0cfac73f7c2dbc0130f0399d96feda257915cd34 Reviewed-on: https://chromium-review.googlesource.com/205720 Reviewed-by: Prashanth B <beeps@chromium.org> Reviewed-by: Fang Deng <fdeng@chromium.org> Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Alex Miller <milleral@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
675480946295e6a834f571f5aff92dd8d92dfe2c |
|
12-Jul-2014 |
Prashanth B <beeps@google.com> |
[autotest] Aborting jobs after drone manager refresh. Aborting a job that has an active special task forces the task's epilog, causing it to unregister its pidfile from the drone manager, which then proceeds to look for the pidfile in sync_refresh because it has stale cached state. The effect is an unnecessary scheduler restart. Avoid this confusion by aborting jobs after the complete refresh, like we did before. TEST=Ran suites, triggered the race with and without the fix. BUG=None. DEPLOY=scheduler. Change-Id: I53f1fd161b57dc3d1fcbf20e78f02c0f4f2c717b Reviewed-on: https://chromium-review.googlesource.com/207715 Tested-by: Prashanth B <beeps@chromium.org> Reviewed-by: Alex Miller <milleral@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
340fd1e49a96391f2143a68c1a8795633be19622 |
|
22-Jun-2014 |
Prashanth B <beeps@google.com> |
[autotest] Threaded asynchronous task execution on drones. This cl does the following: 1. Creates a ThreadedTaskQueue capable of executing calls across drones in parallel. 2. Breaks drone_manager.refresh into 2 stages, a trigger and sync, thereby making it asynchronous. 3. Creates a localhost host object for the localhost drone so we run drone_utility through another process instead of directly importing it as a module. This fits better with the overall drone manager design, and allows us to multithread the monitoring of drones while still using signals within drone utility. 4. Adds stats, unittests and documentation. TEST=Ran jobs, added unittests. BUG=chromium:374322, chromium:380459 DEPLOY=scheduler Change-Id: I950cf260fdc3e5d1a2d4f6fdb4f5954c6371c871 Reviewed-on: https://chromium-review.googlesource.com/207094 Reviewed-by: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
3ef82138e74c0ff62c4ac24f090532f78c72e320 |
|
09-Jul-2014 |
Prashanth B <beeps@chromium.org> |
Revert "[autotest] Threaded asynchronous task execution on drones." Problems with the retry decorator and localhost threads. This reverts commit 0933899b0dd1320e90e06025cced8096aed44908. Change-Id: I99318b4bdf4c11e9c4e5181c4ff5b1bdcdcbb89c Reviewed-on: https://chromium-review.googlesource.com/207038 Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
0933899b0dd1320e90e06025cced8096aed44908 |
|
22-Jun-2014 |
Prashanth B <beeps@google.com> |
[autotest] Threaded asynchronous task execution on drones. This cl does the following: 1. Creates a ThreadedTaskQueue capable of executing calls across drones in parallel. 2. Breaks drone_manager.refresh into 2 stages, a trigger and sync, thereby making it asynchronous. 3. Adds stats, unittests and documentation. TEST=Ran jobs, unittests. BUG=chromium:374322, chromium:380459 DEPLOY=scheduler Change-Id: Ib1257b362a6e4e335e46d51006aedb6b4a341bae Reviewed-on: https://chromium-review.googlesource.com/205884 Reviewed-by: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
da8c60af1e1e3ee97170c700d0b72991687e35a2 |
|
03-Jun-2014 |
Michael Liang <michaelliang@chromium.org> |
[autotest] Migrate graphite directory to client/common_lib/cros This change allows us to report stats in client tests. 1. Change import paths for all files that import modules from graphite 2. Clean up some unused modules Related CL: https://chromium-review.googlesource.com/#/c/202467/ BUG=chromium:237255 TEST=Ran scheduler locally, scheduled reboot jobs, verified stats such as monitor_db_cleanup.user_cleanup._cleanup were reported on chromeos-stats. DEPLOY = apache, scheduler, host_scheduler Change-Id: Iebfe3b8acc1c363a0b70ea555744e85d1367cb67 Reviewed-on: https://chromium-review.googlesource.com/202727 Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Michael Liang <michaelliang@chromium.org> Tested-by: Michael Liang <michaelliang@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
4ec9867f46deb969c154bebf2e64729d56c3a1d3 |
|
15-May-2014 |
Prashanth B <beeps@google.com> |
[autotest] Split host acquisition and job scheduling II. This cl creates a stand-alone service capable of acquiring hosts for new jobs. The host scheduler will be responsible for assigning a host to a job and scheduling its first special tasks (to reset and provision the host). There on after, the special tasks will either change the state of a host or schedule more tasks against it (eg: repair), till the host is ready to run the job associated with the Host Queue Entry to which it was assigned. The job scheduler (monitor_db) will only run jobs, including the special tasks created by the host scheduler. Note that the host scheduler won't go live till we flip the inline_host_acquisition flag in the shadow config, and restart both services. The host scheduler is dead, long live the host scheduler. TEST=Ran the schedulers, created suites. Unittests. BUG=chromium:344613, chromium:366141, chromium:343945, chromium:343937 CQ-DEPEND=CL:199383 DEPLOY=scheduler, host-scheduler Change-Id: I59a1e0f0d59f369e00750abec627b772e0419e06 Reviewed-on: https://chromium-review.googlesource.com/200029 Reviewed-by: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
f66d51b5caa96995b91e7c155ff4378cdef4baaf |
|
06-May-2014 |
Prashanth B <beeps@google.com> |
[autotest] Split host acquisition and job scheduling. This is phase one of two in the plan to split host acquisition out of the scheduler's tick. The idea is to have the host scheduler use a job query manager to query the database for new jobs without hosts and assign hosts to them, while the main scheduler uses the same query managers to look for hostless jobs. Currently the main scheduler uses the class to acquire hosts inline, like it always has, and will continue to do so till the inline_host_acquisition feature flag is turned on via the shadow_config. TEST=Ran the scheduler, suites, unittets. BUG=chromium:344613 DEPLOY=Scheduler Change-Id: I542e4d1e509c16cac7354810416ee18ac940a7cf Reviewed-on: https://chromium-review.googlesource.com/199383 Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
0e960285b022fad77f0b087a2007867363bf6ab9 |
|
14-May-2014 |
Prashanth B <beeps@google.com> |
[autotest] Consolidate methods required to setup a scheduler. Move methods/classes that will be helpful in setting up another scheduler process into scheduler_lib: 1. Make a connection manager capable of managing connections. Create, access, close the database connection through this manager. 2. Cleanup setup_logging so it's usable by multiple schedulers if they just change the name of the logfile. TEST=Ran suites, unittests. BUG=chromium:344613 DEPLOY=Scheduler Change-Id: Id0031df96948d386416ce7cfc754f80456930b95 Reviewed-on: https://chromium-review.googlesource.com/199957 Reviewed-by: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
f7d3516fc0ec29c1903889acc793a0a848d367a9 |
|
14-Feb-2014 |
beeps <beeps@chromium.org> |
[autotest] Hosts that are being used by special tasks don't get jobs. BUG=None TEST=Ran a suite with special tasks. Change-Id: I5a29f225fb1d2dc79c273deb1494c1eb387c67cd Reviewed-on: https://chromium-review.googlesource.com/186478 Commit-Queue: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org> Reviewed-by: Fang Deng <fdeng@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
cc9fc70587d37775673e47b3dcb4d6ded0c6dcb4 |
|
02-Dec-2013 |
beeps <beeps@chromium.org> |
[autotest] RDB Refactor II + Request/Response API. Scheduler Refactor: 1. Batched processing of jobs. 2. Rdb hits the database instead of going through host_scheduler. 3. Migration to add a leased column.The scheduler released hosts every tick, back to the rdb. 4. Client rdb host that queue_entries use to track a host, instead of a database model. Establishes a basic request/response api for the rdb: rdb_utils: 1. Requests: Assert the format and fields of some basic request types. 2. Helper client/server modules to communicate with the rdb. rdb_lib: 1. Request managers for rdb methods: a. Match request-response b. Abstract the batching of requests. 2. JobQueryManager: Regulates database access for job information. rdb: 1. QueryManagers: Regulate database access 2. RequestHandlers: Use query managers to get things done. 3. Dispatchers: Send incoming requests to the appropriate handlers. Ignores wire formats. TEST=unittests, functional verification. BUG=chromium:314081, chromium:314083, chromium:314084 DEPLOY=scheduler, migrate Change-Id: Id174c663c6e78295d365142751053eae4023116d Reviewed-on: https://chromium-review.googlesource.com/183385 Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
0237793390d44e6a39ec3245d3e5966b7dd89a30 |
|
15-Jan-2014 |
beeps <beeps@chromium.org> |
[Autotest] Avoid race conditions between special tasks and hqes. Since the order of things that occur within a tick is: -create an agent for a special task -scheduler new jobs on ready hosts -activate and run the special task, which changes the status of the host removing it from the ready queue. We sometimes find ourselves in a situation where a special task scheduled through the frontend (eg: mass reverify) creates an agent for a 'Ready' host, thereby making it unusable, but the host still gets assigned to an hqe. This is because the host_scheduler till now, didn't understand that an inactive special task could still invalidate a host by assigning an agent to it, without actually changing it's status till later. In such situations the hqe would sit around in a Queued state. BUG=chromium:333122 TEST=Created special tasks for hosts through the frontend and checked that hosts don't get assigned to hqes. DEPLOY=Scheduler Change-Id: Icccb42e14ed86daeb2dcb38f3c492614fcd8d7ac Reviewed-on: https://chromium-review.googlesource.com/182704 Tested-by: Prashanth B <beeps@chromium.org> Reviewed-by: Fang Deng <fdeng@chromium.org> Reviewed-by: Alex Miller <milleral@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
d361404e7c9a5bd26700f14cbd160a5dae8db0c8 |
|
14-Jan-2014 |
Alex Miller <milleral@chromium.org> |
[autotest] Group suites together when scheduling HQEs. This is a feeble attempt at doing more intelligent scheduling for provisioning, as now suites that scheduled their tests interleaved will run one and then the other instead of thrashing between two different suites (and thus probably images). BUG=chromium:250589 DEPLOY=scheduler TEST=ran scheduler Change-Id: Idbabdedd2a20f0a456503fc870a9c64556801d8e Reviewed-on: https://chromium-review.googlesource.com/182346 Reviewed-by: Alex Miller <milleral@chromium.org> Tested-by: Alex Miller <milleral@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
47cd247d77228d5a9a78e81abc4892c71b1f9e3c |
|
26-Nov-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Sneak stats calls in to log queued and running HQEs. This isn't the most well designed stats logging, but it should get the right numbers logged at some point in time during the tick. BUG=None DEPLOY=scheduler TEST=ran locally Change-Id: I7e60f20d88462f077e9469c0cf9cbbb7d8ed11f9 Reviewed-on: https://chromium-review.googlesource.com/177949 Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
76af802bd80edf50fd34efae25205c3aeaf82f25 |
|
19-Oct-2013 |
Dan Shi <dshi@chromium.org> |
[autotest] abort Starting suite job leads to scheduler crash abort Starting suite job leads to scheduler crash with error: AssertionError: self.execution_subdir not found BUG=chromium:309207,276507 TEST=unittest, and manual test: 1. add |max_hostless_processes: 1| and |max_processes_per_drone: 3| to SCHEDULER section of shadow config. 2. restart scheduler 3. add three new suite jobs. When the third job shows status of Starting in afe, try to abort it in afe. 4. abort all other suite jobs, and scheduler should abort all suite jobs. DEPLOY=scheduler Change-Id: I763918c34569643edb5e0acd94a3ca54cc6e5949 Reviewed-on: https://chromium-review.googlesource.com/173770 Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Dan Shi <dshi@chromium.org> Tested-by: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
e50d875ec755a9110e9ac76c2edc8abba44fe167 |
|
21-Nov-2013 |
beeps <beeps@chromium.org> |
[autotest] Abort all special tasks with the hqe. Some suite runs and kicks off jobs. We schedule a Reset and a Provision, per job. Reset creates an agent for a host and starts running. An abort happens, then shenaniganses: find_aborting: Finds hqe for test job: abort agents aborts Reset -> cleanup() provision doesn't have an agent yet abort hqe status == RESETTING so it schedules a cleanup Schedule new jobs: the scheduled cleanup gets an agent, since it is higher priority than provision. handle_agents: Cleanup runs --next tick-- provision gets an agent, and starts. --next tick-- find_aborting: Find hqe that is now in provision (it's still aborted=1, but is now re-activated because of provision) abort agents aborts provision midway, call cleanup() abort hqe status == PROVISIONING, schedules another cleanup This change does the following: - Aborts the provisioning job up front so we don't incur the overhead. - Schedules a repair instead of a cleanup, if we should abort a provision while it's running, as we could have left the DUT in a transient state midway between images such that the cleanup actually passes. TEST=Ran suites, aborted in resetting and provisioning. Ran other suites that used the same duts. BUG=chromium:321871 Change-Id: I18858aef6ef753fa889a28aefcd9c4b1b30030d5 Reviewed-on: https://chromium-review.googlesource.com/177485 Tested-by: Prashanth B <beeps@chromium.org> Reviewed-by: Dan Shi <dshi@chromium.org> Reviewed-by: Alex Miller <milleral@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
7d8273bad1318c13698a162a6e5910bea060d167 |
|
06-Nov-2013 |
beeps <beeps@chromium.org> |
[autotest] RDB refactor I Initial refactor for the rdb, implementes 1 in this schematic: https://x20web.corp.google.com/~beeps/rdb_v1_midway.jpg Also achieves the following: - Don't process an hqe more than once, after having assigned a host to it. - Don't assign a host to a queued, aborted hqe. - Drop the metahost concept. - Stop using labelmetahostscheduler to find hosts for non-metahost jobs. - Include a database migration script for jobs that were still queued during the scheduler restart, since they will now need a meta_host dependency. This cl also doesn't support the schedulers ability to: - Schedule an atomic group * Consequently, also the ability to block a host even when the hqe using it is no longer active. - Schedule a metahost differently from a non-metahost * Both metahosts and non-metahosts are just labels now * Jobs which are already assigned hosts are still give precedence, though - Schedule based on only_if_needed. And fixes the unittests appropriately. TEST=Ran suites, unittests. Restarted scheduler after applying these changes and tested migration. Ran suite scheduler. BUG=chromium:314082,chromium:314219,chromium:313680,chromium:315824,chromium:312333 DEPLOY=scheduler, migrate Change-Id: I70c3c3c740e51581db88fe3ce5879c53d6e6511e Reviewed-on: https://chromium-review.googlesource.com/175957 Reviewed-by: Alex Miller <milleral@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
5e2bb4aa28611aaacaa8798fd07943ede1df46c6 |
|
28-Oct-2013 |
beeps <beeps@chromium.org> |
[autotest] Scheduler refactor. Break scheduler into simpler modules. This change also modifies run_pylint to check for undefined variables. BUG=chromium:312338 TEST=Ran smoke suite against multiple duts. Triggered agents like repair, verify etc. Pylint, Unittests. DEPLOY=scheduler Change-Id: Ibd685a27b5b50abd26cdf2976ac4189c3e9acc0a Reviewed-on: https://chromium-review.googlesource.com/174080 Commit-Queue: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org> Reviewed-by: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
7d8a1b139e6390f6dad06bf0cf2bfeb9a4a69304 |
|
30-Oct-2013 |
beeps <beeps@chromium.org> |
[autotest] De-prioritize hostless hqes in favor of tests. Currently, hostless hqes get precedence over tests. In situations when we're flooded with suites this is a problem, as it leads to a deadlock situation where many hostless jobs are waiting on tests that the drone doesn't have the capacity to run. Note that even after this change such scenarios are possible, just a little less likely. TEST=Started suites, set a low limit, checked that we run tests before hostless jobs. Checked that we prioritize (host+no metahost) over (no host+metahost). Ran unittests. BUG=chromium:312847 DEPLOY=scheduler Change-Id: Ibe66e8a0b6319561cc24e491ec7b9b370a840bad Reviewed-on: https://chromium-review.googlesource.com/175028 Tested-by: Prashanth B <beeps@chromium.org> Reviewed-by: Alex Miller <milleral@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
1a18905776c0a53e2a169f61dbf5bdad3bd0cb74 |
|
28-Oct-2013 |
Dan Shi <dshi@chromium.org> |
[autotest] revert suite throttling Undo CL: https://chromium-review.googlesource.com/#/c/167175 keep stats call and _notify_process_limit_hit. BUG=chromium: TEST=suite runn in local setup, unittest DEPLOY=scheduler Change-Id: I713b69651fabfb8cbb4f9c1ca3a8605900753bc9 Reviewed-on: https://chromium-review.googlesource.com/174896 Commit-Queue: Dan Shi <dshi@chromium.org> Reviewed-by: Dan Shi <dshi@chromium.org> Tested-by: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
5b6b3fd9fec63b0d5f15db51cdc94c584269358f |
|
18-Oct-2013 |
Dan Shi <dshi@chromium.org> |
[autotest] abort Starting job leads to scheduler crash abort Starting job leads to scheduler crash with error: AssertionError: self.execution_subdir not found BUG=none TEST=none Change-Id: I5c9c6b43e043759582a0f7d10efa2e301d5a0ee8 Reviewed-on: https://chromium-review.googlesource.com/173688 Reviewed-by: Alex Miller <milleral@chromium.org> Commit-Queue: Dan Shi <dshi@chromium.org> Tested-by: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
d0e09ab5697f48012bdf4b426d55cd0fb58f4926 |
|
10-Sep-2013 |
Dan Shi <dshi@chromium.org> |
[autotest] Fix SelfThrottledTask._num_running_processes when suite job is aborted When suite job is aborted, the variable SelfThrottledTask._num_running_processes is not decremented. The cause is that abort call AbstractQueueTask.abort bypasses call on SelfThrottledTask.finished. Change is made in Agent.abort method. When a task is aborted from an AgentTask, BaseAgentTask.finished(False) is called to allow finished method in SelfThrottledTask to be called to update the counters properly. BUG=chromium:288175 TEST=unittest, add logging in SelfThrottleTask._increment_running_processes and _decrement_running_processes methods to print out value of _num_running_processes. Start scheduler (monitor_db) in local workstation, create several suite jobs via run_suite, cancel some of the suite jobs. After all jobs are finished or aborted, confirm value of _num_running_processes are all 0. Change-Id: I80545fc68a75db645c9b8b5330b05b64e7609a9d Reviewed-on: https://chromium-review.googlesource.com/168649 Reviewed-by: Alex Miller <milleral@chromium.org> Tested-by: Dan Shi <dshi@chromium.org> Commit-Queue: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
b255fc55ee52e2259f6fad153ee6fc3d4e5e0b8e |
|
14-Oct-2013 |
beeps <beeps@chromium.org> |
[autotest] Stat number of new jobs per scheduler tick. Send new hostless, atomic and host-job-assignment counts per tick as a guage to statsd. This will be useful in figuring out the expected overhead in querying rdb for a host/job and designing a sensible host acquisition api. TEST=Ran it, checked stat keys. BUG=None. Change-Id: I4fcc88945af6c6ad41e5df43932e9a9de2991ec6 Reviewed-on: https://chromium-review.googlesource.com/173329 Commit-Queue: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org> Reviewed-by: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
7bcec08fbd08f29580fef7a0d7c85a575d5243ba |
|
19-Sep-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] max_provision_retries=1 means retry once. The current code would count the just-failed provision as an attempt, and thus max_provision_retries=1 would cause nothing to retry, and this feels counter-intuitive. BUG=None DEPLOY=scheduler TEST=None Change-Id: I0ed0412a074997c581f5bedaa638ed93220d2c97 Reviewed-on: https://chromium-review.googlesource.com/170023 Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
7bcc0987a8e935cdcd694c16b8d8927af05ec61f |
|
06-Sep-2013 |
Fang Deng <fdeng@chromium.org> |
[autotest] Send an email when SelfThrottledTask hits process limit. SelfThrottledTask now sends out an email when process limit is hit. Once the limit is hit, it won't send out another email until it drops to lower than 80% of max processes. Because HostlessQueueTask is subclassing SelfThrottledTask, we will be notified when there are too many suites, assuming all Hostless jobs are currently suites (See crbug.com/279627). Other subclasses of SelfThrottledTask will also send out emails. Also send _num_running_processes to graphite. BUG=chromium:286025 TEST=Set max_hostless_processes to 2, scheduler three suites, confirm that when the second one is kicked off, an email is sent out; confirm that another email will only be sent after it drops to lower than 80% of max limit. DEPLOY=scheduler Change-Id: I78fc93de3a82c2c85853dab288411f884de500d4 Reviewed-on: https://chromium-review.googlesource.com/168370 Reviewed-by: Dan Shi <dshi@chromium.org> Reviewed-by: Alex Miller <milleral@chromium.org> Tested-by: Fang Deng <fdeng@chromium.org> Commit-Queue: Fang Deng <fdeng@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
a4a78efbc0ab93bb57ddc1ea010a30b5c07a7ace |
|
04-Sep-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Add retries to provisioning. There is now a setting in global_config.ini which controls provision retries, and this value can be reloaded on-the-fly in the scheduler. Be cautioned that provision failures are basically silently hid. There's currently no sort of reporting to indicate that a retry happened. Implementing this also pointed out the way to clean up the ProvisionTask code, so there's also some free cleanup packaged into this CL. BUG=chromium:279667 DEPLOY=scheduler TEST=forced provision failures, and watched the HQE get requeued a finite number of times. Change-Id: I66d967fb8f3ab9f199571764821e1a39d0e81f39 Reviewed-on: https://chromium-review.googlesource.com/167990 Reviewed-by: Dan Shi <dshi@chromium.org> Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Prashanth Balasubramanian <beeps@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
c23a7f0357a6109c7e3ed91798d0d433f5de0a21 |
|
28-Aug-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Throttle HostlessQueueEntries. This is a quick hack to throttling the number of suites we have active. I'd like if there was a nice easy way to single out what a suite is, or have a job elect to be throttled, but in terms of making sure the system doesn't go down again soon for the same reason, this will be good enough for now. Note that, while throttled, the suite jobs show as running, but the autoserv process isn't kicked off. BUG=chromium:279627 DEPLOY=scheduler TEST=Kicked off multiple suites with max_hostless_processes set to 500. All ran. Set max_hostless_processes to 1, and reloaded via :13467. Scheduled another suite, which didn't start running until all other suites finished. Change-Id: I141a9bc4bd64d9a40c1ec4cd05a624739bb85076 Reviewed-on: https://chromium-review.googlesource.com/167175 Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Fang Deng <fdeng@chromium.org> Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
9fe39666fcb007cd87273c95d2efb4928206d883 |
|
09-Aug-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Don't use implicit true/false in scheduler query. This ruins MySQL's ability to use an index, even though the field is a TINYINT(1). BUG=chromium:270215 DEPLOY=scheduler TEST=scheduler runs and aborts Change-Id: I48fd3135941a86801e48a4f67e0c1f9ef307ab05 Reviewed-on: https://gerrit.chromium.org/gerrit/65339 Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Prashanth Balasubramanian <beeps@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
9f01d5dbbcbc95c0ac1b114a728c2fd597a091da |
|
08-Aug-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Don't pass --verify_job_repo_url on Hostless Queue Entries. There's currently this ugly message 08/08 01:54:38.933 WARNI|subcommand:0080| parallel_simple was called with an empty arglist, did you forget to pass in a list of machines? in suite autoserv.DEBUG logs because we're trying to invoke the verify job repo url control segment on hostless tasks, so we're doing `parallel_simple(fn, [])`, which isn't accepted. This moves it into QueueTask, which is a something that has a host. BUG=chromium:269999 DEPLOY=scheduler TEST=Ran a suite, and no longer see the message. Watching scheduler output, and suites don't get the arg, but tests do. Change-Id: I37a56a590677893833102f99ffa242560fe6b48d Reviewed-on: https://gerrit.chromium.org/gerrit/65183 Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Prashanth Balasubramanian <beeps@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
f314b262354e35d8f7b8eef78c1b3d98f73dcc8a |
|
08-Aug-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Fail, not abort, tests that fail to provision. If we abort the test, something across dynamic_suites and run_suite will refuse to look at the reason field to see if one is listed, and thus we get no reason displayed on the run_suite output, and by proxy, the waterfall. Testing also revealed that another of my previous assertions, that we need to set the complete bit on the HQE, was actually wrong, and will also be done by the FinalReparseTask, so that code and comments are being dropped. BUG=chromium:188217 DEPLOY=scheduler TEST=run_suite now shows useful output Change-Id: I7369d3847687cd0f0daad50a4e5658eef96f39e4 Reviewed-on: https://gerrit.chromium.org/gerrit/65116 Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Prashanth Balasubramanian <beeps@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
e8949c9bf5495bb1d009b2448994ab27961d6ab6 |
|
08-Aug-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Make ProvisionTask call AgentTask's epilog so we call finish Without a call to finish(), is_complete never gets set to 1, and thus the scheduler, when it starts up again, will think all previously run provision tasks are still running. BUG=chromium:249437 DEPLOY=scheduler TEST=Scheduler was leaking provision tasks, currently is not. This code also existed in a previous CL, but I couldn't explain the effect that it had at the time, so it was removed (incorrectly). Change-Id: I7941a11b2edd38aff0546cbb77d94cba62b5ebf9 Reviewed-on: https://gerrit.chromium.org/gerrit/65040 Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Prashanth Balasubramanian <beeps@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
8bb1f7d8766d202d220879642a7d5c40e2775e52 |
|
05-Aug-2013 |
beeps <beeps@chromium.org> |
[autotest] Teach the scheduler to abort SpecialTasks. 1. Database migration that adds a column to afe_special_tasks. 2. An rpc that modifies the db with the abort bit. 3. A scheduler method that polls the database for aborted jobs and handles the agent. TEST=Scheduled a job and aborted it's special tasks. Checked that subsequent special tasks were scheduled. Aborted Repair->host goes into repair fail. Aborted (Reset, Verify, Cleanup)->Repair queued. Checked that the Hqe is requeued through PreJobTask epilog. Aborted SpecialTasks without hqes. Aborted jobs normally through the frontend. BUG=chromium:234223 DEPLOY=migrate, apache, scheduler Change-Id: I1a47bc2d801486a8abdffb44091c59a8f5bdbefc Reviewed-on: https://gerrit.chromium.org/gerrit/64753 Commit-Queue: Prashanth Balasubramanian <beeps@chromium.org> Reviewed-by: Prashanth Balasubramanian <beeps@chromium.org> Tested-by: Prashanth Balasubramanian <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
dfff2fdc8477be3ba89fd915fde2afe8d3716624 |
|
28-May-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Add a provision special task. We now insert a special task which calls |autoserv --provision| with the host that the HQE is about to run on to provision the machine correctly before the test runs. If the provisioning fails, the HQE will also be marked as failed. No provisioning special task will be queued if no provisioning needs to be done to the host before the job can/will run. With *just* this CL, no provisioning tasks should actually get scheduled, because the part of the scheduler that maps HQEs to hosts hasn't been taught about provisioning yet. That will come in a later CL. Once this CL goes in, it should not be reverted. The scheduler will become very unhappy if it sees special tasks in its database, but can't find a corresponding AgentTask definition for them. One would need to do manual database cleanup to revert this CL. However, since one can disable provisioning by reverting the (future) scheduling change CL, this shouldn't be an issue. BUG=chromium:249437 DEPLOY=scheduler TEST=lots: * Ran a job on a host with a non-matching cros-version:* label, and a provision special task was correctly created. It ran after Reset, and correctly kicked off the HQE after it finished. * Ran a job on a host with a matching cros-version:* label, and no provision special task was created. * Ran a job on a host with a non-matching cros-version:* label, and modified Reset so that it would fail. When reset failed, it canceled the provision task, and the HQE was still rescheduled. * Ran a job on a host with a non-matching cros-version:* label, and modified the cros-version provisioning test to throw an exception. The provision special task aborted the HQE with the desired semantics (see comments in the ProvisionTask class in monitor_db), and scheduled a repair to run after its failure. The provision failures were all deduped against each other when bug filing was enabled. See https://code.google.com/p/autotest-bug-filing-test/issues/detail?id=1678 * Successfully debugged an autoupdate/devserver issue from provision logs, thus proving that sufficient information is collected for debug. Change-Id: I96dbfc7b001b90e7dc09e1196c0901adf35ba4d8 Reviewed-on: https://gerrit.chromium.org/gerrit/58385 Reviewed-by: Prashanth Balasubramanian <beeps@chromium.org> Tested-by: Alex Miller <milleral@chromium.org> Commit-Queue: Prashanth Balasubramanian <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
5e36cccd5f776579b004cc9ad19953a61ba4fdfe |
|
04-Aug-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Remove pointless if statement. `for x in []: pass` is a no-op anyway, so there's no point in checking to see if there's elements in the list. No scheduler deploy, as this is just a code beauty change. TEST=scheduler runs BUG=None Change-Id: Ia895da329b8529c7c6b64ccd10964be25a000ccd Reviewed-on: https://gerrit.chromium.org/gerrit/64445 Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Prashanth Balasubramanian <beeps@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
47715eb70a3c50007ec6d43dd949bd5d331b0cc0 |
|
24-Jul-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Correct updated scheduler docstring. Dear god is `task` overloaded with meanings here. It's great that everything has a task field, but with different types. BUG=None TEST=logged types, and raged at code Change-Id: I40136c4f68e5ab6d8b4e3fb1c525cefc5542c8bf Reviewed-on: https://gerrit.chromium.org/gerrit/63185 Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
dc608d5b5e2a33e4a9b654623f1f2db57f19f10c |
|
30-Jul-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Reset shouldn't unconditionally put host into Ready. The current transitions go Ready -> Resetting -> Ready -> Pending whereas this should look like Ready -> Resetting -> Pending The extra Ready doesn't really matter, since we immediately change it later, but it'd be nice to remove it for sanity reasons. BUG=chromium:266051 DEPLOY=scheduler STATUS=verified TEST=There is no longer an extra Ready. Change-Id: Ic9925ac6016bce750e6638e777930c6e8944d54a Reviewed-on: https://gerrit.chromium.org/gerrit/63770 Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
f3f1945adebff65fe81522c7dda1183a5527e2b9 |
|
30-Jul-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Requeueing a HQE should abort all queued special tasks. If one schedules multiple pre-job special tasks, and one of them fails, the scheduler still thinks it should try to run the rest of them. We need to make sure to kill off any remaining pre-job tasks if one of them fails and we requeue the HQE. They will get re-created when the HQE is assigned another host again. BUG=chromium:249437 DEPLOY=scheduler TEST=A reset and provision task were scheduled. The reset task failed, and the provision task was marked as complete. This is in stark opposition to the previous behavior, which was a scheduler crash. Change-Id: I8261751f63b34892ecbad68f117ed9b85e41c1ed Reviewed-on: https://gerrit.chromium.org/gerrit/63653 Commit-Queue: Alex Miller <milleral@chromium.org> Reviewed-by: Alex Miller <milleral@chromium.org> Tested-by: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
d51d886ed9a03d36cf3607f8d8f7252820472b3c |
|
24-Jul-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Correct an autotest scheduler comment. Misleading comments are misleading. TEST=None BUG=None Change-Id: I7f145a0499a1e812adc6090f797a66aac8b65380 Reviewed-on: https://gerrit.chromium.org/gerrit/63119 Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Prashanth Balasubramanian <beeps@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
ba076c56dbc28a9ccc602c643f2a6653191b0c45 |
|
11-Jul-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Reset needs to abide by _should_pending(). Currently, if Reset succeeds, it immediately forces the HQE into Pending. This breaks the ability to have multiple pre-job special tasks. BUG=None DEPLOY=scheduler TEST=Provision now runs Change-Id: If59435bb0d3882fc643d2add3f268c49b3ce5224 Reviewed-on: https://gerrit.chromium.org/gerrit/61607 Reviewed-by: Dan Shi <dshi@chromium.org> Tested-by: Dan Shi <dshi@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
07e09aff0baf871b33e5479e337e5e3e0523b729 |
|
12-Apr-2013 |
Dan Shi <dshi@chromium.org> |
[Autotest] merge cleanup and verify The objective of this CL is to merge cleanup and verify into a single job to reduce run time of each test. In existing design, by default, a cleanup job is scheduled after a test is finished, and a verify job is scheduled before a test is started. By merging these two jobs together, we are seeing the total run time of these two jobs is reduced from about 47s to 37s, around 10s saving. That does not include the saving on scheduler to schedule two jobs, which may take another 5-10s. The design is to create a new special task, reset, which runs at the beginning of a job by default. Verify task is changed to not to run by default before a job starts. Cleanup job will only be run if a job is scheduled to reboot and any test failed in that job. BUG=chromium:220679 TEST=tested with run_suite in local machine DEPLOY=afe,apache,scheduler,change all users' preference on reboot_after to Never, sql: |update chromeos_autotest_db.afe_users set reboot_after=0| Change-Id: Ia38baf6b73897b7e09fdf635eadedc752b5eba2f Reviewed-on: https://gerrit.chromium.org/gerrit/48685 Commit-Queue: Dan Shi <dshi@chromium.org> Reviewed-by: Dan Shi <dshi@chromium.org> Tested-by: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
23676a2b7f0b5c9831cd176697d7f4107e4b9009 |
|
03-Jul-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Failing a queue entry fails associated special tasks. If we end up in a weird situation where we have multiple special tasks scheduled, and more than one of them will fail and call _fail_queue_entry when they do, then we hit a weird situation where we try to fail a queue entry multiple times. Instead, when we fail a queue entry, we should also fail any leftover special tasks, as running any special task against a host that we just put into repair failed makes no sense at all. BUG=chromium:256431 DEPLOY=scheduler TEST=Reproduced bug happening in prod, and verified this fixes it Change-Id: I2b0e0d6149b6e75b7d30e403a3452ea4ad11d992 Reviewed-on: https://gerrit.chromium.org/gerrit/60871 Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Dan Shi <dshi@chromium.org> Reviewed-by: Prashanth Balasubramanian <beeps@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
cb6f1e2209ae6ea09c78317b114db6e51f6e255b |
|
29-Jun-2013 |
beeps <beeps@chromium.org> |
[autotest] Teach autoserv to check a hosts job_repo_url. Autosev now understands a verify_job_repo_url flag to mean 'execute a control segment to verify said hosts contain a valid job_repo_url'. It does this by re-staging autotest packages at the job_repo_url; If the devserver embedded in the job_repo_url is unresponsive we get another devserver, stage autotest on it, and update the job_repo_url. If the job_repo_url is None we leave it, on the assumption that the host will go through a re-image before it runs any tests, which will give it a new job_repo_url. Pros: 1. If we are unable to stage autotest packages at job_repo_url the test will fail, soon and fast. 2. Since we perform these actions in the server_job itself we won't see package installation exceptions later on, which are just misleading today since we end up rsyncing client autotest anyway. Cons: 1. The re-image job will actually try staging the contents of an autotest build it will surely not need. Something along these lines is unavoidable in most clean solutions, though. TEST= 1. Ran smoke suite, deleted autotest packages on devserver just before security_Minijail, made sure it passed. Also deleted the entire build directory. 2. Ran a suite with a bad devserver in the job_repo_url and confirmed we resolve to a good one, restage autotest and reset the job_repo_url. 3. Ran a suite after recovery (job_repo_url=None). 4. Invoked autoserv with a list of machines. 5. Checked that cleanup, verify, repair don't get the verify_job_repo_url flag, and hence, don't re-stage. Cleanup still passes because it rsyncs client autotest in the invalid job_repo_url case. 6. Server job fails before running tests if devservers are down or we're unable to change job_repo_url. 7. BVTs pass BUG=chromium:255114 Change-Id: I3c5f445962707a0a089f9d755aed4c4d0fdbd8f2 Reviewed-on: https://gerrit.chromium.org/gerrit/60578 Reviewed-by: Alex Miller <milleral@chromium.org> Tested-by: Prashanth Balasubramanian <beeps@chromium.org> Commit-Queue: Prashanth Balasubramanian <beeps@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
42437f9ebf13ea8fe4b10ff165c6747105a0f278 |
|
28-May-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Allow for multiple pre-hqe special tasks. Previously, every special task would call |queue_entry.on_pending()| once it finished. This means that if you queued up multiple special tasks to run before a host queue entry starts running, only the first would actually run. Now, each special task checks to see if it is the last one before it makes the call to |queue_entry.on_pending()|, so that if we have multiple special tasks before a queue entry, all of them will get run. BUG=chromium:249437 DEPLOY=scheduler TEST=Manually created a job and host queue entry and two special tasks (one verify and one pre-job cleanup) and verified that both of them ran before the host queue entry started. Change-Id: Id00296192388ee256cf3071a572f8a019959c158 Reviewed-on: https://gerrit.chromium.org/gerrit/58381 Reviewed-by: Dan Shi <dshi@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> Reviewed-by: Scott Zawalski <scottz@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org> Tested-by: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
308e7367e576ea26b2723aa112c178cb0568b5a4 |
|
21-May-2013 |
Aviv Keshet <akeshet@chromium.org> |
[autotest] pull _autoserv_command_line functionality into utility module This CL pulls the logic used by monitor_db.py to create an autoserv command line from a job into its own function, in a separate new module -- autoserv_utils.py. This will allow for code reuse with test_that. monitor_db still contains a number of other functions which create autoserv command lines for other tasks like log collection. These have not yet been pulled out to shared utility functions, because those parts of the scheduler didn't seem to have any unit test coverage. A future CL may pull some of these out as well. BUG=chromium:236471 TEST=unit tests pass. Ran a smoke suite in local autotest. DEPLOY=scheduler Change-Id: I6317b70aa9fb7e9968739582b9379112baa4507b Reviewed-on: https://gerrit.chromium.org/gerrit/56136 Reviewed-by: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org> Commit-Queue: Aviv Keshet <akeshet@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
1d6c2a0c4cc8d4a7e5e455e5bcddf36446868e49 |
|
18-Apr-2013 |
Fang Deng <fdeng@chromium.org> |
[autotest] Time each step of scheduler tick and some database queries We are interested in seeing how much impact migrating to CloudSQL will have on the performance of the scheduler. In this CL, we are adding some timing stats to the following places: - each sub-step in tick() of the scheduler. The stats actually sit in site_* files. - some database queries in host_scheduler. - where job/host statuses get updated in scheduler_models. BUG=chromium:196392 DEPLOY=scheduler TEST=scheduler works and stat results are shown in graphite. Change-Id: If3a142a0c1135ead6b36b35f2781037be16a90aa Reviewed-on: https://gerrit.chromium.org/gerrit/47375 Tested-by: Fang Deng <fdeng@chromium.org> Reviewed-by: Alex Miller <milleral@chromium.org> Commit-Queue: Fang Deng <fdeng@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
72822020510c0bef3e242a00da492ce7a6ad55f1 |
|
12-Apr-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Do not write queue or .machines files. These files were causing us to not properly redirect to GS for job folders. However, these files don't really serve a purpose for us, so rather than fixing up and running gs_offloader to handle these files, it's better to just not write them in the first place. BUG=chromium:230838 DEPLOY=scheduler TEST=Ran a job, made sure queue and .machines files don't appear Change-Id: Ie1f0014b31f2ed274cad6ce03d98d7f6ce947f43 Reviewed-on: https://gerrit.chromium.org/gerrit/48182 Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Scott Zawalski <scottz@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
05d7b4cd023d4dcaee3c0744dc960f3e01ec6fbe |
|
04-Mar-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Add a tick stat to the scheduler. BUG=chromium-os:36682 TEST=ran scheduler, saw stat on chromeos-stats Change-Id: Ic084d23be424e3d8e7e4de32c1d6ffe1e3342728 Reviewed-on: https://gerrit.chromium.org/gerrit/45139 Reviewed-by: Scott Zawalski <scottz@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org> Tested-by: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
3664d07a735ce591d184855f9c18c9ee90d642d2 |
|
05-Mar-2013 |
Aviv Keshet <akeshet@chromium.org> |
[autotest] monitor.db passes in test retry cli flag to autoserv BUG=chromium-os:37158 TEST=Ran dummyflake suite, and client side flake tests were retried correctly when they failed. Server side flake test is not currently working when run in a suite, for reasons unrelated to this CL. Change-Id: If8134fe263bb33ee5d52bc92e78faee05388b239 Reviewed-on: https://gerrit.chromium.org/gerrit/44642 Reviewed-by: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org> Commit-Queue: Scott Zawalski <scottz@chromium.org> Reviewed-by: Scott Zawalski <scottz@chromium.org> Tested-by: Scott Zawalski <scottz@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
225bdfe2d2da112897f7ea8ba424099ff6f97e2a |
|
05-Mar-2013 |
Aviv Keshet <akeshet@chromium.org> |
[autotest] fix pylint complaints in monitor_db.py Removed some unused imports, fixed a logging call, and disabled missing docstring in the entire file. I have some pending changes in a separate CL to monitor_db.py, but the file is full of pylint violations and I'm tired of uploading things with --no-verify. Splitting off the pylint warning fixes into this separate commit to make my future one cleaner. BUG=None TEST=repo upload works without pylint warnings. scheduler starts and behaves normally. Change-Id: Id48e30782296cd6d1beb6c40ef909fd32a1cead6 Reviewed-on: https://gerrit.chromium.org/gerrit/44641 Commit-Queue: Aviv Keshet <akeshet@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
05c08ac8fa55dae7b7662070f4fa6943d0014beb |
|
11-Jan-2013 |
Aviv Keshet <akeshet@chromium.org> |
[autotest] Typo fix in error message. Fix a small error message typo, and remove one unnecessary temporary variable. TEST=None BUG=None Change-Id: I7d31790217b55b675e0fc76ab4576dacfb84202a Reviewed-on: https://gerrit.chromium.org/gerrit/41085 Commit-Queue: Aviv Keshet <akeshet@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
def92873af1712d6ac6bb062ae82793f15422fd2 |
|
20-Sep-2012 |
Simran Basi <sbasi@chromium.org> |
Autotest: Limit Extra Debugging Messages We added a lot of extra debugging messages in order to determine the amount of time certain operations were taking; however, this has caused our log files to grow at about 1 gig a day. I have changed the most occuring messages to only log if a flag in global_config.ini has been set. This way we can turn on/off this extra level of debugging. BUG=None TEST=local system Change-Id: Ib866a016a5f397aa293b4904e485ac603a1c8f51 Reviewed-on: https://gerrit.chromium.org/gerrit/33713 Reviewed-by: Chris Sosa <sosa@chromium.org> Commit-Ready: Simran Basi <sbasi@chromium.org> Tested-by: Simran Basi <sbasi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
3f6717d84e3c245ec4d5831d2a8f3d018b68519a |
|
14-Sep-2012 |
Simran Basi <sbasi@chromium.org> |
Autotest: Add more logging. Went through the logs and added more logging to mostly _schedule_new_jobs() and _handle_agents(). _handle_agents is eating up the most time with sometimes a 5-7 second jump on status updates in scheduler_models. Also changed the tick debug message to say 'Calling' instead of 'Starting' to make it easier to grep since there is a 'Starting' state. BUG=chromium-os:34416 TEST=Ran on my system and completed a job to make sure the scheduler processed it correctly. Change-Id: Ic8f21dd8a8f237b5c5cfbd4e101b9150cdb2a818 Reviewed-on: https://gerrit.chromium.org/gerrit/33234 Commit-Ready: Simran Basi <sbasi@chromium.org> Reviewed-by: Simran Basi <sbasi@chromium.org> Tested-by: Simran Basi <sbasi@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
0ec94dd0bb1b11b8aec292fbdc1135244db570d7 |
|
28-Aug-2012 |
Simran Basi <sbasi@chromium.org> |
Autotest: Add more logging to the tick. Added a line of logging before each major step of the tick so that we can better determine where the tick time is being used. I put this in monitor_db.py instead of site_monitor_db.py since it does not the proper scope to view the global variable _drone_manager. BUG=chromium-os:33890 TEST=Made sure monitor_db ran locally and my new messages were being printed. Change-Id: I2c8e08bab835c4ee1a92c0a6502242048b13293a Reviewed-on: https://gerrit.chromium.org/gerrit/31601 Commit-Ready: Simran Basi <sbasi@google.com> Reviewed-by: Simran Basi <sbasi@google.com> Tested-by: Simran Basi <sbasi@google.com>
/external/autotest/scheduler/monitor_db.py
|
a858a233889949263ded6d0d6578495aba54a9eb |
|
21-Aug-2012 |
Simran Basi <sbasi@chromium.org> |
Autotest: Fix Reverification Code so we don't get stuck in a loop. Currently if the scheduler dies in a bad state it gets stuck in a loop as it tries to create cleanup jobs on machines that were left in Repairing, Verifying, Cleanup. However the call to create the Cleanup Task is not implemented correctly and will cause an error as it tries to modify the database and create the job. In order to fix this I fill in the missing argument (requested_by) by a User object which I default with an id of 1 (we can change this in review, but it must be a proper id from afe_users). This seems to fix the issue and we should recover properly after dying poorly from now on. BUG=chromium-os:33150 TEST=On my own setup, I set a host to 'Repairing' status verified that the bug occured as monitor_db starts up. Then I added my fix and then started monitor_db where it properly created the cleanup job and recovered properly. Tested autotest_system id lookup on the test server 'chromeos-autotest.cbf'. Change-Id: I4a555002e8bfc69ffd08d99261c7a28a4ebf5fbf Reviewed-on: https://gerrit.chromium.org/gerrit/31008 Reviewed-by: Scott Zawalski <scottz@chromium.org> Commit-Ready: Simran Basi <sbasi@google.com> Reviewed-by: Simran Basi <sbasi@google.com> Tested-by: Simran Basi <sbasi@google.com>
/external/autotest/scheduler/monitor_db.py
|
52ce11d6291bbbd1bde435a62afcaf364db1b502 |
|
02-Aug-2012 |
Yu-Ju Hong <yjhong@google.com> |
Autotest: Make archiving step configurable and disable it by default. This change makes the archiving step configurable in global_config.ini. The variable "enable_archiving" is disabled by default. The Autotest scheduler performs the archive step after parsing. This step spawns an autoserv process, runs site_archive_results, and executes rsync to copy .archive.log back to cautotest. We do not need this step since all our test results are rsync'd back after running the tests. BUG=chromium-os:33061 TEST=run tests with local autotest setup Change-Id: I1f2aac8f92ebd2a4d10c4bd85be2d111063ad251 Reviewed-on: https://gerrit.chromium.org/gerrit/29056 Commit-Ready: Yu-Ju Hong <yjhong@chromium.org> Reviewed-by: Yu-Ju Hong <yjhong@chromium.org> Tested-by: Yu-Ju Hong <yjhong@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
456d3c115952bf1ae984770e226c5a50676b31c0 |
|
19-Jul-2011 |
Dale Curtis <dalecurtis@chromium.org> |
Upstream Autotest merge. Merged from d9d64b855363d214996b187380532d4cc9991d29. BUG=none TEST=emerge autotest-tests, local server, run_remote_tests. Change-Id: Id8cf1ef930bc0cd80347d77f2de65561be2a12a4 Reviewed-on: http://gerrit.chromium.org/gerrit/4664 Reviewed-by: Mike Truty <truty@chromium.org> Reviewed-by: Scott Zawalski <scottz@chromium.org> Tested-by: Dale Curtis <dalecurtis@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
30cb8ebea60afefcb0da10ecf4566cbc5b92d846 |
|
09-Jun-2011 |
Dale Curtis <dalecurtis@chromium.org> |
Fix --image support so it only runs on normal jobs. Right now --image is applied to special tasks like Cleanup and Verify, when it should only apply to base job tasks. The best short term solution is to move the command line building to QueueTask. Longer term we should investigate moving all --image code into site extensions. BUG=none TEST=Running in production. Change-Id: I983ae1a3e42f3a2cc3d285f763114c7fcb572a56 Reviewed-on: http://gerrit.chromium.org/gerrit/2389 Reviewed-by: Paul Pendlebury <pauldean@chromium.org> Reviewed-by: Scott Zawalski <scottz@chromium.org> Reviewed-by: Dale Curtis <dalecurtis@chromium.org> Tested-by: Dale Curtis <dalecurtis@chromium.org>
/external/autotest/scheduler/monitor_db.py
|
aa5133608fb8ea153fb396f332121b617869dcb7 |
|
02-Mar-2011 |
Dale Curtis <dalecurtis@chromium.org> |
Host scheduler refactoring. Move HostScheduler out of monitor_db. In order to facilitate site extensibility of HostScheduler we need to factor out the dependence on global variables in monitor_db. I modeled this refactoring off of monitor_db_cleanup. The main changes I've made are as follows: 1. Move BaseHostScheduler, site import, and SchedulerError out of monitor_db. SchedulerError must be moved to prevent a cyclical dependency. 2. Convert staticmethod/classmethods in BaseHostScheduler, to normal methods. 3. Fix unit tests and monitor_db to import SchedulerError from host_scheduler. Change-Id: I0c10b79e70064b73121bbb347bb71ba15e0353d1 BUG=chromium-os:12654 TEST=Ran unit tests. Tested with private Autotest instance. Review URL: http://codereview.chromium.org/6597047
/external/autotest/scheduler/monitor_db.py
|
a82dc35ef721720e73db887d895bbd5cb835291c |
|
23-Feb-2011 |
Eric Li <ericli@chromium.org> |
Merge remote branch 'autotest-upstream/master' into try-box1 Merged from rev @5236 ~ @5269. This should complete the merge performed last week, with all the missed make patch file checked in. BUG=none TEST= 1. Manually update autotest-tests-9999.ebuild to include all new test cases under client/tests. 2. emerge-x86-generic with autotest-tests package. 3. emerge-arm-generic with autotest-tests package. 4. run_remote_tests bvt with emerged autotest against a chromeos netbook. Change-Id: Ia8dd98af5472f38e723fa364d310dd40b06b6d58 KVM test: Remove last references to env variables on unattended setup In the conversion from stand alone script to KVM autotest infrastructure, we missed to convert some places inside the code that looked for environment variables. Fix it so providing windows CD keys gets the keys written on the answer file again. Signed-off-by: Lucas Meneghel Rodrigues <lmr@redhat.com> git-svn-id: svn://test.kernel.org/autotest/trunk@5269 592f7852-d20e-0410-864c-8624ca9c26a4 KVM test: Fixing migration_control.srv to use new config API Signed-off-by: Lucas Meneghel Rodrigues <lmr@redhat.com> git-svn-id: svn://test.kernel.org/autotest/trunk@5268 592f7852-d20e-0410-864c-8624ca9c26a4 KVM test: Move enumerate test dicts code to kvm_utils.run_tests() Signed-off-by: Lucas Meneghel Rodrigues <lmr@redhat.com> git-svn-id: svn://test.kernel.org/autotest/trunk@5267 592f7852-d20e-0410-864c-8624ca9c26a4 KVM test: encapsulate unittests build and move it to parent class In installer.py, we realized that we could benefit from having the unittests build in pretty much all build types, after all, if the userspace is sufficiently new we can run the unittests, regardless of the way the binaries were build. Encapsulate the unittests build and install to a method and move that method to the base installer, making all install methods benefit from it. Just take extra care to certify the unittests are properly linked. Signed-off-by: Lucas Meneghel Rodrigues <lmr@redhat.com> git-svn-id: svn://test.kernel.org/autotest/trunk@5266 592f7852-d20e-0410-864c-8624ca9c26a4 Review URL: http://codereview.chromium.org/6551020
/external/autotest/scheduler/monitor_db.py
|
861b2d54aec24228cdb3895dbc40062cb40cb2ad |
|
04-Feb-2011 |
Eric Li <ericli@chromium.org> |
Merge remote branch 'cros/upstream' into master Merged to upstream autotest @4749~@5215. The entire change list description is too big to enlist here. Please refer to upstream (http://autotest.kernel.org/browser) for more details. BUG= TEST=emerged both x86 and arm build. Tested emerged x86 build bvt against a chromeos device. Review URL: http://codereview.chromium.org/6246035 Change-Id: I8455f2135c87c321c6efc232e2869dc8f675395e
/external/autotest/scheduler/monitor_db.py
|
5a8c6ad7e67d86090ed6c4ef216b82e45c0f9283 |
|
01-Feb-2011 |
Paul Pendlebury <pauldean@chromium.org> |
Add support for an --image flag to atest. This enables creating jobs from the CLI that use autoserv to install a new OS image before running the requested test. This was not added to the database like the reboot before/reboot after flags to maintain backward compatibility and avoid any changes to the database schema. Also it is not working as a pure parameterized job as the current implementation is not 100% complete and would require more work to finish. And since mixed jobs are not allowed it would also mean moving existing control file jobs to parameterized jobs. So the implementation of adding a parameterized id to control jobs and using a known test to hold the OS image path is the most straight forward of the options. Change-Id: I77cdda0c50c222a4c594da2626a71fa55f5957cb BUG=chromium-os:11486 TEST=Manual testing using atest cli to create jobs with --image parameters and verifying the value is passed to autoserv. Review URL: http://codereview.chromium.org/6181003
/external/autotest/scheduler/monitor_db.py
|
e0493a4af57c1a73376a7bafaed542c01f588196 |
|
15-Nov-2010 |
Eric Li <ericli@chromium.org> |
Merge remote branch 'cros/upstream' into tempbranch BUG= TEST= Review URL: http://codereview.chromium.org/4823005 Change-Id: I5d56f1c10d0fce7f9d7dc3ad727ea52dcb9b2d6c
/external/autotest/scheduler/monitor_db.py
|
6f27d4f22a1ba5063968b8c322fa0845f3279ade |
|
29-Sep-2010 |
Eric Li <ericli@chromium.org> |
Merge remote branch 'cros/upstream' into tempbranch3 Merge to trunk@4817 BUG= TEST= Review URL: http://codereview.chromium.org/3554003 Change-Id: I83376bc7d28104ec2678e157eadbe7df7c05c0e0
/external/autotest/scheduler/monitor_db.py
|
517d95a1ef4edb04da427763f86068a447d45ec7 |
|
29-Sep-2010 |
Benson Leung <bleung@chromium.org> |
Revert "Merge remote branch 'cros/upstream' into tempbranch2" This reverts commit 25fc6d1f28e54c46689f12d3b93c2540ef45323a. TBR=ericli@chromium.org Review URL: http://codereview.chromium.org/3541002 Change-Id: Ib0165b19bfdf02264f8a6a74ddf3ae74c8c0f7df
/external/autotest/scheduler/monitor_db.py
|
25fc6d1f28e54c46689f12d3b93c2540ef45323a |
|
29-Sep-2010 |
Eric Li <ericli@chromium.org> |
Merge remote branch 'cros/upstream' into tempbranch2 Merged to trunk@4816. BUG= TEST=we will build a new autotest server instance, and keep cautotest running and then later do a cname switch. Review URL: http://codereview.chromium.org/3511003 Change-Id: Iee5f52f45f28f84927d6c6f9a74edc370d40288a
/external/autotest/scheduler/monitor_db.py
|
c7d387e5205ba33a54f800dc29772f346656530d |
|
10-Aug-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Adds a diagnostic message. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4747 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
17cadd637acd6967e59b2f40cdde62d97370a83b |
|
17-Jun-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix a logging message Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4623 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
b8f3f354dc07aea89a5301522c8a79d394cba79e |
|
10-Jun-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Set host status to RUNNING on QueueTask abort, since queue entry will be in GATHERING state. Also modify a logging string to be more precise. Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4591 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
e7c65cbace24181c9bd364569de7e05742b8a162 |
|
08-Jun-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Don't try stopping the job on HQE abort, and have the dispatcher stop all necessary jobs in bulk. This avoids a scheduler crash on an assertion. Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4585 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
42318f7a1272f07dbb482c2ae174790038ec7c3b |
|
11-May-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Scheduler VerifyTask should only delete queued manual reverify tasks, not all queued verify tasks. Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4487 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
dd77e01701cef9e97f586294565f1fed41d0e7f8 |
|
28-Apr-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix an error in drone sets in monitor_db. Also added more unit tests. Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4449 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
256635661a8eebfab27500c59a541fae320c380b |
|
27-Apr-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix process counting for SelfThrottledPostJobTask. Would previously lose a slot for a process permanently if the paired results were lost before the process started. Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4447 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
76fcf19ec42d5c7580d2e7891e4610e5fe725286 |
|
21-Apr-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add ability to associate drone sets with jobs. This restricts a job to running on a specified set of drones. Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4439 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
b7c5d278f4388958e28a0c8244a7148d032e7c6e |
|
16-Apr-2010 |
lmr <lmr@592f7852-d20e-0410-864c-8624ca9c26a4> |
monitor_db.py: Fix SyntaxWarning I've noticed that monitor_db.py issues a SyntaxWarning as soon as it is started: 19:56:55 INFO | Killing monitor_db 19:56:55 INFO | STARTING monitor_db with log file /usr/local/autotest/logs/scheduler.log.2010-04-15-19.56.55 /usr/local/autotest/scheduler/monitor_db.py:1779: SyntaxWarning: assertion is always true, perhaps remove parentheses? assert (self.TASK_TYPE is not None, I noticed that the whole statement fits under 80 chars, so the parenthesis can be removed safely, getting rid of the warning. Signed-off-by: Lucas Meneghel Rodrigues <lmr@redhat.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4421 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
47bd737d76b61b40f4f321a1e88919caf74dacc3 |
|
13-Mar-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Set hostless queue entries to STARTING upon scheduling the agent. This fixes an issue where the scheduler created multiple HostlessQueueTask objects for a single hostless queue entry, causing several autoserv processes to be launched when the agents are run. Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4304 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
e0cbc912c81acfb8844510267b2cefd64d9ae478 |
|
11-Mar-2010 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add support to autoserv for a --control-filename parameter, to allow users to control where in the results directory autoserv will store the server control file. This also changes the archving stage in the scheduler to make use of this argument, so that the control file from the archving stage is written to control.archive and does not overwrite the control.srv from the job itself. Signed-off-by: John Admanski <jadmanski@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4294 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
dd855244f44b65d0508345c6fef74846652c8c26 |
|
02-Mar-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Abstract out common models used in the frontend's models.py so that django is not required to interact with non Django portions of the code. This includes the enums RebootBefore, RebootAfter and Test.Type git-svn-id: http://test.kernel.org/svn/autotest/trunk@4280 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
e21bf415db76b009956d7e9b8877a2b898cdf2fd |
|
26-Feb-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Minor fix to new metahost handlers code in scheduler to ensure handlers get a tick every cycle, even if there are no queued metahost jobs. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4274 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
c44ae99354228290914326d42ef1e743b5b7e4b8 |
|
19-Feb-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Refactor scheduler models into a separate module, scheduler_models. This module doesn't depend on monitor_db, only the other way around. The separation and isolation of dependencies should help us organize the scheduler code a bit better. This was made possible largely by the many changes we made late last year to improve statelessness of the scheduler. It was motivated here by my work on pluggable metahost handlers, which will need to depend on scheduler models. Without this separation, we'd end up with circular dependencies. Also includes some fixes for metahost schedulers. Signed-off-by: Steve Howard <showard@google.com> Property changes on: scheduler/scheduler_models.py git-svn-id: http://test.kernel.org/svn/autotest/trunk@4252 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
883492a628bfe5a24bd281cfcac036d77a2acc4e |
|
12-Feb-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
First iteration of pluggable metahost handlers. This change adds the basic framework and moves the default, label-based metahost assignment code into a handler. It includes some refactorings to the basic scheduling code to make things a bit cleaner. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4232 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
cc92936b577707a0c1314d1140b66518d2b7feef |
|
25-Jan-2010 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Basic support for "summary results" -- articifial test results that are explicitly recorded by a server-side control file or code that it calls. This CL just adds the record_summary() method to the server_job object. It lacks any special parser support or TKO DB changes, those will come later. This also includes a couple of minor changes to support conitnuous parsing and final reparsing for hostless jobs. Since hostless jobs are a common intended use case for summary results, they'll need full parsing support to be useful. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4161 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
7e67b433965702c0ffd8205ac08f5e801d9f98a6 |
|
20-Jan-2010 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
New code for performing explicit joins with custom join conditions. * added ExtendedManager.join_custom_field(), which uses the introspection magic from populate_relationships (now factored out) to infer the type of relationship between two models and construct the correct join. join_custom_field() presents a much simpler, more Django-y interface for doing this sort of thing -- compare with add_join() above it. * changed TKO custom fields code to use join_custom_field() * added some cases to AFE rpc_interface_unittest to ensure populate_relationships() usage didn't break * simplified _CustomQuery and got rid of _CustomSqlQ. _CustomQuery can do the work itself and its cleaner this way. * added add_where(), an alternative to extra(where=...) that fits more into Django's normal representation of WHERE clauses, and therefore supports & and | operators later Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4155 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
4076c63186d725c67272bbac53960809e8ccad1c |
|
15-Jan-2010 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
In scheduler check for existence of results before trying to write the .archiver_failed file. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4130 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
c1a98d1e146080bd3e4f034cb13d740dfb1535f4 |
|
15-Jan-2010 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Support for job keyvals * can be passed as an argument to create_job, stored in AFE DB * scheduler reads them from the AFE DB and writes them to the job-level keyval file before the job starts * parser reads them from the keyval file and writes them to the TKO DB in a new table Since the field name "key" happens to be a MySQL keyword, I went ahead and made db.py support proper quoting of field names. Evetually it'd be really nice to deprecate db.py and use Django models exclusively, but that is a far-off dream. Still lacking support in the AFE and TKO web clients and CLIs, at least the TKO part will be coming soon Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4123 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
e1575b54ca176a0e4e93456a0938c356a33f4bb8 |
|
15-Jan-2010 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
When the archiver fails for any reason, write a .archiver_failed file to the results dir. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4119 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
948eb30f2125169817d89c6bd6363a0f787cc1c2 |
|
15-Jan-2010 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Construct an absolute path to the archiving control file when running the Archiving stage. Using a relative path was just silly and lazy and prone to breakage. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4115 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
64a9595406f2884fb3ece241190b10aa054439a9 |
|
13-Jan-2010 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
When using Django models from a script, make the current user default to an actual database user named "autotest_system". This allows for simpler, more consistent code. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4114 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
4608b005f15444d2ec4601b8274828ad52b5ea51 |
|
05-Jan-2010 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add a new Archiving stage to the scheduler, which runs after Parsing. This stage is responsible for copying results to the results server in a drone setup, a task currently performed directly by the scheduler, and allows for site-specific archiving functionality, replacing the site_parse functionality. It does this by running autoserv with a special control file (scheduler/archive_results.control.srv), which loads and runs code from the new scheduler.archive_results module. The implementation was mostly straightfoward, as the archiving stage is fully analogous to the parser stage. I did make a couple of refactorings: * factored out the parser throttling code into a common superclass that the ArchiveResultsTask could share * added some generic flags to Autoserv to duplicate special-case functionality we'd added for the --collect-crashinfo option -- namely, specifying a different pidfile name and specifying that autoserv should allow (and even expect) an existing results directory. in the future, i think it'd be more elegant to make crashinfo collection run using a special control file (as archiving works), rather than a hard-coded command-line option. * moved call to server_job.init_parser() out of the constructor, since this was an easy source of exceptions that wouldn't get logged. Note I believe some of the functional test changes slipped into my previous change there, which is why that looks smaller than you'd expect. Signed-off-by: Steve Howard <showard@google.com> ==== (deleted) //depot/google_vendor_src_branch/autotest/tko/site_parse.py ==== git-svn-id: http://test.kernel.org/svn/autotest/trunk@4070 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
eab66ce582bfe05076ff096c3a044d8f0497bbca |
|
23-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Rename the tables in the databases, by prefixing the app name. This is in preparation for merging the two databases and the two Django projects into one. Note that this renames *all* standard Autotest DB tables in both the autotest_web and tko databases. If you have scripts written directly against these databases, *they will break*. If your scripts access the RPC interfaces, they should continue to work. Another patch will be along within the next few weeks to actually move the TKO tables into the autotest_web database. From: James Ren <jamesren@google.com> Signed-off-by: Steve Howard <showard@google.com> Rename the tables in the databases, by prefixing the app name. This is in preparation for merging the two databases and the two Django projects into one. Note that this renames *all* standard Autotest DB tables in both the autotest_web and tko databases. If you have scripts written directly against these databases, *they will break*. If your scripts access the RPC interfaces, they should continue to work. From: James Ren <jamesren@google.com> Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4040 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
402934a6e974ab70f226d3a9c996c1df47a21017 |
|
21-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Clear the Django connection query log after each tick. This was a major memory leak. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4038 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
f13a9e2b856ae9e4e2f43ef6cbc6083c7435167b |
|
18-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add periodic CPython garbage collector statistics logging to aid in tracking down a memory leak and as a general health beacon for the long running process. The interval at which stats are logged is configurable. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4021 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
493beaab73a8a87a3ce8d7f47b7ce92417b04fbd |
|
18-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
fix a bug with pre-job keyvals, introduced in recent refactorings, and added new test to check it Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4020 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
a9545c0ab3d8f3e36efadaefdcf37393708666d9 |
|
18-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
backend support for hostless jobs * support in rpc_interface.create_job() and models for creating a hostless job -- a job with one queue entry with no host, meta_host or atomic_group * support in scheduler for recognizing and executing such a job. the bulk of the work was in extracting an AbstractQueueTask class from QueueTask, containing all the logic not pertaining to hosts. I then added a simple HostlessQueueTask class also inheriting from it. Also got rid of HostQueueEntry.get_host() and added an extra log line when AgentTasks finish (used to be for QueueTasks only). Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4018 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
2ca64c940277d6ee38a084dc71fa8d3003aedddf |
|
10-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
* add a couple simple test cases to the scheduler functional test for metahosts * augment one of the logging lines in the scheduler Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4009 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
d11956572cb7a5c8e9c588c9a6b4a0892de00384 |
|
08-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make drone_manager track running processes counts using only the information passed in from the scheduler. Currently it also uses process counts derived from "ps", but that is an unreliable source of information. This improves accuracy and consistency and gives us full control over the process. This involves a few primary changes: * made the drone_manager track process counts with each PidfileId * added method declare_process_count() for the scheduler to indicate the process count of a pidfile ID during recovery (in other cases, the DroneManager gets that info in execute_process()) Doing this involved some extensive refactorings. Because the scheduler now needs to declare process counts during recovery, and because the AgentTasks are the entities that know about process counts, it made sense to move the bulk of the recovery process to the AgentTasks. Changes for this include: * converted a bunch of AgentTask instance variables to abstract methods, and added overriding implementations in subclasses as necessary * added methods register_necessary_pidfiles() and recover() to AgentTasks, allowing them to perform recovery for themselves. got rid of the recover_run_monitor() argument to AgentTasks as a result. * changed recovery code to delegate most of the work to the AgentTasks. The flow now looks like this: create all AgentTasks, call them to register pidfiles, call DroneManager to refresh pidfile contents, call AgentTasks to recover themselves, perform extra cleanup and error checking. This simplified the Dispatcher somewhat, in my opinion, though there's room for more simplification. Other changes include: * removed DroneManager.get_process_for(), which was unused, as well as related code (include the DroneManager._processes structure) * moved logic from HostQueueEntry.handle_host_failure to SpecialAgentTask._fail_queue_entry. That was the only call site. And some other bug fixes: * eliminated some extra state from QueueTask * fixed models.HostQueueEntry.execution_path(). It was returning the wrong value, but it was never used. * eliminated some big chunks from monitor_db_unittest. These broke from the refactorings described above and I deemed it not worthwhile to fix them up for the new code. I checked and the total coverage was unaffected by deleting these chunks. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4007 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
b21b8c8f58fc30ecf9b57160fad1017fb910de17 |
|
07-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix handling of database reconnects in the scheduler by enhancing the "django" database_connection backend and having the scheduler use it. This eliminates the duplicate connection that the scheduler was setting up -- now it uses only a single connection (the Django one). Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4000 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
d07a5f3bd8edb843da7f1568bd7be06c32761e11 |
|
07-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
The check for enough pending hosts after the delay to wait for others to become ready before moving from Pending -> Starting on an atomic group job was checking against the wrong value and requiring too many hosts. As a result some jobs never ran. Also, it was not aborting the job which left these HostQueueEntries and Hosts in limbo (until the job timeout would eventually hit a couple days later). Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3998 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
418785bf16a0cb72a5fe5519e8693d7546cd427d |
|
23-Nov-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Some improvements to process tracking in the scheduler. * have all AgentTasks declare how many processes they'll create (as an instance attribute). this is really where the information belongs. * have Agent read its num_processes from its AgentTask, rather than requiring clients to pass it into the constructor. * have AgentTasks pass this num_processes value into the DroneManager when executing commands, and have the DroneManager use this value rather than the hack of parsing it out of the command line. this required various changed to the DroneManager code which actually fix some small bugs and make the code cleaner in my opinion. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3971 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
9bb960b90d5102cce1c8a15314900035c6c4e69a |
|
19-Nov-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Support restricting access to drones by user. Administrators can put lines like <hostname>_users: showard,scottz in the global config, where <hostname> is a drone hostname. That drone will then be limited to use by those users (that is, by jobs launched by those users, and tasks launched due to those jobs). This required numerous changes: * added a requested_by field to SpecialTask (with corresponding migration). For tasks with queue_entries, we can infer this from the job, but for those without, we need this information explicitly declared. Note this can be null if the task was created by the system, not in response to any user action. The only place this occurs now is in scheduler recovery (Dispatcher._recover_hosts_where()), but there may be an upcoming feature to periodically reverify hosts, which would be another (much more common) case. * modified all SpecialTask creation sites to pass requested_by if necessary. * modified AgentTask to keep a username attribute, and modified its run() method to pass that to PidfileRunMonitor.run(), which passes it along to DroneManager.execute_command(). * modified Agent to always keep self.task around, there's no reason to throw it away and now that we're looking at it from other classes, it's problematic if it disappears. * modified Dispatcher throttling code to pass the username when requesting max runnable processes. * added an allowed_users property to _AbstractDrone, and made DroneManager load it from the global config. * made DroneManager's max_runnable_processes() and _choose_drone_for_execution() methods accept the username and obey user restrictions. * added extensive tests for everything. the modiications required to monitor_db_unittest were annoying but not too bad. but parts of that file may need to be removed as they'll be obsoleted by monitor_db_functional_test and they'll become increasingly annoying to maintain. couple other related changes: * got rid of CleanupHostsMixin. it was only acutally needed by GatherLogsTasks (since we made the change to have GatherLogsTask always run), so I inlined it there and simplified code accordingly. * changed a bunch of places in the scheduler that were constructing new instances of Django models for existing rows. they would do something like "models.Host(id=<id of existing host>)". that's correct for scheduler DBModels, but not for Django models. For Django models, you only instantiate new instances when you want to create a new row. for fetching existing rows you always use a manager -- Model.objects.get() or Model.objects.filter() etc. this was an existing bug but wasn't exposed until I made some of the changes involved in this feature. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3961 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
e60e44ece1445d97977a77cb79f0896989b869d7 |
|
13-Nov-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Special tasks show "Failed" as their status instead of "Completed" if they failed Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3946 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
65db39368167dab1730703be3d347581527f70da |
|
28-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
* impose prioritization on SpecialTasks based on task type: Repair, then Cleanup, then Verify. remove prioritization of STs with queue entry over those without. this leads to more sane ordering of execution in certain unusual contexts -- the added functional test cases illustrate a few (in some cases, it's not just more sane, it eliminates bugs as well). * block STs from running on hosts with active HQEs, unless the ST is linked to the HQE. this is a good check in general but specifically prevents a bug where a requested reverify could run on a host in pending. there's a functional test case for that too. * block jobs from running on hosts with active agents, and let special tasks get scheduled before new jobs in each tick. this is necessary for some cases after removing the above-mentioned prioritization of STs with HQEs. otherwise, for example, a job could get scheduled before a previous post-job cleanup has run. (new test cases cover this as well.) Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3890 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
7b2d7cbcc28ea6a19554ecc3043b68103e7ab7e9 |
|
28-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
We never considered the handling of DO_NOT_VERIFY hosts in certain situations. This adds handling of those cases to the scheduler and adds tests to the scheduler functional test. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3885 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
37757f3c65961f81a3b0e37d45f1184a6e8e0e16 |
|
19-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Change "unrecovered active host queue entries" to be a more accurate "unrecovered verifying host queue entries" Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3867 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
8375ce0795fa95fcb4698790ed4db8827f190116 |
|
12-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix unindexable object error raised on the error path within _schedule_running_host_queue_entries. Also cleans up some docstrings. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3830 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
b89004580c267ec12da4f181c76cbc3ec902037d |
|
12-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
In scheduler recovery, allow Running HQEs with no process. The tick code already handles them fine (by re-executing Autoserv), but the recovery code was explicitly disallowing them. With this change, it turns out there's only one status that's not allowed to go unrecovered -- Verifying -- so I changed the code to reflect that and I made the failure conditions more accurate. Tested this change with extensions to the new functional test. We could never really effectively test recovery code with the unit tests, but it's pretty easy and very effective (I believe) with the new test. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3824 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
5682407ea977eef43e82bf85c12f58acb8ca82ac |
|
12-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Added more logging, and fixed logging in HostQueueEntry.set_status() Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3823 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
0db3d436419b563208ffb6528db7ec6f56a761f6 |
|
12-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Recheck queue entry status in Dispatcher._get_unassigned_entries() Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3821 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
d20148295ae80208334474587277580ecacaed92 |
|
12-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
When a delayed call task finishes waiting for extra hosts to enter Pending state on an atomic group job, re-confirm that the job still has enough Pending hosts to run. It could have been Aborted either manually or due to a timeout meaning it should no longer be run. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3820 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
dae680a5d4ff80f540aadfb6f3687a9bceaf473c |
|
12-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Ignore microsecond differences in datetimes when checking existing in memory rows against database rows. datetime objects store microseconds but the database datetime fields do not. not doing this leads to unnecessary warnings. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3819 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
e55955fd3b7c8ea41794aeab492f62247f86fc94 |
|
07-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Rewrite a conditional that was very confusing to me. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3805 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
f85a0b7b456fc60605f09cd16e95167feeba9c5a |
|
07-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Explicitly release pidfiles after we're done with them. This does it in a kind of lazy way, but it should work just fine. Also extended the new scheduler functional test with a few more cases and added a test to check pidfile release under these various cases. In the process, I changed how some of the code works to allow the tests to more cleanly express their intentions. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3804 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
ec6a3b9b60f9e7e6ff26c1c7547f557043b9d52f |
|
25-Sep-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make the pidfile timeout in the scheduler configurable. Raise the default from 5 minutes to 5 hours (the value we're using on our server due to 5 minutes being short enough to cause issues when under heavy load). Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3767 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
0c5c18d1c951aaee395914a9e702f9e994807e47 |
|
25-Sep-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Changed error message to be more useful Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3766 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
828fc4cc7c7375f8bcf7df48b5b4c2386749278c |
|
14-Sep-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make assertion in _choose_group_to_run non-fatal and log an error message with sufficient details to debug and decide if another action should be taken in the future. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3713 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
db502763a2ece3f2aea7b1badca20a6e0b9d3ed7 |
|
09-Sep-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Write host keyvals for all verify/cleanup/repair tasks. Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3677 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
775300bff7b1a7e064f9187f36c69424e8d715df |
|
09-Sep-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Cleanups on hosts marked DO_NOT_VERIFY should continue to run as if they had succeeded (instead of going to Repairing or Repair Failed). Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3676 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
8cc058f50a46976e0a446aa3054f7f2349d6291a |
|
08-Sep-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make scheduler more stateless. Agents are now scheduled only by the dispatcher, and agents no longer use in-memory state to remember multiple tasks. All state is managed by the database. Risk: high (large scheduler change) Visibility: medium (scheduler restarts are now more stable) Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3664 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
8de3713899b7583b41504e5adac64ab5deaebfa1 |
|
31-Aug-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Renamed process_is_alive to program_is_alive. Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3628 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
cdaeae86c156dece62e29afdd4a9976a922883aa |
|
31-Aug-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fixed bug where scheduler would crash if the autoserv process is lost during verify/cleanup/repair. Risk: low Visibility: medium Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3627 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
4ac47541c8bc26398cc0ed847fa18d033962f3ff |
|
31-Aug-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Don't mark HQEs as Failed before the GatherLogsTask and the FinalReparseTask complete. Risk: low Visbility: medium (scheduler bug fix) Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3625 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
549afada66cc95a35fb0be76357f335016a1b122 |
|
21-Aug-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Added pid file checks to monitor_db and monitor_db_babysitter, so that only one of each process can exist at a time. Changed killing of monitor_db in monitor_db_babysitter to use these pid files. Risk: low Visibility: medium Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3575 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
70a294fa96fa8a2a26650be01165812611a100f2 |
|
21-Aug-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Don't expect aborted "Pending" entries to be recovered. They'll be immediately picked up by _find_aborting() so they don't need to be recovered. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3574 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
58721a8b8d9562579f2e45fdd80db2f67d58a6ac |
|
21-Aug-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
One-off fix to address the issue where a scheduler shutdown immediately after a special task leaves the HQE in a strange state. Specifically, we saw this when a cleanup fails, and the scheduler shuts down before the associated repair starts. HQEs are now requeued after a failed cleanup/verify. TODO: reimplement scheduler to maintain less state in memory by not relying on storing an array of AgentTasks. Risk: medium (scheduler change) Visibility: medium (scheduler bug fix) Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3573 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
6bba3d1e6738fc31772859eb530637ed3ea50b3d |
|
21-Aug-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Don't assert if we were unable to load the pidfile in num_tests_failed. Return a numeric value indicating unknown. Our only use of the value is a == 0 test which makes sense to be False when the answer is unknown. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3570 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
e8e370725197ddb059f70d1689bd97bc8954d187 |
|
21-Aug-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Treat unrecoverable host queue entries as a fatal error. Their existance means we've got a consistency problem that needs human intervention to clean up. This can happen when a previously running monitor_db dies due within a race condition window such as a host_queue_entry being created without its corresponding special_task entry existing yet. (such races are actively being looked at and fixed) Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3569 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
6d1c143fb40a752a6d801cf91523f76e505f6054 |
|
21-Aug-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix scheduler's handling of jobs when the PID file can't be found. Risk: low Visibility: medium (scheduler bug fix) Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3568 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
708b3523a34dc9e874fd9488f3d9e306cf0ebc4e |
|
21-Aug-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Do not go through a DelayedCallTask on atomic group jobs when all Hosts assigned to the job have entered Pending state. There are no more left to wait for. Adds a log message prior to the delay starting for easier debugging. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3564 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
9b6ec509708d53dcc73a94b8f5d32821cbf08acc |
|
21-Aug-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Turn an assertion into a more useful error message. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3563 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
5fa9e11c3040bec64270432f738456b5051e9ce2 |
|
03-Aug-2009 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
By default, only warn when orphaned autoservs are found Signed-off-by: Rachel Kroll <rkroll@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3482 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
6fbdb803283146358f65ab81102e0a3537e9b97e |
|
03-Aug-2009 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Change print msg to logging.error(msg) so that we actually get the error in the scheduler log about the scheduler not being enalbed. Signed-off-by: Scott Zawalski <scottz@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3478 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
c6a5687f840932ea6ecfc79cbcb23fb422c50ea5 |
|
28-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Remove an assertion error that was preventing recovered atomic group jobs from ever running, ending up in a scheduler loop through the host verify/pending/starting/running states before the scheduler crashed on job in question. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3462 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
f4a2e5018d3b12551378ffa2aba9845d2b9b090c |
|
28-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
log aborts in the scheduler more explicitly Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3459 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
a5288b4bb2b09aafe914d0b7d5aab79a7e433eaf |
|
28-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Upgrade from Django 0.96 to Django 1.0.2. Risk: high (framework change) Visibility: medium Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3457 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
b000a8d0e94bf47484b137aa07c73d718862fca2 |
|
28-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Added logging and email code to help track down a bug (asynchronous jobs are getting stuck in Pending). Risk: low Visibility: low Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3453 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
6af73ad07124862c7108ef570139b83b82e44a40 |
|
28-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
"Recover" HQEs in "Starting" status by requeuing them. This is what it used to do, but it was lost in the new recovery code. This restores legacy bahavior until we implement proper recovery. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3451 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
6878e8bff29955afe8088496263550002d7bd281 |
|
21-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Never kill processes in scheduler recovery. Instead, consider it an error if any unrecovered orphan process exists. Since we recover special tasks now, we should recover all processes, so if we find any extra, that means something went wrong and it's not safe to continue. Also fix up job recovery code. We took out the code to requeue remaining active entries, but the job recovery code was depending on that. In the future we could improve this to just rerun the job autoserv and not requeue the whole thing, more like the special task recovery code. But this keeps things working for now. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3427 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
a640b2d5cc00ca83f6c41a663225f9a41890f6bf |
|
21-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix scheduler bug with aborting a pre-job task. Scheduler was crashing when a job was aborted during the cleanup phase. Risk: medium (scheduler change) Visibility: high (critical bug fix) Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3425 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
8ac6f2a349f88e28b551f394eb6f68e1922ed396 |
|
16-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
When a SpecialAgentTask is passed an existing SpecialTask, set the _working_directory upon object construction. It was previously set in prolog(), but recovery agents don't run prolog, but they still need _working_directory sometimes (i.e. when a RepairTask fails). Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3419 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
cfd4a7ecdbc9b7fc1ad4b3667c97f01496316c5e |
|
11-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
With the new SpecialTask recovery code, a RepairTask can be passed a queue entry that was previously requeued. So make sure the task leaves the HQE alone in that case. Also delete some dead code that called requeue(). Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3411 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
b6681aa638dc774a296a3627bfc1198a6eb2a99e |
|
08-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
SpecialAgentTasks can be aborted if they're tied to a job that gets aborted while they're active. In that case, we still need to update the SpecialTask entry to mark it as complete. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3386 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
ed2afea4ca6e23a82d20d1f2ee1067d0c25a8cc2 |
|
07-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
make SpecialTasks recoverable. this involves quite a few changes. * run tasks in determined dirs instead of temp dirs. the dir paths look like hosts//-, for example, hosts/myhost/4-verify. the ID comes from the SpecialTask DB row. this allows us to find the pidfile when we go looking for it during recovery, and it makes it simple to find the logs for any given special task, much like for HostQueueEntries. added SpecialTask.execution_path() for this purpose, and added models_test to test it. * added execution_path() to HostQueueEntry to match the interface of SpecialTask, allowing for more polymorphism, and changed most call sites to use it. * since we're running in these dirs, copy the full results back in these dirs, instead of just copying a single log file. * move process recovery code up into AgentTask, so that all AgentTasks can share the same generic process recovery code. * change SpecialTask recovery code to do process recovery. * change VerifyTask handling of multiple pending verify requests for a machine. instead of updating all the requests, just delete all other tasks. they're not specially tracked in any way so it's simplest to just delete them. * made special tasks get marked is_active=False when they complete, to be consistent with HQEs other changes: * added null=True to SpecialTask.time_started definition * made EmailManager.enqueue_notify_email always log the message, and removed explicit logging calls from call sites * added feature to DroneManager.execute_command() to automatically substitute the working directory into the command. this avoids some duplicate information being passed around and simplifies the unit test. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3380 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
6157c63947d2d628d187a084acb0a48473af1c79 |
|
06-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make the scheduler robust to finding a HostQueueEntry with more than one atomic group label. Log a detailed error message and continue rather than bailing out with a SchedulerError. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3373 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
2fe3f1df42f5fd1dc6296219df289851dcf77025 |
|
06-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Enter all Verify/Cleanup/Repair tasks into the special_tasks table. Also keep track of which Host Queue Entry (if any) each Verify/Cleanup/Repair task belongs to. Additionally, implement recovery for jobs in Verify/Cleanup/Repair (i.e., do not simply reverify the host and requeue the job). Risk: medium (scheduler changes) Visibility: medium (functionality change) Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3372 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
e7d9c605bacd7b1816987994ae18a68c63306a16 |
|
02-Jul-2009 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make the job executiontag available in both the server and client side job objects as job.tag. This is useful if your job would like to copy its data off directly to a results repository on its own from the client machine. Mostly small changes to pass the data down, though I did some docstring cleanup near code that I touched which makes the diff larger. The execution tag is taken from the autoserv -P parameter if supplied and no explicit --execution-tag parameter is supplied. This prevents the need to change monitor_db.py to pass yet another autoserv parameter. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3359 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
e9c6936b69cbf3fe5d292c880c81c5662231bd3d |
|
30-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Pass --verbose flag for verify/repair/cleanup. Since we currently log these via piped console output, we want verbose output. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3326 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
b562645f954117559e1ad8e0e8e607e11d9794f7 |
|
30-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
ensure hosts get cleaned up even in the rare but possible case that a QueueTask finds no process at all Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3325 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
2924b0ac9e0ca35e2cd45a23b60ecfc204360c44 |
|
19-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Ensure one-time-hosts aren't in the Everyone ACL, and make the scheduler ignore this. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3299 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
cbe6f942e563d0bf49089ec53fb33b510c2827eb |
|
17-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
add a log message to the scheduler thats useful for debugging atomic groups Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3292 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
af8b4ca5837e8a8488ad80df75815bf320cb3da1 |
|
16-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix _atomic_and_has_started() to check *only* for states that are a direct result of Job.run() having been called. This was preventing atomic group jobs from running if any machine failed Verify before Job.run was called. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3289 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
08356c157fc61074af715d410bb60c96ccd91257 |
|
15-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Do not call .set_host if the host is already set. (asserts that it is already set properly) Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3268 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
043c62a3b9d943f0b3d3a4e6ac53b78101f5c06f |
|
10-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Ensure all entry points get the import-time logging logic executed before other autotest imports. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3253 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
136e6dcae63f726e2e46aaf2aaacd91ea7e39d17 |
|
10-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make scheduler and babysitter use the new logging_manager system. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3252 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
6d7b2ff05b2232b1b225a4cb3521d76c0152cad9 |
|
10-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Redesign the reverify hosts feature. Host status is no longer changed from the frontend; scheduler deals with all status changes. Risk: medium (scheduler behavior change) Visibility: low Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3238 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
77182562edaaeeffcb98f48a7236a727136aa8ec |
|
10-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Have the scheduler wait a configurable amount of time before starting atomic group jobs as soon as minimum synch count hosts are available in Pending state up until AtomicGroup.max_number_of_hosts are available. Adds a DelayedCallTask class to monitor_db along with logic in the Job class to use this to delay the job becoming ready to run for a little while as well as making sure the job is run at the end of the delay without needing to wait for another host to change state. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3236 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
5613c66976bb68724f9f88e0db3091916612c004 |
|
09-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add an option to global config to disable to the scheduler isn't accidentally started on drones. Signed-off-by: Scott Zawalski <scottz@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3233 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
a64e52ab9d75729a698a1a19c33873dab378abaf |
|
09-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Change behavior of Force Reverify: no longer executes cleanup before. Risk: low Visibility: medium (feature behavior change) Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3226 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
844960a5d0a96d6a03296c4267d6295e4d479919 |
|
29-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
make the readonly connection fallback to the regular Django connection when running in the scheduer. this is really important, because otherwise the readonly connection is not autocommit and bad, bad things could happen, though i'm not sure exactly what existing problems there might have been. we used to do this only for testing, but since we do it in another context here, i renamed the method to be more generic and appropriate. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3183 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
5add1c8f74eeec1631f3b0775fd1f420c74cae22 |
|
26-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make recovered tasks correctly handle being aborted before being started. Unlike other tasks, recovered tasks are effectively "started" as soon as they're created, since they're recovering a previously started task. So implement that properly so that when they're aborted, they do all the necessary killing and cleanup stuff. This should fix a bug where jobs aborted while the scheduler is down won't get properly aborted when the scheduler starts up. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3171 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
29caa4bf2508df180f2f09016cef90cefca59f73 |
|
26-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Explcitly catch SystemExit so we don't stack trace when we exit with sys.exit Signed-off-by: Scott Zawalski <scottz@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3170 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
54c1ea94793a2927fe76a876526a1fdd95cd1b58 |
|
20-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Sort hosts when choosing them for use in an atomic group and when actually assigning pending ones to run a job. Adds a Host.cmp_for_sort classmethod usable as a sort comparison function to sort Host objects by hostname in a sane manner. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3149 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
1ff7b2e88ae7a382f85ab76e786a471134e8a6a0 |
|
16-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add ability to reverify a host from the Host List. Risk: low Visilibity: medium (UI change) Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3143 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
a9435c03c301faf6e2f4df9cd008b44887ecca8c |
|
13-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix recurring run code to reflect recent changes to rpc_utils.create_new_job(). Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3134 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
ebc0fb7543af140398e8546eea560762d1f0b395 |
|
13-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add an extra check for existence of Autoserv results in GatherLogsTask -- in certain recovery cases this can be false, previously leading to an exception. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3133 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
12f3e3212795a539d95973f893ac570e669e3a22 |
|
13-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add job maximum runtime, a new per-job timeout that counts time since the job actually started. * added started_on field to host_queue_entries, so that we could actually compute this timeout * added max_runtime_hrs to jobs, with default in global config, and added option to create_job() RPC * added the usual controls to AFE and the CLI for the new job option * added new max runtime timeout method to * added migration to add new fields and set a safe default max runtime for existing jobs Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3132 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
9e9364081e4c9938f90d2199ad9f42922d7f2da5 |
|
13-May-2009 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add post-parse site hooks (parse -P to trigger, default = off) Make scheduler call tko/parse with -P Signed-off-by: Rachel Kroll <rkroll@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3123 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
a1e74b3e9d68792fae0c926f89b6de1736b1fe21 |
|
12-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add job option for whether or not to parse failed repair results as part of a job, with a default value in global_config. Since the number of options associated with a job is getting out of hand, I packaged them up into a dict in the RPC entry point and passed them around that way from then on. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3110 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
f1ae354808a2eeb95d706a669250b613765212a4 |
|
11-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Represent a group of machines with either the atomic group label name, if a specific label was used, or the atomic group name in the results database when parsing. Adds an optional host_group_name= to the server side group job keyval file. The scheduler choses the most appropriate name for this and adds it to the group keyvals file. Changes the TKO results parser to use host_group_name= as the machine name instead of hostname= when hostname= is a comma separated list of hostnames rather than a single name. Also fixes atomic group scheduling to be able to use up to the atomic group's max_number_of_machines when launching the job; This is still unlikely to happen as the code still launches the job as soon as at least the sync count have exited Verify. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3103 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
597bfd3aa52f467942dc181d1dcb4223644c2f7f |
|
08-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Only run crashinfo collection when Autoserv exited due to some signal -- not just when it failed. Also make a minor fixup to some logging during process recovery. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3098 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
ef519214787d9c749849305f95d7ae6e7035171e |
|
08-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Pick hosts out of an atomic group in order rather than randomly so that a consistent set of hosts is used when possible. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3095 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
08a36413b0cd9939aa0090ce4ceaafb8dc43d002 |
|
05-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Change Agent.abort() again. This time, it runs through its queue of AgentTasks, aborting them until it reaches one that ignores the abort (or exhausts the queue). With the previous logic, we might have an Agent with a GatherLogsTasks that should ignore the abort, but if the Agent got aborted before starting it would never run the task. I hope I've really got it right this time. To help simplify things, I reorganized the AgentTask logic a bit, making AgentTask.poll() call AgentTask.start() itself so that the Agent wouldn't have to explicitly call AgentTask.start(). I also got rid of Agent.start(), which was unused. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3089 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
83c1e9e8ae3446be28fb72ecc7d7686a46bb6733 |
|
02-May-2009 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Call out to site_monitor_db: site_init_monitor_db Signed-off-by: Rachel Kroll <rkroll@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3080 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
29f7cd27e51add2648fb62ab2a0c588f9acb1ec4 |
|
29-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Here is a patch, which extends the autotest system with recurring job executions. When you create a new recurring job, you can specify: - start time (on server) - loop count (0 means infinite): how many times it will executed - loop period: how many time will wait between two execution Added features: - Create job: can create Template job. - View job: added "Create recurring job" - New tab "Recurring job" - list of recurring jobs - can click on it to view the executed job - selection support - Action -> remove - creation panel (can be accessible thru "Create recurring job") - submit/create - reset - data validity check From: Zoltan Sogor <weth@inf.u-szeged.hu> Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3064 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
0bbfc2175f8d76647b9f6de7e1d5635d85ca5c00 |
|
29-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make autoserv --collect_crashinfo only run when Autoserv actually failed (exit status nonzero) or was aborted. I was being lazy and always running it, but it seems that introduced very annoying latency into job runs. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3063 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
20f9bddedda271bf486d4de20135160b6951b71d |
|
29-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
fix Agent.abort() when it's called before the agent has started (in that case, it should do nothing -- but the logic was making it basically ignore the abort). this should fix jobs being aborting in the "starting" phase (a phase that lasts one cycle before "running" starts). Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3060 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
b82b1f2db7af9e767271f65c2c910392502acc4b |
|
28-Apr-2009 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make a couple of errant files executable Signed-off-by: Martin J. Bligh <mbligh@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3046 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
d920518065a2b90fec5dd9e3a23d446254502ee3 |
|
27-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make RepairTask write job_queued and job_finished keyvals so they can be parsed into TKO when failed repair results are parsed. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3038 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
6b73341768f1cf0de210630142929047633658ff |
|
27-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix two bugs introduced in previous change to add collect_crashinfo support. * Some late modifications to the previous change prevented the FInalReparseTask from running when a job was aborted. Fixed that by allowing AgentTasks to really ignore an abort (which the PostJobTasks do). * The new abort logic caused hosts to not get cleaned up after an abort if the job was running and had the "reboot_after = Never" option set. This may or may not be preferable to users, but it's a change from the previous logic, so I'm changing it back to always run cleanup when a job is aborted. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3037 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
d3dc199703bfb8784a2f8f072d0514532c86c0a9 |
|
22-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add support to the scheduler to run autoserv --collect_crashinfo after a job finishes or is aborted. * added new state "Gathering" for when we're running collect_crashinfo and copying logs to the results repository * added new GatherLogsTask to the scheduler to perform these two tasks, and made it get run either after a job finishes or after a job is aborted. this task shares a lot with FinalReparseTask, so extracted common code into a new PostJobTask. * made changes to scheduler/drone code to support generic monitoring and recovery of processes via pidfiles, since we need to be able to recover the collect_crashinfo processes too. this will also made the scheduler recover parse processes instead of just killing them as it does now, which is nice. * changed abort logic significantly. since we now need to put aborted jobs through the gathering and parsing stages, but then know to put them into "aborted" afterwards, we can't depend on the old path of abort -> aborting -> aborted statuses. instead, we need to add an "aborted" flag to the HQE DB table and use that. this actually makes things generally cleaner in my opinion -- for one, we can get rid of the "Abort" and "Aborting" statuses altogether. added a migration to add this flag, edited model and relevant logic appropriately, including changing how job statuses are reported for aborted entries. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3031 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
915958db04ca97d3d5a011383e736a3e2b4e8db3 |
|
22-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix monitor_db_unittest, broken by previous change to refactor cleanup code. Two main things here: * 24hr cleanup was running upon object construction, which meant it was running inadvertently during unit testing. Fixed this with the usual trick of moving that action from the constructor to an initialize() function, which gets called separately in monitor_db and which the unit test avoids. * one of the scheduler unit tests was actually testing cleanup code; change that to call the newly located function. this test should maybe be moved to a separate unit test file for the monitor_db_cleanup module, but I just want to get things working again for now. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3029 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
87ba02a820301220076cccf34d34d9243f18da7a |
|
20-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
extract code for generated autoserv command lines to a common place, including support for -l and -u params, and make verify, repair and cleanup tasks pass those params. this should make failed repairs include the right user and job name when parsed into tko. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3019 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
76e29d112fbe5f5afc13cebad278fdcb4fde752f |
|
15-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix monitor_db.DBObject.save() to handle None values as NULL properly. Atomic group scheduling was causing host_queue_entries rows in the database to be created with meta_host=0 when it should have been NULL. This was causing problems elsewhere as code failed to find label id 0. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2988 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
f3294cce1590d9c79cb25dcaa18cec0ac08c9b73 |
|
08-Apr-2009 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Move clean up functions into seperate file/classes Add 24hour clean up run Add django_session clean up Signed-off-by: Scott Zawalski <scottz@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2979 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
27f3387980a99304a9bae73d1647ce8af54d3dc0 |
|
07-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Ensure exception information from monitor_db goes to logs. * in EmailManager.log_exception(), change sys.stderr.write() to logging.exception(), ensuring output goes through the logging infrastructure (as it should) * add an extra top-level main() wrapper in monitor_db to catch any escaping exceptions and log them before reraising. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2968 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
50e463b0b61384f91065bd7407f71c689a38277c |
|
07-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add a check for AUTOTEST_SCHEDULER_LOG_DIR Update monitor_db_babysitter to use subcommand and define the log path for the scheduler Signed-off-by: Scott Zawalski <scottz@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2966 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
c9895aaf0614f76559d58b830046de34190cb085 |
|
01-Apr-2009 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Move monitor_db_babysitter to using utils.run to start monitor_db with environment variable for monitor_db's logs. Add option to monitor_db.py to check if AUTOTEST_SCHEDULER_LOG_NAME is set if it is use that name for logs, otherwise use the default. Signed-off-by: Scott Zawalski <scottz@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2959 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
fb67603a3a1194f610528dea6b46ecea75bbccd6 |
|
01-Apr-2009 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add write_pid to common code Call write_pid from scheduler and babysitter Signed-off-by: Rachel Kroll <rkroll@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2953 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
7629f148feee19efa11bc041d49943aacc43c482 |
|
27-Mar-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Couple fixes for Lucas and Rodrigo's logging changes * fix client.bin.job_unittest * fix a couple prints that slipped into monitor_db during the review of the logging patch Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2945 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
205fd60f9c9d2f64ec2773f295de1cf5cfd3bc77 |
|
21-Mar-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix the AtomicGroup name display in the admin interface. Adds an invalid bool column and use the existing invalid model to avoid problems when deleting from the django admin interface. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2918 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
ccbd6c5c6dfc10072f6ace2f528b9ed7c764a0b3 |
|
21-Mar-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Ensure RepairTasks aren't associated with the queue entries that spawned them, so that if the QE is aborted during repair the repair task will continue running (and just leave the QE alone from then on). Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2917 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
b18134f8faa7c5b4623760bc88650a65e70b2cac |
|
20-Mar-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
As discussed on the mailing list, we implemented logging with a single root logger and logging configurations per entry points. The entry points affected are: * Autotest client * Autoserv * Scheduler We are sending patches for each one of those entry points. Now we don't need to 'grab' loggers anymore to log messages, only need to use the utility functions: logging.info('msg') logging.debug('msg') logging.warning('msg') logging.error('msg') logging.critical('msg') Which reduces complexity of the log system, and makes it easier for developers to log messages, just select the level, make sure the standard logging module is loaded, and profit! From: Lucas Meneghel Rodrigues <lmr@linux.vnet.ibm.com> Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2915 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
89f84dbadf071ba430244356b57af395c79486e4 |
|
12-Mar-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add the concept of an Atomic Group to the scheduler and database. Scheduling a job on an atomic group means that all of the Ready machines (up to a maximum specified in the atomic group) in a single label associated with that atomic group will be used to run the job. The job synch_count becomes a minimum when scheduling on an atomic group. Both HostQueueEntrys and Labels may have an AtomicGroup associated with them: * A HostQueueEntry with an AtomicGroup acts to schedule a job on all Ready machines of a single Label associated with that AtomicGroup. * A Label with an AtomicGroup means that any Hosts bearing that Label may only be scheduled together as a group with other hosts of that Label to satisify a Job's HostQueueEntry bearing the same AtomicGroup. Such Hosts will never be scheduled as normal metahosts. Future patches are coming that will add the ability to schedule jobs using this feature to the RPC interface, CLI and GUI. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2878 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
cca334f4eae9bcaa31146bf138fa111f1f94be87 |
|
12-Mar-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
chdir when starting monitor_db to avoid issues such as "shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory." if monitor_db is started with an odd nfs pwd. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2877 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
a3c585726f4cbc9ab19f9a39844aecc5c9f8c9ce |
|
12-Mar-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
a) Reduce the number of instances of DBObject classes created for the same row in the database by caching existing live instances using weakref's. b) Raise an Exception object rather than a string on SELECT failure. c) Use itertools.izip() instead of zip() Risk: a) Medium b) Low c) Low Visibility: This alters the behavior of the scheduler but by default we always re-query the database when a particular row is instantiated even when reusing an existing cached object. Monitoring of the log file will show if this is even an issue. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2875 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
35162b0a9d278a43d3fd90f3664ea09092ba9684 |
|
03-Mar-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Updated patch including unit test fixes. * remove the whole "dummy process" and "null drone" thing, and change QueueTask to be aware of when the process is lost. the old way was supposed to basically transparently handle that case, but it was just flawed. instead, QueueTask needs to check for the lost process case and handle it specially. this eliminated the need for the dummy process and the NullDrone class, so I removed them. QueueTask also writes a special file with an error message when the process is lost, so the user will be aware of it. (this should only happen when there are system errors on the main server.) * add an extra comment to a check in FinalReparseTask that I added recently, but that was confusing me now. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2844 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
6ae5ea9e1186f0887f625c28cc52a0dba1a732b2 |
|
25-Feb-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Internal cleanup - don't use a ._fields() method, just define a class attribute. Same for ._get_table(), just define a class attribute. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2819 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
25cbdbd2da6242296abe6b1342521b29993993f2 |
|
17-Feb-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Get rid of the host_queue_entries.priority field in the autotest_web DB. It's just duplicating information from the jobs table. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2813 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
a038235e60f5a141414d6a27da43f5c4978365b3 |
|
12-Feb-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Update two calls to _write_keyval to call the new _write_keyval_after_job instead. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2780 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
73ec044e546f6264b03f85c9357d0fdadf7db16b |
|
07-Feb-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
* Write job_queued keyval before running the job, so it's available to the continuous parser. This involved some general refactoring of the keyval writing code. * Make DroneManager keep the list of attached files per execution as a dict instead of a list. this allows us to easily catch duplicate attached files. they shouldn't happen and don't appear to now, but I felt it was a good safety check to add. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2765 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
d9ac445a60d6d11537f566503164344e09527917 |
|
07-Feb-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Remove the old acl_groups_users and acl_groups_hosts many2many pivot table column name hack. Renames acl_group_id -> aclgroup_id. Adds a migration script and updates the one piece of code that actually depended upon the old name. This is needed for my upcoming change that ports autotest to Django 1.0.x but seems worth cleaning up as a change of its own. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2764 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
678df4f19352060e24c1257bae28bf89e769e8bf |
|
04-Feb-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
The change to copy and parse failed repair results was broken for drone setups. The parser always expects to run on results in the normal results directory (determined by the queue entry's execution tag), but it runs on the drone. On the drone, repair results were placed in a temp dir, and copied over to the job dir only on the results repository. To fix this, the code now copies the repair results to the job dir *on the drone*, and then proceeds with the normal copy-and-parse procedure, just like for normal job results. I also added a check in FinalReparseTask to ensure the results to parse can actually be found, and email a warning if not. Previously, it would simply crash the scheduler. This shouldn't normally happen since the underlying bug has been fixed, but it's been seen in other situations so this should be safer in general. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2746 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
8bcd23af07a5420f35acfd0aa57c8c1cceb34368 |
|
03-Feb-2009 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Move all MySQLdb imports after the 'import common' so that a MySQLdb package installed in our own site-packages directory will be found before the system installed one. Adds an empty site-packages directory with an explanation README. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2735 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
6bb7c294f777f722bfba54669287483e2ee0c887 |
|
30-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
when "stopping" a sync job, don't stop verifying entries, because they still have active agents holding HostQueueEntry objects, and messing with them creates data inconsistencies. this is a symptom of a larger problem with the DB models in the scheduler, but I'm not sure how to fix the larger problem right now. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2717 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
de634eed90554d881ecce30812051b63efb14efa |
|
30-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
* when repair fails in a situation that fails a queue entry (i.e. repair triggered by failed verify for a non-metahost queue entry), copy the full autoserv logs to the results repository and parse them, just like we do at the end of a job. * fix some local copying code in drone_utility to better handle copying directories. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2716 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
c9ae1787fd41101089b09551d8028aef1daae1d3 |
|
30-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
* add "Do not verify" host protection level and implement it in scheduler * make scheduler run pre-job cleanup even if "skip verify" was chosen. accomplished this by adding a SetEntryPendingTask to call on_pending, instead of having the VerifyTask do it. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2715 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
ade14e2081e4b934f4b2232acfbf4c1b2f3bece4 |
|
26-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Bypass only_if_needed labels for specifically-requested hosts. This is a workaround for generating proper job dependencies in an easier/more automated manner when selected specific hosts with only_if_needed labels. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2692 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
324bf819a4ea0fb2fa2509c3b4462f208e225d8d |
|
21-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make maximum process throttling configurable on a per-drone basis. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2660 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
67831ae7a70ab35b48f98fad7f14d285b69d9bd9 |
|
16-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make sure the debug messages from models get printed while the scheduler is running. The log level got changed during the scalability work and this was disabling output of these important status change messages, Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2657 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
2fa516935f896d49e7320411942b47db7212ecc8 |
|
14-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Don't execute any new autoserv processes when all drones have been disabled. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2641 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
c5afc46a78de6b881ea716fe5d39df48d7349664 |
|
13-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add capability to "phase out" drones. You can disable a drone in the global config using "<drone hostname>_disabled: 1". Then, from the scheduler web interface, you can reload the config, causing the scheduler to stop scheduling new jobs on the drone but to see all existing jobs to completion. This allows us to safely remove drones from the system without any loss of work. Also moves some initialization lines in monitor_db.main() inside the try-except, so that exceptions there will be reported via email and the status server will still be shut down. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2624 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
55b4b54185fa30f9e9412f6cc6275a964a0b0579 |
|
09-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make the status server smart enough to shut down cleanly when the scheduler exits. Some of this code is taken from tcpserver/tcpcommon.py. I'm not sure where we'd put the common code for these, or if it's too early for that. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2610 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
4f9e53713cc2f147ab8f20b0133b2eb56e3c0d44 |
|
07-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add some extra info to an assertion. This assertion has fired a few times since it was inserted and I believe it may be related to a bug where host quere entries are left in state "Pending" forever. I'm having trouble tracking this bug down so maybe this will help. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2609 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
d1ee1dd3f3e5ac44f00d7a96deb815dbe1beedad |
|
07-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
* move some scheduler config options into a separate module, scheduler_config * add a little embedded HTTP server to the scheduler, defined in status_server.py, running in a separate thread. this displays loaded config values and allows reloading of those config values at runtime. in the future we can extend this to do much more. * make global_config handles empty values as nonexistent values by default. otherwise, we would have to both pass a default= and check for value == '' separately. Now, we just pass default= and it's all taken care of. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2608 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
170873e8869cae8bb9499d6128cf626e8110bf56 |
|
07-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Attached is a very large patch that adds support for running a distributed Autotest service. Previously, the scheduler could only execute autoservs locally and all results were written directly to the local filesystem. This placed a limit on the number of machines that could be concurrently tested by a single Autotest service instance due to the strain of running many autoserv processes on a single machine. With this change, the scheduler can spread autoserv processes among a number of machines and gather all results to a single results repository machine. This allows vastly improved scalability for a single Autotest service instance. See http://autotest.kernel.org/wiki/DistributedServerSetup for more details. Note that the single-server setup is still supported and the global configuration defaults to this setup, so existing service instances should continue to run. Steve Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2596 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
37eceaa2d0640edd83c4df3fc71621022433d52a |
|
15-Dec-2008 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add entries to the config file to control which server is used rather than hardcoding the autotest hostname in the code. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2569 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
6355f6b9ecc400009ca9b93bac2b62ac72de19b8 |
|
05-Dec-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Don't assume there's a non-null host in job completion email code. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2546 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
ac9ce2279050866de6c697a5a27da2b4c731b129 |
|
03-Dec-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Only schedule jobs that are "Queued". Now that state "Parsing" is an active=complete=0 state, we need to explicitly check for this. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2542 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
ff059d7d24d07e094e31c9856c760ddc9024fe24 |
|
03-Dec-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Don't abort running entries from synch start timeout (only queued/starting/verifying/pending ones). git-svn-id: http://test.kernel.org/svn/autotest/trunk@2541 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
d876f459fff6cc4994cab329b1f80c99a86edcbd |
|
03-Dec-2008 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
gps pointed out that "== and != work in most cases but its better to use is and is not as you'll never run into a case where someone's __eq__ or __ne__ method do the wrong thing." Signed-off-by: Martin J. Bligh <mbligh@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2533 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
c85c21bec23af3f805397f1f7e289b9a4b0bfd05 |
|
24-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
* allow scheduler email "from" address to be specified in global config * allow global config to specify statuses which should trigger immediate emails (in addition to email upon job completion) * make "Parsing" an active=complete=0 status, and modify Job.num_complete() appropriately * restructuring of scheduler email notification code to only email at the right time and to include a more informative message git-svn-id: http://test.kernel.org/svn/autotest/trunk@2506 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
e58e3f8d3ffb0144cf71ccefefd5bf253c0bb5fb |
|
20-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Set HQEs to "Verifying" instead of "Starting" when we're about to run verify on them. We need to set them to an active status, but if we use "Starting" then we can't tell which stage they're in, and we need that information to know when to "stop" synchronous jobs. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2482 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
cbd74616b62c28e186d41a3950e9a7ab1ba86da6 |
|
19-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
When aborting a running job, write an INFO line to the status.log. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2450 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
8fe93b5da16ebec51dfec50e1d810085c2af79e0 |
|
18-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make CleanupTask copy results to job dir on failure. Did this by extracting code from VerifyTask into a common superclass. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2435 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
e788ea65d121585d02073341ae63149cf4c43fa5 |
|
17-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-make get_group_entries() return a list instead of a generator, since all callers want it that way anyway -change assertion on existing results dir into warning (with email notification) -move some of the queue entry failure handling code into RepairTask so it doesn't have to be duplicated in VerifyTask and CleanupTask (and because CleanupTask wasn't handling it quite right) git-svn-id: http://test.kernel.org/svn/autotest/trunk@2429 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
e77ac67fca979c1703b706c3b0935c658bd6fcf1 |
|
14-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Set queue entries to "Starting" when the VerifyTask is created for them. This perennial source of problems cropped up again in the latest change to the job.run() code (as part of the synch_count changes). git-svn-id: http://test.kernel.org/svn/autotest/trunk@2421 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
2bab8f45adedeacbf2d62d37b90255581adc3c7d |
|
12-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Implement sync_count. The primary change here is replacing the job.synch_type field with a synch_count field. There is no longer just a distinction between synchronous and asynchronous jobs. Instead, every job as a synch_count, with synch_count = 1 corresponding to the old concept of synchronous jobs. This required: -changes to the job creation RPC and corresponding client code in AFE and the CLI -massive changes to the scheduler to schedule all jobs in groups based on synch_count (this unified the old synch and async code paths) -changed results directory structure to accomodate synchronous groups, as documented at http://autotest.kernel.org/wiki/SchedulerSpecification, including widespread changes to monitor_db and a change in AFE -changes to AFE abort code to handle synchronous groups instead of just synchronous jobs -also got rid of the "synchronizing" field in the jobs table, since I was changing the table anyway and it seems very likely now that that field will never be used other changes included: -add some logging to afe/models.py to match what the scheduler code does, since the scheduler is starting to use the models more -added checks for aborts of synchronous groups to abort_host_queue_entries RPC git-svn-id: http://test.kernel.org/svn/autotest/trunk@2402 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
9d9ffd51a4332d77eb4a9772f834fcc9d1304cae |
|
10-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
don't reboot hosts when aborting inactive jobs. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2393 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
6198f1d6cd03b4f2ffdf84febb4ddaa42f8831a8 |
|
06-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
When a synch job fails and we stop other entries, set the host back to "Ready" if it was "Pending". Otherwise it'll sit in state "Pending" forever. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2385 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
45ae819b2d2a67d0882edafaf1a8f7b95c3fb9d2 |
|
05-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add a formal cleanup phase to the scheduler flow. -add a --cleanup or -C option to autoserv, which runs a new control segment, cleanup. this option and control segment obsolete the old -b option and reboot_segment control segment. -change the RebootTask in the scheduler into a more generic CleanupTask, which calls autoserv --cleanup. -change the host status "Rebooting" to "Cleaning" git-svn-id: http://test.kernel.org/svn/autotest/trunk@2377 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
8ebca79709f6201a632b19394365803394932b45 |
|
04-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-fix running process accounting in scheduler. Dispatcher.num_running_processes() already excludes Agents that are done, so we don't need to subtract their processes off. -only set hosts to "Ready" after a job if there's no post-job reboot. if we set to "Ready" and schedule a post-job reboot, the host could get picked up by another job before the reboot executes. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2374 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
fa8629c3a28b0ccebbd339218883e5e6cbb1ce16 |
|
04-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-ensure Django connection is autocommit enabled, when used from monitor_db -fix HostScheduler to not crash when there are no ready hosts -change RebootTask to optionally take a queue entry and pass it to the RepairTask if reboot fails. This allows jobs to be failed if the pre-verify reboot fails, instead of being left hanging. -add unit test for RebootTask -add check for DB inconsistencies to cleanup step. Currently this just checks for HQEs with active=complete=1. -when unexpected existing results files are found, email a warning git-svn-id: http://test.kernel.org/svn/autotest/trunk@2368 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
97aed504f709270614ccbcef299e394333a76598 |
|
04-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Rewrite final reparse code in scheduler. the final reparse is now handled by a separate AgentTask, and there's a "Parsing" status for queue entries. This is a cleaner implementation that allows us to still implement parse throttling with ease and get proper recovery of reparses after a system crash fairly easily. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2367 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
a3ab0d56117c0d55f768d5817284c2c2a0b0305d |
|
03-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-change AFE abort code to always set to "Abort" status and never skip straight to "Aborted". Doing so is prone to a race condition with the scheduler. The scheduler handles a non-active "Abort" entries perfectly already, setting them immediately to "Aborted" without trying to kill anything. -change scheduler timeout code to use AFE models abort code instead of it's own SQL version. unfortunately we need need a fragment of SQL to do the timeout computation, which means no testability under SQLite. I manually tested it. -also extracted the scheduler periodic "cleanup" code into its own method. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2365 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
9886397ceb7c752db78a6acd9737992db891015b |
|
29-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add job start timeout for synchronous jobs. This timeout applies to synchronous jobs that are holding a public pool machine (i.e. in the Everyone ACL) as "Pending". This includes a new global config option, scheduler code to enforce the timeout and a unit test. Note that the new scheduler code uses the Django models instead of making DB queries directly. This is a first example of how the scheduler can use the models to simplify DB interaction and reuse code from the frontend. I'd like to move in this direction from now on, although I certainly won't be making any sweeping changes to rewrite existing code. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2358 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
b2ccdda84c2c6bb34a2e5c58000a9dbf185fb691 |
|
28-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Change location of set_status('Starting') line. This just got put in the wrong place when I refactored the job.run() code, and it wasn't getting run at all for asynchronous jobs. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2350 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
e05654d90b166b04d8a00d037b5e3469776175a8 |
|
28-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Ensure results directories always get created for asynchronous multimachine jobs (previously they wouldn't for jobs with run_verify=False). git-svn-id: http://test.kernel.org/svn/autotest/trunk@2349 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
3dd6b88de09c14cf7f93ff188461876ec65afe55 |
|
27-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Two simple scheduler fixes: -treat empty pidfiles like nonexistent pidfiles. there seems to be a race condition where the schedulers reads a pidfile after autoserv creates it but before autoserv writes the pid to it. this should solve it. -prioritize host queue entries by job id instead of host queue entry id. when jobs are created exactly in parallel, their host queue entries can have interleaved IDs, which can lead to deadlock. ordering by job id should protect against that. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2342 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
0fc3830f17d644bab74bfe38556299f5e58bc0fa |
|
23-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add user preferences for reboot options, including simple user preferences tab which could later be expanded to include more options. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2330 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
21baa459ea14f96e06212f1f35fcddab9442b3fc |
|
21-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add options to control reboots before and after a job. -add reboot_before and reboot_after fields to Job, along with enums for each -add options to create_job RPC for reboot_before and reboot_after -add options to job create CLI for these fields, and made job stat -v display them -add widgets to job create page in AFE for these fields and made job detail view display them -add dirty field to Hosts, defaulting to True, and set to True when a host is locked -made scheduler set this field when a job runs and clear it when a host is rebooted -updated scheduler's PidfileRunMonitor to read a new three-line .autoserv_execute format, where the third line contains the number of tests that failed -made scheduler Job.run() include a RebootTask before the verify task according to the reboot_before option -made QueueTask.epilog() launch a RebootTask for each host according to the reboot_after option -updated autoserv to write out a third line to .autoserv_execute containing the number of failed tests. Other changes: -added support for displaying Job.run_verify in the CLI (job stat -v) and job detail page on AFE -updated ModelExtensions to convert BooleanField values to actual booleans. The MySQL Django backend just leaves them as ints (as they are represented in the DB), and it's stupid and annoying (Yes, bool is a subclass of int, so it's often not a problem. But yes, it can be.). -get rid of use of Job.synch_count since we don't actually support it. I think this was meant for inclusion in a previous change and got left out. -made the scheduler use the new setup_django_environment stuff to import and use the django models. It doesn't *really* use the models yet -- it just uses the Job.Reboot{Before,After} enum objects -- but this shows we could easily start using the models, and that's definitely the direction I want to go long term. -refactored PidfileRunMonitor generally and made it a bit more robust by having it email errors for corrupt pidfiles and continue gracefully, instead of just crashing the scheduler -changed the way Agent.tick() works. now, it basically runs through as much work as it can in a single call. for example, if there's a RebootTask and a VerifyTask, and the RebootTask has just finished, in a single call it will finish up the RebootTask and start the VerifyTask. this used to take two cycles and that was problematic for cases like this one -- the RebootTask would like to set host.status=Ready, but then the host could get snatched up on the next scheduling round, before the VerifyTask got started. This was sort of solved previously by keeping the HostQueueEntry active, and we could apply that approach here by making a new status for HostQueueEntries like "Rebooting". But I prefer this approach as I think it's more efficient, more powerful and easier to work with. Risk: extremely high Visibility: new reboot options for jobs, skip verify now displayed in AFE + CLI Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2308 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
1be9743d7bf16ad21fbd70ec45c77f3907f10718 |
|
17-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-fix bug with handling abort on unassigned host queue entries -refactor abort code into abort() method on HostQueueEntry, which the dispatcher calls both during abort and during recovery of aborting tasks -don't special case aborting queue entries with no agents. We can just create an AbortTask in all cases, and when there are no agents, it'll iterate over an empty list. No need to complicate the code with a special case. -rewrite the dispatcher abort unit test to do less mocking and test the code more organically, and add a test for AbortTask git-svn-id: http://test.kernel.org/svn/autotest/trunk@2298 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
cfd66a35e8c939998ff354d9740bdf15c9bc3fda |
|
15-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make scheduler set host status to "Pending" when there's a pending queue entry against the host. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2292 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
9976ce9873867a397e448d358543a9dc1d33aa77 |
|
15-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-make monitor_db implement "skip verify" properly, and add unit tests for it -change order of a couple fields in AFE models to match DB order git-svn-id: http://test.kernel.org/svn/autotest/trunk@2291 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
b2e2c325bc0b1d822690b6af07f920d5da398cb8 |
|
14-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-refactor Job.run in monitor_db, one of the most important and most confusing methods in the scheduler. it's now broken into separate synchronous and asynchronous paths with common methods extracted. -remove a bunch of dead code from the Job class -remove the one actual usage of the synch_count fields. We don't really support this yet, so there's no reason to pretend we do. -extract some code from VerifySynchronousTask into HostQueueEntry.on_pending(). it's better here and will be necessary in a near-future change (to implement skip_verify right). -add simple test for job.run() to scheduler unittest. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2282 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
b1e5187f9aa303c4fc914f07312286d302b46a0e |
|
07-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Get the scheduler unittest to run against SQLite! * get rid of monitor_db.DatabaseConn, and make monitor_db use the new DatabaseConnection * modify some queries in monitor_db that weren't SQLite-compatible (SQLite doesn't support TRUE and FALSE literals) * add frontend/django_test_utils.py, which contains utilities to * setup a django environment (something manage.py normally does for you) * replace the configured DB with a SQLite one, either in-memory or on disk * run syncdb on the test DB * backup and restore the test DB, handy because then we can syncdb once, save the fresh DB, and quickly restore it between unittests without having to run syncdb again (syncdb is terribly slow for whatever reason) * modify monitor_db_unittest to use these methods to set up a temporary SQLite DB, run syncdb on it, and test against it * replace much of the data modification code in monitor_db_unittest with use of the django models. The INSERTs were very problematic with SQLite because syncdb doesn't set database defaults, but using the models solves that (django inserts the defaults itself). using the models is much cleaner anyway as you can see. it was just difficult to do before, but now that we've got the infrastructure to setup the environment anyway, it's easy. this is a good model for how we can make the scheduler use the django models eventually. * reorder fields of Label model to match actual DB ordering; this is necessary since monitor_db depends on field ordering * add defaults to some fields in AFE models that should've had them * make DatabaseConnection.get_test_database support SQLite in files, which gives us persistence that is necessary and handy in the scheduler unittest * add a fix to _SqliteBackend for pysqlite2 crappiness The following are extras that weren't strictly necessary to get things working: * add a debug feature to DatabaseConnection to print all queries * add an execute_script method to DatabaseConnection (it was duplicated in migrate and monitor_db_unittest) * rename "arguments" to "parameters" in _GenericBackend.execute, to match the DB-API names * get rid of some debug code that was left in monitor_db, and one unnecessary statement Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2252 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
c993bee9e210f75ecdb76042125f11895b1c02e2 |
|
03-Oct-2008 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
The scheduler has some overly vebose (debug) logging...kill it. Risk: High Visibility: High to sys admins Signed-off-by: Jeremy Orlow <jorlow@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2232 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
f7fa2cc6159fda3adecef2eeb5e5a016e866564c |
|
01-Oct-2008 |
jadmanski <jadmanski@592f7852-d20e-0410-864c-8624ca9c26a4> |
Update the scheduler and the parser to use the new aborted_* attributes that are written into the database on abort. Risk: Medium Visibility: Better error messages from the parser when a job is aborted. Signed-off-by: John Admanski <jadmanski@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2217 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
989f25dcbb6361218f0f84d1c8404761b4c39d96 |
|
01-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
two new major features: (1) added test and job dependencies -added M2M relationship between tests and labels and between jobs and labels, for tracking the labels on which a test/job depends -modified test_importer to read the DEPENDENCIES field and create the right M2M relationships -modified generate_control_file() RPC to compute and return the union of test dependencies. since generate_control_file now returns four pieces of information, i converted its return type from tuple to dict, and changed clients accordingly. -modified job creation clients (GWT and CLI) to pass this dependency list to the create_job() RPC -modified the create_job() RPC to check that hosts satisfy job dependencies, and to create M2M relationships -modified the scheduler to check dependencies when scheduling jobs -modified JobDetailView to show a job's dependencies (2) added "only_if_needed" bit to labels; if true, a machine with this label can only be used if the label is requested (either by job dependencies or by the metahost label) -added boolean field to Labels -modified CLI label creation/viewing to support this new field -made create_job() RPC and scheduler check for hosts with such a label that was not requested, and reject such hosts also did some slight refactoring of other code in create_job() to simplify it while I was changing things there. a couple notes: -an only_if_needed label can be used if either the job depends on the label or it's a metahost for that label. we assume that if the user specifically requests the label in a metahost, then it's OK, even if the job doesn't depend on that label. -one-time-hosts are assumed to satisfy job dependencies. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2215 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
7d182aa9fa3e1f2d8c32b6f3160fafe98b9123ae |
|
22-Sep-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Handled exceptions caused by email sending functions. Prints log messages to standard out. Signed-off-by: Bryce Boe <bboe@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2178 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
542e840486b02b5025d26da16f98fed97898a601 |
|
19-Sep-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Added email_list field to front end. On job completion emails on this list will be notified of the completion. Also the function send_email in manage_db.py adds the ability to put a delimited email list in global_settings.py rather than just a single email. Signed-off-by: Bryce Boe <bboe@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2173 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
d8e548aefdaed529f8e36c38904366c1a2f519d8 |
|
09-Sep-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
make scheduler write host keyval files at the beginning of the job. presently the only keyval that's written is a list of host labels. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2118 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
4c5374f34ee4b31899c875c068ec6080ec8ce21c |
|
04-Sep-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-modify scheduler throttling code to track number of running processes rather than just number of running agents. note this is only an estimate of running processes - it counts all agents as one process unless the agent is a synchronous autoserv execution, in which case it uses the number of hosts being run. -add scheduler throttling test Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2105 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
970a6db6074aade35910f0e6194d58bf8b440182 |
|
03-Sep-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Rate limit the final parse of the scheduler. If more than 100 or so run at a time, it will bring mysql to its knees (for no good reason...all actions are on different jobs). Risk: High Visibility: Medium (things will work better on big jobs) Signed-off-by: Jeremy Orlow <jorlow@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2089 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
849a0f6ccc984232e662d800eef6add81b58dd00 |
|
28-Aug-2008 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Invalid SQL is created if you have one-time hosts but no 'real' hosts and you schedule a job. Add a check for this. Risk: Low Visibility: Low Signed-off-by: Jeremy Orlow <jorlow@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2074 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
ccb86d78539f6760265ca287a3b0e5b01227be31 |
|
22-Aug-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
revert an earlier change to when exactly we set the 'Starting' status. this was breaking synchronous jobs and I'm not sure what the reason for it was. it doesn't seem to have been necessary. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2029 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
63a3477a2d6502c10cc47a3022e8f8a257d91434 |
|
18-Aug-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-Refactor new monitor_db scheduling algorithm into it's own class -Reorganize + clean up said code, make better use of existing methods -Change non-metahost code path to check ACLs (previously ACLs were only checked for metahosts) -Add some new unit tests -Change some one-line docstrings on tests to multi-line, since PyUnit is kind of annoying with one-line docstrings git-svn-id: http://test.kernel.org/svn/autotest/trunk@2006 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
b95b1bda8fc4c91eb80af2d44004e5cf37a09905 |
|
15-Aug-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Rewrite the scheduling algorithm yet again. This time, we make separate DB queries to get all the queued host queue entries and all the ready hosts, and then match them up in Python. We could still do the non-metahosts the old way, but we might as well just do it all uniformly, so I've completely eliminated the old code. git-svn-id: http://test.kernel.org/svn/autotest/trunk@1995 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
56193bb485d65716526079449f1df86ba5cb2df5 |
|
13-Aug-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-add basic abort functionality test to scheduler unit tests. this involved some serious refactoring of the existing tests. -various other enhancements to the scheduler unit tests. -extend mock.py comparator logic to be more powerful and generic. it can now handle comparators within data structures, and it handles keyword args properly. -enhance debug printing in mock.py. previously a test failure before calling check_playback would cause errors to be hidden even though they contained the cause of the failure. now, with debug mode enabled, errors will be printed as soon as they occur. -minor changes to monitor_db.py to increase testability. git-svn-id: http://test.kernel.org/svn/autotest/trunk@1981 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
3d9899a76a88cd4dea4cb819b8b9eb2165ede4e0 |
|
31-Jul-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Provides a mechanism in the UI to choose to skip the verification stage. Signed-off-by: Travis Miller <raphtee@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@1934 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
7e26d62ec2d9398636a5f03c7f4198e0c3e81357 |
|
29-Jul-2008 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fixed the job timeouts. Jobs should no longer time out early. Risk: low Visibility: medium (scheduler bug fix) Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@1921 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
542537f8db0bfe38e7ff2c1409ca72862983603d |
|
24-Jul-2008 |
jadmanski <jadmanski@592f7852-d20e-0410-864c-8624ca9c26a4> |
Normalize the --host-protection name, since autoserv is somewhat picky on how it gets formatted on the cli. Risk: Low Visibility: The scheduler-launched host repair will work again. Signed-off-by: John Admanski <jadmanski@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@1884 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
fb2a7fa621ddc91634dde6c56a47a1c8df2610ef |
|
17-Jul-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Adding new columns "locked_by_id" and "lock_time" to the hosts table, to indicate who locked a host and when. Risk: low Visibility: low Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@1864 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
909c7a661e6739c110690e70e2f4421ffc4e5433 |
|
15-Jul-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Initial release of test auto importer Update models.py to reflect database changes Add the following columns to autotests table: * author * dependencies * experimental * run_verify * test_time * test_category * sync_count Add run_verify to jobs table Update scheduler to assert with run_verify Risk: Medium Visibility: High, people addings tests will now see more fields via the admin frontend Signed-off-by: Scott Zawalski <scottz@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@1837 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
fb7cfb165bf5e98f76af9fd9644913fa67f8f567 |
|
09-Jul-2008 |
jadmanski <jadmanski@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add support to the scheduler to pass in the host.protection value as a --host-protection parameter when launching an autoserv repair job. Risk: Low Visibility: Makes autoserv actually obey the host protection value set in the frontend. Signed-off-by: John Admanski <jadmanski@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@1792 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
df062569a6407ec084c4ee05b9390f8a0183d37b |
|
03-Jul-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Adding protection levels to hosts. Allows the user to specify how much the repair operation is allowed to do on the host (e.g., do not repair, repair filesystem only, allow reimaging). Risk: low Visbility: medium (adding a new input field) git-svn-id: http://test.kernel.org/svn/autotest/trunk@1771 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
b8471e33ed22512001ec4cec8c33fdf01f32eb62 |
|
03-Jul-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Added a new input that allows used to specify a one-time host when creating a job. The job will be run against that host once, and the host will not appear in the "Available hosts" selector. Risk: medium (deleting records from database) Visibility: medium (adding an input field) git-svn-id: http://test.kernel.org/svn/autotest/trunk@1768 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
3bb499f74c04acec1f802a531cdfcba8f5ac0164 |
|
03-Jul-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Adding a timeout field to the "Create Job" tab, modified the create_job RPC to handle a "timeout" argument, and added a "timeout" column to the AUTOTEST_WEB database. Sets how long a job should run until it is automatically aborted. git-svn-id: http://test.kernel.org/svn/autotest/trunk@1765 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
f8c624dbc152fa42f4c836ff0b090917a497a504 |
|
03-Jul-2008 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
If a job is marked as Abort/Aborting/Aborted, do not change its status to something different. This fixes a pretty nasty race that showed up when aborting large jobs with items still verifying. Signed-off-by: Jeremy Orlow <jorlow@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@1763 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
b376bc57f68507335bc387963c6807090c89dc90 |
|
13-Jun-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix a bug introduced into recovery code in my refatoring for testability. PidfileRunMonitor.run() was being called on a path when it shouldn't have been. git-svn-id: http://test.kernel.org/svn/autotest/trunk@1701 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
70feeee434c33ac2b5efbb56ddf66967259ba351 |
|
11-Jun-2008 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Needed to fix problems caused by the use of the old import usage which has been replaced with the newer absolute path imports. This is needed to fix problems that were occuring in running our unittests. Signed-off-by: Travis Miller <raphtee@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@1676 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
4eaaf5220537db8e09a202ac27121f52341693a9 |
|
07-Jun-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add distinct to query to cut time spent in half git-svn-id: http://test.kernel.org/svn/autotest/trunk@1660 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
0afbb6369aa5aa9a75ea67dd9e95ec4b21c0c181 |
|
06-Jun-2008 |
jadmanski <jadmanski@592f7852-d20e-0410-864c-8624ca9c26a4> |
Convert all python code to use four-space indents instead of eight-space tabs. Signed-off-by: John Admanski <jadmanski@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@1658 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
3182b336d8f9531dd69043ee6fffab5afdd904b4 |
|
06-Jun-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
minor refactorings to scheduler to make it more testable. the corresponding unit test changes seem to have gone in with some other change. git-svn-id: http://test.kernel.org/svn/autotest/trunk@1652 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
20f47064be6855277e251cee7611d8336bcc9149 |
|
05-Jun-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-check ACLs directly in the scheduler (bypassing ineligible_host_queues) -rewrite scheduler queries to avoid all subqueries. they are just bad in mysql. -rip out all that code related to using ineligible_host_queues to enforce ACLs. good riddance! -update scheduler unit test to reflect this new policy (no ineligible_host_queue blocks for ACLs) -minor bugfixes to scheduler unit test. this sucks, but i did go back and ensure the old scheduler passed the fixed up unit test suite as well. -remove a blanket except: block from the scheduler. it wasn't necessary, it was inconsistent, and it was interfering with unit testing. git-svn-id: http://test.kernel.org/svn/autotest/trunk@1608 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
04c82c5dc70a6de95f9cd77371d0a99cbdcf0959 |
|
29-May-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Rewrite scheduling algorithm to use two queries + some data processing, rather than a separate query for each "idle" host. This should be considerably faster. It also gives us the opportunity to eliminate the whole ACL checking with ineligible_host_queues thing, which has been a nightmare. But one step at a time... git-svn-id: http://test.kernel.org/svn/autotest/trunk@1564 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
30eed1fccc346d44d9eca9d2125c84cdf416644d |
|
28-May-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
A bit of refactoring to monitor_db.py to clean up some code and make it more testable. git-svn-id: http://test.kernel.org/svn/autotest/trunk@1558 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|
93ff7ea968764e96f0aec0197e71e25a2a1f4e50 |
|
27-May-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Rename monitor_db to monitor_db.py. This makes it import-able, which is necessary for unit testing. git-svn-id: http://test.kernel.org/svn/autotest/trunk@1548 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db.py
|