114e17228efd62ab595690be30cb1e3f26fabebe |
|
11-Jan-2016 |
Dan Shi <dshi@google.com> |
[autotest] Support selecting drone in restricted subnet For agent task uses host in restricted subnet, only use drone in the subnet. For agent task uses host NOT in restricted subnet, only use drones NOT in any restricted subnet. BUG=chromium:574872 TEST=local run, unittest Change-Id: I3492fe14660e7629f982937d428d230ca9dcf3dc Reviewed-on: https://chromium-review.googlesource.com/321116 Commit-Ready: Dan Shi <dshi@google.com> Tested-by: Dan Shi <dshi@chromium.org> Reviewed-by: Fang Deng <fdeng@chromium.org> Reviewed-by: Simran Basi <sbasi@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
1bf60eb788365f083d0ee8045a6556f906149dec |
|
02-Dec-2015 |
Simran Basi <sbasi@google.com> |
[autotest] autoserv add --lab & --host_attributes arguments Added two new flags to autoserv. --lab indicates that autoserv is running in the lab and has the full Autotest infrastructure at its disposal. --host_attributes allows host attribute information that is usually in the database to be retrievable from the command line arguments. If --lab is pulled in, autoserv will request the host attributes from the database at test runtime. From here this change, then updates the concept of the "machines" list that test control files receive to now be a list of dicts that contain the machine hostname and host attributes. This will enable identifing information the hosts library needs to create host objects to be available whether or not there is a database present. BUG=chromium:564343 TEST=local autoserv runs. Also verified scheduler changes work via MobLab. waiting on trybot results. DEPLOY=scheduler Change-Id: I6021de11317e29e2e6c084d863405910c7d1a71d Reviewed-on: https://chromium-review.googlesource.com/315230 Commit-Ready: Simran Basi <sbasi@chromium.org> Tested-by: Simran Basi <sbasi@chromium.org> Reviewed-by: Simran Basi <sbasi@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
80f7c5339175966a1fad1cb0d6b5fbbab46ba032 |
|
25-Aug-2015 |
Dan Shi <dshi@chromium.org> |
[autotest] Replace scheduler email alerts with stats and metadata logging. The emails are not manageable. Replace with logs in metadata and stats server. BUG=chromium:524243 TEST=unittest, local scheduler run Change-Id: Ib9854576b64edfb428195d5cb323b8f883e635d5 Reviewed-on: https://chromium-review.googlesource.com/295482 Tested-by: Dan Shi <dshi@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> Commit-Queue: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
b92af21b84c5e27af7f2023ea54409c124d0968e |
|
10-Apr-2015 |
Paul Hobbs <phobbs@google.com> |
[autotest] Remove per-tick process restriction. The per-tick process restriction was causing a performance problem when a tick took a long time, and there isn't a good reason to keep the per-tick process constraint as there is already a total process constraint. TEST=Ran the scheduler. The unit tests pass. BUG=chromium:471352 Change-Id: I2b669fb758fbcc898e1727da51bd6d4cd99cd5d2 Reviewed-on: https://chromium-review.googlesource.com/265072 Trybot-Ready: Paul Hobbs <phobbs@google.com> Tested-by: Paul Hobbs <phobbs@google.com> Commit-Queue: Paul Hobbs <phobbs@google.com> Reviewed-by: Fang Deng <fdeng@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
d615a1e8c37ff5af145d9ef6b5af219c20728321 |
|
04-Sep-2014 |
Jakob Juelich <jakobjuelich@chromium.org> |
[autotest] Fix missing execution_subdir for hostless jobs The status of hostless jobs is set to Starting in schedule_new_jobs. If the scheduler is interrupted after doing that, it will try to restore the agents after a starting again. The execution_subdir is not set at that point though. Therefore an assertion will fail and an exception will be raised. Before this commit, the execution_subdir is set to 'hostless' in the prolog of hostless jobs. This commit moves this also to start_new_jobs, before setting the status, so when the status is set to Starting, the execution subdir will always be already set. In case the scheduler is interrupted after setting the execution_subdir but before setting the status, nothing bad will happen as the execution_subdir is never accessed if the status isn't Starting, Running, Gathering, Parsing or Archiving. BUG=chromium:334353 DEPLOY=scheduler TEST=Ran utils/unittest_suite.py and manually killed+restarted scheduler Change-Id: I048bf18883857d6ff5016ace64526729f631bc26 Reviewed-on: https://chromium-review.googlesource.com/215394 Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Jakob Jülich <jakobjuelich@chromium.org> Tested-by: Jakob Jülich <jakobjuelich@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
f47a6bbb9971efd228eaa22425431b91fa9f69bf |
|
29-Aug-2014 |
Prashanth B <beeps@chromium.org> |
Revert "[autotest] Restore from inconsistent state after the scheduler was interrupted." This reverts commit b7c842f8c8ba135bb03a0862ac0c880d3158bf07. Change-Id: I8d34329b8a2771eb4068ab50414c9eac6fd73d3f Reviewed-on: https://chromium-review.googlesource.com/215612 Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
b7c842f8c8ba135bb03a0862ac0c880d3158bf07 |
|
24-Jul-2014 |
Jakob Juelich <jakobjuelich@google.com> |
[autotest] Restore from inconsistent state after the scheduler was interrupted. If the scheduler assigns hosts to hqes but hasn't set a execution_subdir yet, an exception is thrown. With this, the database will be cleaned up once, when the scheduler starts. Jobs that are in an inconsistent state, will just be resetted so they can be scheduled again. BUG=chromium:334353 DEPLOY=scheduler TEST=Ran utils/unittest_suite.py and manually set db into inconsistent state. Change-Id: I96cc5634ae5120beab59b160e735245be736ea92 Reviewed-on: https://chromium-review.googlesource.com/209635 Tested-by: Jakob Jülich <jakobjuelich@chromium.org> Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Jakob Jülich <jakobjuelich@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
36accc6a2a572e9d502407b34701f535a169f524 |
|
23-Jul-2014 |
Jakob Jülich <jakobjuelich@google.com> |
[autotest] Fixing and re-enabling monitor_db_functional_test. The test was disabled and outdated. Database access and mocking of the drone manager changed. This fixes these issues, updates the unit tests to the current status and reanables them. BUG=chromium:395756 DEPLOY=scheduler TEST=ran ./utils/unittest_suite.py Change-Id: I6a3eda5ddfaf07f06d6b403692b004b22939ffb6 Reviewed-on: https://chromium-review.googlesource.com/209567 Reviewed-by: Alex Miller <milleral@chromium.org> Tested-by: Jakob Jülich <jakobjuelich@google.com> Commit-Queue: Jakob Jülich <jakobjuelich@google.com>
/external/autotest/scheduler/monitor_db_unittest.py
|
4ec9867f46deb969c154bebf2e64729d56c3a1d3 |
|
15-May-2014 |
Prashanth B <beeps@google.com> |
[autotest] Split host acquisition and job scheduling II. This cl creates a stand-alone service capable of acquiring hosts for new jobs. The host scheduler will be responsible for assigning a host to a job and scheduling its first special tasks (to reset and provision the host). There on after, the special tasks will either change the state of a host or schedule more tasks against it (eg: repair), till the host is ready to run the job associated with the Host Queue Entry to which it was assigned. The job scheduler (monitor_db) will only run jobs, including the special tasks created by the host scheduler. Note that the host scheduler won't go live till we flip the inline_host_acquisition flag in the shadow config, and restart both services. The host scheduler is dead, long live the host scheduler. TEST=Ran the schedulers, created suites. Unittests. BUG=chromium:344613, chromium:366141, chromium:343945, chromium:343937 CQ-DEPEND=CL:199383 DEPLOY=scheduler, host-scheduler Change-Id: I59a1e0f0d59f369e00750abec627b772e0419e06 Reviewed-on: https://chromium-review.googlesource.com/200029 Reviewed-by: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
f66d51b5caa96995b91e7c155ff4378cdef4baaf |
|
06-May-2014 |
Prashanth B <beeps@google.com> |
[autotest] Split host acquisition and job scheduling. This is phase one of two in the plan to split host acquisition out of the scheduler's tick. The idea is to have the host scheduler use a job query manager to query the database for new jobs without hosts and assign hosts to them, while the main scheduler uses the same query managers to look for hostless jobs. Currently the main scheduler uses the class to acquire hosts inline, like it always has, and will continue to do so till the inline_host_acquisition feature flag is turned on via the shadow_config. TEST=Ran the scheduler, suites, unittets. BUG=chromium:344613 DEPLOY=Scheduler Change-Id: I542e4d1e509c16cac7354810416ee18ac940a7cf Reviewed-on: https://chromium-review.googlesource.com/199383 Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
0e960285b022fad77f0b087a2007867363bf6ab9 |
|
14-May-2014 |
Prashanth B <beeps@google.com> |
[autotest] Consolidate methods required to setup a scheduler. Move methods/classes that will be helpful in setting up another scheduler process into scheduler_lib: 1. Make a connection manager capable of managing connections. Create, access, close the database connection through this manager. 2. Cleanup setup_logging so it's usable by multiple schedulers if they just change the name of the logfile. TEST=Ran suites, unittests. BUG=chromium:344613 DEPLOY=Scheduler Change-Id: Id0031df96948d386416ce7cfc754f80456930b95 Reviewed-on: https://chromium-review.googlesource.com/199957 Reviewed-by: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
372613d54bdfd1b708a5d41ced9a80e209e6dc6a |
|
05-May-2014 |
Prashanth B <beeps@google.com> |
[autotest] Sanity check host assignments. Check that we haven't violated any correctness constraints by assigning the same host to 2 simultaneously active jobs. These changes are in preperation for eventually breaking host assignment out of the scheduler. The performance degradation should be negligable since we're only querying for the host_ids of currently active jobs, every 5 minutes. TEST=Ran suites, unittests. BUG=None DEPLOY=Scheduler Change-Id: Ie560a67861f9e4d1d59cda9828fb9d2ef433e5f4 Reviewed-on: https://chromium-review.googlesource.com/198196 Reviewed-by: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
cc9fc70587d37775673e47b3dcb4d6ded0c6dcb4 |
|
02-Dec-2013 |
beeps <beeps@chromium.org> |
[autotest] RDB Refactor II + Request/Response API. Scheduler Refactor: 1. Batched processing of jobs. 2. Rdb hits the database instead of going through host_scheduler. 3. Migration to add a leased column.The scheduler released hosts every tick, back to the rdb. 4. Client rdb host that queue_entries use to track a host, instead of a database model. Establishes a basic request/response api for the rdb: rdb_utils: 1. Requests: Assert the format and fields of some basic request types. 2. Helper client/server modules to communicate with the rdb. rdb_lib: 1. Request managers for rdb methods: a. Match request-response b. Abstract the batching of requests. 2. JobQueryManager: Regulates database access for job information. rdb: 1. QueryManagers: Regulate database access 2. RequestHandlers: Use query managers to get things done. 3. Dispatchers: Send incoming requests to the appropriate handlers. Ignores wire formats. TEST=unittests, functional verification. BUG=chromium:314081, chromium:314083, chromium:314084 DEPLOY=scheduler, migrate Change-Id: Id174c663c6e78295d365142751053eae4023116d Reviewed-on: https://chromium-review.googlesource.com/183385 Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
76af802bd80edf50fd34efae25205c3aeaf82f25 |
|
19-Oct-2013 |
Dan Shi <dshi@chromium.org> |
[autotest] abort Starting suite job leads to scheduler crash abort Starting suite job leads to scheduler crash with error: AssertionError: self.execution_subdir not found BUG=chromium:309207,276507 TEST=unittest, and manual test: 1. add |max_hostless_processes: 1| and |max_processes_per_drone: 3| to SCHEDULER section of shadow config. 2. restart scheduler 3. add three new suite jobs. When the third job shows status of Starting in afe, try to abort it in afe. 4. abort all other suite jobs, and scheduler should abort all suite jobs. DEPLOY=scheduler Change-Id: I763918c34569643edb5e0acd94a3ca54cc6e5949 Reviewed-on: https://chromium-review.googlesource.com/173770 Reviewed-by: Dan Shi <dshi@chromium.org> Commit-Queue: Dan Shi <dshi@chromium.org> Tested-by: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
7d8273bad1318c13698a162a6e5910bea060d167 |
|
06-Nov-2013 |
beeps <beeps@chromium.org> |
[autotest] RDB refactor I Initial refactor for the rdb, implementes 1 in this schematic: https://x20web.corp.google.com/~beeps/rdb_v1_midway.jpg Also achieves the following: - Don't process an hqe more than once, after having assigned a host to it. - Don't assign a host to a queued, aborted hqe. - Drop the metahost concept. - Stop using labelmetahostscheduler to find hosts for non-metahost jobs. - Include a database migration script for jobs that were still queued during the scheduler restart, since they will now need a meta_host dependency. This cl also doesn't support the schedulers ability to: - Schedule an atomic group * Consequently, also the ability to block a host even when the hqe using it is no longer active. - Schedule a metahost differently from a non-metahost * Both metahosts and non-metahosts are just labels now * Jobs which are already assigned hosts are still give precedence, though - Schedule based on only_if_needed. And fixes the unittests appropriately. TEST=Ran suites, unittests. Restarted scheduler after applying these changes and tested migration. Ran suite scheduler. BUG=chromium:314082,chromium:314219,chromium:313680,chromium:315824,chromium:312333 DEPLOY=scheduler, migrate Change-Id: I70c3c3c740e51581db88fe3ce5879c53d6e6511e Reviewed-on: https://chromium-review.googlesource.com/175957 Reviewed-by: Alex Miller <milleral@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
5e2bb4aa28611aaacaa8798fd07943ede1df46c6 |
|
28-Oct-2013 |
beeps <beeps@chromium.org> |
[autotest] Scheduler refactor. Break scheduler into simpler modules. This change also modifies run_pylint to check for undefined variables. BUG=chromium:312338 TEST=Ran smoke suite against multiple duts. Triggered agents like repair, verify etc. Pylint, Unittests. DEPLOY=scheduler Change-Id: Ibd685a27b5b50abd26cdf2976ac4189c3e9acc0a Reviewed-on: https://chromium-review.googlesource.com/174080 Commit-Queue: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org> Reviewed-by: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
7d8a1b139e6390f6dad06bf0cf2bfeb9a4a69304 |
|
30-Oct-2013 |
beeps <beeps@chromium.org> |
[autotest] De-prioritize hostless hqes in favor of tests. Currently, hostless hqes get precedence over tests. In situations when we're flooded with suites this is a problem, as it leads to a deadlock situation where many hostless jobs are waiting on tests that the drone doesn't have the capacity to run. Note that even after this change such scenarios are possible, just a little less likely. TEST=Started suites, set a low limit, checked that we run tests before hostless jobs. Checked that we prioritize (host+no metahost) over (no host+metahost). Ran unittests. BUG=chromium:312847 DEPLOY=scheduler Change-Id: Ibe66e8a0b6319561cc24e491ec7b9b370a840bad Reviewed-on: https://chromium-review.googlesource.com/175028 Tested-by: Prashanth B <beeps@chromium.org> Reviewed-by: Alex Miller <milleral@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
d0e09ab5697f48012bdf4b426d55cd0fb58f4926 |
|
10-Sep-2013 |
Dan Shi <dshi@chromium.org> |
[autotest] Fix SelfThrottledTask._num_running_processes when suite job is aborted When suite job is aborted, the variable SelfThrottledTask._num_running_processes is not decremented. The cause is that abort call AbstractQueueTask.abort bypasses call on SelfThrottledTask.finished. Change is made in Agent.abort method. When a task is aborted from an AgentTask, BaseAgentTask.finished(False) is called to allow finished method in SelfThrottledTask to be called to update the counters properly. BUG=chromium:288175 TEST=unittest, add logging in SelfThrottleTask._increment_running_processes and _decrement_running_processes methods to print out value of _num_running_processes. Start scheduler (monitor_db) in local workstation, create several suite jobs via run_suite, cancel some of the suite jobs. After all jobs are finished or aborted, confirm value of _num_running_processes are all 0. Change-Id: I80545fc68a75db645c9b8b5330b05b64e7609a9d Reviewed-on: https://chromium-review.googlesource.com/168649 Reviewed-by: Alex Miller <milleral@chromium.org> Tested-by: Dan Shi <dshi@chromium.org> Commit-Queue: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
1f23b6918dd601f469eea2975f3e8dbda6659b58 |
|
14-May-2013 |
Aviv Keshet <akeshet@chromium.org> |
[autotest] reenable django or simplejson requiring unit tests For a refactor of control_type, a lot of the relevant unit test coverage is currently blacklisted in utils/unittest_suite.py due to requiring Django or simplejson. This CL un-blacklists those tests. It also therefore fixes a few tests which are failing or broken. In particular: tko/rpc_interface_unittest was throwing `DatabaseError: only a single result allowed for a SELECT that is part of an expression` all over the place. I couldn't blacklist just this test file, since it has the same filename as afe/rpc_interface_unittest.py which we do not want to blacklist and with the way blacklists work in utils/unittest_suite I couldn't blacklist just one of them. Instead, I have renamed the broken test file to rpc_interface_unittest_fixme.py. tko/resources_test imports this failing module, so I had to rename that one too. One test in resources_test was throwing the same excepiion as above, so I commented it out. (test_keyval_filtering) in monitor_db_unittest, the test test_HostScheduler_get_host_atomic_group_id throws KeyErrors that seem to be related to labels not being correctly set up or committed to the test database in the test setup phase. After trying to blindly track this down a bit, I realized I was in over my head and just commented out this specific test). monitor_db_functional_test was throwing `DatabaseError: near "TRUNCATE": syntax error` all over the place, so I blacklisted that test file in utils/unittest_suite.py BUG=chromium:240643 TEST=utils/unittest_suite.py # All tests pass Change-Id: I8fdbe048b04516548e96bd888ed74e9fc82a2d88 Reviewed-on: https://gerrit.chromium.org/gerrit/51190 Commit-Queue: Aviv Keshet <akeshet@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
72822020510c0bef3e242a00da492ce7a6ad55f1 |
|
12-Apr-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Do not write queue or .machines files. These files were causing us to not properly redirect to GS for job folders. However, these files don't really serve a purpose for us, so rather than fixing up and running gs_offloader to handle these files, it's better to just not write them in the first place. BUG=chromium:230838 DEPLOY=scheduler TEST=Ran a job, made sure queue and .machines files don't appear Change-Id: Ie1f0014b31f2ed274cad6ce03d98d7f6ce947f43 Reviewed-on: https://gerrit.chromium.org/gerrit/48182 Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Scott Zawalski <scottz@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db_unittest.py
|
aa5133608fb8ea153fb396f332121b617869dcb7 |
|
02-Mar-2011 |
Dale Curtis <dalecurtis@chromium.org> |
Host scheduler refactoring. Move HostScheduler out of monitor_db. In order to facilitate site extensibility of HostScheduler we need to factor out the dependence on global variables in monitor_db. I modeled this refactoring off of monitor_db_cleanup. The main changes I've made are as follows: 1. Move BaseHostScheduler, site import, and SchedulerError out of monitor_db. SchedulerError must be moved to prevent a cyclical dependency. 2. Convert staticmethod/classmethods in BaseHostScheduler, to normal methods. 3. Fix unit tests and monitor_db to import SchedulerError from host_scheduler. Change-Id: I0c10b79e70064b73121bbb347bb71ba15e0353d1 BUG=chromium-os:12654 TEST=Ran unit tests. Tested with private Autotest instance. Review URL: http://codereview.chromium.org/6597047
/external/autotest/scheduler/monitor_db_unittest.py
|
dd77e01701cef9e97f586294565f1fed41d0e7f8 |
|
28-Apr-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix an error in drone sets in monitor_db. Also added more unit tests. Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4449 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
76fcf19ec42d5c7580d2e7891e4610e5fe725286 |
|
21-Apr-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add ability to associate drone sets with jobs. This restricts a job to running on a specified set of drones. Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4439 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
47bd737d76b61b40f4f321a1e88919caf74dacc3 |
|
13-Mar-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Set hostless queue entries to STARTING upon scheduling the agent. This fixes an issue where the scheduler created multiple HostlessQueueTask objects for a single hostless queue entry, causing several autoserv processes to be launched when the agents are run. Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4304 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
c44ae99354228290914326d42ef1e743b5b7e4b8 |
|
19-Feb-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Refactor scheduler models into a separate module, scheduler_models. This module doesn't depend on monitor_db, only the other way around. The separation and isolation of dependencies should help us organize the scheduler code a bit better. This was made possible largely by the many changes we made late last year to improve statelessness of the scheduler. It was motivated here by my work on pluggable metahost handlers, which will need to depend on scheduler models. Without this separation, we'd end up with circular dependencies. Also includes some fixes for metahost schedulers. Signed-off-by: Steve Howard <showard@google.com> Property changes on: scheduler/scheduler_models.py git-svn-id: http://test.kernel.org/svn/autotest/trunk@4252 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
883492a628bfe5a24bd281cfcac036d77a2acc4e |
|
12-Feb-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
First iteration of pluggable metahost handlers. This change adds the basic framework and moves the default, label-based metahost assignment code into a handler. It includes some refactorings to the basic scheduling code to make things a bit cleaner. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4232 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
64a9595406f2884fb3ece241190b10aa054439a9 |
|
13-Jan-2010 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
When using Django models from a script, make the current user default to an actual database user named "autotest_system". This allows for simpler, more consistent code. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4114 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
78f5b016b5367cb51b1f031b31e3afea6ebd2d74 |
|
23-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Update to Django 1.1.1. I want to use a new feature for my RESTful interface prototyping (direct inclusion of URL patterns in URLconfs). The one obstacle this presented was that Django 1.1.1 changes the DB connection object to accept DB config information in its constructor, rather than reading it from django.conf.settings on-demand. This was a problem because we change stuff in django.conf.settings on the fly to do our fancy test DB stuff -- basically, we initialize a SQLite DB once, copy it off, and then copy it between test cases, rather than clearing and reconstructing the initial DB. I did measurements and it turns out all that jazz wasn't really saving us much time at all, so I just got rid of it all. Django's testing stuff has improved and v1.1 even has some new tricks for using transactions to accomplish the above with a dramatic speedup, so we ought to look into using that in the future. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4041 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
eab66ce582bfe05076ff096c3a044d8f0497bbca |
|
23-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Rename the tables in the databases, by prefixing the app name. This is in preparation for merging the two databases and the two Django projects into one. Note that this renames *all* standard Autotest DB tables in both the autotest_web and tko databases. If you have scripts written directly against these databases, *they will break*. If your scripts access the RPC interfaces, they should continue to work. Another patch will be along within the next few weeks to actually move the TKO tables into the autotest_web database. From: James Ren <jamesren@google.com> Signed-off-by: Steve Howard <showard@google.com> Rename the tables in the databases, by prefixing the app name. This is in preparation for merging the two databases and the two Django projects into one. Note that this renames *all* standard Autotest DB tables in both the autotest_web and tko databases. If you have scripts written directly against these databases, *they will break*. If your scripts access the RPC interfaces, they should continue to work. From: James Ren <jamesren@google.com> Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4040 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
f13a9e2b856ae9e4e2f43ef6cbc6083c7435167b |
|
18-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add periodic CPython garbage collector statistics logging to aid in tracking down a memory leak and as a general health beacon for the long running process. The interval at which stats are logged is configurable. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4021 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
f65b7402e74e86e9b654fb6343800c212ac3bc05 |
|
18-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix a rather brittle scheduler unit test Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4019 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
d11956572cb7a5c8e9c588c9a6b4a0892de00384 |
|
08-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make drone_manager track running processes counts using only the information passed in from the scheduler. Currently it also uses process counts derived from "ps", but that is an unreliable source of information. This improves accuracy and consistency and gives us full control over the process. This involves a few primary changes: * made the drone_manager track process counts with each PidfileId * added method declare_process_count() for the scheduler to indicate the process count of a pidfile ID during recovery (in other cases, the DroneManager gets that info in execute_process()) Doing this involved some extensive refactorings. Because the scheduler now needs to declare process counts during recovery, and because the AgentTasks are the entities that know about process counts, it made sense to move the bulk of the recovery process to the AgentTasks. Changes for this include: * converted a bunch of AgentTask instance variables to abstract methods, and added overriding implementations in subclasses as necessary * added methods register_necessary_pidfiles() and recover() to AgentTasks, allowing them to perform recovery for themselves. got rid of the recover_run_monitor() argument to AgentTasks as a result. * changed recovery code to delegate most of the work to the AgentTasks. The flow now looks like this: create all AgentTasks, call them to register pidfiles, call DroneManager to refresh pidfile contents, call AgentTasks to recover themselves, perform extra cleanup and error checking. This simplified the Dispatcher somewhat, in my opinion, though there's room for more simplification. Other changes include: * removed DroneManager.get_process_for(), which was unused, as well as related code (include the DroneManager._processes structure) * moved logic from HostQueueEntry.handle_host_failure to SpecialAgentTask._fail_queue_entry. That was the only call site. And some other bug fixes: * eliminated some extra state from QueueTask * fixed models.HostQueueEntry.execution_path(). It was returning the wrong value, but it was never used. * eliminated some big chunks from monitor_db_unittest. These broke from the refactorings described above and I deemed it not worthwhile to fix them up for the new code. I checked and the total coverage was unaffected by deleting these chunks. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4007 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
d07a5f3bd8edb843da7f1568bd7be06c32761e11 |
|
07-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
The check for enough pending hosts after the delay to wait for others to become ready before moving from Pending -> Starting on an atomic group job was checking against the wrong value and requiring too many hosts. As a result some jobs never ran. Also, it was not aborting the job which left these HostQueueEntries and Hosts in limbo (until the job timeout would eventually hit a couple days later). Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3998 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
418785bf16a0cb72a5fe5519e8693d7546cd427d |
|
23-Nov-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Some improvements to process tracking in the scheduler. * have all AgentTasks declare how many processes they'll create (as an instance attribute). this is really where the information belongs. * have Agent read its num_processes from its AgentTask, rather than requiring clients to pass it into the constructor. * have AgentTasks pass this num_processes value into the DroneManager when executing commands, and have the DroneManager use this value rather than the hack of parsing it out of the command line. this required various changed to the DroneManager code which actually fix some small bugs and make the code cleaner in my opinion. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3971 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
9bb960b90d5102cce1c8a15314900035c6c4e69a |
|
19-Nov-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Support restricting access to drones by user. Administrators can put lines like <hostname>_users: showard,scottz in the global config, where <hostname> is a drone hostname. That drone will then be limited to use by those users (that is, by jobs launched by those users, and tasks launched due to those jobs). This required numerous changes: * added a requested_by field to SpecialTask (with corresponding migration). For tasks with queue_entries, we can infer this from the job, but for those without, we need this information explicitly declared. Note this can be null if the task was created by the system, not in response to any user action. The only place this occurs now is in scheduler recovery (Dispatcher._recover_hosts_where()), but there may be an upcoming feature to periodically reverify hosts, which would be another (much more common) case. * modified all SpecialTask creation sites to pass requested_by if necessary. * modified AgentTask to keep a username attribute, and modified its run() method to pass that to PidfileRunMonitor.run(), which passes it along to DroneManager.execute_command(). * modified Agent to always keep self.task around, there's no reason to throw it away and now that we're looking at it from other classes, it's problematic if it disappears. * modified Dispatcher throttling code to pass the username when requesting max runnable processes. * added an allowed_users property to _AbstractDrone, and made DroneManager load it from the global config. * made DroneManager's max_runnable_processes() and _choose_drone_for_execution() methods accept the username and obey user restrictions. * added extensive tests for everything. the modiications required to monitor_db_unittest were annoying but not too bad. but parts of that file may need to be removed as they'll be obsoleted by monitor_db_functional_test and they'll become increasingly annoying to maintain. couple other related changes: * got rid of CleanupHostsMixin. it was only acutally needed by GatherLogsTasks (since we made the change to have GatherLogsTask always run), so I inlined it there and simplified code accordingly. * changed a bunch of places in the scheduler that were constructing new instances of Django models for existing rows. they would do something like "models.Host(id=<id of existing host>)". that's correct for scheduler DBModels, but not for Django models. For Django models, you only instantiate new instances when you want to create a new row. for fetching existing rows you always use a manager -- Model.objects.get() or Model.objects.filter() etc. this was an existing bug but wasn't exposed until I made some of the changes involved in this feature. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3961 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
e60e44ece1445d97977a77cb79f0896989b869d7 |
|
13-Nov-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Special tasks show "Failed" as their status instead of "Completed" if they failed Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3946 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
7ca9e01f5ef84af6e4f0649d8291e05ee158e833 |
|
10-Nov-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Remove the synch_job_start_timeout_minutes scheduler "feature" as it is pretty much broken by design as is by being based off of the job create time rather than the time the job's hosts went into Pending. Its not being used so its easier to remove it. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3921 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
8375ce0795fa95fcb4698790ed4db8827f190116 |
|
12-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix unindexable object error raised on the error path within _schedule_running_host_queue_entries. Also cleans up some docstrings. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3830 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
d20148295ae80208334474587277580ecacaed92 |
|
12-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
When a delayed call task finishes waiting for extra hosts to enter Pending state on an atomic group job, re-confirm that the job still has enough Pending hosts to run. It could have been Aborted either manually or due to a timeout meaning it should no longer be run. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3820 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
dae680a5d4ff80f540aadfb6f3687a9bceaf473c |
|
12-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Ignore microsecond differences in datetimes when checking existing in memory rows against database rows. datetime objects store microseconds but the database datetime fields do not. not doing this leads to unnecessary warnings. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3819 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
ec6a3b9b60f9e7e6ff26c1c7547f557043b9d52f |
|
25-Sep-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make the pidfile timeout in the scheduler configurable. Raise the default from 5 minutes to 5 hours (the value we're using on our server due to 5 minutes being short enough to cause issues when under heavy load). Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3767 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
db502763a2ece3f2aea7b1badca20a6e0b9d3ed7 |
|
09-Sep-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Write host keyvals for all verify/cleanup/repair tasks. Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3677 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
8cc058f50a46976e0a446aa3054f7f2349d6291a |
|
08-Sep-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make scheduler more stateless. Agents are now scheduled only by the dispatcher, and agents no longer use in-memory state to remember multiple tasks. All state is managed by the database. Risk: high (large scheduler change) Visibility: medium (scheduler restarts are now more stable) Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3664 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
cdaeae86c156dece62e29afdd4a9976a922883aa |
|
31-Aug-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fixed bug where scheduler would crash if the autoserv process is lost during verify/cleanup/repair. Risk: low Visibility: medium Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3627 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
6631273af8b88842cbd6202cc4615daf050cc957 |
|
27-Aug-2009 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make a bunch of stuff executable git-svn-id: http://test.kernel.org/svn/autotest/trunk@3621 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
58721a8b8d9562579f2e45fdd80db2f67d58a6ac |
|
21-Aug-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
One-off fix to address the issue where a scheduler shutdown immediately after a special task leaves the HQE in a strange state. Specifically, we saw this when a cleanup fails, and the scheduler shuts down before the associated repair starts. HQEs are now requeued after a failed cleanup/verify. TODO: reimplement scheduler to maintain less state in memory by not relying on storing an array of AgentTasks. Risk: medium (scheduler change) Visibility: medium (scheduler bug fix) Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3573 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
6d1c143fb40a752a6d801cf91523f76e505f6054 |
|
21-Aug-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix scheduler's handling of jobs when the PID file can't be found. Risk: low Visibility: medium (scheduler bug fix) Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3568 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
708b3523a34dc9e874fd9488f3d9e306cf0ebc4e |
|
21-Aug-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Do not go through a DelayedCallTask on atomic group jobs when all Hosts assigned to the job have entered Pending state. There are no more left to wait for. Adds a log message prior to the delay starting for easier debugging. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3564 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
1ef218db76c473c28627377d8f50d6e6c6743289 |
|
03-Aug-2009 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
This is the result of a batch reindent.py across our tree. As Martin pointed out, we ought to be more careful and create a pre-svn commit script to avoid inserting trash in the tree, meanwhile, this is a good start to cleanup things Signed-off-by: Lucas Meneghel Rodrigues <lmr@redhat.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3487 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
a5288b4bb2b09aafe914d0b7d5aab79a7e433eaf |
|
28-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Upgrade from Django 0.96 to Django 1.0.2. Risk: high (framework change) Visibility: medium Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3457 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
a640b2d5cc00ca83f6c41a663225f9a41890f6bf |
|
21-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix scheduler bug with aborting a pre-job task. Scheduler was crashing when a job was aborted during the cleanup phase. Risk: medium (scheduler change) Visibility: high (critical bug fix) Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3425 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
8ac6f2a349f88e28b551f394eb6f68e1922ed396 |
|
16-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
When a SpecialAgentTask is passed an existing SpecialTask, set the _working_directory upon object construction. It was previously set in prolog(), but recovery agents don't run prolog, but they still need _working_directory sometimes (i.e. when a RepairTask fails). Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3419 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
381341a00cb7cccaf90d84ec466524c3b6376429 |
|
15-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Enter the mock objects created in AgentTasksTest of monitor_db_unittest into the database. This is for compatibility with Django 1.0, which is more strict with foreign key relations than Django 0.96. Risk: low Visibility: low Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3418 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
cfd4a7ecdbc9b7fc1ad4b3667c97f01496316c5e |
|
11-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
With the new SpecialTask recovery code, a RepairTask can be passed a queue entry that was previously requeued. So make sure the task leaves the HQE alone in that case. Also delete some dead code that called requeue(). Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3411 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
b6681aa638dc774a296a3627bfc1198a6eb2a99e |
|
08-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
SpecialAgentTasks can be aborted if they're tied to a job that gets aborted while they're active. In that case, we still need to update the SpecialTask entry to mark it as complete. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3386 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
ed2afea4ca6e23a82d20d1f2ee1067d0c25a8cc2 |
|
07-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
make SpecialTasks recoverable. this involves quite a few changes. * run tasks in determined dirs instead of temp dirs. the dir paths look like hosts//-, for example, hosts/myhost/4-verify. the ID comes from the SpecialTask DB row. this allows us to find the pidfile when we go looking for it during recovery, and it makes it simple to find the logs for any given special task, much like for HostQueueEntries. added SpecialTask.execution_path() for this purpose, and added models_test to test it. * added execution_path() to HostQueueEntry to match the interface of SpecialTask, allowing for more polymorphism, and changed most call sites to use it. * since we're running in these dirs, copy the full results back in these dirs, instead of just copying a single log file. * move process recovery code up into AgentTask, so that all AgentTasks can share the same generic process recovery code. * change SpecialTask recovery code to do process recovery. * change VerifyTask handling of multiple pending verify requests for a machine. instead of updating all the requests, just delete all other tasks. they're not specially tracked in any way so it's simplest to just delete them. * made special tasks get marked is_active=False when they complete, to be consistent with HQEs other changes: * added null=True to SpecialTask.time_started definition * made EmailManager.enqueue_notify_email always log the message, and removed explicit logging calls from call sites * added feature to DroneManager.execute_command() to automatically substitute the working directory into the command. this avoids some duplicate information being passed around and simplifies the unit test. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3380 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
6157c63947d2d628d187a084acb0a48473af1c79 |
|
06-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make the scheduler robust to finding a HostQueueEntry with more than one atomic group label. Log a detailed error message and continue rather than bailing out with a SchedulerError. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3373 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
2fe3f1df42f5fd1dc6296219df289851dcf77025 |
|
06-Jul-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Enter all Verify/Cleanup/Repair tasks into the special_tasks table. Also keep track of which Host Queue Entry (if any) each Verify/Cleanup/Repair task belongs to. Additionally, implement recovery for jobs in Verify/Cleanup/Repair (i.e., do not simply reverify the host and requeue the job). Risk: medium (scheduler changes) Visibility: medium (functionality change) Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3372 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
e7d9c605bacd7b1816987994ae18a68c63306a16 |
|
02-Jul-2009 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make the job executiontag available in both the server and client side job objects as job.tag. This is useful if your job would like to copy its data off directly to a results repository on its own from the client machine. Mostly small changes to pass the data down, though I did some docstring cleanup near code that I touched which makes the diff larger. The execution tag is taken from the autoserv -P parameter if supplied and no explicit --execution-tag parameter is supplied. This prevents the need to change monitor_db.py to pass yet another autoserv parameter. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3359 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
e9c6936b69cbf3fe5d292c880c81c5662231bd3d |
|
30-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Pass --verbose flag for verify/repair/cleanup. Since we currently log these via piped console output, we want verbose output. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3326 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
b562645f954117559e1ad8e0e8e607e11d9794f7 |
|
30-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
ensure hosts get cleaned up even in the rare but possible case that a QueueTask finds no process at all Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3325 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
2924b0ac9e0ca35e2cd45a23b60ecfc204360c44 |
|
19-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Ensure one-time-hosts aren't in the Everyone ACL, and make the scheduler ignore this. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3299 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
af8b4ca5837e8a8488ad80df75815bf320cb3da1 |
|
16-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix _atomic_and_has_started() to check *only* for states that are a direct result of Job.run() having been called. This was preventing atomic group jobs from running if any machine failed Verify before Job.run was called. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3289 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
77182562edaaeeffcb98f48a7236a727136aa8ec |
|
10-Jun-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Have the scheduler wait a configurable amount of time before starting atomic group jobs as soon as minimum synch count hosts are available in Pending state up until AtomicGroup.max_number_of_hosts are available. Adds a DelayedCallTask class to monitor_db along with logic in the Job class to use this to delay the job becoming ready to run for a little while as well as making sure the job is run at the end of the delay without needing to wait for another host to change state. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3236 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
184a5e885d70ddb6c6e11159be90099c29a3256f |
|
29-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
make AgentTasksTest inherit from BaseSchedulerTest. it didn't used to, since it didn't have any DB dependencies, but the recent introduction of SpecialTasks has changed that, so we need AgentTasksTest to setup the DB now like everything else. It doesn't increase the unit test runtime too drastically. Signed-off-by: Steve Howard <showard@google.com> Autotest@test.kernel.org http://test.kernel.org/cgi-bin/mailman/listinfo/autotest git-svn-id: http://test.kernel.org/svn/autotest/trunk@3184 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
b6d1662e18d756483d5fd81f4057cae4ef62152c |
|
26-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
fix JobManager.get_status_counts, which was returning incorrect counts in some cases when jobs were aborted. the problem was that it's possible for a complete entry to have aborted set or not and have the same full status, which was violating an assumption of the method. to test it, instead of adding stuff to the doctests (which would be messy in this particular case, since we need to reach in and mess with HQE stauses), i instead started a new rpc_interface_unittest, which seems to be the way of the future. since it shared a bunch of logic with the scheduler unit test (which also depends on setting up a fake AFE database), i extracted common logic into frontend/afe/frontend_test_utils.py. i also fixed up some of the logic extracted from monitor_db_unittest for reusing an initial DB between tests. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3177 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
5add1c8f74eeec1631f3b0775fd1f420c74cae22 |
|
26-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make recovered tasks correctly handle being aborted before being started. Unlike other tasks, recovered tasks are effectively "started" as soon as they're created, since they're recovering a previously started task. So implement that properly so that when they're aborted, they do all the necessary killing and cleanup stuff. This should fix a bug where jobs aborted while the scheduler is down won't get properly aborted when the scheduler starts up. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3171 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
54c1ea94793a2927fe76a876526a1fdd95cd1b58 |
|
20-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Sort hosts when choosing them for use in an atomic group and when actually assigning pending ones to run a job. Adds a Host.cmp_for_sort classmethod usable as a sort comparison function to sort Host objects by hostname in a sane manner. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3149 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
ebc0fb7543af140398e8546eea560762d1f0b395 |
|
13-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add an extra check for existence of Autoserv results in GatherLogsTask -- in certain recovery cases this can be false, previously leading to an exception. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3133 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
12f3e3212795a539d95973f893ac570e669e3a22 |
|
13-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add job maximum runtime, a new per-job timeout that counts time since the job actually started. * added started_on field to host_queue_entries, so that we could actually compute this timeout * added max_runtime_hrs to jobs, with default in global config, and added option to create_job() RPC * added the usual controls to AFE and the CLI for the new job option * added new max runtime timeout method to * added migration to add new fields and set a safe default max runtime for existing jobs Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3132 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
2d7c8bde223f5e29483a72bf17d0c999e3c28c96 |
|
13-May-2009 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix scheduler unittest for parser's new -P flag Signed-off-by: Rachel Kroll <rkroll@google.com> Autotest@test.kernel.org http://test.kernel.org/cgi-bin/mailman/listinfo/autotest git-svn-id: http://test.kernel.org/svn/autotest/trunk@3124 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
a1e74b3e9d68792fae0c926f89b6de1736b1fe21 |
|
12-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add job option for whether or not to parse failed repair results as part of a job, with a default value in global_config. Since the number of options associated with a job is getting out of hand, I packaged them up into a dict in the RPC entry point and passed them around that way from then on. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3110 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
f1ae354808a2eeb95d706a669250b613765212a4 |
|
11-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Represent a group of machines with either the atomic group label name, if a specific label was used, or the atomic group name in the results database when parsing. Adds an optional host_group_name= to the server side group job keyval file. The scheduler choses the most appropriate name for this and adds it to the group keyvals file. Changes the TKO results parser to use host_group_name= as the machine name instead of hostname= when hostname= is a comma separated list of hostnames rather than a single name. Also fixes atomic group scheduling to be able to use up to the atomic group's max_number_of_machines when launching the job; This is still unlikely to happen as the code still launches the job as soon as at least the sync count have exited Verify. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3103 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
597bfd3aa52f467942dc181d1dcb4223644c2f7f |
|
08-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Only run crashinfo collection when Autoserv exited due to some signal -- not just when it failed. Also make a minor fixup to some logging during process recovery. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3098 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
08a36413b0cd9939aa0090ce4ceaafb8dc43d002 |
|
05-May-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Change Agent.abort() again. This time, it runs through its queue of AgentTasks, aborting them until it reaches one that ignores the abort (or exhausts the queue). With the previous logic, we might have an Agent with a GatherLogsTasks that should ignore the abort, but if the Agent got aborted before starting it would never run the task. I hope I've really got it right this time. To help simplify things, I reorganized the AgentTask logic a bit, making AgentTask.poll() call AgentTask.start() itself so that the Agent wouldn't have to explicitly call AgentTask.start(). I also got rid of Agent.start(), which was unused. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3089 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
0bbfc2175f8d76647b9f6de7e1d5635d85ca5c00 |
|
29-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make autoserv --collect_crashinfo only run when Autoserv actually failed (exit status nonzero) or was aborted. I was being lazy and always running it, but it seems that introduced very annoying latency into job runs. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3063 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
20f9bddedda271bf486d4de20135160b6951b71d |
|
29-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
fix Agent.abort() when it's called before the agent has started (in that case, it should do nothing -- but the logic was making it basically ignore the abort). this should fix jobs being aborting in the "starting" phase (a phase that lasts one cycle before "running" starts). Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3060 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
d920518065a2b90fec5dd9e3a23d446254502ee3 |
|
27-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make RepairTask write job_queued and job_finished keyvals so they can be parsed into TKO when failed repair results are parsed. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3038 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
6b73341768f1cf0de210630142929047633658ff |
|
27-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix two bugs introduced in previous change to add collect_crashinfo support. * Some late modifications to the previous change prevented the FInalReparseTask from running when a job was aborted. Fixed that by allowing AgentTasks to really ignore an abort (which the PostJobTasks do). * The new abort logic caused hosts to not get cleaned up after an abort if the job was running and had the "reboot_after = Never" option set. This may or may not be preferable to users, but it's a change from the previous logic, so I'm changing it back to always run cleanup when a job is aborted. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3037 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
d3dc199703bfb8784a2f8f072d0514532c86c0a9 |
|
22-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add support to the scheduler to run autoserv --collect_crashinfo after a job finishes or is aborted. * added new state "Gathering" for when we're running collect_crashinfo and copying logs to the results repository * added new GatherLogsTask to the scheduler to perform these two tasks, and made it get run either after a job finishes or after a job is aborted. this task shares a lot with FinalReparseTask, so extracted common code into a new PostJobTask. * made changes to scheduler/drone code to support generic monitoring and recovery of processes via pidfiles, since we need to be able to recover the collect_crashinfo processes too. this will also made the scheduler recover parse processes instead of just killing them as it does now, which is nice. * changed abort logic significantly. since we now need to put aborted jobs through the gathering and parsing stages, but then know to put them into "aborted" afterwards, we can't depend on the old path of abort -> aborting -> aborted statuses. instead, we need to add an "aborted" flag to the HQE DB table and use that. this actually makes things generally cleaner in my opinion -- for one, we can get rid of the "Abort" and "Aborting" statuses altogether. added a migration to add this flag, edited model and relevant logic appropriately, including changing how job statuses are reported for aborted entries. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3031 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
915958db04ca97d3d5a011383e736a3e2b4e8db3 |
|
22-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix monitor_db_unittest, broken by previous change to refactor cleanup code. Two main things here: * 24hr cleanup was running upon object construction, which meant it was running inadvertently during unit testing. Fixed this with the usual trick of moving that action from the constructor to an initialize() function, which gets called separately in monitor_db and which the unit test avoids. * one of the scheduler unit tests was actually testing cleanup code; change that to call the newly located function. this test should maybe be moved to a separate unit test file for the monitor_db_cleanup module, but I just want to get things working again for now. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3029 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
87ba02a820301220076cccf34d34d9243f18da7a |
|
20-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
extract code for generated autoserv command lines to a common place, including support for -l and -u params, and make verify, repair and cleanup tasks pass those params. this should make failed repairs include the right user and job name when parsed into tko. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3019 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
76e29d112fbe5f5afc13cebad278fdcb4fde752f |
|
15-Apr-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix monitor_db.DBObject.save() to handle None values as NULL properly. Atomic group scheduling was causing host_queue_entries rows in the database to be created with meta_host=0 when it should have been NULL. This was causing problems elsewhere as code failed to find label id 0. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2988 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
205fd60f9c9d2f64ec2773f295de1cf5cfd3bc77 |
|
21-Mar-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix the AtomicGroup name display in the admin interface. Adds an invalid bool column and use the existing invalid model to avoid problems when deleting from the django admin interface. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2918 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
ccbd6c5c6dfc10072f6ace2f528b9ed7c764a0b3 |
|
21-Mar-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Ensure RepairTasks aren't associated with the queue entries that spawned them, so that if the QE is aborted during repair the repair task will continue running (and just leave the QE alone from then on). Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2917 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
89f84dbadf071ba430244356b57af395c79486e4 |
|
12-Mar-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add the concept of an Atomic Group to the scheduler and database. Scheduling a job on an atomic group means that all of the Ready machines (up to a maximum specified in the atomic group) in a single label associated with that atomic group will be used to run the job. The job synch_count becomes a minimum when scheduling on an atomic group. Both HostQueueEntrys and Labels may have an AtomicGroup associated with them: * A HostQueueEntry with an AtomicGroup acts to schedule a job on all Ready machines of a single Label associated with that AtomicGroup. * A Label with an AtomicGroup means that any Hosts bearing that Label may only be scheduled together as a group with other hosts of that Label to satisify a Job's HostQueueEntry bearing the same AtomicGroup. Such Hosts will never be scheduled as normal metahosts. Future patches are coming that will add the ability to schedule jobs using this feature to the RPC interface, CLI and GUI. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2878 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
a3c585726f4cbc9ab19f9a39844aecc5c9f8c9ce |
|
12-Mar-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
a) Reduce the number of instances of DBObject classes created for the same row in the database by caching existing live instances using weakref's. b) Raise an Exception object rather than a string on SELECT failure. c) Use itertools.izip() instead of zip() Risk: a) Medium b) Low c) Low Visibility: This alters the behavior of the scheduler but by default we always re-query the database when a particular row is instantiated even when reusing an existing cached object. Monitoring of the log file will show if this is even an issue. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2875 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
35162b0a9d278a43d3fd90f3664ea09092ba9684 |
|
03-Mar-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Updated patch including unit test fixes. * remove the whole "dummy process" and "null drone" thing, and change QueueTask to be aware of when the process is lost. the old way was supposed to basically transparently handle that case, but it was just flawed. instead, QueueTask needs to check for the lost process case and handle it specially. this eliminated the need for the dummy process and the NullDrone class, so I removed them. QueueTask also writes a special file with an error message when the process is lost, so the user will be aware of it. (this should only happen when there are system errors on the main server.) * add an extra comment to a check in FinalReparseTask that I added recently, but that was confusing me now. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2844 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
25cbdbd2da6242296abe6b1342521b29993993f2 |
|
17-Feb-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Get rid of the host_queue_entries.priority field in the autotest_web DB. It's just duplicating information from the jobs table. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2813 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
d9ac445a60d6d11537f566503164344e09527917 |
|
07-Feb-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Remove the old acl_groups_users and acl_groups_hosts many2many pivot table column name hack. Renames acl_group_id -> aclgroup_id. Adds a migration script and updates the one piece of code that actually depended upon the old name. This is needed for my upcoming change that ports autotest to Django 1.0.x but seems worth cleaning up as a change of its own. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2764 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
678df4f19352060e24c1257bae28bf89e769e8bf |
|
04-Feb-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
The change to copy and parse failed repair results was broken for drone setups. The parser always expects to run on results in the normal results directory (determined by the queue entry's execution tag), but it runs on the drone. On the drone, repair results were placed in a temp dir, and copied over to the job dir only on the results repository. To fix this, the code now copies the repair results to the job dir *on the drone*, and then proceeds with the normal copy-and-parse procedure, just like for normal job results. I also added a check in FinalReparseTask to ensure the results to parse can actually be found, and email a warning if not. Previously, it would simply crash the scheduler. This shouldn't normally happen since the underlying bug has been fixed, but it's been seen in other situations so this should be safer in general. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2746 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
8bcd23af07a5420f35acfd0aa57c8c1cceb34368 |
|
03-Feb-2009 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Move all MySQLdb imports after the 'import common' so that a MySQLdb package installed in our own site-packages directory will be found before the system installed one. Adds an empty site-packages directory with an explanation README. Signed-off-by: Gregory Smith <gps@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2735 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
de634eed90554d881ecce30812051b63efb14efa |
|
30-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
* when repair fails in a situation that fails a queue entry (i.e. repair triggered by failed verify for a non-metahost queue entry), copy the full autoserv logs to the results repository and parse them, just like we do at the end of a job. * fix some local copying code in drone_utility to better handle copying directories. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2716 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
c9ae1787fd41101089b09551d8028aef1daae1d3 |
|
30-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
* add "Do not verify" host protection level and implement it in scheduler * make scheduler run pre-job cleanup even if "skip verify" was chosen. accomplished this by adding a SetEntryPendingTask to call on_pending, instead of having the VerifyTask do it. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2715 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
ade14e2081e4b934f4b2232acfbf4c1b2f3bece4 |
|
26-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Bypass only_if_needed labels for specifically-requested hosts. This is a workaround for generating proper job dependencies in an easier/more automated manner when selected specific hosts with only_if_needed labels. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2692 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
324bf819a4ea0fb2fa2509c3b4462f208e225d8d |
|
21-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make maximum process throttling configurable on a per-drone basis. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2660 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
2fa516935f896d49e7320411942b47db7212ecc8 |
|
14-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Don't execute any new autoserv processes when all drones have been disabled. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2641 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
d1ee1dd3f3e5ac44f00d7a96deb815dbe1beedad |
|
07-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
* move some scheduler config options into a separate module, scheduler_config * add a little embedded HTTP server to the scheduler, defined in status_server.py, running in a separate thread. this displays loaded config values and allows reloading of those config values at runtime. in the future we can extend this to do much more. * make global_config handles empty values as nonexistent values by default. otherwise, we would have to both pass a default= and check for value == '' separately. Now, we just pass default= and it's all taken care of. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2608 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
170873e8869cae8bb9499d6128cf626e8110bf56 |
|
07-Jan-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Attached is a very large patch that adds support for running a distributed Autotest service. Previously, the scheduler could only execute autoservs locally and all results were written directly to the local filesystem. This placed a limit on the number of machines that could be concurrently tested by a single Autotest service instance due to the strain of running many autoserv processes on a single machine. With this change, the scheduler can spread autoserv processes among a number of machines and gather all results to a single results repository machine. This allows vastly improved scalability for a single Autotest service instance. See http://autotest.kernel.org/wiki/DistributedServerSetup for more details. Note that the single-server setup is still supported and the global configuration defaults to this setup, so existing service instances should continue to run. Steve Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2596 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
e58e3f8d3ffb0144cf71ccefefd5bf253c0bb5fb |
|
20-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Set HQEs to "Verifying" instead of "Starting" when we're about to run verify on them. We need to set them to an active status, but if we use "Starting" then we can't tell which stage they're in, and we need that information to know when to "stop" synchronous jobs. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2482 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
8fe93b5da16ebec51dfec50e1d810085c2af79e0 |
|
18-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make CleanupTask copy results to job dir on failure. Did this by extracting code from VerifyTask into a common superclass. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2435 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
e788ea65d121585d02073341ae63149cf4c43fa5 |
|
17-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-make get_group_entries() return a list instead of a generator, since all callers want it that way anyway -change assertion on existing results dir into warning (with email notification) -move some of the queue entry failure handling code into RepairTask so it doesn't have to be duplicated in VerifyTask and CleanupTask (and because CleanupTask wasn't handling it quite right) git-svn-id: http://test.kernel.org/svn/autotest/trunk@2429 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
e77ac67fca979c1703b706c3b0935c658bd6fcf1 |
|
14-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Set queue entries to "Starting" when the VerifyTask is created for them. This perennial source of problems cropped up again in the latest change to the job.run() code (as part of the synch_count changes). git-svn-id: http://test.kernel.org/svn/autotest/trunk@2421 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
2bab8f45adedeacbf2d62d37b90255581adc3c7d |
|
12-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Implement sync_count. The primary change here is replacing the job.synch_type field with a synch_count field. There is no longer just a distinction between synchronous and asynchronous jobs. Instead, every job as a synch_count, with synch_count = 1 corresponding to the old concept of synchronous jobs. This required: -changes to the job creation RPC and corresponding client code in AFE and the CLI -massive changes to the scheduler to schedule all jobs in groups based on synch_count (this unified the old synch and async code paths) -changed results directory structure to accomodate synchronous groups, as documented at http://autotest.kernel.org/wiki/SchedulerSpecification, including widespread changes to monitor_db and a change in AFE -changes to AFE abort code to handle synchronous groups instead of just synchronous jobs -also got rid of the "synchronizing" field in the jobs table, since I was changing the table anyway and it seems very likely now that that field will never be used other changes included: -add some logging to afe/models.py to match what the scheduler code does, since the scheduler is starting to use the models more -added checks for aborts of synchronous groups to abort_host_queue_entries RPC git-svn-id: http://test.kernel.org/svn/autotest/trunk@2402 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
9d9ffd51a4332d77eb4a9772f834fcc9d1304cae |
|
10-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
don't reboot hosts when aborting inactive jobs. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2393 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
45ae819b2d2a67d0882edafaf1a8f7b95c3fb9d2 |
|
05-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add a formal cleanup phase to the scheduler flow. -add a --cleanup or -C option to autoserv, which runs a new control segment, cleanup. this option and control segment obsolete the old -b option and reboot_segment control segment. -change the RebootTask in the scheduler into a more generic CleanupTask, which calls autoserv --cleanup. -change the host status "Rebooting" to "Cleaning" git-svn-id: http://test.kernel.org/svn/autotest/trunk@2377 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
fa8629c3a28b0ccebbd339218883e5e6cbb1ce16 |
|
04-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-ensure Django connection is autocommit enabled, when used from monitor_db -fix HostScheduler to not crash when there are no ready hosts -change RebootTask to optionally take a queue entry and pass it to the RepairTask if reboot fails. This allows jobs to be failed if the pre-verify reboot fails, instead of being left hanging. -add unit test for RebootTask -add check for DB inconsistencies to cleanup step. Currently this just checks for HQEs with active=complete=1. -when unexpected existing results files are found, email a warning git-svn-id: http://test.kernel.org/svn/autotest/trunk@2368 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
97aed504f709270614ccbcef299e394333a76598 |
|
04-Nov-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Rewrite final reparse code in scheduler. the final reparse is now handled by a separate AgentTask, and there's a "Parsing" status for queue entries. This is a cleaner implementation that allows us to still implement parse throttling with ease and get proper recovery of reparses after a system crash fairly easily. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2367 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
9886397ceb7c752db78a6acd9737992db891015b |
|
29-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add job start timeout for synchronous jobs. This timeout applies to synchronous jobs that are holding a public pool machine (i.e. in the Everyone ACL) as "Pending". This includes a new global config option, scheduler code to enforce the timeout and a unit test. Note that the new scheduler code uses the Django models instead of making DB queries directly. This is a first example of how the scheduler can use the models to simplify DB interaction and reuse code from the frontend. I'd like to move in this direction from now on, although I certainly won't be making any sweeping changes to rewrite existing code. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2358 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
3dd6b88de09c14cf7f93ff188461876ec65afe55 |
|
27-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Two simple scheduler fixes: -treat empty pidfiles like nonexistent pidfiles. there seems to be a race condition where the schedulers reads a pidfile after autoserv creates it but before autoserv writes the pid to it. this should solve it. -prioritize host queue entries by job id instead of host queue entry id. when jobs are created exactly in parallel, their host queue entries can have interleaved IDs, which can lead to deadlock. ordering by job id should protect against that. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2342 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
0fc3830f17d644bab74bfe38556299f5e58bc0fa |
|
23-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add user preferences for reboot options, including simple user preferences tab which could later be expanded to include more options. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2330 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
21baa459ea14f96e06212f1f35fcddab9442b3fc |
|
21-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add options to control reboots before and after a job. -add reboot_before and reboot_after fields to Job, along with enums for each -add options to create_job RPC for reboot_before and reboot_after -add options to job create CLI for these fields, and made job stat -v display them -add widgets to job create page in AFE for these fields and made job detail view display them -add dirty field to Hosts, defaulting to True, and set to True when a host is locked -made scheduler set this field when a job runs and clear it when a host is rebooted -updated scheduler's PidfileRunMonitor to read a new three-line .autoserv_execute format, where the third line contains the number of tests that failed -made scheduler Job.run() include a RebootTask before the verify task according to the reboot_before option -made QueueTask.epilog() launch a RebootTask for each host according to the reboot_after option -updated autoserv to write out a third line to .autoserv_execute containing the number of failed tests. Other changes: -added support for displaying Job.run_verify in the CLI (job stat -v) and job detail page on AFE -updated ModelExtensions to convert BooleanField values to actual booleans. The MySQL Django backend just leaves them as ints (as they are represented in the DB), and it's stupid and annoying (Yes, bool is a subclass of int, so it's often not a problem. But yes, it can be.). -get rid of use of Job.synch_count since we don't actually support it. I think this was meant for inclusion in a previous change and got left out. -made the scheduler use the new setup_django_environment stuff to import and use the django models. It doesn't *really* use the models yet -- it just uses the Job.Reboot{Before,After} enum objects -- but this shows we could easily start using the models, and that's definitely the direction I want to go long term. -refactored PidfileRunMonitor generally and made it a bit more robust by having it email errors for corrupt pidfiles and continue gracefully, instead of just crashing the scheduler -changed the way Agent.tick() works. now, it basically runs through as much work as it can in a single call. for example, if there's a RebootTask and a VerifyTask, and the RebootTask has just finished, in a single call it will finish up the RebootTask and start the VerifyTask. this used to take two cycles and that was problematic for cases like this one -- the RebootTask would like to set host.status=Ready, but then the host could get snatched up on the next scheduling round, before the VerifyTask got started. This was sort of solved previously by keeping the HostQueueEntry active, and we could apply that approach here by making a new status for HostQueueEntries like "Rebooting". But I prefer this approach as I think it's more efficient, more powerful and easier to work with. Risk: extremely high Visibility: new reboot options for jobs, skip verify now displayed in AFE + CLI Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2308 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
1be9743d7bf16ad21fbd70ec45c77f3907f10718 |
|
17-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-fix bug with handling abort on unassigned host queue entries -refactor abort code into abort() method on HostQueueEntry, which the dispatcher calls both during abort and during recovery of aborting tasks -don't special case aborting queue entries with no agents. We can just create an AbortTask in all cases, and when there are no agents, it'll iterate over an empty list. No need to complicate the code with a special case. -rewrite the dispatcher abort unit test to do less mocking and test the code more organically, and add a test for AbortTask git-svn-id: http://test.kernel.org/svn/autotest/trunk@2298 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
364fe862f88bcaaefaa40dd145a777bee840ec9b |
|
17-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Refactor the basic environment setup code out of django_test_utils.py into setup_django_environment.py, and rename django_test_utils.py to setup_test_environment.py. Also changed the environment setup code to run at import time. This makes it easy for scripts, both test and non-test, to use Django models without running through manage.py. The idea is that scripts will import setup_django_environment before importing Django code (somewhat akin to common.py), and test code will subsequently import setup_test_environment. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2297 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
9976ce9873867a397e448d358543a9dc1d33aa77 |
|
15-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-make monitor_db implement "skip verify" properly, and add unit tests for it -change order of a couple fields in AFE models to match DB order git-svn-id: http://test.kernel.org/svn/autotest/trunk@2291 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
b2e2c325bc0b1d822690b6af07f920d5da398cb8 |
|
14-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-refactor Job.run in monitor_db, one of the most important and most confusing methods in the scheduler. it's now broken into separate synchronous and asynchronous paths with common methods extracted. -remove a bunch of dead code from the Job class -remove the one actual usage of the synch_count fields. We don't really support this yet, so there's no reason to pretend we do. -extract some code from VerifySynchronousTask into HostQueueEntry.on_pending(). it's better here and will be necessary in a near-future change (to implement skip_verify right). -add simple test for job.run() to scheduler unittest. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2282 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
12bc8a8a235f2b2d0f712daf4d3683bf2a056e24 |
|
09-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
The scheduler unit test needs to pass in a created_on time. Risk: Low Visibility: Fixes broken test. Signed-off-by: John Admanski <jadmanski@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2259 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
b1e5187f9aa303c4fc914f07312286d302b46a0e |
|
07-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Get the scheduler unittest to run against SQLite! * get rid of monitor_db.DatabaseConn, and make monitor_db use the new DatabaseConnection * modify some queries in monitor_db that weren't SQLite-compatible (SQLite doesn't support TRUE and FALSE literals) * add frontend/django_test_utils.py, which contains utilities to * setup a django environment (something manage.py normally does for you) * replace the configured DB with a SQLite one, either in-memory or on disk * run syncdb on the test DB * backup and restore the test DB, handy because then we can syncdb once, save the fresh DB, and quickly restore it between unittests without having to run syncdb again (syncdb is terribly slow for whatever reason) * modify monitor_db_unittest to use these methods to set up a temporary SQLite DB, run syncdb on it, and test against it * replace much of the data modification code in monitor_db_unittest with use of the django models. The INSERTs were very problematic with SQLite because syncdb doesn't set database defaults, but using the models solves that (django inserts the defaults itself). using the models is much cleaner anyway as you can see. it was just difficult to do before, but now that we've got the infrastructure to setup the environment anyway, it's easy. this is a good model for how we can make the scheduler use the django models eventually. * reorder fields of Label model to match actual DB ordering; this is necessary since monitor_db depends on field ordering * add defaults to some fields in AFE models that should've had them * make DatabaseConnection.get_test_database support SQLite in files, which gives us persistence that is necessary and handy in the scheduler unittest * add a fix to _SqliteBackend for pysqlite2 crappiness The following are extras that weren't strictly necessary to get things working: * add a debug feature to DatabaseConnection to print all queries * add an execute_script method to DatabaseConnection (it was duplicated in migrate and monitor_db_unittest) * rename "arguments" to "parameters" in _GenericBackend.execute, to match the DB-API names * get rid of some debug code that was left in monitor_db, and one unnecessary statement Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2252 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
442e71e957a8d16ba234193352f3ad1baffbd680 |
|
06-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Move migration system into database/ directory. git-svn-id: http://test.kernel.org/svn/autotest/trunk@2243 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
0e73c855576b97891ec38b6512bf040f3a1e1e40 |
|
03-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add a generic database wrapper, supporting different database backends, to be used by migrate, scheduler, parser (eventually), and maybe others. This will consolidate the multiple database wrappers we have throughout the code and allow us to swap in SQLite for MySQL for unit testing purposes. -add database/ directory for database libraries. migrate.py will move here soon. -add database_connection.py under server_common, a basic database wrapper supporting both MySQL and SQLite. PostgreSQL should be an easy future addition (any library supporting Python DB-API should be trivial to add). DatabaseConnection also supports graceful handling of dropped connections. -add unittest for DatabaseConnection -change migrate.py to use common DatabaseConnection. Scheduler will be changed to use it in a coming CL and in the future hopefully the TKO parser will be able to use it as well. -change migrate_unittest.py to use SQLite. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2234 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
989f25dcbb6361218f0f84d1c8404761b4c39d96 |
|
01-Oct-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
two new major features: (1) added test and job dependencies -added M2M relationship between tests and labels and between jobs and labels, for tracking the labels on which a test/job depends -modified test_importer to read the DEPENDENCIES field and create the right M2M relationships -modified generate_control_file() RPC to compute and return the union of test dependencies. since generate_control_file now returns four pieces of information, i converted its return type from tuple to dict, and changed clients accordingly. -modified job creation clients (GWT and CLI) to pass this dependency list to the create_job() RPC -modified the create_job() RPC to check that hosts satisfy job dependencies, and to create M2M relationships -modified the scheduler to check dependencies when scheduling jobs -modified JobDetailView to show a job's dependencies (2) added "only_if_needed" bit to labels; if true, a machine with this label can only be used if the label is requested (either by job dependencies or by the metahost label) -added boolean field to Labels -modified CLI label creation/viewing to support this new field -made create_job() RPC and scheduler check for hosts with such a label that was not requested, and reject such hosts also did some slight refactoring of other code in create_job() to simplify it while I was changing things there. a couple notes: -an only_if_needed label can be used if either the job depends on the label or it's a metahost for that label. we assume that if the user specifically requests the label in a metahost, then it's OK, even if the job doesn't depend on that label. -one-time-hosts are assumed to satisfy job dependencies. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2215 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
50c0e71efd37c04c5f441c7e4dd095632b54c35f |
|
22-Sep-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-add --force option to migrations to disable user confirmation because this can make migrations unscriptable -add default migration dirs to migrate.py because having to cd to the right place was getting really annoying! -made monitor_db_unittest.py use migrations to initialize the test DB schema instead of copying the schema from the real DB, since that was creating problems for testing. this slows down the test considerably, but it's better than no testing at all (and we can improve it in the future). -made global_config allow overriding options, which is useful for testing Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2180 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
4c5374f34ee4b31899c875c068ec6080ec8ce21c |
|
04-Sep-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-modify scheduler throttling code to track number of running processes rather than just number of running agents. note this is only an estimate of running processes - it counts all agents as one process unless the agent is a synchronous autoserv execution, in which case it uses the number of hosts being run. -add scheduler throttling test Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@2105 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
63a3477a2d6502c10cc47a3022e8f8a257d91434 |
|
18-Aug-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-Refactor new monitor_db scheduling algorithm into it's own class -Reorganize + clean up said code, make better use of existing methods -Change non-metahost code path to check ACLs (previously ACLs were only checked for metahosts) -Add some new unit tests -Change some one-line docstrings on tests to multi-line, since PyUnit is kind of annoying with one-line docstrings git-svn-id: http://test.kernel.org/svn/autotest/trunk@2006 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
56193bb485d65716526079449f1df86ba5cb2df5 |
|
13-Aug-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-add basic abort functionality test to scheduler unit tests. this involved some serious refactoring of the existing tests. -various other enhancements to the scheduler unit tests. -extend mock.py comparator logic to be more powerful and generic. it can now handle comparators within data structures, and it handles keyword args properly. -enhance debug printing in mock.py. previously a test failure before calling check_playback would cause errors to be hidden even though they contained the cause of the failure. now, with debug mode enabled, errors will be printed as soon as they occur. -minor changes to monitor_db.py to increase testability. git-svn-id: http://test.kernel.org/svn/autotest/trunk@1981 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
dd7037160726851196510ae3ee56bf255ecb3732 |
|
28-Jul-2008 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
I left some debugging code in monitor_db_unittest.py. This goes with a patch I sent out a few minutes ago. so it should be applied after it. It was a patch to monitor_db_unittest as well. Signed-off-by: Travis Miller <raphtee@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@1914 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
3e0f7e06ced63b5e0702812178d2185110304238 |
|
28-Jul-2008 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Need changes to fix the monitor_db unittest Signed-off-by: Travis Miller <raphtee@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@1913 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
c16035226604e4945733bc8848506e64d209ef43 |
|
17-Jul-2008 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fixed the logic in the scheduler unit tests. Checks that the command line contains the expected set (but is not necessarily a strict superset. Risk: low Visibility: low Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@1867 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
fb7cfb165bf5e98f76af9fd9644913fa67f8f567 |
|
09-Jul-2008 |
jadmanski <jadmanski@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add support to the scheduler to pass in the host.protection value as a --host-protection parameter when launching an autoserv repair job. Risk: Low Visibility: Makes autoserv actually obey the host protection value set in the frontend. Signed-off-by: John Admanski <jadmanski@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@1792 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
5df2b19e0c379d0915fae11401935725ad47d423 |
|
03-Jul-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Updating the RPC interface and scheduler unit tests to match up with the recent changes to their respective subjects. Risk: low Visibility: low git-svn-id: http://test.kernel.org/svn/autotest/trunk@1769 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
f40cf53963083a1e8d292cf1a5eb1a921a025905 |
|
24-Jun-2008 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fixed the monitor_db_unittest to be more robust. When checking that the command line is correct should do a set comparison since ordering of the arguments shouldn't affect the test, and also only need to test that the command arguments contain the list given since additional ones can be added. Signed-off-by: Travis Miller <raphtee@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@1736 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
0afbb6369aa5aa9a75ea67dd9e95ec4b21c0c181 |
|
06-Jun-2008 |
jadmanski <jadmanski@592f7852-d20e-0410-864c-8624ca9c26a4> |
Convert all python code to use four-space indents instead of eight-space tabs. Signed-off-by: John Admanski <jadmanski@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@1658 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
3d161b028a4cd20c0a91ee5b1b86149a0821db4f |
|
06-Jun-2008 |
jadmanski <jadmanski@592f7852-d20e-0410-864c-8624ca9c26a4> |
Move the mock libraries from client/unittest into client/common_lib/test_utils. This is a better location, and a rename of the package dir from unittest to test_utils avoids some conflicts that were occuring between it and the stdlib unittest module. Signed-off-by: John Admanski <jadmanski@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@1637 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
20f47064be6855277e251cee7611d8336bcc9149 |
|
05-Jun-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
-check ACLs directly in the scheduler (bypassing ineligible_host_queues) -rewrite scheduler queries to avoid all subqueries. they are just bad in mysql. -rip out all that code related to using ineligible_host_queues to enforce ACLs. good riddance! -update scheduler unit test to reflect this new policy (no ineligible_host_queue blocks for ACLs) -minor bugfixes to scheduler unit test. this sucks, but i did go back and ensure the old scheduler passed the fixed up unit test suite as well. -remove a blanket except: block from the scheduler. it wasn't necessary, it was inconsistent, and it was interfering with unit testing. git-svn-id: http://test.kernel.org/svn/autotest/trunk@1608 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
04c82c5dc70a6de95f9cd77371d0a99cbdcf0959 |
|
29-May-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Rewrite scheduling algorithm to use two queries + some data processing, rather than a separate query for each "idle" host. This should be considerably faster. It also gives us the opportunity to eliminate the whole ACL checking with ineligible_host_queues thing, which has been a nightmare. But one step at a time... git-svn-id: http://test.kernel.org/svn/autotest/trunk@1564 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|
ce38e0c281b1046574b8112209944e9daf2c3641 |
|
29-May-2008 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
The beginning of a unit test for the scheduler. Right now it only tests the job scheduling algorithm (i.e. Dispatcher._find_more_work() and the methods it uses). git-svn-id: http://test.kernel.org/svn/autotest/trunk@1563 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_unittest.py
|