5cca8180477b25ab2b83861cbaaeb1fb35fd931d |
|
20-Nov-2017 |
Allen Li <ayatane@chromium.org> |
[autotest] Use JobHandoff for job_aborter tracking This handles all cases of job_reporter failure after spawning. The main loophole in the previous model is if job_reporter crashes before making the lease file, as job_aborter relied on expired leases to determine if cleanup is necessary. We now rely on JobHandoff to track ongoing jobs. The lease file is still kept to determine if a job still has an active job_reporter owning it. Previously we cared only about expired lease files, now we care only about active lease files combined with JobHandoff. BUG=chromium:748234 TEST=bin/test_lucifer TEST=bin/job_aborter without crashing Change-Id: I97e8b53f2fbd33d41a2b567e2797de8811a36e88 Reviewed-on: https://chromium-review.googlesource.com/780768 Commit-Ready: Allen Li <ayatane@chromium.org> Tested-by: Allen Li <ayatane@chromium.org> Reviewed-by: Allen Li <ayatane@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
|
6285198f32ff77aed80f78f488c30bfb18d24365 |
|
18-Nov-2017 |
Xixuan Wu <xixuan@chromium.org> |
autotest: schedule RESET task for DUTs that fail jobs. BUG=chromium:782501 TEST=Ran server & client job on local autotest. Ran unittest. Change-Id: Iaa87d24b8b1d873279b8fc127b8bcdad46c4059c Reviewed-on: https://chromium-review.googlesource.com/780394 Commit-Ready: Xixuan Wu <xixuan@chromium.org> Tested-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Dan Shi <dshi@google.com>
/external/autotest/scheduler/monitor_db_functional_test.py
|
3df0b2bf7e77ee4aecc629b40365d489f9e4cdc9 |
|
17-May-2017 |
Allen Li <ayatane@chromium.org> |
[autotest] Skip archiving after parsing Normally, PARSING -> ARCHIVING -> COMPLETED/FAILED but all ARCHIVING does is set the status. So cut the middleman and set the status immediately. BUG=chromium:699275 TEST=None Change-Id: I6d3d69a1e5aa9167405d301fa1e0af382f9c2b4e Reviewed-on: https://chromium-review.googlesource.com/507020 Commit-Ready: Allen Li <ayatane@chromium.org> Tested-by: Allen Li <ayatane@chromium.org> Reviewed-by: Allen Li <ayatane@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
|
0f261debdbba7664ef7b0eacbac0e7daba89eebd |
|
02-Feb-2017 |
Allen Li <ayatane@chromium.org> |
[autotest] [atomic] Remove atomic groups from scheduler BUG=chromium:681906 TEST=Run unittest suite Change-Id: If9c144aae8d2a8df567a5a03b02bc3fec5d14c0d Reviewed-on: https://chromium-review.googlesource.com/435565 Commit-Ready: Allen Li <ayatane@chromium.org> Tested-by: Allen Li <ayatane@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com>
/external/autotest/scheduler/monitor_db_functional_test.py
|
c1f41e806039e541b152516058eea535687e07a4 |
|
06-Jan-2017 |
Prathmesh Prabhu <pprabhu@chromium.org> |
Move FakeGlobalConfig to the global_config module. This CL moves a fake object for global_config from a unittest module to the global_config module so that other unittests can use it too. BUG=None. TEST=python scheduler/monitor_db_functional_test.py Change-Id: If35ee9db0ef332ce890920b365da93075af4737a Reviewed-on: https://chromium-review.googlesource.com/425720 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
|
6a46f0bff029acbb98a11bde0d737fb78b5f8b77 |
|
21-Dec-2016 |
Prathmesh Prabhu <pprabhu@chromium.org> |
[scheduler] Mock out another config value for functional test. Currently, if a developer sets inline_host_acquisition=False in their local shadow_config, this functional test fails. Make the test more hermetic by mocking out this value. BUG=None TEST=Functional test finally passes with my modified shadow_config. Change-Id: Id57bd345ea5cb9e1d0c78c39b75f90102354b4aa Reviewed-on: https://chromium-review.googlesource.com/422655 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Allen Li <ayatane@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
|
303d2669b78eb5a2ebd6c41dce80ca18d06a19cf |
|
02-Jul-2016 |
Laurence Goodby <lgoodby@google.com> |
[autotest] Scheduler fix for SYNC_COUNT > 1. Details in go/autotest-sync-count-fix BUG=chromium:621257 TEST=Run included tests. Change-Id: If08df8fb04771a321dbdf2122b885935e7ef3b41 Reviewed-on: https://chromium-review.googlesource.com/358092 Commit-Ready: Laurence Goodby <lgoodby@chromium.org> Tested-by: Laurence Goodby <lgoodby@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com> Reviewed-by: Dan Shi <dshi@google.com>
/external/autotest/scheduler/monitor_db_functional_test.py
|
f47a6bbb9971efd228eaa22425431b91fa9f69bf |
|
29-Aug-2014 |
Prashanth B <beeps@chromium.org> |
Revert "[autotest] Restore from inconsistent state after the scheduler was interrupted." This reverts commit b7c842f8c8ba135bb03a0862ac0c880d3158bf07. Change-Id: I8d34329b8a2771eb4068ab50414c9eac6fd73d3f Reviewed-on: https://chromium-review.googlesource.com/215612 Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
|
b7c842f8c8ba135bb03a0862ac0c880d3158bf07 |
|
24-Jul-2014 |
Jakob Juelich <jakobjuelich@google.com> |
[autotest] Restore from inconsistent state after the scheduler was interrupted. If the scheduler assigns hosts to hqes but hasn't set a execution_subdir yet, an exception is thrown. With this, the database will be cleaned up once, when the scheduler starts. Jobs that are in an inconsistent state, will just be resetted so they can be scheduled again. BUG=chromium:334353 DEPLOY=scheduler TEST=Ran utils/unittest_suite.py and manually set db into inconsistent state. Change-Id: I96cc5634ae5120beab59b160e735245be736ea92 Reviewed-on: https://chromium-review.googlesource.com/209635 Tested-by: Jakob Jülich <jakobjuelich@chromium.org> Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Jakob Jülich <jakobjuelich@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
|
36accc6a2a572e9d502407b34701f535a169f524 |
|
23-Jul-2014 |
Jakob Jülich <jakobjuelich@google.com> |
[autotest] Fixing and re-enabling monitor_db_functional_test. The test was disabled and outdated. Database access and mocking of the drone manager changed. This fixes these issues, updates the unit tests to the current status and reanables them. BUG=chromium:395756 DEPLOY=scheduler TEST=ran ./utils/unittest_suite.py Change-Id: I6a3eda5ddfaf07f06d6b403692b004b22939ffb6 Reviewed-on: https://chromium-review.googlesource.com/209567 Reviewed-by: Alex Miller <milleral@chromium.org> Tested-by: Jakob Jülich <jakobjuelich@google.com> Commit-Queue: Jakob Jülich <jakobjuelich@google.com>
/external/autotest/scheduler/monitor_db_functional_test.py
|
4ec9867f46deb969c154bebf2e64729d56c3a1d3 |
|
15-May-2014 |
Prashanth B <beeps@google.com> |
[autotest] Split host acquisition and job scheduling II. This cl creates a stand-alone service capable of acquiring hosts for new jobs. The host scheduler will be responsible for assigning a host to a job and scheduling its first special tasks (to reset and provision the host). There on after, the special tasks will either change the state of a host or schedule more tasks against it (eg: repair), till the host is ready to run the job associated with the Host Queue Entry to which it was assigned. The job scheduler (monitor_db) will only run jobs, including the special tasks created by the host scheduler. Note that the host scheduler won't go live till we flip the inline_host_acquisition flag in the shadow config, and restart both services. The host scheduler is dead, long live the host scheduler. TEST=Ran the schedulers, created suites. Unittests. BUG=chromium:344613, chromium:366141, chromium:343945, chromium:343937 CQ-DEPEND=CL:199383 DEPLOY=scheduler, host-scheduler Change-Id: I59a1e0f0d59f369e00750abec627b772e0419e06 Reviewed-on: https://chromium-review.googlesource.com/200029 Reviewed-by: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
|
04be2bd5e4666a5c253e9c30ab20555e04286032 |
|
08-May-2014 |
Ilja H. Friedel <ihf@chromium.org> |
Autotest: Change logging.warn() to logging.warning(). logging.warn() is deprecated. See http://bugs.python.org/issue13235 Substitution was performed via ~/cros/src/third_party/autotest/files$ find ./ -type f | xargs sed -i 's/logging.warn(/logging.warning(/' BUG=None. TEST=There should be one-- and preferably only one --obvious way to do it. Change-Id: Ie5665743121a49f7fbd5d1f47896a7c65e87e489 Reviewed-on: https://chromium-review.googlesource.com/198793 Commit-Queue: Ilja Friedel <ihf@chromium.org> Tested-by: Ilja Friedel <ihf@chromium.org> Reviewed-by: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
|
cc9fc70587d37775673e47b3dcb4d6ded0c6dcb4 |
|
02-Dec-2013 |
beeps <beeps@chromium.org> |
[autotest] RDB Refactor II + Request/Response API. Scheduler Refactor: 1. Batched processing of jobs. 2. Rdb hits the database instead of going through host_scheduler. 3. Migration to add a leased column.The scheduler released hosts every tick, back to the rdb. 4. Client rdb host that queue_entries use to track a host, instead of a database model. Establishes a basic request/response api for the rdb: rdb_utils: 1. Requests: Assert the format and fields of some basic request types. 2. Helper client/server modules to communicate with the rdb. rdb_lib: 1. Request managers for rdb methods: a. Match request-response b. Abstract the batching of requests. 2. JobQueryManager: Regulates database access for job information. rdb: 1. QueryManagers: Regulate database access 2. RequestHandlers: Use query managers to get things done. 3. Dispatchers: Send incoming requests to the appropriate handlers. Ignores wire formats. TEST=unittests, functional verification. BUG=chromium:314081, chromium:314083, chromium:314084 DEPLOY=scheduler, migrate Change-Id: Id174c663c6e78295d365142751053eae4023116d Reviewed-on: https://chromium-review.googlesource.com/183385 Reviewed-by: Prashanth B <beeps@chromium.org> Commit-Queue: Prashanth B <beeps@chromium.org> Tested-by: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
|
1a18905776c0a53e2a169f61dbf5bdad3bd0cb74 |
|
28-Oct-2013 |
Dan Shi <dshi@chromium.org> |
[autotest] revert suite throttling Undo CL: https://chromium-review.googlesource.com/#/c/167175 keep stats call and _notify_process_limit_hit. BUG=chromium: TEST=suite runn in local setup, unittest DEPLOY=scheduler Change-Id: I713b69651fabfb8cbb4f9c1ca3a8605900753bc9 Reviewed-on: https://chromium-review.googlesource.com/174896 Commit-Queue: Dan Shi <dshi@chromium.org> Reviewed-by: Dan Shi <dshi@chromium.org> Tested-by: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
|
6ee996fd9252ae7d0c56d2daa6cf085c3e02358c |
|
28-Feb-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Fix scheduler unittesting framework. They're alive! Most of the fixes here are to adapt to the fact that we no longer default to running a cleanup post-job, and that any verify+cleanup before a job gets merged into a reset. There's also some mocking out of global_config because extended use of the scheduler_config forces us to provide a mocked value for any setting touched in a test. BUG=chromium:305072 DEPLOY=scheduler TEST=this? Change-Id: Ie7ada591c31766f30647fd6c4ba151e5dd0d1003 Reviewed-on: https://chromium-review.googlesource.com/58380 Reviewed-by: Dan Shi <dshi@chromium.org> Tested-by: Alex Miller <milleral@chromium.org> Reviewed-by: Alex Miller <milleral@chromium.org> Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
|
dfff2fdc8477be3ba89fd915fde2afe8d3716624 |
|
28-May-2013 |
Alex Miller <milleral@chromium.org> |
[autotest] Add a provision special task. We now insert a special task which calls |autoserv --provision| with the host that the HQE is about to run on to provision the machine correctly before the test runs. If the provisioning fails, the HQE will also be marked as failed. No provisioning special task will be queued if no provisioning needs to be done to the host before the job can/will run. With *just* this CL, no provisioning tasks should actually get scheduled, because the part of the scheduler that maps HQEs to hosts hasn't been taught about provisioning yet. That will come in a later CL. Once this CL goes in, it should not be reverted. The scheduler will become very unhappy if it sees special tasks in its database, but can't find a corresponding AgentTask definition for them. One would need to do manual database cleanup to revert this CL. However, since one can disable provisioning by reverting the (future) scheduling change CL, this shouldn't be an issue. BUG=chromium:249437 DEPLOY=scheduler TEST=lots: * Ran a job on a host with a non-matching cros-version:* label, and a provision special task was correctly created. It ran after Reset, and correctly kicked off the HQE after it finished. * Ran a job on a host with a matching cros-version:* label, and no provision special task was created. * Ran a job on a host with a non-matching cros-version:* label, and modified Reset so that it would fail. When reset failed, it canceled the provision task, and the HQE was still rescheduled. * Ran a job on a host with a non-matching cros-version:* label, and modified the cros-version provisioning test to throw an exception. The provision special task aborted the HQE with the desired semantics (see comments in the ProvisionTask class in monitor_db), and scheduled a repair to run after its failure. The provision failures were all deduped against each other when bug filing was enabled. See https://code.google.com/p/autotest-bug-filing-test/issues/detail?id=1678 * Successfully debugged an autoupdate/devserver issue from provision logs, thus proving that sufficient information is collected for debug. Change-Id: I96dbfc7b001b90e7dc09e1196c0901adf35ba4d8 Reviewed-on: https://gerrit.chromium.org/gerrit/58385 Reviewed-by: Prashanth Balasubramanian <beeps@chromium.org> Tested-by: Alex Miller <milleral@chromium.org> Commit-Queue: Prashanth Balasubramanian <beeps@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
|
52ce11d6291bbbd1bde435a62afcaf364db1b502 |
|
02-Aug-2012 |
Yu-Ju Hong <yjhong@google.com> |
Autotest: Make archiving step configurable and disable it by default. This change makes the archiving step configurable in global_config.ini. The variable "enable_archiving" is disabled by default. The Autotest scheduler performs the archive step after parsing. This step spawns an autoserv process, runs site_archive_results, and executes rsync to copy .archive.log back to cautotest. We do not need this step since all our test results are rsync'd back after running the tests. BUG=chromium-os:33061 TEST=run tests with local autotest setup Change-Id: I1f2aac8f92ebd2a4d10c4bd85be2d111063ad251 Reviewed-on: https://gerrit.chromium.org/gerrit/29056 Commit-Ready: Yu-Ju Hong <yjhong@chromium.org> Reviewed-by: Yu-Ju Hong <yjhong@chromium.org> Tested-by: Yu-Ju Hong <yjhong@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
|
aa5133608fb8ea153fb396f332121b617869dcb7 |
|
02-Mar-2011 |
Dale Curtis <dalecurtis@chromium.org> |
Host scheduler refactoring. Move HostScheduler out of monitor_db. In order to facilitate site extensibility of HostScheduler we need to factor out the dependence on global variables in monitor_db. I modeled this refactoring off of monitor_db_cleanup. The main changes I've made are as follows: 1. Move BaseHostScheduler, site import, and SchedulerError out of monitor_db. SchedulerError must be moved to prevent a cyclical dependency. 2. Convert staticmethod/classmethods in BaseHostScheduler, to normal methods. 3. Fix unit tests and monitor_db to import SchedulerError from host_scheduler. Change-Id: I0c10b79e70064b73121bbb347bb71ba15e0353d1 BUG=chromium-os:12654 TEST=Ran unit tests. Tested with private Autotest instance. Review URL: http://codereview.chromium.org/6597047
/external/autotest/scheduler/monitor_db_functional_test.py
|
b8f3f354dc07aea89a5301522c8a79d394cba79e |
|
10-Jun-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Set host status to RUNNING on QueueTask abort, since queue entry will be in GATHERING state. Also modify a logging string to be more precise. Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4591 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
e7c65cbace24181c9bd364569de7e05742b8a162 |
|
08-Jun-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Don't try stopping the job on HQE abort, and have the dispatcher stop all necessary jobs in bulk. This avoids a scheduler crash on an assertion. Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4585 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
76fcf19ec42d5c7580d2e7891e4610e5fe725286 |
|
21-Apr-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add ability to associate drone sets with jobs. This restricts a job to running on a specified set of drones. Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4439 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
dd855244f44b65d0508345c6fef74846652c8c26 |
|
02-Mar-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Abstract out common models used in the frontend's models.py so that django is not required to interact with non Django portions of the code. This includes the enums RebootBefore, RebootAfter and Test.Type git-svn-id: http://test.kernel.org/svn/autotest/trunk@4280 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
c44ae99354228290914326d42ef1e743b5b7e4b8 |
|
19-Feb-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
Refactor scheduler models into a separate module, scheduler_models. This module doesn't depend on monitor_db, only the other way around. The separation and isolation of dependencies should help us organize the scheduler code a bit better. This was made possible largely by the many changes we made late last year to improve statelessness of the scheduler. It was motivated here by my work on pluggable metahost handlers, which will need to depend on scheduler models. Without this separation, we'd end up with circular dependencies. Also includes some fixes for metahost schedulers. Signed-off-by: Steve Howard <showard@google.com> Property changes on: scheduler/scheduler_models.py git-svn-id: http://test.kernel.org/svn/autotest/trunk@4252 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
883492a628bfe5a24bd281cfcac036d77a2acc4e |
|
12-Feb-2010 |
jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> |
First iteration of pluggable metahost handlers. This change adds the basic framework and moves the default, label-based metahost assignment code into a handler. It includes some refactorings to the basic scheduling code to make things a bit cleaner. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4232 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
5c114c7ba4be43c88f9967e06a9aacdd3861264c |
|
25-Jan-2010 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Fix scheduler functional test for recent change to parse hostless jobs. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4163 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
fd8b89f0117366c9aeaad9b600a43238a84b4ab9 |
|
20-Jan-2010 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
don't set the current user to my_user in frontend_test_utils. let it default to the new autotest_system user. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4156 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
c1a98d1e146080bd3e4f034cb13d740dfb1535f4 |
|
15-Jan-2010 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Support for job keyvals * can be passed as an argument to create_job, stored in AFE DB * scheduler reads them from the AFE DB and writes them to the job-level keyval file before the job starts * parser reads them from the keyval file and writes them to the TKO DB in a new table Since the field name "key" happens to be a MySQL keyword, I went ahead and made db.py support proper quoting of field names. Evetually it'd be really nice to deprecate db.py and use Django models exclusively, but that is a far-off dream. Still lacking support in the AFE and TKO web clients and CLIs, at least the TKO part will be coming soon Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4123 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
12b45582c04b2417036a6f11afc843ac5fddea50 |
|
11-Jan-2010 |
lmr <lmr@592f7852-d20e-0410-864c-8624ca9c26a4> |
Massive permission fix Fix permissions for all the development tree Signed-off-by: Lucas Meneghel Rodrigues <lmr@redhat.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4094 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
4608b005f15444d2ec4601b8274828ad52b5ea51 |
|
05-Jan-2010 |
mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add a new Archiving stage to the scheduler, which runs after Parsing. This stage is responsible for copying results to the results server in a drone setup, a task currently performed directly by the scheduler, and allows for site-specific archiving functionality, replacing the site_parse functionality. It does this by running autoserv with a special control file (scheduler/archive_results.control.srv), which loads and runs code from the new scheduler.archive_results module. The implementation was mostly straightfoward, as the archiving stage is fully analogous to the parser stage. I did make a couple of refactorings: * factored out the parser throttling code into a common superclass that the ArchiveResultsTask could share * added some generic flags to Autoserv to duplicate special-case functionality we'd added for the --collect-crashinfo option -- namely, specifying a different pidfile name and specifying that autoserv should allow (and even expect) an existing results directory. in the future, i think it'd be more elegant to make crashinfo collection run using a special control file (as archiving works), rather than a hard-coded command-line option. * moved call to server_job.init_parser() out of the constructor, since this was an easy source of exceptions that wouldn't get logged. Note I believe some of the functional test changes slipped into my previous change there, which is why that looks smaller than you'd expect. Signed-off-by: Steve Howard <showard@google.com> ==== (deleted) //depot/google_vendor_src_branch/autotest/tko/site_parse.py ==== git-svn-id: http://test.kernel.org/svn/autotest/trunk@4070 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
2b38f67cc7c52ea368514bc14f98eded2bc477dd |
|
23-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Add test case for aborting a synchronous job while it's throttled in the Starting state. Was trying to repro a bug. It doesn't repro, indicating that maybe the bug has already been fixed (or maybe this test case is missing something). Either way, it's good to have another test case around. Also fixing a little test bug where we need to mock out a new global config value. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4043 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
78f5b016b5367cb51b1f031b31e3afea6ebd2d74 |
|
23-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Update to Django 1.1.1. I want to use a new feature for my RESTful interface prototyping (direct inclusion of URL patterns in URLconfs). The one obstacle this presented was that Django 1.1.1 changes the DB connection object to accept DB config information in its constructor, rather than reading it from django.conf.settings on-demand. This was a problem because we change stuff in django.conf.settings on the fly to do our fancy test DB stuff -- basically, we initialize a SQLite DB once, copy it off, and then copy it between test cases, rather than clearing and reconstructing the initial DB. I did measurements and it turns out all that jazz wasn't really saving us much time at all, so I just got rid of it all. Django's testing stuff has improved and v1.1 even has some new tricks for using transactions to accomplish the above with a dramatic speedup, so we ought to look into using that in the future. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4041 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
493beaab73a8a87a3ce8d7f47b7ce92417b04fbd |
|
18-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
fix a bug with pre-job keyvals, introduced in recent refactorings, and added new test to check it Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4020 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
a9545c0ab3d8f3e36efadaefdcf37393708666d9 |
|
18-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
backend support for hostless jobs * support in rpc_interface.create_job() and models for creating a hostless job -- a job with one queue entry with no host, meta_host or atomic_group * support in scheduler for recognizing and executing such a job. the bulk of the work was in extracting an AbstractQueueTask class from QueueTask, containing all the logic not pertaining to hosts. I then added a simple HostlessQueueTask class also inheriting from it. Also got rid of HostQueueEntry.get_host() and added an extra log line when AgentTasks finish (used to be for QueueTasks only). Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4018 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
2ca64c940277d6ee38a084dc71fa8d3003aedddf |
|
10-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
* add a couple simple test cases to the scheduler functional test for metahosts * augment one of the logging lines in the scheduler Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4009 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
d11956572cb7a5c8e9c588c9a6b4a0892de00384 |
|
08-Dec-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Make drone_manager track running processes counts using only the information passed in from the scheduler. Currently it also uses process counts derived from "ps", but that is an unreliable source of information. This improves accuracy and consistency and gives us full control over the process. This involves a few primary changes: * made the drone_manager track process counts with each PidfileId * added method declare_process_count() for the scheduler to indicate the process count of a pidfile ID during recovery (in other cases, the DroneManager gets that info in execute_process()) Doing this involved some extensive refactorings. Because the scheduler now needs to declare process counts during recovery, and because the AgentTasks are the entities that know about process counts, it made sense to move the bulk of the recovery process to the AgentTasks. Changes for this include: * converted a bunch of AgentTask instance variables to abstract methods, and added overriding implementations in subclasses as necessary * added methods register_necessary_pidfiles() and recover() to AgentTasks, allowing them to perform recovery for themselves. got rid of the recover_run_monitor() argument to AgentTasks as a result. * changed recovery code to delegate most of the work to the AgentTasks. The flow now looks like this: create all AgentTasks, call them to register pidfiles, call DroneManager to refresh pidfile contents, call AgentTasks to recover themselves, perform extra cleanup and error checking. This simplified the Dispatcher somewhat, in my opinion, though there's room for more simplification. Other changes include: * removed DroneManager.get_process_for(), which was unused, as well as related code (include the DroneManager._processes structure) * moved logic from HostQueueEntry.handle_host_failure to SpecialAgentTask._fail_queue_entry. That was the only call site. And some other bug fixes: * eliminated some extra state from QueueTask * fixed models.HostQueueEntry.execution_path(). It was returning the wrong value, but it was never used. * eliminated some big chunks from monitor_db_unittest. These broke from the refactorings described above and I deemed it not worthwhile to fix them up for the new code. I checked and the total coverage was unaffected by deleting these chunks. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@4007 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
418785bf16a0cb72a5fe5519e8693d7546cd427d |
|
23-Nov-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Some improvements to process tracking in the scheduler. * have all AgentTasks declare how many processes they'll create (as an instance attribute). this is really where the information belongs. * have Agent read its num_processes from its AgentTask, rather than requiring clients to pass it into the constructor. * have AgentTasks pass this num_processes value into the DroneManager when executing commands, and have the DroneManager use this value rather than the hack of parsing it out of the command line. this required various changed to the DroneManager code which actually fix some small bugs and make the code cleaner in my opinion. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3971 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
9bb960b90d5102cce1c8a15314900035c6c4e69a |
|
19-Nov-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Support restricting access to drones by user. Administrators can put lines like <hostname>_users: showard,scottz in the global config, where <hostname> is a drone hostname. That drone will then be limited to use by those users (that is, by jobs launched by those users, and tasks launched due to those jobs). This required numerous changes: * added a requested_by field to SpecialTask (with corresponding migration). For tasks with queue_entries, we can infer this from the job, but for those without, we need this information explicitly declared. Note this can be null if the task was created by the system, not in response to any user action. The only place this occurs now is in scheduler recovery (Dispatcher._recover_hosts_where()), but there may be an upcoming feature to periodically reverify hosts, which would be another (much more common) case. * modified all SpecialTask creation sites to pass requested_by if necessary. * modified AgentTask to keep a username attribute, and modified its run() method to pass that to PidfileRunMonitor.run(), which passes it along to DroneManager.execute_command(). * modified Agent to always keep self.task around, there's no reason to throw it away and now that we're looking at it from other classes, it's problematic if it disappears. * modified Dispatcher throttling code to pass the username when requesting max runnable processes. * added an allowed_users property to _AbstractDrone, and made DroneManager load it from the global config. * made DroneManager's max_runnable_processes() and _choose_drone_for_execution() methods accept the username and obey user restrictions. * added extensive tests for everything. the modiications required to monitor_db_unittest were annoying but not too bad. but parts of that file may need to be removed as they'll be obsoleted by monitor_db_functional_test and they'll become increasingly annoying to maintain. couple other related changes: * got rid of CleanupHostsMixin. it was only acutally needed by GatherLogsTasks (since we made the change to have GatherLogsTask always run), so I inlined it there and simplified code accordingly. * changed a bunch of places in the scheduler that were constructing new instances of Django models for existing rows. they would do something like "models.Host(id=<id of existing host>)". that's correct for scheduler DBModels, but not for Django models. For Django models, you only instantiate new instances when you want to create a new row. for fetching existing rows you always use a manager -- Model.objects.get() or Model.objects.filter() etc. this was an existing bug but wasn't exposed until I made some of the changes involved in this feature. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3961 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
a21b949e60c56bc854ff6a2cc373b7bbf63d7fec |
|
04-Nov-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Added functional test for recovering jobs with atomic hosts, with HQEs in Pending Signed-off-by: James Ren <jamesren@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3896 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
65db39368167dab1730703be3d347581527f70da |
|
28-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
* impose prioritization on SpecialTasks based on task type: Repair, then Cleanup, then Verify. remove prioritization of STs with queue entry over those without. this leads to more sane ordering of execution in certain unusual contexts -- the added functional test cases illustrate a few (in some cases, it's not just more sane, it eliminates bugs as well). * block STs from running on hosts with active HQEs, unless the ST is linked to the HQE. this is a good check in general but specifically prevents a bug where a requested reverify could run on a host in pending. there's a functional test case for that too. * block jobs from running on hosts with active agents, and let special tasks get scheduled before new jobs in each tick. this is necessary for some cases after removing the above-mentioned prioritization of STs with HQEs. otherwise, for example, a job could get scheduled before a previous post-job cleanup has run. (new test cases cover this as well.) Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3890 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
7b2d7cbcc28ea6a19554ecc3043b68103e7ab7e9 |
|
28-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
We never considered the handling of DO_NOT_VERIFY hosts in certain situations. This adds handling of those cases to the scheduler and adds tests to the scheduler functional test. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3885 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
4a60479e3f9a1576e9212a50ad78c6d55be1b97f |
|
21-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
add a bunch of tests to the scheduler functional test to cover pre- and post-job cleanup, including failure cases Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3871 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
b89004580c267ec12da4f181c76cbc3ec902037d |
|
12-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
In scheduler recovery, allow Running HQEs with no process. The tick code already handles them fine (by re-executing Autoserv), but the recovery code was explicitly disallowing them. With this change, it turns out there's only one status that's not allowed to go unrecovered -- Verifying -- so I changed the code to reflect that and I made the failure conditions more accurate. Tested this change with extensions to the new functional test. We could never really effectively test recovery code with the unit tests, but it's pretty easy and very effective (I believe) with the new test. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3824 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
f85a0b7b456fc60605f09cd16e95167feeba9c5a |
|
07-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
Explicitly release pidfiles after we're done with them. This does it in a kind of lazy way, but it should work just fine. Also extended the new scheduler functional test with a few more cases and added a test to check pidfile release under these various cases. In the process, I changed how some of the code works to allow the tests to more cleanly express their intentions. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3804 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|
34ab09918228040aa244f7f258280e59c6baca3a |
|
06-Oct-2009 |
showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> |
beginnings of a new scheduler functional test. this aims to test the entire monitor_db.py file holistically, made possible by the fact that monitor_db.py is already isolated from all direct system access through drone_manager (this was a necessary separation for distributed scheduling). by mocking out the entire drone_manager, as well as other major dependencies (email manager, global config), and filling a test database, we can allow the dispatcher to execute normally and allow it to interact with all the other code in monitor_db. at the end, we can check the state of the database and the drone_manager, and (probably most importantly, given the usual failure mode of the scheduler) we can ensure no exceptions get raised from monitor_db. right now, the test is very minimal. it's able to walk through the process of executing a normal, default job with nothing unusual arising. it checks very little, other than ensuring that the scheduler doesn't die from any exceptions. over time it can be extended to test many more cases and check many more things. this is just a start. as a side note, i added a "django" backend to database_connection.py which just uses the Django connection wrapper. this is preferable because Django has lots of nice translations already in place. for example, SQLite returns datetime columns as strings, but Django's wrapper automatically translates them into real datetime objects. we could probably just use this in lots of places, such as the frontend and scheduler unit tests. but it was necessary here. Signed-off-by: Steve Howard <showard@google.com> git-svn-id: http://test.kernel.org/svn/autotest/trunk@3801 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
|