History log of /external/autotest/scheduler/monitor_db_functional_test.py
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
5cca8180477b25ab2b83861cbaaeb1fb35fd931d 20-Nov-2017 Allen Li <ayatane@chromium.org> [autotest] Use JobHandoff for job_aborter tracking

This handles all cases of job_reporter failure after spawning.

The main loophole in the previous model is if job_reporter crashes
before making the lease file, as job_aborter relied on expired leases
to determine if cleanup is necessary.

We now rely on JobHandoff to track ongoing jobs.

The lease file is still kept to determine if a job still has an active
job_reporter owning it.

Previously we cared only about expired lease files, now we care only
about active lease files combined with JobHandoff.

BUG=chromium:748234
TEST=bin/test_lucifer
TEST=bin/job_aborter without crashing

Change-Id: I97e8b53f2fbd33d41a2b567e2797de8811a36e88
Reviewed-on: https://chromium-review.googlesource.com/780768
Commit-Ready: Allen Li <ayatane@chromium.org>
Tested-by: Allen Li <ayatane@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
6285198f32ff77aed80f78f488c30bfb18d24365 18-Nov-2017 Xixuan Wu <xixuan@chromium.org> autotest: schedule RESET task for DUTs that fail jobs.

BUG=chromium:782501
TEST=Ran server & client job on local autotest.
Ran unittest.

Change-Id: Iaa87d24b8b1d873279b8fc127b8bcdad46c4059c
Reviewed-on: https://chromium-review.googlesource.com/780394
Commit-Ready: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Dan Shi <dshi@google.com>
/external/autotest/scheduler/monitor_db_functional_test.py
3df0b2bf7e77ee4aecc629b40365d489f9e4cdc9 17-May-2017 Allen Li <ayatane@chromium.org> [autotest] Skip archiving after parsing

Normally, PARSING -> ARCHIVING -> COMPLETED/FAILED

but all ARCHIVING does is set the status. So cut the middleman and
set the status immediately.

BUG=chromium:699275
TEST=None

Change-Id: I6d3d69a1e5aa9167405d301fa1e0af382f9c2b4e
Reviewed-on: https://chromium-review.googlesource.com/507020
Commit-Ready: Allen Li <ayatane@chromium.org>
Tested-by: Allen Li <ayatane@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
0f261debdbba7664ef7b0eacbac0e7daba89eebd 02-Feb-2017 Allen Li <ayatane@chromium.org> [autotest] [atomic] Remove atomic groups from scheduler

BUG=chromium:681906
TEST=Run unittest suite

Change-Id: If9c144aae8d2a8df567a5a03b02bc3fec5d14c0d
Reviewed-on: https://chromium-review.googlesource.com/435565
Commit-Ready: Allen Li <ayatane@chromium.org>
Tested-by: Allen Li <ayatane@chromium.org>
Reviewed-by: Richard Barnette <jrbarnette@google.com>
/external/autotest/scheduler/monitor_db_functional_test.py
c1f41e806039e541b152516058eea535687e07a4 06-Jan-2017 Prathmesh Prabhu <pprabhu@chromium.org> Move FakeGlobalConfig to the global_config module.

This CL moves a fake object for global_config from a unittest module to
the global_config module so that other unittests can use it too.

BUG=None.
TEST=python scheduler/monitor_db_functional_test.py

Change-Id: If35ee9db0ef332ce890920b365da93075af4737a
Reviewed-on: https://chromium-review.googlesource.com/425720
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
6a46f0bff029acbb98a11bde0d737fb78b5f8b77 21-Dec-2016 Prathmesh Prabhu <pprabhu@chromium.org> [scheduler] Mock out another config value for functional test.

Currently, if a developer sets inline_host_acquisition=False in their
local shadow_config, this functional test fails. Make the test more
hermetic by mocking out this value.

BUG=None
TEST=Functional test finally passes with my modified shadow_config.

Change-Id: Id57bd345ea5cb9e1d0c78c39b75f90102354b4aa
Reviewed-on: https://chromium-review.googlesource.com/422655
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
303d2669b78eb5a2ebd6c41dce80ca18d06a19cf 02-Jul-2016 Laurence Goodby <lgoodby@google.com> [autotest] Scheduler fix for SYNC_COUNT > 1.

Details in go/autotest-sync-count-fix

BUG=chromium:621257
TEST=Run included tests.

Change-Id: If08df8fb04771a321dbdf2122b885935e7ef3b41
Reviewed-on: https://chromium-review.googlesource.com/358092
Commit-Ready: Laurence Goodby <lgoodby@chromium.org>
Tested-by: Laurence Goodby <lgoodby@chromium.org>
Reviewed-by: Richard Barnette <jrbarnette@chromium.org>
Reviewed-by: Richard Barnette <jrbarnette@google.com>
Reviewed-by: Dan Shi <dshi@google.com>
/external/autotest/scheduler/monitor_db_functional_test.py
f47a6bbb9971efd228eaa22425431b91fa9f69bf 29-Aug-2014 Prashanth B <beeps@chromium.org> Revert "[autotest] Restore from inconsistent state after the scheduler was interrupted."

This reverts commit b7c842f8c8ba135bb03a0862ac0c880d3158bf07.

Change-Id: I8d34329b8a2771eb4068ab50414c9eac6fd73d3f
Reviewed-on: https://chromium-review.googlesource.com/215612
Reviewed-by: Prashanth B <beeps@chromium.org>
Commit-Queue: Prashanth B <beeps@chromium.org>
Tested-by: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
b7c842f8c8ba135bb03a0862ac0c880d3158bf07 24-Jul-2014 Jakob Juelich <jakobjuelich@google.com> [autotest] Restore from inconsistent state after the scheduler was interrupted.

If the scheduler assigns hosts to hqes but hasn't set a execution_subdir yet,
an exception is thrown. With this, the database will be cleaned up once,
when the scheduler starts. Jobs that are in an inconsistent state, will just
be resetted so they can be scheduled again.

BUG=chromium:334353
DEPLOY=scheduler
TEST=Ran utils/unittest_suite.py and manually set db into inconsistent state.

Change-Id: I96cc5634ae5120beab59b160e735245be736ea92
Reviewed-on: https://chromium-review.googlesource.com/209635
Tested-by: Jakob Jülich <jakobjuelich@chromium.org>
Reviewed-by: Prashanth B <beeps@chromium.org>
Commit-Queue: Jakob Jülich <jakobjuelich@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
36accc6a2a572e9d502407b34701f535a169f524 23-Jul-2014 Jakob Jülich <jakobjuelich@google.com> [autotest] Fixing and re-enabling monitor_db_functional_test.

The test was disabled and outdated. Database access and mocking of the drone
manager changed. This fixes these issues, updates the unit tests to the
current status and reanables them.

BUG=chromium:395756
DEPLOY=scheduler
TEST=ran ./utils/unittest_suite.py

Change-Id: I6a3eda5ddfaf07f06d6b403692b004b22939ffb6
Reviewed-on: https://chromium-review.googlesource.com/209567
Reviewed-by: Alex Miller <milleral@chromium.org>
Tested-by: Jakob Jülich <jakobjuelich@google.com>
Commit-Queue: Jakob Jülich <jakobjuelich@google.com>
/external/autotest/scheduler/monitor_db_functional_test.py
4ec9867f46deb969c154bebf2e64729d56c3a1d3 15-May-2014 Prashanth B <beeps@google.com> [autotest] Split host acquisition and job scheduling II.

This cl creates a stand-alone service capable of acquiring hosts for
new jobs. The host scheduler will be responsible for assigning a host to
a job and scheduling its first special tasks (to reset and provision the host).
There on after, the special tasks will either change the state of a host or
schedule more tasks against it (eg: repair), till the host is ready to
run the job associated with the Host Queue Entry to which it was
assigned. The job scheduler (monitor_db) will only run jobs, including the
special tasks created by the host scheduler.

Note that the host scheduler won't go live till we flip the
inline_host_acquisition flag in the shadow config, and restart both
services. The host scheduler is dead, long live the host scheduler.

TEST=Ran the schedulers, created suites. Unittests.
BUG=chromium:344613, chromium:366141, chromium:343945, chromium:343937
CQ-DEPEND=CL:199383
DEPLOY=scheduler, host-scheduler

Change-Id: I59a1e0f0d59f369e00750abec627b772e0419e06
Reviewed-on: https://chromium-review.googlesource.com/200029
Reviewed-by: Prashanth B <beeps@chromium.org>
Tested-by: Prashanth B <beeps@chromium.org>
Commit-Queue: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
04be2bd5e4666a5c253e9c30ab20555e04286032 08-May-2014 Ilja H. Friedel <ihf@chromium.org> Autotest: Change logging.warn() to logging.warning().

logging.warn() is deprecated. See
http://bugs.python.org/issue13235

Substitution was performed via
~/cros/src/third_party/autotest/files$ find ./ -type f | xargs sed -i 's/logging.warn(/logging.warning(/'

BUG=None.
TEST=There should be one-- and preferably only one --obvious way to do it.

Change-Id: Ie5665743121a49f7fbd5d1f47896a7c65e87e489
Reviewed-on: https://chromium-review.googlesource.com/198793
Commit-Queue: Ilja Friedel <ihf@chromium.org>
Tested-by: Ilja Friedel <ihf@chromium.org>
Reviewed-by: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
cc9fc70587d37775673e47b3dcb4d6ded0c6dcb4 02-Dec-2013 beeps <beeps@chromium.org> [autotest] RDB Refactor II + Request/Response API.

Scheduler Refactor:
1. Batched processing of jobs.
2. Rdb hits the database instead of going through host_scheduler.
3. Migration to add a leased column.The scheduler released hosts
every tick, back to the rdb.
4. Client rdb host that queue_entries use to track a host, instead
of a database model.

Establishes a basic request/response api for the rdb:
rdb_utils:
1. Requests: Assert the format and fields of some basic request types.
2. Helper client/server modules to communicate with the rdb.
rdb_lib:
1. Request managers for rdb methods:
a. Match request-response
b. Abstract the batching of requests.
2. JobQueryManager: Regulates database access for job information.
rdb:
1. QueryManagers: Regulate database access
2. RequestHandlers: Use query managers to get things done.
3. Dispatchers: Send incoming requests to the appropriate handlers.
Ignores wire formats.

TEST=unittests, functional verification.
BUG=chromium:314081, chromium:314083, chromium:314084
DEPLOY=scheduler, migrate

Change-Id: Id174c663c6e78295d365142751053eae4023116d
Reviewed-on: https://chromium-review.googlesource.com/183385
Reviewed-by: Prashanth B <beeps@chromium.org>
Commit-Queue: Prashanth B <beeps@chromium.org>
Tested-by: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
1a18905776c0a53e2a169f61dbf5bdad3bd0cb74 28-Oct-2013 Dan Shi <dshi@chromium.org> [autotest] revert suite throttling

Undo CL:
https://chromium-review.googlesource.com/#/c/167175
keep stats call and _notify_process_limit_hit.

BUG=chromium:
TEST=suite runn in local setup, unittest
DEPLOY=scheduler

Change-Id: I713b69651fabfb8cbb4f9c1ca3a8605900753bc9
Reviewed-on: https://chromium-review.googlesource.com/174896
Commit-Queue: Dan Shi <dshi@chromium.org>
Reviewed-by: Dan Shi <dshi@chromium.org>
Tested-by: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
6ee996fd9252ae7d0c56d2daa6cf085c3e02358c 28-Feb-2013 Alex Miller <milleral@chromium.org> [autotest] Fix scheduler unittesting framework.

They're alive! Most of the fixes here are to adapt to the fact that we
no longer default to running a cleanup post-job, and that any
verify+cleanup before a job gets merged into a reset.

There's also some mocking out of global_config because extended use of
the scheduler_config forces us to provide a mocked value for any setting
touched in a test.

BUG=chromium:305072
DEPLOY=scheduler
TEST=this?

Change-Id: Ie7ada591c31766f30647fd6c4ba151e5dd0d1003
Reviewed-on: https://chromium-review.googlesource.com/58380
Reviewed-by: Dan Shi <dshi@chromium.org>
Tested-by: Alex Miller <milleral@chromium.org>
Reviewed-by: Alex Miller <milleral@chromium.org>
Commit-Queue: Alex Miller <milleral@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
dfff2fdc8477be3ba89fd915fde2afe8d3716624 28-May-2013 Alex Miller <milleral@chromium.org> [autotest] Add a provision special task.

We now insert a special task which calls |autoserv --provision| with the
host that the HQE is about to run on to provision the machine correctly
before the test runs. If the provisioning fails, the HQE will also be
marked as failed. No provisioning special task will be queued if no
provisioning needs to be done to the host before the job can/will run.

With *just* this CL, no provisioning tasks should actually get
scheduled, because the part of the scheduler that maps HQEs to hosts
hasn't been taught about provisioning yet. That will come in a later
CL.

Once this CL goes in, it should not be reverted. The scheduler will
become very unhappy if it sees special tasks in its database, but can't
find a corresponding AgentTask definition for them. One would need to
do manual database cleanup to revert this CL. However, since one can
disable provisioning by reverting the (future) scheduling change CL,
this shouldn't be an issue.

BUG=chromium:249437
DEPLOY=scheduler
TEST=lots:
* Ran a job on a host with a non-matching cros-version:* label, and
a provision special task was correctly created. It ran after Reset,
and correctly kicked off the HQE after it finished.
* Ran a job on a host with a matching cros-version:* label, and no
provision special task was created.
* Ran a job on a host with a non-matching cros-version:* label, and
modified Reset so that it would fail. When reset failed, it canceled
the provision task, and the HQE was still rescheduled.
* Ran a job on a host with a non-matching cros-version:* label, and
modified the cros-version provisioning test to throw an exception.
The provision special task aborted the HQE with the desired semantics
(see comments in the ProvisionTask class in monitor_db), and scheduled
a repair to run after its failure.
The provision failures were all deduped against each other when bug
filing was enabled. See
https://code.google.com/p/autotest-bug-filing-test/issues/detail?id=1678
* Successfully debugged an autoupdate/devserver issue from provision
logs, thus proving that sufficient information is collected for debug.

Change-Id: I96dbfc7b001b90e7dc09e1196c0901adf35ba4d8
Reviewed-on: https://gerrit.chromium.org/gerrit/58385
Reviewed-by: Prashanth Balasubramanian <beeps@chromium.org>
Tested-by: Alex Miller <milleral@chromium.org>
Commit-Queue: Prashanth Balasubramanian <beeps@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
52ce11d6291bbbd1bde435a62afcaf364db1b502 02-Aug-2012 Yu-Ju Hong <yjhong@google.com> Autotest: Make archiving step configurable and disable it by default.

This change makes the archiving step configurable in
global_config.ini. The variable "enable_archiving" is disabled by
default.

The Autotest scheduler performs the archive step after parsing. This
step spawns an autoserv process, runs site_archive_results, and executes
rsync to copy .archive.log back to cautotest. We do not need this step
since all our test results are rsync'd back after running the tests.

BUG=chromium-os:33061
TEST=run tests with local autotest setup

Change-Id: I1f2aac8f92ebd2a4d10c4bd85be2d111063ad251
Reviewed-on: https://gerrit.chromium.org/gerrit/29056
Commit-Ready: Yu-Ju Hong <yjhong@chromium.org>
Reviewed-by: Yu-Ju Hong <yjhong@chromium.org>
Tested-by: Yu-Ju Hong <yjhong@chromium.org>
/external/autotest/scheduler/monitor_db_functional_test.py
aa5133608fb8ea153fb396f332121b617869dcb7 02-Mar-2011 Dale Curtis <dalecurtis@chromium.org> Host scheduler refactoring. Move HostScheduler out of monitor_db.

In order to facilitate site extensibility of HostScheduler we need to factor out the dependence on global variables in monitor_db. I modeled this refactoring off of monitor_db_cleanup.

The main changes I've made are as follows:
1. Move BaseHostScheduler, site import, and SchedulerError out of monitor_db. SchedulerError must be moved to prevent a cyclical dependency.
2. Convert staticmethod/classmethods in BaseHostScheduler, to normal methods.
3. Fix unit tests and monitor_db to import SchedulerError from host_scheduler.

Change-Id: I0c10b79e70064b73121bbb347bb71ba15e0353d1

BUG=chromium-os:12654
TEST=Ran unit tests. Tested with private Autotest instance.

Review URL: http://codereview.chromium.org/6597047
/external/autotest/scheduler/monitor_db_functional_test.py
b8f3f354dc07aea89a5301522c8a79d394cba79e 10-Jun-2010 jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> Set host status to RUNNING on QueueTask abort, since queue entry will be in
GATHERING state. Also modify a logging string to be more precise.

Signed-off-by: James Ren <jamesren@google.com>



git-svn-id: http://test.kernel.org/svn/autotest/trunk@4591 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
e7c65cbace24181c9bd364569de7e05742b8a162 08-Jun-2010 jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> Don't try stopping the job on HQE abort, and have the dispatcher stop all
necessary jobs in bulk. This avoids a scheduler crash on an assertion.

Signed-off-by: James Ren <jamesren@google.com>



git-svn-id: http://test.kernel.org/svn/autotest/trunk@4585 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
76fcf19ec42d5c7580d2e7891e4610e5fe725286 21-Apr-2010 jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> Add ability to associate drone sets with jobs. This restricts a job to
running on a specified set of drones.

Signed-off-by: James Ren <jamesren@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@4439 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
dd855244f44b65d0508345c6fef74846652c8c26 02-Mar-2010 jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> Abstract out common models used in the frontend's models.py so that django is not required to interact with non Django portions of the code.

This includes the enums RebootBefore, RebootAfter and Test.Type


git-svn-id: http://test.kernel.org/svn/autotest/trunk@4280 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
c44ae99354228290914326d42ef1e743b5b7e4b8 19-Feb-2010 jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> Refactor scheduler models into a separate module, scheduler_models. This module doesn't depend on monitor_db, only the other way around. The separation and isolation of dependencies should help us organize the scheduler code a bit better.

This was made possible largely by the many changes we made late last year to improve statelessness of the scheduler. It was motivated here by my work on pluggable metahost handlers, which will need to depend on scheduler models. Without this separation, we'd end up with circular dependencies.

Also includes some fixes for metahost schedulers.

Signed-off-by: Steve Howard <showard@google.com>


Property changes on: scheduler/scheduler_models.py


git-svn-id: http://test.kernel.org/svn/autotest/trunk@4252 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
883492a628bfe5a24bd281cfcac036d77a2acc4e 12-Feb-2010 jamesren <jamesren@592f7852-d20e-0410-864c-8624ca9c26a4> First iteration of pluggable metahost handlers. This change adds the basic framework and moves the default, label-based metahost assignment code into a handler. It includes some refactorings to the basic scheduling code to make things a bit cleaner.

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@4232 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
5c114c7ba4be43c88f9967e06a9aacdd3861264c 25-Jan-2010 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> Fix scheduler functional test for recent change to parse hostless jobs.

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@4163 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
fd8b89f0117366c9aeaad9b600a43238a84b4ab9 20-Jan-2010 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> don't set the current user to my_user in frontend_test_utils. let it default to the new autotest_system user.

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@4156 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
c1a98d1e146080bd3e4f034cb13d740dfb1535f4 15-Jan-2010 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> Support for job keyvals
* can be passed as an argument to create_job, stored in AFE DB
* scheduler reads them from the AFE DB and writes them to the job-level keyval file before the job starts
* parser reads them from the keyval file and writes them to the TKO DB in a new table

Since the field name "key" happens to be a MySQL keyword, I went ahead and made db.py support proper quoting of field names. Evetually it'd be really nice to deprecate db.py and use Django models exclusively, but that is a far-off dream.

Still lacking support in the AFE and TKO web clients and CLIs, at least the TKO part will be coming soon

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@4123 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
12b45582c04b2417036a6f11afc843ac5fddea50 11-Jan-2010 lmr <lmr@592f7852-d20e-0410-864c-8624ca9c26a4> Massive permission fix

Fix permissions for all the development tree

Signed-off-by: Lucas Meneghel Rodrigues <lmr@redhat.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@4094 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
4608b005f15444d2ec4601b8274828ad52b5ea51 05-Jan-2010 mbligh <mbligh@592f7852-d20e-0410-864c-8624ca9c26a4> Add a new Archiving stage to the scheduler, which runs after Parsing. This stage is responsible for copying results to the results server in a drone setup, a task currently performed directly by the scheduler, and allows for site-specific archiving functionality, replacing the site_parse functionality. It does this by running autoserv with a special control file (scheduler/archive_results.control.srv), which loads and runs code from the new scheduler.archive_results module. The implementation was mostly straightfoward, as the archiving stage is fully analogous to the parser stage. I did make a couple of refactorings:
* factored out the parser throttling code into a common superclass that the ArchiveResultsTask could share
* added some generic flags to Autoserv to duplicate special-case functionality we'd added for the --collect-crashinfo option -- namely, specifying a different pidfile name and specifying that autoserv should allow (and even expect) an existing results directory. in the future, i think it'd be more elegant to make crashinfo collection run using a special control file (as archiving works), rather than a hard-coded command-line option.
* moved call to server_job.init_parser() out of the constructor, since this was an easy source of exceptions that wouldn't get logged.

Note I believe some of the functional test changes slipped into my previous change there, which is why that looks smaller than you'd expect.

Signed-off-by: Steve Howard <showard@google.com>

==== (deleted) //depot/google_vendor_src_branch/autotest/tko/site_parse.py ====


git-svn-id: http://test.kernel.org/svn/autotest/trunk@4070 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
2b38f67cc7c52ea368514bc14f98eded2bc477dd 23-Dec-2009 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> Add test case for aborting a synchronous job while it's throttled in the Starting state. Was trying to repro a bug. It doesn't repro, indicating that maybe the bug has already been fixed (or maybe this test case is missing something). Either way, it's good to have another test case around.

Also fixing a little test bug where we need to mock out a new global config value.

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@4043 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
78f5b016b5367cb51b1f031b31e3afea6ebd2d74 23-Dec-2009 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> Update to Django 1.1.1. I want to use a new feature for my RESTful interface prototyping (direct inclusion of URL patterns in URLconfs).

The one obstacle this presented was that Django 1.1.1 changes the DB connection object to accept DB config information in its constructor, rather than reading it from django.conf.settings on-demand. This was a problem because we change stuff in django.conf.settings on the fly to do our fancy test DB stuff -- basically, we initialize a SQLite DB once, copy it off, and then copy it between test cases, rather than clearing and reconstructing the initial DB. I did measurements and it turns out all that jazz wasn't really saving us much time at all, so I just got rid of it all. Django's testing stuff has improved and v1.1 even has some new tricks for using transactions to accomplish the above with a dramatic speedup, so we ought to look into using that in the future.

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@4041 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
493beaab73a8a87a3ce8d7f47b7ce92417b04fbd 18-Dec-2009 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> fix a bug with pre-job keyvals, introduced in recent refactorings, and added new test to check it

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@4020 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
a9545c0ab3d8f3e36efadaefdcf37393708666d9 18-Dec-2009 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> backend support for hostless jobs
* support in rpc_interface.create_job() and models for creating a hostless job -- a job with one queue entry with no host, meta_host or atomic_group
* support in scheduler for recognizing and executing such a job. the bulk of the work was in extracting an AbstractQueueTask class from QueueTask, containing all the logic not pertaining to hosts. I then added a simple HostlessQueueTask class also inheriting from it.
Also got rid of HostQueueEntry.get_host() and added an extra log line when AgentTasks finish (used to be for QueueTasks only).

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@4018 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
2ca64c940277d6ee38a084dc71fa8d3003aedddf 10-Dec-2009 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> * add a couple simple test cases to the scheduler functional test for metahosts
* augment one of the logging lines in the scheduler

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@4009 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
d11956572cb7a5c8e9c588c9a6b4a0892de00384 08-Dec-2009 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> Make drone_manager track running processes counts using only the information passed in from the scheduler. Currently it also uses process counts derived from "ps", but that is an unreliable source of information. This improves accuracy and consistency and gives us full control over the process.

This involves a few primary changes:
* made the drone_manager track process counts with each PidfileId
* added method declare_process_count() for the scheduler to indicate the process count of a pidfile ID during recovery (in other cases, the DroneManager gets that info in execute_process())

Doing this involved some extensive refactorings. Because the scheduler now needs to declare process counts during recovery, and because the AgentTasks are the entities that know about process counts, it made sense to move the bulk of the recovery process to the AgentTasks. Changes for this include:
* converted a bunch of AgentTask instance variables to abstract methods, and added overriding implementations in subclasses as necessary
* added methods register_necessary_pidfiles() and recover() to AgentTasks, allowing them to perform recovery for themselves. got rid of the recover_run_monitor() argument to AgentTasks as a result.
* changed recovery code to delegate most of the work to the AgentTasks. The flow now looks like this: create all AgentTasks, call them to register pidfiles, call DroneManager to refresh pidfile contents, call AgentTasks to recover themselves, perform extra cleanup and error checking. This simplified the Dispatcher somewhat, in my opinion, though there's room for more simplification.

Other changes include:
* removed DroneManager.get_process_for(), which was unused, as well as related code (include the DroneManager._processes structure)
* moved logic from HostQueueEntry.handle_host_failure to SpecialAgentTask._fail_queue_entry. That was the only call site.
And some other bug fixes:
* eliminated some extra state from QueueTask
* fixed models.HostQueueEntry.execution_path(). It was returning the wrong value, but it was never used.
* eliminated some big chunks from monitor_db_unittest. These broke from the refactorings described above and I deemed it not worthwhile to fix them up for the new code. I checked and the total coverage was unaffected by deleting these chunks.

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@4007 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
418785bf16a0cb72a5fe5519e8693d7546cd427d 23-Nov-2009 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> Some improvements to process tracking in the scheduler.
* have all AgentTasks declare how many processes they'll create (as an instance attribute). this is really where the information belongs.
* have Agent read its num_processes from its AgentTask, rather than requiring clients to pass it into the constructor.
* have AgentTasks pass this num_processes value into the DroneManager when executing commands, and have the DroneManager use this value rather than the hack of parsing it out of the command line. this required various changed to the DroneManager code which actually fix some small bugs and make the code cleaner in my opinion.

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@3971 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
9bb960b90d5102cce1c8a15314900035c6c4e69a 19-Nov-2009 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> Support restricting access to drones by user. Administrators can put lines like

<hostname>_users: showard,scottz

in the global config, where <hostname> is a drone hostname. That drone will then be limited to use by those users (that is, by jobs launched by those users, and tasks launched due to those jobs). This required numerous changes:
* added a requested_by field to SpecialTask (with corresponding migration). For tasks with queue_entries, we can infer this from the job, but for those without, we need this information explicitly declared. Note this can be null if the task was created by the system, not in response to any user action. The only place this occurs now is in scheduler recovery (Dispatcher._recover_hosts_where()), but there may be an upcoming feature to periodically reverify hosts, which would be another (much more common) case.
* modified all SpecialTask creation sites to pass requested_by if necessary.
* modified AgentTask to keep a username attribute, and modified its run() method to pass that to PidfileRunMonitor.run(), which passes it along to DroneManager.execute_command().
* modified Agent to always keep self.task around, there's no reason to throw it away and now that we're looking at it from other classes, it's problematic if it disappears.
* modified Dispatcher throttling code to pass the username when requesting max runnable processes.
* added an allowed_users property to _AbstractDrone, and made DroneManager load it from the global config.
* made DroneManager's max_runnable_processes() and _choose_drone_for_execution() methods accept the username and obey user restrictions.
* added extensive tests for everything. the modiications required to monitor_db_unittest were annoying but not too bad. but parts of that file may need to be removed as they'll be obsoleted by monitor_db_functional_test and they'll become increasingly annoying to maintain.

couple other related changes:
* got rid of CleanupHostsMixin. it was only acutally needed by GatherLogsTasks (since we made the change to have GatherLogsTask always run), so I inlined it there and simplified code accordingly.
* changed a bunch of places in the scheduler that were constructing new instances of Django models for existing rows. they would do something like "models.Host(id=<id of existing host>)". that's correct for scheduler DBModels, but not for Django models. For Django models, you only instantiate new instances when you want to create a new row. for fetching existing rows you always use a manager -- Model.objects.get() or Model.objects.filter() etc. this was an existing bug but wasn't exposed until I made some of the changes involved in this feature.

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@3961 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
a21b949e60c56bc854ff6a2cc373b7bbf63d7fec 04-Nov-2009 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> Added functional test for recovering jobs with atomic hosts, with HQEs
in Pending

Signed-off-by: James Ren <jamesren@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@3896 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
65db39368167dab1730703be3d347581527f70da 28-Oct-2009 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> * impose prioritization on SpecialTasks based on task type: Repair, then Cleanup, then Verify. remove prioritization of STs with queue entry over those without. this leads to more sane ordering of execution in certain unusual contexts -- the added functional test cases illustrate a few (in some cases, it's not just more sane, it eliminates bugs as well).
* block STs from running on hosts with active HQEs, unless the ST is linked to the HQE. this is a good check in general but specifically prevents a bug where a requested reverify could run on a host in pending. there's a functional test case for that too.
* block jobs from running on hosts with active agents, and let special tasks get scheduled before new jobs in each tick. this is necessary for some cases after removing the above-mentioned prioritization of STs with HQEs. otherwise, for example, a job could get scheduled before a previous post-job cleanup has run. (new test cases cover this as well.)

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@3890 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
7b2d7cbcc28ea6a19554ecc3043b68103e7ab7e9 28-Oct-2009 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> We never considered the handling of DO_NOT_VERIFY hosts in certain situations. This adds handling of those cases to the scheduler and adds tests to the scheduler functional test.

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@3885 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
4a60479e3f9a1576e9212a50ad78c6d55be1b97f 21-Oct-2009 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> add a bunch of tests to the scheduler functional test to cover pre- and post-job cleanup, including failure cases

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@3871 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
b89004580c267ec12da4f181c76cbc3ec902037d 12-Oct-2009 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> In scheduler recovery, allow Running HQEs with no process. The tick code already handles them fine (by re-executing Autoserv), but the recovery code was explicitly disallowing them. With this change, it turns out there's only one status that's not allowed to go unrecovered -- Verifying -- so I changed the code to reflect that and I made the failure conditions more accurate.

Tested this change with extensions to the new functional test. We could never really effectively test recovery code with the unit tests, but it's pretty easy and very effective (I believe) with the new test.

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@3824 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
f85a0b7b456fc60605f09cd16e95167feeba9c5a 07-Oct-2009 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> Explicitly release pidfiles after we're done with them. This does it in a kind of lazy way, but it should work just fine. Also extended the new scheduler functional test with a few more cases and added a test to check pidfile release under these various cases. In the process, I changed how some of the code works to allow the tests to more cleanly express their intentions.

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@3804 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py
34ab09918228040aa244f7f258280e59c6baca3a 06-Oct-2009 showard <showard@592f7852-d20e-0410-864c-8624ca9c26a4> beginnings of a new scheduler functional test. this aims to test the entire monitor_db.py file holistically, made possible by the fact that monitor_db.py is already isolated from all direct system access through drone_manager (this was a necessary separation for distributed scheduling). by mocking out the entire drone_manager, as well as other major dependencies (email manager, global config), and filling a test database, we can allow the dispatcher to execute normally and allow it to interact with all the other code in monitor_db. at the end, we can check the state of the database and the drone_manager, and (probably most importantly, given the usual failure mode of the scheduler) we can ensure no exceptions get raised from monitor_db.

right now, the test is very minimal. it's able to walk through the process of executing a normal, default job with nothing unusual arising. it checks very little, other than ensuring that the scheduler doesn't die from any exceptions. over time it can be extended to test many more cases and check many more things. this is just a start.

as a side note, i added a "django" backend to database_connection.py which just uses the Django connection wrapper. this is preferable because Django has lots of nice translations already in place. for example, SQLite returns datetime columns as strings, but Django's wrapper automatically translates them into real datetime objects. we could probably just use this in lots of places, such as the frontend and scheduler unit tests. but it was necessary here.

Signed-off-by: Steve Howard <showard@google.com>


git-svn-id: http://test.kernel.org/svn/autotest/trunk@3801 592f7852-d20e-0410-864c-8624ca9c26a4
/external/autotest/scheduler/monitor_db_functional_test.py