History log of /external/autotest/scheduler/rdb_hosts.py
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
5de10171860a2ef3a30aa2ba6a67431673120649 05-Jun-2015 Dan Shi <dshi@chromium.org> [autotest] Fix couple issues in bulk post metadata.

The metadb had an outage last couple hours and led to all data reported
get the same timestamp. This CL is to fix the bug and increase buffer size.

1. Increase the buffer size, also limit size in a single upload.
2. Add time_recorded in the host history metadata.

BUG=None
TEST=local

Change-Id: I157eb25ab0b8aeb227080aca47e42d3834ae1337
Reviewed-on: https://chromium-review.googlesource.com/275641
Trybot-Ready: Dan Shi <dshi@chromium.org>
Tested-by: Dan Shi <dshi@chromium.org>
Reviewed-by: Fang Deng <fdeng@chromium.org>
Commit-Queue: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/rdb_hosts.py
cf2e8dd3f81d5eb4c9720db396ebbf64fd7b9ae4 08-May-2015 Dan Shi <dshi@chromium.org> [autotest] Add a new thread to upload metadata reported by scheduler

Currently host state change was reported to metadb before the change is
committed to database. Each change makes a ES post call to send data. To avoid
performance overhead for scheduler, UDP is used. UDP has a data lost issue.
Especially that the ES server now lives in GCE, while scheduler runs in a
different network.

This CL attempts to fix the issue by reporting metadata in a separate thread
in bulk. The performance of ES bulk API is much better than individual calls.
For example, a single index request through HTTP might take 80ms. For bulk API,
1000 records can be indexed in less than 0.5 second.

BUG=chromium:471015
TEST=run local scheduler, make sure all metadata was uploaded. Also, confirm
scheduler can be properly shut down.

Change-Id: I38991b9e647bb7a6fcaade8e8ef9eea27d9aa035
Reviewed-on: https://chromium-review.googlesource.com/270074
Reviewed-by: Dan Shi <dshi@chromium.org>
Commit-Queue: Dan Shi <dshi@chromium.org>
Trybot-Ready: Dan Shi <dshi@chromium.org>
Tested-by: Dan Shi <dshi@chromium.org>
Reviewed-by: Keith Haddow <haddowk@chromium.org>
/external/autotest/scheduler/rdb_hosts.py
b72f4fbcf1583da27f09f4abb9d8162530bf4559 21-Jan-2015 Gabe Black <gabeblack@chromium.org> graphite: Reorganize the elastic search code so we can put it in chromite.

This change reorganizes the elastic search integration code so that it's
separate from the code that, for instance, reads config information from the
autotest global config. That way, it can be moved from chromite without
breaking any dependencies.

BUG=chromium:446291
TEST=Ran stats_es_functionaltest.py. Ran unit tests. Ran a butterfly-paladin
tryjob with --hwtest.

Change-Id: I0dbf135c4f1732d633e5fc9d5edb9e1f4f7199d5
Reviewed-on: https://chromium-review.googlesource.com/242701
Reviewed-by: Dan Shi <dshi@chromium.org>
Tested-by: Gabe Black <gabeblack@chromium.org>
Commit-Queue: Gabe Black <gabeblack@chromium.org>
/external/autotest/scheduler/rdb_hosts.py
8c98ac10beaa08bfb975c412b0b3bda23178763a 23-Dec-2014 Prashanth Balasubramanian <beeps@google.com> [autotest] Send frontend jobs to shards.

Frontend jobs on hosts that are on the shard are disallowed
currently, because the host-scheduler on master currently
ignore jobs based on meta-host, but frontend jobs have no
meta-host. This CL have the following changes:
- Make host-scheduler ignore frontend jobs that are supposed
to be picked by shard.
- Send such frontend jobs in heartbeat.
- Allow creation of frontend jobs in rpc.

TEST=Test the follows:
- Create a job on a host on shard from AFE frontend.
Observe it runs on shards and completes on master.
- Create a job on two hosts (one host on shard, the other on master)
from AFE frontend. Make sure exception is railed with correct
message.
- Run a normal dummy suite on shard, make sure normal flow still
works. Heartbeat contains the right information.
- Run a normal dummy suite on master, make sure it works.
BUG=chromium:444790
DEPLOY=apache, host-scheduler

Change-Id: Ibca3d36cb59fed695233ffdc89506364c402cc37
Reviewed-on: https://chromium-review.googlesource.com/240396
Reviewed-by: Mungyung Ryu <mkryu@google.com>
Reviewed-by: Dan Shi <dshi@chromium.org>
Commit-Queue: Fang Deng <fdeng@chromium.org>
Tested-by: Fang Deng <fdeng@chromium.org>
/external/autotest/scheduler/rdb_hosts.py
0e96b046c053e8b9e85c6512e75b850ffbbd4358 30-Sep-2014 Dan Shi <dshi@chromium.org> [autotest] Record host's platform and pool info in host status metadata

BUG=chromium:419043
TEST=local run tests, and query test ES server:
http://172.25.61.45:9200/_plugin/elastic-hammer/
search for:
{"query": {"bool": {"minimum_should_match": 4,
"should": [{"term": {"_type": "host_history"}},
{"term": {"hostname": "172.27.213.193"}},
{"term": {"pools": "bvt"}},
{"range": {"time_recorded": {"gte": 1407196142.893756,
"lte": 1507282542.893756}}}]}},
"size": 10000,
"sort": [{"time_recorded": "asc"}]}

Confirm the results has metadata like:
platform: "peppy"
pools[]
"bvt"
"suites"

Change-Id: I068427510fc983b9ef4cc448e8a5bb1dace71c52
Reviewed-on: https://chromium-review.googlesource.com/220551
Commit-Queue: Dan Shi <dshi@chromium.org>
Tested-by: Dan Shi <dshi@chromium.org>
Reviewed-by: Fang Deng <fdeng@chromium.org>
/external/autotest/scheduler/rdb_hosts.py
e4cb9e23709425fcbb5faec40bfe824b818bd106 29-Aug-2014 Dan Shi <dshi@chromium.org> [autotest] Change devserver stats call to log data only for staging artifacts.

File names are different as they have build information included. Recording
stats for each file will create too many counters in graphite and lead to disk
space issue.

BUG=chromium:404475
TEST=local setup.
To verify metadata:
visit http://172.25.61.45:9200/_plugin/elastic-hammer/
update search url with index of local setup: dshi.mtv/_search
search for:
{
"query": {"bool": {"minimum_should_match": 1,
"should": [{"term": {"_type": "devserver"}}
]}},
"size": 10000,
"sort": [{"time_recorded": "asc"}]}

Confirm the data.

Change-Id: Ic11f76c3ef6fbf8cf5d25312d3285e1e9b87a178
Reviewed-on: https://chromium-review.googlesource.com/215640
Tested-by: Dan Shi <dshi@chromium.org>
Reviewed-by: Fang Deng <fdeng@chromium.org>
Commit-Queue: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/rdb_hosts.py
7cf3d84fda609f6402543bb7e0bf3e3b7f93d539 13-Aug-2014 Dan Shi <dshi@chromium.org> [autotest] Record more metadata to include more job related information.

For host history rpc to return more info like job name/owner, we added a new
attribute metadata_info to host object. metadata_info is a dictionary
containing information such as task_id, task_name, job_id, job_name,
parent_job_id.
When host status is changed, metadata_info will be reported to metaDB.

Examples of metadata_info are:

{"hostname": "192.96.48.88", "task_name": "Verify", "task_id": 6551}
{"job_name": "dummy_pass", "hostname": "192.96.48.88", "task_name": "Reset", "task_id": 6552, "job_id": 4133, "parent_job_id": 4132}
{'owner': 'debug_user', 'parent_job_id': null, 'job_id': 4140, 'job_name': 'dummy_pass'}

BUG=chromium:394451
TEST=local setup,
site_utils/host_history.py --hosts 100.96.48.196
test output with visiting page: http://172.25.61.45:9200/_plugin/elastic-hammer/
Enter following in filter, then click search to see results.
{"query": {"bool": {"minimum_should_match": 3,
"should": [{"term": {"_type": "host_history"}},
{"term": {"hostname": "100.96.48.196"}},
{"range": {"time_recorded": {"gte": 1408121317,
"lte": 1409007215}}}]}},
"size": 10000,
"sort": [{"time_recorded": "asc"}]}

Change-Id: Icddc27fb39529924d0030dbec97176a2d8b683fc
Reviewed-on: https://chromium-review.googlesource.com/212304
Reviewed-by: Dan Shi <dshi@chromium.org>
Tested-by: Dan Shi <dshi@chromium.org>
Commit-Queue: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/rdb_hosts.py
0d7474640b6a6e4334d16921fe0df79418007af9 17-Jul-2014 Michael Liang <michaelliang@chromium.org> [autotest] Log host history and categorize metadata in es by _type

Log hostname, status, time stamp, along with a debug string which
has job_id and hqe id to metadata db under the _type='host_history'.
I also modified scheduler_models to have log with _type='hqe_status'.
es_utils index now defaults to autotest instance, i.e. 'cautotest'.

BUG=None
TEST=Ran generic_RebootTest locally and verified status logged in esdb.
TEST=python stats_es_functionaltest.py --all --es_port=prod
Change-Id: I64223ed12c45c5e2adeca7630cd9d2ffd28dd2c2
DEPLOY=scheduler
Reviewed-on: https://chromium-review.googlesource.com/208711
Reviewed-by: Michael Liang <michaelliang@chromium.org>
Tested-by: Michael Liang <michaelliang@chromium.org>
Reviewed-by: Dan Shi <dshi@chromium.org>
Commit-Queue: Michael Liang <michaelliang@chromium.org>
/external/autotest/scheduler/rdb_hosts.py
2d8047e8b2d901bec66d483664d8b6322501d245 28-Apr-2014 Prashanth B <beeps@google.com> [autotest] In process request/host caching for the rdb.

This cl implements an in process host cache manager for the rdb. The
following considerations were taken into account while designing it:
1. The number of requests outweigh the number of leased hosts
2. The number of net hosts outweighs the number of leased hosts
3. The 'same' request can consult the cache within the span of a single
batched request. These will only be same in terms of host labels/acls
required, not in terms of priority or parent_job_id.

Resulting ramifications:
1. We can't afford to consult the database for each request.
2. We can afford to refresh our in memory representation of a host
just before leasing it.
3. Leasing a host can fail, as we might be using a stale cached host.
4. We can't load a map of all hosts <-> labels each request.
5. Invalidation is hard for most sane, straight-forward choices of
keying hosts against requests.
6. Lower priority requests will starve if they try to lease the same
hosts taken by higher priority requests.

Main design tenets:
1. We can tolerate some staleness in the cache, since we're going
to make sure the host is unleased just before using it.
2. If a job hits a stale cache line it tries again next tick.
3. Trying to invalidate the cache within a single batched request will
be unnecessarily complicated and error prone. Instead, to prevent
starvation, each request only invalidates its cache line, by removing
the hosts it has just leased.
4. The same host may be preset in 2 different cache lines but this won't
matter because each request will check the leased bit in real time before
acquiring it.
5. The entire cache is invalidated at the end of a batched request.

TEST=Ran suites, unittests.
BUG=chromium:366141
DEPLOY=Scheduler

Change-Id: Iafc3ffa876537da628c52260ae692bc2d5d3d063
Reviewed-on: https://chromium-review.googlesource.com/197788
Reviewed-by: Dan Shi <dshi@chromium.org>
Tested-by: Prashanth B <beeps@chromium.org>
Commit-Queue: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/rdb_hosts.py
b474fdfd353cdb0888191f4b80e47e6b5343d891 04-Apr-2014 Prashanth B <beeps@google.com> [autotest] Lease hosts according to frontend job priorities.

This cl modifies the way we lease hosts by teaching the
RDBServerHostWrapper to handle host leasing. Though this involves
a seperate query for each host it leads to a design we can later
build atomicity into, because we can check the leased bit on a single
host before setting it. This model of leasing also has the following benefits:
1. It doesn't abuse the response map.
2. It gives us more clarity into which reqeusts are acquiring
hosts by setting the leased bit in step with host validation.
3. It is more tolerant to db errors because exceptions raised while
leasing one host will not fail the entire batched request.

This cl also adds an rdb_unittest module.

TEST=Unittests, ran suites.
BUG=chromium:353183
DEPLOY=scheduler

Change-Id: I35c04bcb37eee0191a211c133a35824cc78b5d71
Reviewed-on: https://chromium-review.googlesource.com/193182
Reviewed-by: Prashanth B <beeps@chromium.org>
Commit-Queue: Prashanth B <beeps@chromium.org>
Tested-by: Prashanth B <beeps@chromium.org>
/external/autotest/scheduler/rdb_hosts.py
489b91d72cd225e902081dbd3f9e47448fe867f6 15-Mar-2014 Prashanth B <beeps@google.com> [autotest] Establish a common interface for host representation.

This cl has the work needed to ensure that schema changes made on
the server trickled down into the client. If the same changes don't
reflect on the client, creating or saving a client host wrapper for
a given host will fail deterministically on the client side until
modules using the rdb_host are modified to reflect the changes.

1. rdb_hosts: A module containing the host heirarchy needed to
establish a dependence between the creation of the RDBServerHostWrapper
(which is serialized and returned to the client, which converts it
into an RDBClientHostWrapper) and the saving of the RDBClientHostWrapper
through and rdb update request.
2. rdb_requests: Contains the requests/request managers that were in
rdb_utils, because I plan to expand them in subsequent cls.
3. rdb_model_extensions: Contains model classes common
to both server and client that help in establishing
the common host model interface.
4. rdb integration tests.

TEST=Ran suites, unittests
BUG=chromium: 348176
DEPLOY=scheduler

Change-Id: I0bbab1dd184e505b1130ee73714e45ceb7bf4189
Reviewed-on: https://chromium-review.googlesource.com/191357
Commit-Queue: Prashanth B <beeps@chromium.org>
Tested-by: Prashanth B <beeps@chromium.org>
Reviewed-by: Dan Shi <dshi@chromium.org>
/external/autotest/scheduler/rdb_hosts.py