Full Record

New Search | Similar Records

Author
Title Towards A Workload-Driven Flow Scheduler For Modern Datacenters
URL
Publication Date
University/Publisher University of Waterloo
Abstract Modern datacenters run different applications with various communication requirements in terms of bandwidth and deadlines. Of particular interest are deadlines that are driving web-search workloads e.g. when submitting requests to Bing search engine or loading Facebook home page. Serving the submitted requests in a timely fashion relies on meeting the deadlines of the generated scatter/gather flows for each request. The current flow-schedulers are deadline unaware, and they just start flows as soon as they arrive when the bandwidth resource is available. In this thesis, we present Artemis: a workload-driven flow-scheduler at the end-hosts that learns via reinforcement how to schedule flows to meet their deadlines. The flow-scheduling policy in Artemis is not hard-coded and is instead computed in real-time based on a reinforcement-learning control loop. In Artemis, we model flow-scheduling as a deep reinforcement learning problem, and we use the actor-critic architecture to solve it. Flows in Artemis do not start as soon as they arrive, and a source starts sending a particular flow upon requesting and acquiring a token from the destination node. The token-request is issued by the source node and it exposes the flow's requirements to the destination. At the destination side, Artemis flow-scheduler is a decision-making agent that learns how to serve the awaiting token-requests based on their embedded requirements, using the deep reinforcement learning actor-critic model. We use two gather workloads to demonstrate (1) Artemis's ability to learn how to schedule deadline flows on its own and (2) its effectiveness to meet flow deadlines. We compare the performance of Artemis against Earliest Deadline First (EDF), and two other rule-based flow-scheduling policies that, unlike EDF, are aware of both the sizes and the deadlines of the flows: Largest Size Deadline ratio First (LSDF) and Smallest Size Deadline ratio First (SSDF). LSDF schedules arrived flows with largest size deadline ratio first, while LSDF does the inverse logic. Our experimental results show that Artemis flow-scheduler is able to capture the structure of the gather workloads, maps the requirements of the arrived flows to the order at which they need be served and computes a flow-scheduling strategy based on that. Using the first gather workload that has an equal distribution of flows with (size, deadline) pairs that are equal to (350KB, 40ms) and (250KB, 50ms), Artemis met +35.58% more deadlines than EDF, +24.93% more than SSDF, and performed marginally better than LSDF with +4.42%. For the second workload, 60% of flows have a (size, deadline) pair equals to (350KB, 40ms) and 40% flows with (250KB, 50ms), Artemis outperformed all three flows-schedulers, meeting +16.34% more deadlines than the second best SSDF.
Subjects/Keywords datacenters; flow scheduling; reinforcement learning
Language en
Country of Publication ca
Record ID handle:10012/13978
Repository waterloo
Date Indexed 2019-06-26

Sample Search Hits | Sample Images | Cited Works

learning problem[44] . . . . . . . . . . . . . . . . 16 3.3 Scheduling flows in Artemis using reinforcement learning . . . . . . . . . . 17 3.4 Artemis flow-scheduler agent in one figure . . . . . . . . . . . . . . . . . . 18 4.1…

…schedule the arrived flows using a reinforcement-learning loop. In this thesis, we present Artemis: a workload-driven flow-scheduling system for modern datacenters that learns how to schedule deadline-flows via reinforcement. Traffic flows in Artemis do not…

…the application-flow requirements to the destination node. At the destination side, Artemis flow-scheduler picks which flow-request to schedule first using the deep reinforcement actorcritic learning model. The applications’ communication-requirements…

…Contributions In Artemis, we model flow-scheduling as a deep reinforcement learning problem, and we solve it at the end-hosts using the actor-critic architecture. We evaluate Artemis’s flowscheduling system using two specific deadline-driven workloads and we…

…show that: • Artemis is able to learn how to schedule deadlines flows on its own via reinforcement, starting initially with no prior knowledge about the workload characteristics and using the deep reinforcement actor-critic learning model. • Artemis is…

…via reinforcement how to schedule deadline flows. We model the flow-scheduling problem as a deep reinforcement learning task and we solve it using the actor-critic architecture. Artemis flow-scheduler adopts initially no particular flow-scheduling…

…policy following the deep reinforcement actor-critic learning model. When a scheduled flow meet its deadline, Artemis will get a reward equals to one otherwise it gets by default a reward equals to zero. During the course of serving latency sensitive…

…workload, and computes a probabilistic flow-scheduling strategy following the deep reinforcement actor-critic learning model. 3.3 Flow-Scheduling as a Deep Reinforcement Learning Problem in Artemis In Artemis, we adopt a learning-based approach where we…

.