8.9 KiB

Raw Permalink Blame History

Concepts - katalyst core concepts

Katalyst contains a lot of components, making it difficult to dive deep. This documentation will introduce the basic concepts of katalyst to help developers understand how the system works, how it abstracts the QoS model, and how you can dynamically configure the system.

Architecture

As shown in the architecture below, katalyst mainly contains three layers. For user-side API, katalyst defines a suit of QoS model along with multiple enhancements to match up with QoS requirements for different kinds of workload. Users can deploy their workload with different QoS requirements, and katalyst daemon will try to allocate proper resources and devices for those pods to satisfy their QoS requirements. This allocation process will work both at pod admission phase and runtime, taking into consideration the resource usage and QoS class of pods running on the same node. Besides, centralized components will cooperate with daemons to provide better resource adjustments for each workload with a cluster-level perspective.

Components

Katalyst contains centralized components that are deployed as deployments, and agents that run as deamonsets on each and every node.

Centralized Components

Katalyst Controllers/Webhooks

Katalyst controllers provide cluster-level abilities, including service profiling, elastic resource recommendation, core Custom Resource lifecycle management, and centralized eviction strategies run as a backstop. Katalyst webhooks are responsible for validating QoS configurations, and mutating resource requests according to service profiling.

Katalyst Scheduler

Katalyst scheduler is developed based on the scheduler v2 framework to provide the scheduling functionality for hybrid deployment and topology-aware scheduling scenarios

Custom Metrics API

Custom metrics API implements the standard custom-metrics-apiserver interface, and is responsible for collecting, storing, and inquiring metrics. It is mainly used by elastic resource recommendation and re-scheduling in the katalyst system.

Daemon Components

QoS Resource Manager

QoS Resource Manager (QRM for short) is designed as an extended framework in kubelet, and it works as a new hint provider similar to Device Manager. But unlike Device Manager, QRM aims at allocating nondiscrete resources (i.e. cpu/memory) rather than discrete devices, and it can adjust allocation results dynamically and periodically based on container running status. QRM is implemented in kubewahrf enhanced kubernetes, and if you want to get more information about QRM, please refer to qos-resource-manager.

Katalyst agent

Katalyst Agent is designed as the core daemon component to implement resource management according to QoS requirements and container running status. Katalyst agent contains several individual modules that are responsible for different functionalities. These modules can either be deployed as a monolithic container or separate ones.

Eviction Manager is a framework for eviction strategies. Users can implement their own eviction plugins to handle contention for each resource type. For more information about eviction manager, please refer to eviction-manager.
Resource Reporter is a framework for different CRDs or different fields in the same CRD. For instance, different fields in CNR may be collected through different sources, and this framework makes it possible for users to implement each resource reporter with a plugin. For more information about reporter manager, please refer to reporter-manager.
SysAdvisor is the core node-level resource recommendation module, and it uses statistical-based, indicator-based, and ml-based algorithms for different scenarios. For more information about sysadvisor, please refer to sys-advisor.
QRM Plugin works as a plugin for each resource with static or dynamic policies. Generally, QRM Plugins receive resource recommendations from SysAdvisor, and export controlling configs through CRI interface embedded in QRM Framework.

Malachite

Malachite is a unified metrics-collecting component. It is implemented out-of-tree, and serves node, numa, pod and container level metrics through an http endpoint from which katalyst will query real-time metrics data. In a real-world production environment, you can replace malachite with your own metric implementations.

QoS

To extend the ability of kubernetes' original QoS level, katalyst defines its own QoS level with CPU as the dominant resource. Other than memory, CPU is considered as a divisible resource and is easier to isolate. And for cloudnative workloads, CPU is usually the dominant resource that causes performance problems. So katalyst uses CPU to name different QoS classes, and other resources are implicitly accompanied by it.

Definition

Qos level	Feature	Target Workload	Mapped k8s QoS
dedicated_cores	Bind with a quantity of dedicated cpu cores Without sharing with any other pod	Workload that's very sensitive to latency such as online advertising, recommendation.	Guaranteed
shared_cores	Share a set of dedicated cpu cores with other shared_cores pods	Workload that can tolerate a little cpu throttle or neighbor spikes such as microservices for webservice.	Guaranteed/Burstable
reclaimed_cores	Over-committed resources that are squeezed from dedicated_cores or shared_cores Whenever dedicated_cores or shared_cores need to claim their resources back, reclaimed_cores will be suppressed or evicted	Workload that mainly cares about throughput rather than latency such as batch bigdata, offline training.	BestEffort
system_cores	Reserved for core system agents to ensure performance	Core system agents.	Burstable

Pool

As introduced above, katalyst uses the term pool to indicate a combination of resources that a batch of pods share with each other. For instance, pods with shared_cores may share a shared pool, meaning that they share the same cpusets, memory limits and so on; in the meantime, if cpuset_pool enhancement is enabled, the single shared pool will be separated into several pools based on the configurations.

Enhancement

Beside the core QoS level, katalyst also provides a mechanism to enhance the ability of standard QoS levels. The enhancement works as a flexible extensibility, and may be added continuously.

Enhancement	Feature
numa_binding	Indicates that the pod should be bound into a (or several) numa node(s) to gain further performance improvements Only supported by dedicated_cores
cpuset_pool	Allocate a separated cpuset in shared_cores pool to isolate scheduling domain for identical pods. Only supported by shared_cores
...

Configurations

To make the configuration more flexible, katalyst designs a new mechanism to set configs on the run, and it works as a supplement for static configs defined via command-line flags. In katalyst, the implementation of this mechanism is called KatalystCustomConfig (KCC for short). It enables each daemon component to dynamically adjust its working status without restarting or re-deploying. For more information about KCC, please refer to dynamic-configuration.

8.9 KiB Raw Permalink Blame History