As metricsFetcher.GetContainerMetric and cpuResourceAdvisor.update are called in separate goroutines,
the latter may be executed before the former, resulting in obtaining empty metrics and
falling back to using pod requests as estimated resources, which leads to a sharp decrease in headroom.
Signed-off-by: linzhecheng <linzhecheng@bytedance.com>
* refactor(sysadvisor): refine cpu advisor to improve clarity and fix several bugs
* refactor(sysadvisor): abstract provision and headroom assembler for extensibility
* test(sysadvisor): fix cpu advisor tests and bugs
We need to set binding numas for policy, otherwise total usage of Pod
will be treated as per numa usage of Pod.
Signed-off-by: linzhecheng <linzhecheng@bytedance.com>
The modification of data struct for checkpoint will lead to hash not matched,
we should create new checkpoint in this case.
Signed-off-by: linzhecheng <linzhecheng@bytedance.com>
* feat(eviction): support dryrun plugin
* feat(eviction): support dry run plugin
* chore(eviction): licence format
* chore(eviction): change flag name
* chore(eviction): reuse general function
* chore(eviction): print dry run plugins
* chore(eviction): rename inner eviction plugin initializers
* fix(qos): fix QoSEnhancementAnnotationSelector parser
* feat(qrm): katalyst network qrm plugin supports nic affinitive allocation
* move the network detact logic to general util
* fix(qrm): fix network plugin bugs
* switch to the latest api main and fix bugs
---------
Co-authored-by: shaowei.wayne <shaowei.wayne@bytedance.com>
1. The region names of containers belonging to the same pod
should be the same, so we have to get region by podUID.
2. The regionNames in poolInfo should not be cleaned up when updatePoolInfo.
Signed-off-by: linzhecheng <linzhecheng@bytedance.com>
* support disable dynamic configuration
* merge topologyZone Attributes and Allocations in generateTopologyZoneStatus to make sure the final zone status sorted
In order to maintain the same region names after rebuilding regionMap of sysadvisor,
we need to persist regionEntries.
And checkpoint corruption should be ignored if MetaCacheCheckpoint struct changed after upgrading.
Signed-off-by: linzhecheng <linzhecheng@bytedance.com>
* support adaptive cpu headroom policy
* fix network policy register not import path
* rename sysadvisor RegisterHealthzCheckRules to RegisterAdvisorPlugin
* change policy name adaptive to utilization
the `AddContainer` request may time out due to `storeState`,
causing container leaks in metaCache, so it is necessary clean up any excess containers.
Signed-off-by: linzhecheng <linzhecheng@bytedance.com>
* 1. refine kubelet plugin to support report topology zone to cnr
2. refine eviction, scheduler, reporter to support newly cnr definition
3. cnr reporter support merge cnr's TopologyZone field by Type and Name as unique key
4. add conversion framework to reporter manager to support transformation from old ReportField to newly one
5. support reset cnr to default when get cnr from remote with UnmarshalTypeError
* refactor(test): go test add -race flag
* fix pod resources server topology adapter restart
1. add new interface RangeAndDeleteContainer
2. break the mutex lock into finer-grained locks: podMutex, poolMutex and poolMutex.
So that poolEntries or regionEntries can be access when RangeContainer or RangeAndUpdateContainer
Signed-off-by: linzhecheng <linzhecheng@bytedance.com>