docs/design/namespace_resource

id: namespace_resource_quota title: Namespace Resource Quota

In K8s, user can setup namespace with resource quotas to limit aggregated resource consumption in this namespace. The validation of namespace resource quotas is handled in api-server directly, therefore YuniKorn simply honors the quotas like the default scheduler.

Best practice

It is not mandatory to setup YuniKorn queues with respect of namespaces. However, in practice, it makes more sense to do so. Namespace is often used to set a cap for resource consumptions per user-group/team, YuniKorn queue is also meant to divide cluster resource into multiple groups. Let's go through an example.

1. Setup namespace

Namespace: advertisement:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: advertisement
spec:
  hard:
    requests.cpu: "200m"
    requests.memory: 2000Mi
    limits.cpu: "200m"
    limits.memory: 4000Mi

Create the namespace

kubectl create namespace advertisement
kubectl create -f ./advertisement.yaml --namespace=advertisement
kubectl get quota --namespace=advertisement
kubectl describe quota advertisement --namespace=advertisement

// output
Name:            advertisement
Namespace:       advertisement
Resource         Used  Hard
--------         ----  ----
limits.cpu       0     200m
limits.memory    0     4000Mi
requests.cpu     0     200m
requests.memory  0     2000Mi

2. Setup YuniKorn queues

Queue: advertisement:

name: advertisement
resources:
  guaranteed:
    vcore: 100
    memory: 1000
  max:
    vcore: 200
    memory: 2000

ensure QueueMaxResource <= NamespaceResourceQuotaRequests

3. Mapping applications to queues & namespace

In a pod spec

apiVersion: v1
kind: Pod
metadata:
  namespace: advertisement
  labels:
    app: sleep
    applicationId: "application_2019_01_22_00001"
    queue: "root.advertisement"
  name: task0
spec:
  schedulerName: yunikorn
  containers:
    - name: sleep-5s
      image: "alpine:latest"
      command: ["/bin/ash", "-ec", "while :; do echo '.'; sleep 5 ; done"]
      resources:
        requests:
          cpu: "50m"
          memory: "800M"
        limits:
          cpu: "100m"
          memory: "1000M"

Check Quota

kubectl describe quota advertisement --namespace=advertisement

Name:            advertisement
Namespace:       advertisement
Resource         Used  Hard
--------         ----  ----
limits.cpu       100m  200m
limits.memory    1G    4000Mi
requests.cpu     50m   200m
requests.memory  800M  2000Mi

Now submit another application,

apiVersion: v1
kind: Pod
metadata:
  namespace: advertisement
  labels:
    app: sleep
    applicationId: "application_2019_01_22_00002"
    queue: "root.advertisement"
  name: task1
spec:
  schedulerName: yunikorn
  containers:
    - name: sleep-5s
      image: "alpine:latest"
      command: ["/bin/ash", "-ec", "while :; do echo '.'; sleep 5 ; done"]
      resources:
        requests:
          cpu: "200m"
          memory: "800M"
        limits:
          cpu: "200m"
          memory: "1000M"

pod will not be able to submitted to api-server, because the requested cpu 200m + used cpu 100m = 300m which exceeds the resource quota.

kubectl create -f pod_ns_adv_task1.yaml
Error from server (Forbidden): error when creating "pod_ns_adv_task1.yaml": pods "task1" is forbidden: exceeded quota: advertisement, requested: limits.cpu=200m,requests.cpu=200m, used: limits.cpu=100m,requests.cpu=50m, limited: limits.cpu=200m,requests.cpu=200m

Future Work

For compatibility, we should respect namespaces and resource quotas. Resource quota is overlapped with queue configuration in many ways, for example the requests quota is just like queue's max resource. However, there are still a few features resource quota can do but queue cannot, such as

Resource limits. The aggregated resource from all pods in a namespace cannot exceed this limit.
Storage Resource Quota, e.g storage size, PVC number, etc.
Object Count Quotas, e.g count of PVCs, services, configmaps, etc.
Resource Quota can map to priority class.

Probably we can build something similar to cover (3) in this list. But it would be hard to completely support all these cases.

But currently, setting applications mapping to a queue as well as a corresponding namespace is over complex. Some future improvements might be:

Automatically detects namespaces in k8s-shim and map them to queues. Behind the scenes, we automatically generates queue configuration based on namespace definition. Generated queues are attached under root queue.
When new namespace added/updated/removed, similarly to (1), we automatically update queues.
User can add more configuration to queues, e.g add queue ACL, add child queues on the generated queues.
Applications submitted to namespaces are transparently submitted to corresponding queues.