1

在过去的几天里,我们在部署(通过 Helm)到 Kubernetes v1.11.2 时遇到了间歇性部署失败。

失败时,kubectl describe <deployment>通常会报容器创建失败:

Events:
Type    Reason     Age   From                   Message
----    ------     ----  ----                   -------
Normal  Scheduled  1s    default-scheduler      Successfully assigned default/pod-fc5c8d4b8-99npr to fh1-node04
Normal  Pulling    0s    kubelet, fh1-node04    pulling image "docker-registry.internal/pod:0e5a0cb1c0e32b6d0e603333ebb81ade3427ccdd"
Error from server (BadRequest): container "pod" in pod "pod-fc5c8d4b8-99npr" is waiting to start: ContainerCreating

我们可以在 kubelet 日志中找到的唯一问题是:

58468 kubelet_pods.go:146] Mount cannot be satisfied for container "pod", because the volume is missing or the volume mounter is nil: {Name:default-token-q8k7w ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}
58468 kuberuntime_manager.go:733] container start failed: CreateContainerConfigError: cannot find volume "default-token-q8k7w" to mount container start failed: CreateContainerConfigError: cannot find volume "default-token-q8k7w" to mount into container "pod"

它是间歇性的,这意味着每 20 次左右的部署就会失败一次。重新运行部署按预期工作。

集群和节点运行状况在部署时看起来都很好,所以我们不知道从哪里开始。寻找关于下一步从哪里开始诊断问题的建议。

编辑:根据要求,部署文件是通过 Helm 模板生成的,输出如下所示。有关更多信息,我们的许多服务都使用了相同的 Helm 模板,但只有这个特定的服务有这个间歇性问题:

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: pod
  labels:
    app: pod
    chart: pod-0.1.0
    release: pod
    heritage: Tiller
    environment: integration
  annotations:
    kubernetes.io/change-cause: https://github.com/path_to_release
spec:
  replicas: 2
  revisionHistoryLimit: 3
  selector:
    matchLabels:
      app: pod
      release: pod
      environment: integration
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: pod
        release: pod
        environment: integration
    spec:
      containers:
        - name: pod
          image: "docker-registry.internal/pod:0e5a0cb1c0e32b6d0e603333ebb81ade3427ccdd"
          env:
            - name: VAULT_USERNAME
              valueFrom:
                secretKeyRef:
                  name: "pod-integration"
                  key: username
            - name: VAULT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: "pod-integration"
                  key: password
          imagePullPolicy: IfNotPresent
          command: ['mix', 'phx.server']

          ports:
            - name: http
              containerPort: 80
              protocol: TCP
          envFrom:
          - configMapRef:
              name: pod

          livenessProbe:
            httpGet:
              path: /api/health
              port: http
            initialDelaySeconds: 10
          readinessProbe:
            httpGet:
              path: /api/health
              port: http
            initialDelaySeconds: 10
          resources:
            limits:
              cpu: 750m
              memory: 200Mi
            requests:
              cpu: 500m
              memory: 150Mi
4

0 回答 0