π§ Concept 14: HPA (Horizontal Pod Autoscaling π―)




π 1. Core Idea (1-line)
π HPA automatically increases/decreases number of Pods based on load
π§ 2. Why HPA Exists (VERY IMPORTANT β οΈ)
Without autoscaling:
-
Traffic spike β app crashes β
-
Low traffic β wasted resources β
π Manual scaling is not practical
π‘ 3. What HPA Does
π Based on metrics (usually CPU):
-
High load β increase pods π
-
Low load β decrease pods π
βοΈ 4. How It Works (VERY IMPORTANT π₯)
Flow:
Metrics Server β HPA β Deployment β ReplicaSet β Podsπ 5. Example Logic
Target CPU = 50%
If current = 80% β scale UP
If current = 20% β scale DOWNπ¦ 6. Example YAML
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50π§ 7. Requirements (IMPORTANT β οΈ)
π HPA needs:
-
Metrics Server installed
-
Resource requests defined β
π₯ 8. Real-world Example (Your Domain π)
ML API:
-
Normal traffic β 2 pods
-
Traffic spike β 10 pods
-
Midnight β back to 2
π Fully automatic β‘
π₯ 9. Types of Metrics
-
CPU (most common)
-
Memory
-
Custom metrics (Prometheus) π₯
β οΈ 10. Common Mistakes
β Not setting resource requests β HPA fails
β No metrics server installed
β Wrong thresholds
πΌ 11. Interview Answer
π βHPA automatically scales the number of pods in a deployment based on observed metrics like CPU utilization to handle varying workloads efficiently.β
β‘ 12. Commands (CKA π₯)
kubectl get hpa
kubectl describe hpa <name>
kubectl autoscale deployment my-dep --cpu-percent=50 --min=2 --max=10π§ 13. Memory Trick
π HPA = traffic-based scaling ππ
π₯ 14. Pro Insight (Real-world)
-
Combine:
-
HPA + Cluster Autoscaler
-
For full auto infra scaling π―
π Next Step
Bol:
π βnextβ
Then we go to:
π₯ Concept 15: Taints & Tolerations (Advanced Scheduling π― β VERY IMPORTANT FOR INTERVIEWS + REAL WORLD)