Once MetalLB is installed and configured, to expose a service
externally, simply create it with spec.type
set to LoadBalancer
,
and MetalLB will do the rest.
MetalLB attaches informational events to the services that it’s
controlling. If your LoadBalancer is misbehaving, run kubectl
describe service <service name>
and check the event log.
MetalLB respects the spec.loadBalancerIP
parameter, so if you want
your service to be set up with a specific address, you can request it
by setting that parameter. If MetalLB does not own the requested
address, or if the address is already in use by another service,
assignment will fail and MetalLB will log a warning event visible in
kubectl describe service <service name>
.
MetalLB also supports requesting a specific address pool, if you want
a certain kind of address but don’t care which one exactly. To request
assignment from a specific pool, add the
metallb.universe.tf/address-pool
annotation to your service, with the
name of the address pool as the annotation value. For example:
apiVersion: v1
kind: Service
metadata:
name: nginx
annotations:
metallb.universe.tf/address-pool: production-public-ips
spec:
ports:
- port: 80
targetPort: 80
selector:
app: nginx
type: LoadBalancer
MetalLB understands the service’s externalTrafficPolicy
option, and
implements different announcements modes depending on the policy and
announcement protocol you select.
When announcing in layer2 mode, MetalLB forces the traffic policy to
Cluster
. In this mode, the elected leader node receives all the
inbound traffic, and kube-proxy
load-balances from there to
individual pods.
This policy results in uniform traffic distribution across all pods in
the service. However, kube-proxy
will obscure the source IP address
of the connection when it does load-balancing, so your pod logs will
show that external traffic appears to be coming from the cluster’s
leader node.
When announcing over BGP, MetalLB respects the service’s
externalTrafficPolicy
option, and implements two different
announcement modes depending on what policy you select. If you’re
familiar with Google Cloud’s Kubernetes load balancers, you can
probably skip this section: MetalLB’s behaviors and tradeoffs are
identical.
With the default Cluster
traffic policy, every node in your cluster
will attract traffic for the service IP. On each node, the traffic is
subjected to a second layer of load-balancing (provided by
kube-proxy
), which directs the traffic to individual pods.
This policy results in uniform traffic distribution across all nodes
in your cluster, and across all pods in your service. However, it
results in two layers of load-balancing (one at the BGP router, one at
kube-proxy
on the nodes), which can cause inefficient traffic
flows. For example, a particular user’s connection might be sent to
node A by the BGP router, but then node A decides to send that
connection to a pod running on node B.
The other downside of the “Cluster” policy is that kube-proxy
will
obscure the source IP address of the connection when it does its
load-balancing, so your pod logs will show that external traffic
appears to be coming from your cluster’s nodes.
With the Local
traffic policy, nodes will only attract traffic if
they are running one or more of the service’s pods locally. The BGP
routers will load-balance incoming traffic only across those nodes
that are currently hosting the service. On each node, the traffic is
forwarded only to local pods by kube-proxy
, there is no “horizontal”
traffic flow between nodes.
This policy provides the most efficient flow of traffic to your
service. Furthermore, because kube-proxy
doesn’t need to send
traffic between cluster nodes, your pods can see the real source IP
address of incoming connections.
The downside of this policy is that it treats each cluster node as one “unit” of load-balancing, regardless of how many of the service’s pods are running on that node. This may result in traffic imbalances to your pods.
For example, if your service has 2 pods running on node A and one pod
running on node B, the Local
traffic policy will send 50% of the
service’s traffic to each node. Node A will split the traffic it
receives evenly between its two pods, so the final per-pod load
distribution is 25% for each of node A’s pods, and 50% for node B’s
pod. In contrast, if you used the Cluster
traffic policy, each pod
would receive 33% of the overall traffic.
In general, when using the Local
traffic policy, it’s recommended to
finely control the mapping of your pods to nodes, for example
using
node anti-affinity,
so that an even traffic split across nodes translates to an even
traffic split across pods.
In future, MetalLB might be able to overcome the downsides of the
Local
traffic policy, in which case it would be unconditionally the
best mode to use with BGP
announcements. See
issue 1 for more
information.