Flux Troubleshooting Tool
Problem Statement
When using FluxCD to deploy applications on Kubernetes clusters, users often need to control the deployment order of resources (e.g., ensuring that Secrets are created before Pods that consume them). In addition, Kubernetes and Flux resources naturally form hierarchical relationships (e.g., Deployment → ReplicaSet → Pod).
As a result, resources can:
- deploy other resources
- depend on other resources
This creates complex dependency chains. When a resource fails to deploy, identifying the root cause can be challenging. The error message may not be explicit, and the actual issue may come from another resource.
Troubleshooting often requires jumping between commands such as kubectl get, kubectl describe, flux trace, and flux reconcile across many objects.
This project aims to provide a global view of resource relationships and help users quickly identify the root cause of deployment failures.
Description
The tool provides a unified view of Flux and Kubernetes resources, helping users understand dependencies and troubleshoot failures faster.
The main capabilities are:
- Visualize resources in a namespace, their status, and associated error messages
- Understand parent-child relationships without manually inspecting owner references
- Understand Kustomization dependency relationships without manually tracing dependencies
- Quickly identify unhealthy resources through status indicators and filtering
- Inspect manifests, events, traces, and trigger reconciliations from a single interface
- Generate a DAG with cycle detection and root cause identification to visualize Kustomization dependency chains
Project Details
Leader: Orange (A. Pasquier, M. Richomme)
List of people/organization interested to join:
* *
Hackathon Objectives
- Test the tool against real-world Flux deployments and troubleshooting scenarios
- Gather feedback on missing features and usability improvements
- Improve the robustness, resilience, performance, and responsiveness of the tool
- Identify additional Flux/Kubernetes signals that could help improve troubleshooting
- Compare with other tools probably already developed for the same needs