Flux Troubleshooting Tool

Problem Statement

When using FluxCD to deploy applications on Kubernetes clusters, users often need to control the deployment order of resources (e.g., ensuring that Secrets are created before Pods that consume them). In addition, Kubernetes and Flux resources naturally form hierarchical relationships (e.g., Deployment → ReplicaSet → Pod).

As a result, resources can:

  • deploy other resources
  • depend on other resources

This creates complex dependency chains. When a resource fails to deploy, identifying the root cause can be challenging. The error message may not be explicit, and the actual issue may come from another resource.

Troubleshooting often requires jumping between commands such as kubectl get, kubectl describe, flux trace, and flux reconcile across many objects.

This project aims to provide a global view of resource relationships and help users quickly identify the root cause of deployment failures.

Description

The tool provides a unified view of Flux and Kubernetes resources, helping users understand dependencies and troubleshoot failures faster.

The main capabilities are:

  • Visualize resources in a namespace, their status, and associated error messages
  • Understand parent-child relationships without manually inspecting owner references
  • Understand Kustomization dependency relationships without manually tracing dependencies
  • Quickly identify unhealthy resources through status indicators and filtering
  • Inspect manifests, events, traces, and trigger reconciliations from a single interface
  • Generate a DAG with cycle detection and root cause identification to visualize Kustomization dependency chains

Project Details

Leader: Orange (A. Pasquier, M. Richomme)

List of people/organization interested to join:

* *

Hackathon Objectives

  • Test the tool against real-world Flux deployments and troubleshooting scenarios
  • Gather feedback on missing features and usability improvements
  • Improve the robustness, resilience, performance, and responsiveness of the tool
  • Identify additional Flux/Kubernetes signals that could help improve troubleshooting
  • Compare with other tools probably already developed for the same needs