Golang Graceful Shutdown

When building resilient Go applications, handling graceful shutdown is a must. The goal is to let our server finish processing in-flight requests and cleanly release any resources (like database connections or caches) before our application fully terminates.

Full code example:

func run() error {
	// Initialize resources (e.g., database, cache, etc.)
	// For demonstration, we just set up a simple HTTP server.
	mux := http.NewServeMux()
	mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		w.Write([]byte("Hello!"))
	})

	server := http.Server{
		Addr:    ":80",
		Handler: mux,
		// Configure timeouts according to our application needs.
		ReadTimeout:       5 * time.Second,
		WriteTimeout:      10 * time.Second,
		IdleTimeout:       30 * time.Second,
		ReadHeaderTimeout: 2 * time.Second,
	}

	// Create a context that listens for termination signals.
	// This context will be canceled when one of the specified signals is received.
	ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM, syscall.SIGQUIT)
	defer stop()

	// Channel to capture server errors
	errCh := make(chan error, 1)

	// Start the server in a separate goroutine.
	go func() {
		log.Printf("HTTP server starting on %s", server.Addr)
		// ListenAndServe will return http.ErrServerClosed on graceful shutdown.
		errCh <- server.ListenAndServe()
	}()

	var err error

	// Wait for either a server error or a shutdown signal.
	select {
	case err = <-errCh:
		// An error occurred while running the server.
	case <-ctx.Done():
		// A termination signal was received.
		log.Printf("Shutdown signal was received")
	}

	// Create a context with a timeout for the graceful shutdown process.
	shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
	defer cancel()

	log.Println("Initiating graceful shutdown...")
	// Shutdown the server
	if shutdownErr := server.Shutdown(shutdownCtx); shutdownErr != nil {
		// Merge the server error with the shutdown error. ie if the net.listener.Accept() returns an error.
		return errors.Join(err, shutdownErr)
	}

	// Clean up other resources (e.g., database, cache, etc.)  
	// In many cases, if we are working in an API, all the other resources are consumed by our HTTP handlers, so we should shut down the server first to stop accepting new requests, and then clean up the resources.  
	// This way, our in-flight requests will still have access to the resources they need.

	return err
}

Here’s a breakdown of the approach:

Catching Termination Signals

Using signal.NotifyContext, we create a context that cancels as soon as a termination signal (like SIGINT, SIGTERM, etc.) is received. This allows the application to start a graceful shutdown process immediately.

ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM, syscall.SIGQUIT)
defer stop()

Starting the Server in a Goroutine and Monitoring for Errors

The server runs concurrently, and any error (including http.ErrServerClosed on a graceful shutdown) is sent over a channel so that the main goroutine can react accordingly.

errCh := make(chan error, 1)
go func() {
    log.Printf("HTTP server starting on %s", server.Addr)
    errCh <- server.ListenAndServe()
}()

Waiting

We wait for either an error from the server or a cancellation of the context (indicating that a termination signal was received). This select block is the heart of the graceful shutdown logic.

var err error
select {
case err = <-errCh:
    // An error occurred while running the server.
case <-ctx.Done():
    log.Printf("Shutdown signal was received")
}

Shutting Down the HTTP Server First

It is crucial to shut down the HTTP server before cleaning up other resources. By doing so, we prevent new requests from being accepted while still allowing in-flight requests to complete. This ensures that the cleanup of dependent resources (e.g., databases, caches) does not interrupt ongoing operations that still rely on them.

Anti-pattern: Closing databases first → panics in in-flight handlers.

shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

log.Println("Initiating graceful shutdown...")
if shutdownErr := server.Shutdown(shutdownCtx); shutdownErr != nil {
    return errors.Join(err, shutdownErr)
}

// Clean up other resources (e.g., database, cache, etc.)

On Context Management: Why Not to Use the Signal Context as the Base Context

While it might be tempting to use the context returned by signal.NotifyContext as our application context, doing so can lead to unintended cancellation of in-flight requests. The signal context is specifically designed to trigger the shutdown sequence—when a termination signal is received, the context is cancelled immediately. If this same context is used for initializing services and handling business logic, the cancellation might propagate to parts of our application that should remain active until a graceful shutdown is fully in progress.

ctx := context.Background()
services := services.New(ctx)
repository := repository.New(ctx)

Using separate contexts—one for our application's core operations and one for handling termination signals—ensures that our application can manage shutdowns gracefully without sacrificing the integrity or completion of ongoing processes.

Logging Specific Signals with signal.Notify

In scenarios where it’s necessary to log or handle the specific signal received (for instance, tracking if Kubernetes is killing our application), using signal.Notify instead of signal.NotifyContext provides more granular control:

sigquit := make(chan os.Signal, 1)
signal.Notify(sigquit, syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM, syscall.SIGQUIT)
// ...
select {
case err = <-errCh:
    // Handle server error
case sig := <-sigquit:
    log.Printf("Caught signal: %v", sig)
}

So use signal.Notify (vs NotifyContext) when we need:

Logging specific signals (e.g., SIGTERM vs SIGQUIT).
Custom handling for signals (e.g., SIGHUP for config reload).

Note that, as of now, signal.NotifyContext does not provide the specific signal received—which is why this approach can be useful for detailed logging.