Hot & Cold Spectral GCNs: How Graph Fourier Transforms Connect Heat Flow and Cold-Start Recommendations

Hot & Cold Spectral GCNs: How Graph Fourier Transforms Connect Heat Flow and Cold-Start Recommendations

Introduction

I have always been obsessed with the Fourier Transform, it is in my opinion the single greatest invention in the history of mathematics. Check out this Veritasium video on it! Part of what makes the Fourier Transform so ubiquitous is that any function can be broken down into its component frequencies. What is less well known is that the definition of "frequency" is purely mathematical and applies to a broader class of mathematical objects than just functions! In this post I will try to provide some intuition and visualizations that expand the Fourier Transform to graphs, called the Graph Fourier Transform. Hopefully once that is clear, we will apply the Graph Fourier Transform in a Spectral Graph Convolution Network to model heat propagation in a toroidal surface.

Repo:
https://github.com/FranciscoRMendes/graph_networks/tree/main

Colab Notebook:
https://github.com/FranciscoRMendes/graph_networks/blob/main/GCN.ipynb

Classical Fourier Transform As A Special Case Of The Graph Fourier Transform

While there are many ways to view the Fourier Transform, the most revealing perspective is to regard it as multiplication of a discrete signal by a special matrix. This viewpoint is useful for several reasons.

  1. Once a signal is discretised, it becomes a vector, and any linear operation on it can be represented as multiplication by a matrix.

  2. A transform is therefore a change of basis: multiplying a vector by a matrix produces a new representation of the same data.

  3. However, only a very small number of matrices yield transformed coordinates that are interpretable. The Fourier matrix is special because its columns correspond to pure oscillations, which are the eigenvectors of every shift-invariant operator.

  4. A useful transform must also be invertible. After performing operations in the transformed domain, one should be able to recover the original signal exactly. The Fourier matrix satisfies , which gives a simple inverse and perfect reconstruction.

Every transform follows the same general recipe:

  • choose a matrix whose columns represent meaningful basis vectors,

  • multiply the signal by this matrix,

  • interpret the transformed coefficients,

  • use the inverse matrix to return to the original domain.

DFT via the Discrete Laplacian Matrix

We start by deriving the DFT in matrix form for a discrete signal. We will use this as a basis to then derive the Graph Fourier Transform.
Consider a 1-D signal sampled at evenly spaced points:

The continuous Laplacian operator is approximated on a uniform grid by the finite-difference stencil

With periodic boundary conditions, the discrete Laplacian becomes the circulant matrix (keep this in mind when we go to the graph case, we shall see later that this is exactly the Laplacian of a cycle graph):

This matrix discretises the second derivative, on a circle.

Eigenvectors of the Discrete Laplacian

The eigenvectors of are the complex exponentials

These form the DFT basis. Their corresponding eigenvalues are

Thus the discrete Laplacian admits the decomposition where is the DFT matrix and .

Fourier Transform in Matrix Form

Define the DFT matrix

The discrete Fourier transform of is the unitary matrix–vector product and the inverse transform is .

Interpretation

The classical Fourier transform is therefore the spectral decomposition of the discrete Laplacian on a 1-D grid. Its eigenvectors (complex exponentials) play the role of “frequencies,” and its eigenvalues correspond to squared frequencies:

So what the heck was the convolution?

Convolution is a local, weighted sum operation over neighbouring inputs. On a 1D signal you would need to use windows and slide them over the signal using the weighted sum operation over all signals in the window.

However, by moving to the spectral domain using the graph Fourier transform, convolution reduces to a simple multiplication: where is the matrix of eigenvectors of the graph Laplacian and is the signal on the nodes.

This is crucial because it allows us to avoid explicitly defining a complicated convolution operator. Instead, we can learn filters in the spectral domain that act directly on the eigencomponents of the signal, greatly simplifying the operation while retaining expressive power.

On a graph, performing such a convolution directly is highly nontrivial because the neighbourhoods are irregular. But what if we could mathematically transform the graph to another domain where the operation is a simple multiplcation?

General Recipe For Transforms

Diagonalizing an operator of interest is all a transform really does. Thus, the general recipe for a transform is,

  • Choose an operator that captures the structure of your data

  • Compute its eigen vectors (under some nice conditions these form a basis)

  • Assemble them into a matrix

  • Project your data into this basic

Computational Issues

In many cases, an operation becomes substantially cheaper once we move to an appropriate transform domain. Suppose an operator acting on data admits the decomposition where contains the eigenvectors of and is diagonal. Then applying to can be written as

This is advantageous because:

  • Multiplication by the diagonal matrix reduces to simple elementwise scaling.

  • Both and correspond to structured transforms (see my post on the computational benefits of low-rank factorizations), which can often be carried out efficiently.

However, these gains come with an important caveat: computing the eigen-decomposition itself is expensive. For both dense and sparse matrices, a full eigen-decomposition typically costs . If the decomposition is computed once and reused, the transform offers real computational savings. But if the eigenvectors must be recomputed repeatedly, the cost of the decomposition can outweigh the benefits of faster multiplication in the transform domain.

Graph Fourier Transform

Using the general formulation of the Transform, we can kind of get a sense of what we need in order to create a recipe for a transform. As it turns out we can define a Laplacian operator for the graph as well! And once we have that, we can use the general recipe for a transform and get to work.

The Laplacian

Take an undirected weighted graph . The normalised Laplacian is defined as:

where is the adjacency matrix and the degree matrix. Write more about why this is important and a good choice.

In our general framework of transforms, you could conceivably use any linear operator and transform it. What is important is that the operator means something in your use case. The Laplacian has a meaning (from the classical case above). There are two other operators you could think of using

  • The adjacency matrix - perfectly okay to use. But what would the eigen values and vectors mean? (the matrix is also not PSD, which is important but we wont go into that here).

  • Degree matrix - this already a diagonalized matrix, thus the decomposition would be trivial i.e. . The transform would be .

Two key facts:

  1. Laplacian eigenvectors are the “graph sinusoids” - They generalize the sine waves used in classical Fourier analysis.

  2. Laplacian eigenvalues represent graph frequencies - Small eigenvalues correspond to smooth variation across the graph; large eigenvalues correspond to high-frequency, rapidly changing signals across edges.

Connection to the 1D case:

The Laplacian for a cycle graph is identical to the Laplacian for the 1D case.

In the graph setting, the vector is not part of the graph’s structure but rather a signal defined on its vertices. Formally, it is a function assigning a real value to each node. Examples include the temperature at each location in a sensor network, the concentration of a diffusing substance, or any node-level feature such as degree, label, or an embedding. In all cases, the graph provides the geometric structure, while provides the data living on top of it.

The Graph Fourier Transform (GFT)

Given the eigendecomposition of the Laplacian:

we can write the matrices in fully expanded form as

Therefore,

Equivalently,

Each column is an eigenvector of , and its entries give the value of the -th graph frequency mode at every node of the graph.

the Graph Fourier Transform (GFT) of a graph signal is:

and the inverse transform is:

Interpretation:

  • is an item signal (e.g., a rating vector, an embedding dimension, or item popularity).

  • is the graph Fourier basis (the eigenvectors of the Laplacian).

  • decomposes the signal into frequencies over the item graph.

One-Layer Spectral GCN

Now that we understand the Graph Fourier Transform (GFT), we can place it in the context of learning on graphs. Recall the eigen decomposition of the (combinatorial or normalized) graph Laplacian: where contains the eigenvectors and contains the corresponding eigenvalues. Since the columns of form the graph Fourier basis, the GFT of a signal is simply , and the inverse GFT is .

The key observation behind spectral graph neural networks is that any linear, shift-invariant operator on the graph must commute with , and hence can be written as a function of . In the spectral domain this means:

where

is a diagonal matrix whose entries are the spectral response . This is the exact analogue of designing filters in classical Fourier analysis: multiplication by a diagonal spectral filter.

Applying this filter to a graph signal gives which mirrors the familiar “transform–scale–inverse transform’’ pipeline.

A useful intuition comes from the spectral perspective: if we apply the trivial spectral filter i.e., leave all eigenvalues unchanged, then . In other words, doing nothing in the spectral domain reproduces the original signal exactly. The graph Fourier transform framework therefore generalises the idea of filtering: by modifying , we can amplify, attenuate, or smooth different frequency components of .

This structure leads directly to the formulation of a one-layer spectral GCN. Suppose we have input features and we want to learn output features. For each output channel, we learn a spectral filter parameterised by a set of trainable weights . The spectral GCN layer becomes: where is the output feature matrix.

In other words:

  • transforms node features into the spectral domain (i.e., the GFT applied column-wise),

  • performs learned, elementwise spectral filtering,

  • transforms the filtered signals back to the vertex domain.

It is always good to have a good understanding of the exact matrix or vector that we need to "learn" so that we can represent it in PyTorch exactly! We start with the Laplacian eigendecomposition

To construct a spectral filter we introduce a learnable vector,

Thus,

This makes it clear that each frequency component is scaled independently:

and the operation modifies the contribution of each eigenvalue individually before transforming the signal back to the graph domain. Additionally, it might be worthwhile to squash the values after multiplying to make sure they are between 0 and 1. We can do this by introducing an activation function.

This is the original “spectral GCN’’ formulation of Bruna et al., and it explicitly relies on the GFT. Later work (e.g. Kipf & Welling) replaces with a polynomial approximation to avoid the eigen-decomposition, but the conceptual core remains the same: GCNs perform convolution by filtering in the GFT domain.

Application of Spectral GCN: Heat Propagation

Heat Propagation on a uniform torus

In this section, we investigate a simple setting where a Spectral Graph Convolutional Network (GCN) performs surprisingly well: predicting heat diffusion across a toroidal mesh. Although the spectral approach is elegant and effective in the right circumstances, it also highlights several structural limitations inherent to spectral methods.

Graph Model of Heat Propagation

When we zoom into a small patch of the torus and add the connecting edges, the mesh suddenly looks like a familiar graph. This makes the role of the graph Laplacian immediately intuitive.

We zoom in on the hottest point on the mesh and plot it as a graph by explicitly showing edges.

We simulate heat diffusion on the graph using the discrete heat equation:

where is the heat at each node and is the graph Laplacian. Starting from two random vertices with initial heat, we update the heat iteratively using a simple forward Euler scheme:

storing the state at each timestep to visualize how heat spreads across the mesh. Low-frequency modes of correspond to smooth, global patterns of heat, while high-frequency modes produce rapid, local variations.

Graph Fourier Transform of Heat Propagation

In order to get intuition for how the Fourier transform behaves on a graph, consider the distribution of heat on the graph surface.

  • The heat on the graph is represented by a real number for each node (temperature or heat energy in joules), so the signal is a vector where is the number of nodes.

  • If there are nodes in the graph the (combinatorial or normalized) Laplacian is an matrix .

We use the eigendecomposition of the Laplacian to move between the vertex domain and the spectral (frequency) domain: with the eigenvalues ordered . The graph Fourier transform (GFT) and inverse GFT are .

To visualise single-frequency modes we simply pick individual eigenvectors : where a natural choice is (the first nontrivial eigenvector) and (one of the largest-eigenvalue modes). Each vector assigns one scalar value to every vertex; plotting those values on the torus surface gives the heat-colour visualisation.

Practical steps used to create the figure

  1. Build a uniform torus mesh and assemble adjacency and Laplacian .

  2. Compute the eigendecomposition (for small / moderate meshes) or compute a selection of eigenpairs (Lanczos) for large meshes.

  3. Select a low-frequency eigenvector and a high-frequency eigenvector .

  4. [Optional; not done here; to show smaller values in absolute terms]Normalize each eigenvector for display: so colours are comparable across panels.

  5. Render the torus surface and colour each vertex by the value using a diverging colormap (e.g. heat) and add a colourbar showing the mapping from value to colour.

Visualizing the Graph Fourier Transform of heat on the torus

Interpreting the GFT on the torus

  • Low-frequency mode. The plotted heat corresponds to (small eigenvalue). The signal varies smoothly over the torus: neighbouring vertices have similar values, representing broad, global patterns of heat.

  • High-frequency mode. The plotted heat corresponds to (large eigenvalue). The signal alternates rapidly across nearby vertices, producing fine-scale oscillations around the torus that represent high-frequency, localised variations.

Spectral intuition

Recall, we expressed discrete heat propagation on a graph as,

where is the graph Laplacian and is a small step size.

Using the eigendecomposition of ,

we can rewrite the propagation as

Comparing with the spectral graph filtering form,

we can identify the corresponding filter as

Applying a spectral filter to a heat signal acts by scaling each mode:

so a low-pass filter suppresses the high-frequency panel patterns and produces smoother heat distributions, while a high-pass filter accentuates the oscillatory features visible in the high-frequency panel.

Neural Network To Learn

We can write a spectral graph convolution / filter with learnable parameters as

where is the eigenvector matrix of the Laplacian, is the diagonal eigenvalue matrix, and is a diagonal matrix of learnable weights acting on each eigenmode.

Fully expanding the diagonal :

and the Laplacian eigenvectors as column vectors , , we have

This makes it explicit that each column vector (the -th eigenvector) is scaled by the learnable weight in the spectral domain, and then transformed back to the original node space via to produce the predicted signal .

Why Use A Neural Network?

Two motivating examples illustrate the practical usefulness of such a model:

  • Partial Observations from Sensors
    In many real-world systems, heat or pressure sensors are only available at a small subset of points. We train the Spectral GCN using only these sparse observations, yet the learned model reconstructs and predicts the heat field across all vertices on the mesh. This effectively transforms a sparse set of measurements into a full-field prediction.

  • Generalization to a New Geometry
    One might hope that a model trained on one torus could be applied to a slightly different torus. Unfortunately, this is generally not possible in the GCN setting. The eigenvectors of the Laplacian form the coordinate system in which the model operates, and even small geometric changes produce different Laplacian spectra. As a result, the learned spectral filters are not transferable across meshes. This is a fundamental drawback of spectral GCNs. However, we shall see that the GCN framework inspires frameworks that do not suffer from this drawback.

Stability Issues And Normalization

While the Spectral GCN learns the qualitative behaviour of heat diffusion, raw training often leads to unstable predictions. After several steps, the overall temperature of the mesh may drift upward or downward, even though heat diffusion is energy-conserving. This is because the neural network makes predictions locally without obeying the laws of physics such as the law of conservation of energy. Which is why our predictions are on average “hotter” than the actual.

Two practical fixes alleviate this:

  • Eigenvalue Normalization. Applying a sigmoid or similar squashing function to the learned spectral filter ensures that each frequency component is damped in a physically plausible range. This prevents the model from amplifying high-frequency modes, which would otherwise cause heat values to explode.

  • Energy Conservation. After each predicted step, the total heat can be renormalized to match the physical energy of the system. This ensures that although the shape of the prediction is learned by the model, the magnitude remains consistent with diffusion dynamics. Empirically, this correction dramatically improves long-horizon stability.

Overall, the Spectral GCN provides a compact and interpretable model for heat propagation on a fixed mesh and performs remarkably well given its simplicity. However, its reliance on the Laplacian eigenbasis also limits its ability to generalize across geometries, motivating the need for more flexible spatial or message-passing approaches in applications where the underlying mesh may change.

Cold Start: Recommender Systems

What does spectral graph theory have to do with recommender systems? Once we view user–item behaviour as a graph, the connection becomes natural. In the spectral domain, low-frequency Laplacian eigenvectors capture broad, mainstream purchasing patterns, while high-frequency components represent niche tastes and micro-segments. Matrix Factorisation (MF) implicitly applies a low-pass filter: embeddings vary smoothly across the item–item graph, meaning MF emphasises low-frequency structure. But MF breaks down for cold-start items because an isolated item contributes no collaborative signal.

In contrast, a spectral GCN applies a learned filter

In general, we represent user-item interactions as a bipartite graph i.e. edges do not exist between products. In this scenario, even the GCN cannot help since very clearly for a node to get assigned a value it must be connected to at least one other node. However, the graph formulation provides a very intuitive way to fix this issue! Simply add edges between products that are similar to each other. Then low frequency patterns are bound to filter into the node even if high frequency niche patterns will not.

Matrix factorization resolves this issue by using side information (such as product attributes), which asserts similarity from external data. In my previous post I argued that you can achieve something similar through an intuitive edge-addition approach—even though it amounts to inserting 1’s into a fairly unintuitive matrix and factorizing it.

Conclusion

In this post, we’ve journeyed from classical Fourier transforms to the spectral domain of graphs, uncovering how eigenvectors of the graph Laplacian act as the “frequencies” of a network. We saw how spectral graph convolutional networks can learn filters in this domain, elegantly predicting heat diffusion on a toroidal mesh. Along the way, we connected these ideas to recommender systems, showing how spectral methods and graph propagation provide a principled way to tackle the cold-start problem by letting information flow from similar or popular items.

While spectral GCNs shine on fixed graphs and structured problems, they also come with caveats: eigen-decompositions can be expensive, and filters are not always transferable across different geometries. Nevertheless, the framework provides intuition and a foundation for more flexible spatial or message-passing approaches.

So, whether you’re modeling heat flowing across a mesh or figuring out what obscure sock a new customer might want next, spectral graph theory shows that Fourier Transforms can take you a long way.

In my next post, I will deal with the the two main issues of the GCN.

  • Adding a new node/ transferring information to a similar graph
  • Saving the computation of the Eigen values of the graph

Comments