Hub Analysis for Obsidian
The conception of this feature can be attributed to my late-night Wikipedia binging (this chain of articles being relatively short, from AWS, to Google Cloud Platform, to Google Search, finally to PageRank). I saw a graph, some math, and a handy Python implementation, so I figured I should take a shot at applying it.
The original Path Analysis finds connections between two specific notes. The shiny new Hub Analysis examines your entire vault to understand its global structure.
In a knowledge management context,, Path Analysis helps you understand specific relationships – how your note on productivity connects to your thoughts on creativity. Hub Analysis, on the other hand, reveals the structural backbone of your entire thinking system – which notes serve as the conceptual anchors that hold your knowledge network together.
PageRank: Steal our data, we take your (defunct) algorithm!
At the heart of Hub Analysis is PageRank, the same algorithm that powered Google's early search dominance. Much like the internet, your Obsidian vault is essentially a graph composed of many nodes (notes) linking to each other—unless you forgo that feature for some reason, in which case, this algorithm is entirely useless to you.
A note is considered important if many important notes link to it. This creates a feedback loop where importance flows through your knowledge network like water finding its natural channels.
When you link from one note to another, you're essentially "voting" for the importance of the destination note. But not all votes are equal – a vote from a highly important note (one that receives many votes itself) counts more than a vote from a less important note.
For the mathematically curious, I've included links to the formal PageRank equations (thanks again, Wikipedia!) at the end of this post. But the implementation is surprisingly straightforward:
for (const [note, links] of Object.entries(graph)) {
const outgoingLinks = links.size;
if (outgoingLinks > 0) {
// Distribute importance to linked notes
const contributionPerLink = prevRank[note] / outgoingLinks;
for (const linkedNote of links.keys()) {
rank[linkedNote] += dampingFactor * contributionPerLink;
}
} else {
// For notes with no outgoing links, distribute evenly
const contribution = prevRank[note] / totalNotes;
for (const otherNote of allNotes) {
rank[otherNote] += dampingFactor * contribution;
}
}
}
The algorithm repeatedly:
- Takes each note's current importance score
- Distributes that importance to the notes it links to
- Repeats until the scores stabilize
My interpretation of the dampingFactor in knowledge management: it represents the balance between structured browsing (following explicit links) and random exploration (jumping to unrelated notes). This analogy isn't perfect, but in practical terms, it ensures that even isolated notes, the ones rotting away in the outskirts of your graph, get some minimal importance. Ideally, this prevents "importance sink holes" in your graph.
Hub Analysis reveals implicit patterns in your knowledge that may not be obvious through conscious reflection. The algorithm doesn't care about your intentions, which notes you think are important , it may not even care about your feelings– it objectively measures the structural importance based solely on the connection patterns you've created over time.
Some other metrics. Do we have a metric count metric?
PageRank is the star of this update, but I implemented several complementary metrics that highlight different aspects of note importance:
Degree Centrality: The notes everyone wants to be friends with
Degree Centrality is the most intuitive measure – it simply counts connections:
- In-Degree: How many notes link TO this note (incoming links)
- Out-Degree: How many notes this note links TO (outgoing links)
- Total Degree: The sum of incoming and outgoing links
A high in-degree suggests that a note serves as a main reference point in your vault. A high out-degree is more like a "directory": it's a jumping-off point.
Eigenvector Centrality: The quality of your friends (?) (I have no idea how to continue this metaphor)
While PageRank considers the entire global structure, Eigenvector Centrality focuses more on your note's local neighborhood quality. It measures how connected a note is to other well-connected notes.
In knowledge management terms, this identifies notes that might not have many connections overall but are linked to other highly central notes. These are often the "crucial supporting concepts" that form the immediate conceptual neighborhood around your major hub notes.
The implementation uses an iterative approach that progressively identifies notes that are connected to other important notes:
// Update based on neighbors
for (const [note, links] of Object.entries(graph)) {
for (const linkedNote of links.keys()) {
nextCentrality[linkedNote] += centrality[note];
}
}
A note with high eigenvector centrality might only have a few connections, but those connections are to other highly connected notes. These are often the "bridge concepts" that tie together major themes in your knowledge base.
Bridging Coefficient: The middlemen
The Bridging Coefficient identifies notes that connect otherwise separate parts of your vault:
// Calculate clustering coefficient (how connected the neighbors are)
const clustering = possibleConnections > 0 ?
neighborConnections / possibleConnections : 0;
// Bridging coefficient is inversely related to clustering
bridgingCoef[note] = neighbors.length * (1 - clustering);
This metric finds notes whose neighbors aren't well-connected to each other. In knowledge management, these are the powerful cross-domain concepts that bridge different areas of your thinking.
You're going to kill my PC!
Wait! Your PC should be fine (though just in case, maybe I should add a disclaimer "installing this plugin has a nonzero chance of causing your system to ignite"—it's never fully zero, is it?).
-
The graph structure is cached and updated only when files change. Same as Path Analysis. It actually uses the same internal graph.
-
For metrics like Bridging Coefficient that involve checking connections between many neighbors, we use statistical sampling, again like in Path Analysis:
const MAX_NEIGHBORS_TO_CHECK = 15;
const neighborsToCheck = neighbors.length > MAX_NEIGHBORS_TO_CHECK ?
neighbors.slice(0, MAX_NEIGHBORS_TO_CHECK) : neighbors;
// adjust result based on sampling ratio
if (neighbors.length > MAX_NEIGHBORS_TO_CHECK) {
const samplingRatio = neighborsToCheck.length / neighbors.length;
connections = Math.round(connections / (samplingRatio * samplingRatio));
}
So what's the difference between Hub and Path Analysis?
The original Path Analysis and new Hub Analysis serve complementary purposes in your knowledge workflow:
-
Path Analysis answers: "How are these two specific notes related?"
- Explore connections between specific ideas
- Understand how concepts relate to each other
- Discover unexpected paths between seemingly unrelated notes (very fun)
-
Hub Detection answers: "What are the most important notes in my entire vault?"
- Uncover the structural backbone of your knowledge
- Identify forgotten but important concept notes
- Find cross-domain bridging concepts that connect different areas
My hope is that combined, they can provide both microscopic and macroscopic views of your knowledge network.
Your new favorite Hub?
If you're interested in trying the Graph Metrics plugin, you can install it via BRAT in Obsidian. I welcome feedback and feature suggestions on the GitHub repository. Otherwise, send me a picture of your dog.
Math Appendix
For those interested in the mathematical details:
- PageRank Mathematical Formulation
- Eigenvector Centrality
- Clustering Coefficient
- Louvain Method (could be interesting for community detection!)