Binary partition hierarchy
|
Binary partition tree of the graph with a user provided cluster distance. |
|
Alias for |
Binary partition tree with complete linkage distance. |
|
|
Binary partition tree with average linkage distance. |
Binary partition tree with exponential linkage distance. |
|
|
Binary partition tree with the Ward linkage rule. |
Binary partition tree according to the Mumford-Shah energy with a constant piecewise model. |
- binary_partition_tree_single_linkage(graph, edge_weights)[source]
Alias for
bpt_canonical()
.Given a graph \(G=(V, E)\), with initial edge weights \(w\), the distance \(d(X,Y)\) between any two clusters \(X\) and \(Y\) is
\[d(X,Y) = \min \{w(\{x,y\}) | x \in X, y \in Y, \{x,y\} \in E \}\]Regions are then iteratively merged following the above distance (closest first) until a single region remains.
- Parameters:
edge_weights – edge weights of the input graph (Concept
CptEdgeWeightedGraph
)graph – input graph (deduced from
CptEdgeWeightedGraph
)
- Returns:
a tree (Concept
CptHierarchy
) and its node altitudes (ConceptCptValuedHierarchy
)
- binary_partition_tree_complete_linkage(graph, edge_weights)[source]
Binary partition tree with complete linkage distance.
Given a graph \(G=(V, E)\), with initial edge weights \(w\), the distance \(d(X,Y)\) between any two clusters \(X\) and \(Y\) is
\[d(X,Y) = \max \{w(\{x,y\}) | x \in X, y \in Y, \{x,y\} \in E \}\]Regions are then iteratively merged following the above distance (closest first) until a single region remains
- Parameters:
graph – input graph
edge_weights – edge weights of the input graph
- Returns:
a tree (Concept
CptHierarchy
) and its node altitudes
- binary_partition_tree_average_linkage(graph, edge_weights, edge_weight_weights=None)[source]
Binary partition tree with average linkage distance.
Given a graph \(G=(V, E)\), with initial edge weights \(w\) with associated weights \(w_2\), the distance \(d(X,Y)\) between any two clusters \(X\) and \(Y\) is
\[d(X,Y) = \frac{1}{Z} \sum_{x \in X, y \in Y, \{x,y\} \in E} w(\{x,y\}) \times w_2(\{x,y\})\]with \(Z = \sum_{x \in X, y \in Y, \{x,y\} \in E} w_2({x,y})\).
- Parameters:
graph – input graph
edge_weights – edge weights of the input graph
edge_weight_weights – weighting of edge weights of the input graph (default to an array of ones)
- Returns:
a tree (Concept
CptHierarchy
) and its node altitudes
- binary_partition_tree_exponential_linkage(graph, edge_weights, alpha, edge_weight_weights=None)[source]
Binary partition tree with exponential linkage distance.
Given a graph \(G=(V, E)\), with initial edge weights \(w\) with associated weights \(w_2\), the distance \(d(X,Y)\) between any two clusters \(X\) and \(Y\) is
\[d(X,Y) = \frac{1}{Z} \sum_{x \in X, y \in Y, \{x,y\} in E} w_2(\{x,y\}) \times \exp(\alpha * w(\{x,y\})) \times w(\{x,y\})\]with \(Z = \sum_{x \in X, y \in Y, \{x,y\} \in E} w_2(\{x,y\}) \times \exp(\alpha * w(\{x,y\}))\).
- See:
Nishant Yadav, Ari Kobren, Nicholas Monath, Andrew Mccallum. Supervised Hierarchical Clustering with Exponential Linkage Proceedings of the 36th International Conference on Machine Learning, PMLR 97:6973-6983, 2019.
- Parameters:
graph – input graph
edge_weights – edge weights of the input graph
alpha – exponential parameter
edge_weight_weights – weighting of edge weights of the input graph (default to an array of ones)
- Returns:
a tree (Concept
CptHierarchy
) and its node altitudes
- binary_partition_tree_ward_linkage(graph, vertex_centroids, vertex_sizes=None, altitude_correction='max')[source]
Binary partition tree with the Ward linkage rule.
Given a graph \(G=(V, E)\), with initial edge weights \(w\) with associated weights \(w'\), the distance \(d(X,Y)\) between any two clusters \(X\) and \(Y\) is
\[d(X,Y) = \frac{| X |\times| Y |}{| X |+| Y |} \| \vec{X} - \vec{Y} \|^2\]where \(\vec{X}\) and \(\vec{Y}\) are the centroids of \(X\) and \(Y\).
Regions are then iteratively merged following the above distance (closest first) until a single region remains
Note that the Ward distance is not necessarily strictly increasing when processing a non complete graph. This can be corrected afterward with an altitude correction strategy. Valid values for
altitude correction
are:"none"
: nothing is done and the altitude of a node is equal to the Ward distance between its 2 children; this may not be non-decreasing"max"
: the altitude of a node \(n\) is defined as the maximum of the the Ward distance associated to each node in the subtree rooted in \(n\).
- Parameters:
graph – input graph
vertex_centroids – Centroids of the graph vertices (must be a 2d array)
vertex_sizes – Size (number of elements) of the graph vertices (default to an array of ones)
altitude_correction – can be
"none"
or"max"
(default)
- Returns:
a tree (Concept
CptHierarchy
) and its node altitudes
- binary_partition_tree_MumfordShah_energy(graph, vertex_values, vertex_area=None, vertex_perimeter=None, edge_length=None, squared_vertex_values=None)[source]
Binary partition tree according to the Mumford-Shah energy with a constant piecewise model.
The distance between two regions is equal to the apparition scale of the merged region.
See:
Laurent Guigues, Jean Pierre Cocquerez, Hervé Le Men. Scale-sets Image Analysis. International Journal of Computer Vision, Springer Verlag, 2006, 68 (3), pp.289-317
- Parameters:
graph – input graph
vertex_values – Sum of values inside each vertex of the input graph (1d array for scalar value or 2d array for vectorial values, e.g. RGB pixel values)
vertex_area – area of the vertices of the input graph (provided by
attribute_vertex_area()
on graph)vertex_perimeter – perimeter of the vertices of the input graph (provided by
attribute_vertex_perimeter()
on graph)edge_length – length of the frontier represented by the edges of the input graph (provided by
attribute_edge_length()
on graph)squared_vertex_values – Sum of squared values inside each vertex of the input graph (1d array for scalar value or 2d array for vectorial values, e.g. RGB pixel values). If this argument is not provided, it will default to vertex_values * vertex_values which is only correct if a vertex contains a single value.
- Returns:
a tree (Concept
CptHierarchy
) and its node altitudes
- binary_partition_tree(graph, weight_function, edge_weights)[source]
Binary partition tree of the graph with a user provided cluster distance.
At each step:
the algorithm finds the edge of smallest weight;
the two vertices linked by this edge are merged: the new vertex is the parent of the two merged vertices and its altitude is equal to the weight of the fusion edge;
the weight of the edges linking the new vertex to the remaining vertices of the graph are updated according to the user provided function (
weight_function
);repeat until a single vertex remains
The initial weight of the edges (
edge_weights
) and the callback (weight_function
) determine the shape of the hierarchy. Note that the altitudes of the constructed hierarchy are not necessarily increasing; if needed this can be enforced as a post processing with the functiontree_monotonic_regression()
.Classical linkage functions are already implemented:
single/min linkage
binary_partition_tree_single_linkage()
complete/max linkage
binary_partition_tree_complete_linkage()
average linkage
binary_partition_tree_average_linkage()
Ward linkage
binary_partition_tree_ward_linkage()
.
Those functions are implemented in c++ and should be faster than a user defined weighting function written in Python.
- Weight function:
The
weight_function
callback can be anything defining the operator()
and should follow the following pattern:def weight_function(graph, # the current state of the graph fusion_edge_index, # the edge between the two vertices being merged new_region, # the new vertex in the graph merged_region1, # the first vertex merged merged_region2, # the second vertex merged new_neighbours): # list of edges to be weighted (see below) ... for n in new_neighbours: ... n.set_new_edge_weight(new_edge_value) # define the weight of this edge }
Each element in the parameter
new_neighbours
represent an edge between the new vertex and another vertex of the graph. For each element of the list, the following methods are available:neighbour_vertex()
: the other vertexnum_edges()
: returns 2 if both the two merged vertices had an edge linking themselves withneighbour_vertex()
and 1 otherwisefirst_edge_index()
: the index of the edge linking one of the merged region toneighbour_vertex()
second_edge_index()
: the index of the edge linking the other merged region toneighbour_vertex()
(only ifnum_edges()==2
)set_new_edge_weight(value)
: weight of the new edge. This has to be defined in the weighting function.new_edge_index()
: the index of the new edge as the weighting function will probably have to track new weight values
- Example:
The following example shows how to define a weighting function for average linkage assuming that:
edge_weights
is an array containing the weight of each edge, andedge_counts
is an array containing the number of edges present between two clusters: initially 1 for each edge but this will increase when clusters are merged.
When a merge happens, there are two possibilities. Consider that we have three clusters \(A\), \(B\), and \(C\) and we are merging \(A\) and \(B\) to obtain the new cluster \(D\).
If there is an edge of weight \(w_{AC}\) and multiplicity \(c_{AC}\) between \(A\) and \(C\) but there is no edge between \(B\) and \(C\). Then, the new graph will contain an edge between \(D\) and \(C\) such that \(w_{DC}=w_{AC}\) and \(c_{DC}=c_{AC}\). (The situation is similar if the edge were between \(B\) and \(C\)).
If there is an edge between \(A\) and \(C\) and between \(B\) and \(C\). Then, the new graph will contain an edge between \(D\) and \(C\) such that \(w_{DC} = \frac{w_{AC}*c_{AC} + w_{BC}*c_{BC}}{c_{AC} + c_{BC}}\) and \(c_{DC}=c_{AC} + c_{BC}\).
def weighting_function_average_linkage(graph, fusion_edge_index, new_region, merged_region1, merged_region2, new_neighbours): for n in new_neighbours: if n.num_edges() > 1: new_count = edge_counts[n.first_edge_index()] + edge_counts[n.second_edge_index()] new_weight = (edge_weights[n.first_edge_index()] * edge_counts[n.first_edge_index()] \ + edge_weights[n.second_edge_index()] * edge_counts[n.second_edge_index()]) \ / new_weight else: new_count = edge_counts[n.first_edge_index()] new_weight = edge_weights[n.first_edge_index()] n.set_new_edge_weight(new_weight) edge_weights[n.new_edge_index()] = new_weight edge_counts[n.new_edge_index()] = new_count
- Complexity:
The worst case time complexity is in \(\mathcal{O}(n^2\log(n))\) with \(n\) the number of vertices in the graph. The runtime complexity on a sparse graph with well structured data (far different from noise) can be much better in practice.
Warning
This function uses a Python callback (the
weight_function
) that is called frequently by the algorithm: performances will be far from optimal. Please consider a C++ implementation if it is too slow (see this helper project ).- Parameters:
graph – input graph
weight_function – see detailed description above
edge_weights – edge weights of the input graph
- Returns:
a tree (Concept
CptHierarchy
) and its node altitudes