# Tree attributes¶

 attribute_area(tree[, vertex_area, leaf_graph]) Area of each node the given tree. Given a node $$n$$ whose parent is $$p$$, the attribute value of $$n$$ is the rank of $$n$$ in the list of children of $$p$$. attribute_contour_length(tree[, …]) Length of the contour (perimeter) of each node of the given tree. attribute_contour_strength(tree, edge_weights) Strength of the contour of each node of the given tree. attribute_compactness(tree[, area, …]) The compactness of a node is defined as its area divided by the square of its perimeter length. The depth of a node $$n$$ of the tree $$T$$ is equal to the number of ancestors of $$n$$ in $$T$$. attribute_dynamics(tree, altitudes[, …]) Given a node $$n$$ of the tree $$T$$, the dynamics of $$n$$ is the difference between the altitude of the deepest minima of the subtree rooted in $$n$$ and the altitude of the closest ancestor of $$n$$ that has a deeper minima in its subtree. attribute_extinction_value(tree, altitudes, …) The extinction value of a node $$n$$ of the input tree $$T$$ with increasing altitudes $$alt$$ for the increasing attribute $$att$$ is the equal to the threshold $$k$$ such that the node $$n$$ is still in an minima of $$t$$ when all nodes having an attribute value smaller than $$k$$ are removed. attribute_extrema(tree, altitudes) Identify nodes in a hierarchy that represent extrema (minima or maxima). attribute_frontier_length(tree[, …]) Length of the frontier represented by each node the given partition tree. Mean edge weight along the frontier represented by each node the given partition tree. Estimates a gaussian model (mean, (co-)variance) for leaf weights inside a node. attribute_height(tree, altitudes[, …]) In a tree $$T$$, given that the altitudes of the nodes vary monotically from the leaves to the root, the height of a node $$n$$ of $$T$$ is equal to the difference between the altitude of the parent of $$n$$ and the altitude of the deepest non-leaf node in the subtree of $$T$$ rooted in $$n$$. attribute_lca_map(tree, leaf_graph) Lowest common ancestor of i and j for each edge $$(i, j)$$ of the leaf graph of the given tree. attribute_mean_vertex_weights(tree, …[, …]) Mean vertex weights of the leaf graph vertices inside each node of the given tree. attribute_moment_of_inertia(tree, leaf_graph) Moment of inertia (first Hu moment) of each node of the given tree. Given a tree $$T$$ with node weights $$w$$: the children pair sum product for a node $$n$$ sums for every pairs $$(c_i, c_j)$$ of children of $$n$$, the product of the node weights of $$c_i$$ and $$c_j$$. Piecewise constant Mumford-Shah energy of each node of the input tree. attribute_regular_altitudes(tree[, depth]) Regular altitudes is comprised between 0 and 1 and is inversely proportional to the depth of a node attribute_sibling(tree[, skip]) Sibling index of each node of the given tree. Given a node $$n$$ of tree, the topological height of $$n$$ is the number of edges on the longest path from the node $$n$$ to a leaf of tree. Given a tree $$T$$, estimate the probability that a node $$n$$ of the tree represents the smallest cluster containing a pair of vertices $$\{a, b\}$$ of the graph $$G=(V, E)$$ with edge weights $$w$$. attribute_volume(tree, altitudes[, area]) Volume of each node the given tree.
attribute_area(tree, vertex_area=None, leaf_graph=None)[source]

Area of each node the given tree. The area of a node is equal to the sum of the area of the leaves of the subtree rooted in the node.

Parameters
Returns

a 1d array

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_child_number(tree)[source]

Given a node $$n$$ whose parent is $$p$$, the attribute value of $$n$$ is the rank of $$n$$ in the list of children of $$p$$. In other $$attribute(n)=i$$ means that $$n$$ is the $$i$$-th child of $$p$$.

The root of the tree, who has no parent, take the value -1.

Parameters

tree – Input tree

Returns

a 1d array

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_contour_length(tree, vertex_perimeter=None, edge_length=None, leaf_graph=None)[source]

Length of the contour (perimeter) of each node of the given tree.

Parameters
Returns

a 1d array

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_contour_strength(tree, edge_weights, vertex_perimeter=None, edge_length=None, leaf_graph=None)[source]

Strength of the contour of each node of the given tree. The strength of the contour of a node is defined as the mean edge weights on the contour.

Parameters
Returns

a 1d array

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_compactness(tree, area=None, contour_length=None, normalize=True, leaf_graph=None)[source]

The compactness of a node is defined as its area divided by the square of its perimeter length.

Parameters
• tree – input tree (Concept CptHierarchy)

• area – node area of the input tree (provided by attribute_area() on tree)

• contour_length – node contour length of the input tree (provided by attribute_perimeter_length() on tree)

• normalize – if True the result is divided by the maximal compactness value in the tree

• leaf_graph – (deduced from CptHierarchy)

Returns

a 1d array

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_depth(tree)[source]

The depth of a node $$n$$ of the tree $$T$$ is equal to the number of ancestors of $$n$$ in $$T$$.

The depth of the root node is equal to 0.

Parameters

tree – Input tree

Returns

a nd array

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_dynamics(tree, altitudes, increasing_altitudes='auto')[source]

Given a node $$n$$ of the tree $$T$$, the dynamics of $$n$$ is the difference between the altitude of the deepest minima of the subtree rooted in $$n$$ and the altitude of the closest ancestor of $$n$$ that has a deeper minima in its subtree. If no such ancestor exists then, the dynamics of $$n$$ is equal to the difference between the altitude of the highest node of the tree (the root) and the depth of the deepest minima.

The dynamics is the extinction values (attribute_extinction_value()) for the attribute height (attribute_height()).

Possible values of increasing_altitude are:

• 'auto': the function will automatically determine if altitudes are increasing or decreasing (this has small computational cost but does not impact the runtime complexity).

• True or 'increasing': this means that altitudes are increasing from the leaves to the root (ie. for any node $$n$$, $$altitudes(n) \leq altitudes(parent(n))$$.

• False or 'decreasing': this means that altitudes are decreasing from the leaves to the root (ie. for any node $$n$$, $$altitude(n) \geq altitude(parent(n))$$.

Parameters
• tree – Input tree

• altitudes – Tree node altitudes

• increasing_altitudes – possible values ‘auto’, True, False, ‘increasing’, and ‘decreasing’

Returns

a 1d array like altitudes

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_extinction_value(tree, altitudes, attribute, increasing_altitudes='auto')[source]

The extinction value of a node $$n$$ of the input tree $$T$$ with increasing altitudes $$alt$$ for the increasing attribute $$att$$ is the equal to the threshold $$k$$ such that the node $$n$$ is still in an minima of $$t$$ when all nodes having an attribute value smaller than $$k$$ are removed.

Formally, let $$\{M_i\}$$ be the set of minima of the hierarchy $$T$$ with altitudes $$alt$$. Let $$prec$$ be a total ordering of $$\{M_i\}$$ such that $$M_i \prec M_j \Rightarrow alt(M_i) \leq alt(M_j)$$. Let $$r(M_i)$$ be the smallest node of $$t$$ containing $$M_i$$ and another minima $$M_j$$ such that $$M_j \prec M_i$$. The extinction value of $$M_i$$ is then defined as $$alt(r(M_i)) - alt(M_i)$$.

Extinction values of minima are then extended to other nodes in the tree with the following rules:

• the extinction value of a non-leaf node $$n$$ which is not a minimum is defined as the largest extinction values among all the minima contained in $$n$$ (and 0 if $$n$$ does not contain any minima); and

• the extinction value of a leaf node $$n$$ belonging to a minima $$M_i$$ is equal to the extinction value of $$M_i$$. I $$n$$ does not belong to any minima its extinction value is 0.

The function can also handle decreasing altitudes, in which case minima should be replaced by maxima in the description above. Possible values of increasing_altitude are:

• 'auto': the function will automatically determine if altitudes are increasing or decreasing (this has small computational cost but does not impact the runtime complexity).

• True or 'increasing': this means that altitudes are increasing from the leaves to the root (ie. for any node $$n$$, $$altitudes(n) \leq altitudes(parent(n))$$.

• False or 'decreasing': this means that altitudes are decreasing from the leaves to the root (ie. for any node $$n$$, $$altitude(n) \geq altitude(parent(n))$$.

Parameters
• tree – Input tree

• altitudes – Tree node altitudes

• attribute – Tree node attribute

• increasing_altitudes – possible values ‘auto’, True, False, ‘increasing’, and ‘decreasing’

Returns

a 1d array like attribute

attribute_extrema(tree, altitudes)[source]

Identify nodes in a hierarchy that represent extrema (minima or maxima).

An extremum (minimum or maximum) of the hierarchy $$T$$ with altitudes $$alt$$ is a node $$n$$ of $$T$$ such that the altitude of any non leaf node included in $$n$$ is equal to the altitude of $$n$$ and the altitude of the parent of $$n$$ is different from the altitude of $$n$$.

The result is a boolean array such that $$result(n)$$ is True if the node $$n$$ is an extremum and False otherwise.

Parameters
• tree – Input tree

• altitudes – Tree node altitudes

Returns

a 1d boolean array

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_frontier_length(tree, edge_length=None, leaf_graph=None)[source]

Length of the frontier represented by each node the given partition tree.

In a partition tree, each node represent the merging of 2 or more regions. The frontier of a node is then defined as the common contour between the merged regions. This function compute the length of these common contours as the sum of the length of edges going from one of the merged region to the other one.

The result has the same dtype as the edge_length array.

Parameters
Returns

a 1d array

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_frontier_strength(tree, edge_weights, leaf_graph)[source]

Mean edge weight along the frontier represented by each node the given partition tree.

In a partition tree, each node represent the merging of 2 or more regions. The frontier of a node is then defined as the common contour between the merged regions. This function compute the strength of a common contour as the sum of the weights of edges going from one of the merged region to the other one divided by the length of the contour.

The result has the same dtype as the edge_weights array.

Parameters
• tree – input tree

• edge_weights – weight of the edges of the leaf graph (if leaf_graph is a region adjacency graph, edge_weights might be weights on the edges of the pre-graph of the rag).

• leaf_graph – graph on the leaves of the input tree (deduced from CptHierarchy)

Returns

a 1d array

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_gaussian_region_weights_model(tree, vertex_weights, leaf_graph=None)[source]

Estimates a gaussian model (mean, (co-)variance) for leaf weights inside a node.

The result is composed of two arrays:

• the first one contains the mean value inside each node, scalar if vertex weights are scalar and vectorial otherwise,

• the second one contains the variance of the values inside each node, scalar if vertex weights are scalar and a (biased) covariance matrix otherwise.

Vertex weights must be scalar or 1 dimensional.

Parameters
Returns

two arrays mean and variance

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_height(tree, altitudes, increasing_altitudes='auto')[source]

In a tree $$T$$, given that the altitudes of the nodes vary monotically from the leaves to the root, the height of a node $$n$$ of $$T$$ is equal to the difference between the altitude of the parent of $$n$$ and the altitude of the deepest non-leaf node in the subtree of $$T$$ rooted in $$n$$.

Possible values of increasing_altitude are:

• 'auto': the function will automatically determine if altitudes are increasing or decreasing (this has small computational cost but does not impact the runtime complexity).

• True or 'increasing': this means that altitudes are increasing from the leaves to the root (ie. for any node $$n$$, $$altitudes(n) \leq altitudes(parent(n))$$.

• False or 'decreasing': this means that altitudes are decreasing from the leaves to the root (ie. for any node $$n$$, $$altitude(n) \geq altitude(parent(n))$$.

Parameters
• tree – Input tree

• altitudes – Tree node altitudes

• increasing_altitudes – possible values ‘auto’, True, False, ‘increasing’, and ‘decreasing’

Returns

a 1d array like altitudes

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_lca_map(tree, leaf_graph)[source]

Lowest common ancestor of i and j for each edge $$(i, j)$$ of the leaf graph of the given tree.

Complexity: $$\mathcal{O}(n\log(n)) + \mathcal{O}(m)$$ where $$n$$ is the number of nodes in tree and $$m$$ is the number of edges in leaf_graph.

Parameters
Returns

a 1d array

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_mean_vertex_weights(tree, vertex_weights, area=None, leaf_graph=None)[source]

Mean vertex weights of the leaf graph vertices inside each node of the given tree.

For any node $$n$$, the mean vertex weights $$a(n)$$ of $$n$$ is

$a(n) = \frac{\sum_{x\in n} vertex\_weights(x)}{area(n)}$
Parameters
Returns

a nd array

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_moment_of_inertia(tree, leaf_graph)[source]

Moment of inertia (first Hu moment) of each node of the given tree. This function works only if leaf_graph is a 2D grid graph. The moment of inertia is a translation, scale and rotation invariant characterization of the shape of the nodes.

Given a node $$X$$ of tree, the raw moments $$M_{ij}$$ are defined as:

$M_{ij} = \sum_{x}\sum_{y} x^i y^j$

where $$(x,y)$$ are the coordinates of every vertex in $$X$$. Then, the centroid $$\{\overline{x},\overline{y}\}$$ of $$X$$ is given by

$\overline{x} = \frac{M_{10}}{M_{00}} \textrm{ and } \overline{y} = \frac{M_{01}}{M_{00}}$

Some central moments of $$X$$ are then:

• $$\mu_{00} = M_{00}$$

• $$\mu_{20} = M_{20} - \overline{x} \times M_{10}$$

• $$\mu_{02} = M_{02} - \overline{y} \times M_{01}$$

The moment of inertia $$I_1$$ of $$X$$ if finally defined as

$I_1 = \eta_{20} + \eta_{02}$

where $$\eta_{ij}$$ are given by:

$$\eta_{ij} = \frac{\mu_{ij}}{\mu_{00}^{1+\frac{i+j}{2}}}$$

Parameters
Returns

a 1d array

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_children_pair_sum_product(tree, node_weights)[source]

Given a tree $$T$$ with node weights $$w$$: the children pair sum product for a node $$n$$ sums for every pairs $$(c_i, c_j)$$ of children of $$n$$, the product of the node weights of $$c_i$$ and $$c_j$$. Formally:

$res(n) = \sum_{i=0}^{i<numc(n)} \sum_{j=0}^{j<i} w(child(i, n)) * w(child(j, n))$

where $$numc(n)$$ is the number of children of $$n$$ and $$child(i, n)$$ is the $$i$$-th child of the node $$n$$.

The result is thus an array with the same shape as node_weights

Parameters
• tree – Input tree

• node_weights – node weights of the input tree

Returns

an array with the same shape as node_weights

attribute_piecewise_constant_Mumford_Shah_energy(tree, vertex_weights, gamma, leaf_graph)[source]

Piecewise constant Mumford-Shah energy of each node of the input tree. The energy of a node is equal to its data fidelity energy plus gamma times its regularization energy.

For the piecewise constant Mumford-Shah model:

Parameters
• tree – input tree (Concept CptHierarchy)

• vertex_weights – vertex weights of the leaf graph of the input tree

• gamma – weighting of the regularization term (should be a positive value)

• leaf_graph – leaf graph of the input tree (deduced from CptHierarchy)

Returns

a 1d array measuring the energy of each node the input tree

attribute_regular_altitudes(tree, depth=None)[source]

Regular altitudes is comprised between 0 and 1 and is inversely proportional to the depth of a node

Parameters
Returns

a nd array

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_sibling(tree, skip=1)[source]

Sibling index of each node of the given tree.

For each node $$n$$ which is the $$k$$-th child of its parent node $$p$$ among $$N$$ children, the attribute sibling of $$n$$ is the index of the $$(k + skip) % N$$-th child of $$p$$.

The sibling of the root node is itself.

The sibling attribute enables to easily emulates a (doubly) linked list among brothers.

In a binary tree, the sibling attribute of a node is effectively its only brother (with skip equals to 1).

Parameters
• tree – Input tree

• skip – Number of skipped element in the children list (including yourself)

Returns

a nd array

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_topological_height(tree)[source]

Given a node $$n$$ of tree, the topological height of $$n$$ is the number of edges on the longest path from the node $$n$$ to a leaf of tree.

The topological height of the leaves is equal to 0.

Parameters

tree – Input tree

Returns

a 1d array

Auto-cache: This function is decorated with the auto_cache() decorator.

attribute_tree_sampling_probability(tree, leaf_graph, leaf_graph_edge_weights, model='edge')[source]

Given a tree $$T$$, estimate the probability that a node $$n$$ of the tree represents the smallest cluster containing a pair of vertices $$\{a, b\}$$ of the graph $$G=(V, E)$$ with edge weights $$w$$.

This method is defined in 1.

We define the probability $$P(\{a,b\})$$ of a pair of vertices $$\{a,b\}$$ as $$w(\{a,b\}) / Z$$ with $$Z=\sum_{e\in E}w(E)$$ if $$\{a,b\}$$ is an edge of $$G$$ and 0 otherwise. Then the probability $$P(a)$$ of a vertex $$b$$ is defined as $$\sum_{b\in V}P(\{a, b\})$$

Two sampling strategies are proposed for sampling pairs of vertices to compute the probability of a node of the tree:

• edge: the probability of sampling the pair $$\{a, b\}$$ is given by $$P(\{a, b\})$$; and

• null: the probability of sampling the pair $$\{a, b\}$$ is given by the product of the probabilities of $$a$$ and $$b$$: $$P(a)*P(b)$$.

Assuming that the edge weights on the leaf graph of a hierarchy represents similarities:

We expect these distributions to differ significantly if the tree indeed represents the hierarchical structure of the graph. Specifically, we expect [the edge distribution] to be mostly concentrated on deep nodes of the tree (far from the root), as two nodes $$u$$, $$v$$ connected with high weight $$w(\{u, v\})$$ in the graph typically belong to a small cluster, representative of the clustering structure of the graph; on the contrary, we expect [the null distribution] to be concentrated over shallow nodes (close to the root) as two nodes $$w(\{u, v\})$$ sampled independently at random typically belong to large clusters, less representative of the clustering structure of the graph. 1

1(1,2)

Charpentier, B. & Bonald, T. (2019). “Tree Sampling Divergence: An Information-Theoretic Metric for Hierarchical Graph Clustering.” Proceedings of IJCAI.

Complexity

The tree sampling divergence runtime complexity depends of the sampling model:

• edge: $$\mathcal{O}(N\log(N) + M)$$ with $$N$$ the number of nodes in the tree and $$M$$ the number of edges in the leaf graph.

• null: $$\mathcal{O}(N\times C^2)$$ with $$N$$ the number of nodes in the tree and $$C$$ the maximal number of children of a node in the tree.

See

The tree_sampling_divergence() is a non supervised hierarchical cost function defined as the Kullback-Leibler divergence between the edge sampling model and the independent (null) sampling model.

Parameters
• tree – Input tree

• leaf_graph – Graph defined on the leaves of the input tree

• leaf_graph_edge_weights – Edge weights of the leaf graphs (similarities)

• model – defines the edge sampling strategy, either “edge” or “null”

Returns

a 1d array

attribute_volume(tree, altitudes, area=None)[source]

Volume of each node the given tree. The volume $$V(n)$$ of a node $$n$$ is defined recursively as:

$V(n) = area(n) * | altitude(n) - altitude(parent(n)) | + \sum_{c \in children(n)} V(c)$
Parameters
• tree – input tree

• altitudes – node altitudes of the input tree

• area – area of the nodes of the input hierarchy (provided by attribute_area() on tree)

Returns

a 1d array

Auto-cache: This function is decorated with the auto_cache() decorator.