# Algorithm for trees¶

 Given two binary markers $$o$$ (object) and $$b$$ (background) (given by their indicator functions) on the leaves of a tree $$T$$, the corresponding binary labelization of the leaves of $$T$$ is defined as the union of all the nodes intersecting $$o$$ but not $$b$$: Filter the given tree according to a functor telling if nodes are relevant or not. filter_small_nodes_from_tree(tree, …[, …]) Filter the given tree according to node size: Filter the given tree according to the frontier strength. Labelize the tree leaves into supervertices. reconstruct_leaf_data(tree, altitudes[, …]) Each leaf of the tree takes the altitude of its closest non deleted ancestor. sort_hierarchy_with_altitudes(tree, altitudes) Sort the nodes of a tree according to their altitudes. test_altitudes_increasingness(tree, altitudes) Test if the altitudes of the given tree are increasing; i.e. test_tree_isomorphism(tree1, tree2) Test if 2 trees $$t_1$$ and $$t_2$$ are isomorph assuming that they share the same leaves. tree_fusion_depth_map(*trees) Depth map associated to the fusion of the given list of trees. tree_monotonic_regression(tree, altitudes, mode) Monotonic regression on the given tree altitudes.
binary_labelisation_from_markers(tree, object_marker, background_marker, leaf_graph=None)[source]

Given two binary markers $$o$$ (object) and $$b$$ (background) (given by their indicator functions) on the leaves of a tree $$T$$, the corresponding binary labelization of the leaves of $$T$$ is defined as the union of all the nodes intersecting $$o$$ but not $$b$$:

$res = \bigcup \{R \in T \mid R \cap o \neq \emptyset, \textrm{ and } R \cap b = \emptyset\}$
Parameters:
• tree – input tree (Concept CptHierarchy)

• object_marker – indicator function of the object marker: 1d array of size tree.num_leaves() where non zero values correspond to the object marker

• background_marker – indicator function of the background marker: 1d array of size tree.num_leaves() where non zero values correspond to the background marker

• leaf_graph – graph on the leaves of the input tree (optional, deduced from CptHierarchy)

Returns:

Leaf labels

filter_non_relevant_node_from_tree(tree, altitudes, non_relevant_functor, leaf_graph, canonize_tree=True)[source]

Filter the given tree according to a functor telling if nodes are relevant or not.

In a binary a tree, each inner node (non leaf node) is associated to the frontier separating its two children. If the frontier associated to a node is considered as non-relevant (for example because on of the two children of the node is too small) then the corresponding frontier is removed effectively merging its two children.

This function returns a binary partition tree such that:

• the frontiers associated to nodes marked non-relevant do not exist anymore;

• the regions of the new tree are either regions of the initial tree or regions obtained by merging adjacent regions of the initial tree.

If tree does not satisfy the concept CptBinaryHierarchy, the given tree is first transformed into a binary tree (arbitrary choices are made).

non_relevant_functor must be a function that accepts two arguments, a binary tree and its node altitudes, and must return a boolean node attribute for the given tree (ie a 1d array of boolean-ish values of size tree.num_vertices(). A value of True is interpreted as this node is not relevant and its associated frontier must be removed.

See:
Parameters:
• tree – input tree (Concept CptHierarchy)

• altitudes – node altitudes of the input tree

• non_relevant_functor – a function that computes an attribute on a binary tree

• leaf_graph – graph of the tree leaves (deduced from CptHierarchy)

• canonize_tree – if True (default), the resulting hierarchy is canonized (see function canonize_hierarchy()), otherwise the returned hierarchy is a binary tree

Returns:

a tree (Concept CptHierarchy is True and CptBinaryHierarchy otherwise) and its node altitudes

filter_small_nodes_from_tree(tree, altitudes, size_threshold, leaf_graph, canonize_tree=True)[source]

Filter the given tree according to node size:

This function returns a binary partition tree such that:

• it does not contain any region whose size is below the given threshold;

• the regions of the new tree are either regions of the initial tree or regions obtained by merging adjacent regions of the initial tree.

See:

filter_non_relevant_node_from_tree()

Parameters:
Returns:

a tree (Concept CptHierarchy is True and CptBinaryHierarchy otherwise) and its node altitudes

filter_weak_frontier_nodes_from_tree(tree, altitudes, edge_weights, strength_threshold, leaf_graph, canonize_tree=True)[source]

Filter the given tree according to the frontier strength.

The strength of a frontier is defined as the mean weights of the edges crossing the frontier (see attribute_frontier_strength()).

This function returns a binary partition tree such that:

• it does not contain any contour whose strength is lower than the given threshold;

• the regions of the new tree are either regions of the initial tree or regions obtained by merging adjacent regions of the initial tree.

See:

filter_non_relevant_node_from_tree()

Parameters:
Returns:

a tree (Concept CptHierarchy is True and CptBinaryHierarchy otherwise) and its node altitudes

labelisation_hierarchy_supervertices(tree, altitudes, leaf_graph=None, handle_rag=True)[source]

Labelize the tree leaves into supervertices. The altitudes must be increasing, i.e. for any nodes $$i, j$$ such that $$j$$ is an ancestor of $$i$$, then $$altitudes[i] \leq altitudes[j]$$. Two leaves are in the same supervertex if they have a common ancestor at altitude 0.

If we consider that the pair $$(tree, altitudes)$$ represents a dendrogram, i.e. that it defines a pseudo-ultrametric on the set of leaves, a supervertex is a maximal cluster such that the distance between any pair of points in the cluster is equal to 0.

This functions guaranties that the labels are in the range $$[0, num\_supervertices-1]$$.

Parameters:
• tree – input tree (Concept CptHierarchy)

• altitudes – node altitudes of the input tree

• leaf_graph – graph of the tree leaves (optional, deduced from CptHierarchy)

• handle_rag – if True and the provided tree has been built on a region adjacency graph, then the labelisation corresponding to the rag regions is returned.

Returns:

Leaf labels

reconstruct_leaf_data(tree, altitudes, deleted_nodes=None, leaf_graph=None)[source]

Each leaf of the tree takes the altitude of its closest non deleted ancestor. The root node is never deleted. In a component tree, leaves are always deleted.

The result is an array of shape [tree.num_leaves()] + altitudes.shape[1:] such that $$result[i]$$ is the altitude of the smallest node $$n$$ containing $$i$$ such that $$deleted\_nodes[n]$$ is false and for all nodes $$j$$ in the branch from $$i$$ (included) to $$n$$ (excluded), $$deleted\_nodes[j]$$ is true.

If deleted_nodes is None then its default value is set to np.zeros((tree.numvertices(),) (no nodes are deleted).

Complexity:

This algorithms runs in linear time $$O(tree.num\_vertices())$$.

Example:
>>> tree = hg.Tree((5, 5, 6, 6, 6, 7, 7, 7))
>>> altitudes = np.asarray(((1, 8), (2, 7), (3, 6), (4, 5), (5, 4), (6, 3), (7, 2), (8, 1)), dtype=np.int32)
>>> condition = np.asarray((True, False, True, False, True, True, False, False), np.bool_)
>>> hg.reconstruct_leaf_data(tree, altitudes, condition)
array([[8, 1],
[2, 7],
[7, 2],
[4, 5],
[7, 2]])

Parameters:
• tree – input tree (Concept CptHierarchy)

• altitudes – node altitudes of the input tree

• deleted_nodes – binary node weights indicating which nodes are deleted (optional)

• leaf_graph – graph of the tree leaves (optional, deduced from CptHierarchy)

Returns:

Leaf weights: array of shape [tree.num_leaves()] + altitudes.shape[1:] and of dtype altitudes.dtype

sort_hierarchy_with_altitudes(tree, altitudes)[source]

Sort the nodes of a tree according to their altitudes. The altitudes must be increasing, i.e. for any nodes $$i, j$$ such that $$j$$ is an ancestor of $$i$$, then $$altitudes[i] \leq altitudes[j]$$.

The result is a new tree and a node map, isomorph to the input tree such that for any nodes $$i$$ and $$j$$, $$i<j \Rightarrow altitudes[node\_map[i]] \leq altitudes[node\_map[j]]$$

The latter condition is stronger than the original condition on the altitudes as $$j$$ is an ancestor of $$i$$ implies $$i<j$$ while the converse is not true.

The returned “node_map” is an array that maps any node index $$i$$ of the new tree, to the index of this node in the original tree.

Parameters:
• tree – input tree

• altitudes – node altitudes of the input tree

Returns:

the sorted tree (Concept CptHierarchy), its node altitudes, and the node map

test_altitudes_increasingness(tree, altitudes)[source]

Test if the altitudes of the given tree are increasing; i.e. if for any nodes $$i, j$$ such that $$j$$ is an ancestor of $$i$$, then $$altitudes[i] \leq altitudes[j]$$.

Parameters:
• tree – input tree

• altitudes – node altitudes of the input tree

Returns:

True if altitudes are increasing for tree, False otherwise.

test_tree_isomorphism(tree1: hg::tree_internal::tree, tree2: hg::tree_internal::tree) → bool

Test if 2 trees $$t_1$$ and $$t_2$$ are isomorph assuming that they share the same leaves.

By this definition $$t_1$$ is isomorph to $$t_2$$ if there exists a bijection $$f$$ from the nodes of $$t_1$$ to the nodes of $$t_2$$ such that:

1. for any leaf node $$n$$ of $$t_1$$, $$f(n) = n$$, and

2. for any node $$n$$ of $$t_1$$, $$f(t_1.parent(n)) = t_2.parent(f(n))$$

Note that the root node of $$r$$ a tree $$t$$ is defined by $$t.parent(r) = r$$, thus 2) becomes for the root node $$r_1$$ of $$t_1$$, $$f(r_1) = t_2.parent(f(r_1))$$, i.e. $$f(r_1)$$ is the root node of $$t_2$$

tree_fusion_depth_map(*trees)[source]

Depth map associated to the fusion of the given list of trees. This depth map can be used to compute a new tree representing the fusion of the input trees, eg. with component_tree_max_tree().

The method is described in:

E. Carlinet. A Tree of shapes for multivariate images. PhD Thesis, Université Paris-Est, 2015.

All trees must be defined over the same domain, i.e. have the same number of leaves. Given a set of trees $$(T_1, T_2, T_3,... T_n)$$ composed of the set of nodes $$(N_1, N_2, N_3, ... N_n)$$. We define the fusion graph as the graph induced the inclusion relation $$\subseteq$$ on the union of all the tree nodes $$\bigcup\{N_1, N_2, N_3, ... N_n\}$$.

The result is a directed acyclic graph with a single root (corresponding to the roots of the input trees). The depth of a node in this graph is defined as the length of the longest path from the root this node. This function returns the depth of the leaves of this graph (which are the same as the leaves of the input trees).

See:

A multivariate tree of shapes of a colour 2d can be computed with component_tree_multivariate_tree_of_shapes_image2d().

Complexity:

The worst case runtime complexity of this method is $$\mathcal{O}(N^2D^2)$$ with $$N$$ the number of leaves and $$D$$ the number of trees. If we denote by $$K$$ the depth of the deepest tree to merge, one can rewrite the runtime complexity as $$\mathcal{O}(NKD^2)$$.

Parameters:

trees – at least two trees defined over the same domain

Returns:

a depth map representing the fusion of the input trees

tree_monotonic_regression(tree, altitudes, mode, weights=None)[source]

Monotonic regression on the given tree altitudes. Computes new altitudes naltitudes that are close to the given altitudes and that are increasing for the given tree: i.e. for any nodes $$i, j$$ such that $$j$$ is an ancestor of $$i$$, then $$naltitudes[i] \leq naltitudes[j]$$.

The definition of close depends of the value of mode:

• If mode is equal to "min" then naltitudes is the largest increasing function below altitudes.

• If mode is equal to "max" then naltitudes is the smallest increasing function above altitudes.

• If mode is equal to "least_square" then naltitudes minizes the following minization problem:

$naltitudes = \arg \min_x \sum_i (weights[i] * (altitudes[i] - x[i])^2)$

such that naltitudes is increasing for tree.

Complexity:

With $$n$$ the number of nodes in the tree:

• For the modes "min" and "max", the runtime complexity is linear $$\mathcal{O}(n)$$.

• For the mode "least_square", the runtime complexity is linearithmic $$\mathcal{O}(n\log(n))$$ and the space complexity is linear $$\mathcal{O}(n)$$. The algorithm used is described in:

P. Pardalos and G. Xue ‘Algorithms for a Class of Isotonic Regression Problems.’ Algorithmica (1999) 23: 211. doi:10.1007/PL00009258

Parameters:
• tree – input tree

• altitudes – node altitudes of the input tree

• mode – the regression mode : "min", "max", or "least_square"

• weights – node weights of the input tree (default to an array of 1s). This parameter is ignored if mode is not "least_square".

Returns:

a 1d array