Twin examples of multiple trees: 1. UML models, 2. Machine Learning

“Today, professionals get trained in using tools… there’s a lack of education of fundamentals like modeling, architecture, methods, or concepts... Getting value out of data needs professionalization based on education and practical experience.”

Andreas Buckenhofer, Daimler TSS, in an interview by OODBMS on Big Data, July 2019

In my opinion, he’s spot on.

My post from March mentions why new AI languages aren’t exactly heavies of a CV in a mainstream business; in April, a figure (at the end of the post) also touched on Forest structures in ML and eXplainable AI.

After a free-wind sail that took us from detail to architecture, we now go into some structural “forestry”. It’s about tackling the same domain from multiple viewpoints, instead of clinging on to one.

1. Multiple trees in UML: Generalization sets

This post from 2015 (in Swedish) discusses the sets in more detail, so let’s just recap the diagrams, in English, and add «powertype» on the fourth one (a power set’s instances are subsets, so by the same token, a UML2 powertype’s instances are “subtypes” of a general construct).

2. Multiple trees in Machine Learning: random Decision Forests

Here, a designer is not the one who maps out subclasses. Rather, an ML algorithm generates in training time, from (labeled) data, an ability to perform classification (i.e accurate “mapping-out” of “classes”). The decision nodes of a (classification) tree gradually subdivide the data into more and more fine-grained classes.

Why bother about decision trees when Deep Neural Networks are booming? Because explainability opens the door to acceptance in mission-critical apps. ML-generated logic has to be auditable. User enterprises are pushing for graphicness, conceptualization, traceability, V&V. Those are the strengths of decision trees, and weaknesses of Deep NNs (we know those work, but hardly how); same thing with learning time required, size of training data sets, execution speed, or partitionability (a hint for IT architects: a tree works independently, whereas a neuron relies on many other ones). Atop of that, decision trees offer a structural backbone of hybrid AI systems (see also the last paragraphs in this post).

You might remember that trees in woods sometimes fuse their roots and exchange materials. Unsurprisingly, we find some synergy in virtual forests too. Firstly, our trees are “grown” on a random sample each (hence some “biodiversity” too), from one training-data set (hence fewer terabytes of training data). Secondly, on each sample, its tree’s decision nodes use a random subset of all available attributes. This gives architects and other roles some room to tune the mix of efficiency and explainability; in forests, it’s is near the level of genetic algorithms (GAs too have possible “mix-tuning points” in “biodiversity steps” Crossover and Mutation).

The more trees and “biodiversity” our ML algorithm grows, the more accurate and robust the generated logic becomes, because the final step is vote counting. A forest’s output (a classification like here, or a forecast) is an aggregated value of the outputs of all trees (a statistical mode in classification, or a mean in regression). It prevents the random decision forest from getting stuck in local optima, that is, we minimize error rates and overfitting to a given training-data set (which may be both incomplete and biased).

by Milan Kratochvil

Trainer at Informator, senior modeling and architecture consultant at Kiseldalens, main author : UML Extra Light (Cambridge University Press) and Growing Modular (Springer), Advanced UML2 Professional (OCUP cert level 3/3).

Milan and Informator collaborate since 1996 on architecture, modelling, UML, requirements, rules, and design. You can meet him in September at public courses (in English or Swedish) on AI, Architecture, and ML (T1913), Architecture (T1101, T1430) or Modeling (T2715, T2716).

Om Informator

Informator har över 25 års erfarenhet av att utbilda inom IT-branschen. Bland våra samarbetspartners finns branschens främsta lärare och produkttillverkare. Alla med mycket hög kompetens och stark praktisk förankring i sina respektive ämnen.

Informator är en del av Nasdaq OMX-listade Soprano Plc som är Nordens ledande specialister inom kommunikation och IT.

"Soprano Training" utgörs av Informator, Tieturi och Management Institute of Finland. Tillsammans är vi Nordens största kompetensutvecklare inom IT med 67.000 kursdeltagare per år.

Twin examples of multiple trees: 1. UML models, 2. Machine Learning

Skicka en kommentar

Kategorier

Skribenter

Populära inlägg

Följ oss på Twitter

Länkar

Kontakt

Kursområden

Om Informator