Home   Run   Help   Documentation   Interpret Results

MADMatch Documentation


Here, you will find instructions on 
	(1) how to generate input data for MADMatch, 
	(2) how to set its parameters,
	(3) and which outputs it generates.

Note that this online version is restricted to the differencing of class diagrams.
Also, only diagrams of less than 2500 entities are processed.

For other types of diagrams or bigger data, please contact us.

(1) Input *

In MADMatch, software diagrams are modeled as Entity-Relationship Diagrams.
Entities possess attributes such as
	a Name (e.g., class name), 
	a Type (e.g., class) 
	and, possibly, specific features. 
Relationships between entities are distinguished by their Type;
	a special Type ("9") is used to model containment relationships
It is recommended that you use PADL to generate (from jars) appropriate input files for MADMatch.
Otherwise, please follow the instructions below

An input file for MADMatch represents a diagram and must have
	(i) a first line indicating its number of entities (n) and relationships (m): n,m
	(ii) a list of the entities, one entity (and its attributes) per line
	(iii) a list of the relationships, one relationship per line

For class diagrams, each entity is numbered from 0 (the root package) to n (the number of entities)
An entity line is expressed as follows:


Different letters are used to signal entity types:
p for package, c for class, m for method and a for attribute

m,#node,method_signature [visibility@return_type@name@(inputs)]
a,#node,attribute_signature [visibility@type@name]

A relationship line is expressed as follows:


with A (resp. B) being the number of the source (resp. destination) entity
and #relationship being the number of the relationship
	1: association between classes
	2: agregation/composition between classes
	3: inheritance
	4: method A calls method B (r,A,B,4)
	5: method A uses attribute B (r,A,B,5)
	6: attribute A is of type B (r,A,B,6)
	7: method A has return type of class B (r,A,B,7)
	8: method A has an input type of class B (r,A,B,8)
	9: entity A contains an entity B (r,A,B,9)
		(Ex: node p which is a package contains class c 
			which in turn contains method m and attribute a:

(2) Parameter Setting *

MADMatch is controlled by four main parameters grouped in three categories

-- Category I: Error/Dissimilarity Tolerance
Two parameters, one for lexical information and another for structural information, are used
to calibrate tolerance to lexical dissimilarity and structural differences. The lower
those values, the less tolerant the matching (i.e., the less likely it is to match elements which
underwent significant distorsion).
In particular, when both parameters have a value higher than 1, MADMatch becomes error-friendly, 
i.e., it always favors matches over deletions/insertions. Conversely, a zero value for
both parameters generate error-free solutions with only perfect matches.
		Range: [0, 1]		Recommended: 0.7

-- Category II: Relative Importance of different sources of information
MADMatch proposes a parameter (relationshipWeight) which allows to give more or less weight to the 
structural information. For relationshipWeight = x, the information carried by one relationship is 
considered as important as that carried by x entity. The higher x, the more important the structural 
information. A value 0 means that the structural information is not taken into account while a high
value (--> +oo ) means that textual information is irrelevant.
		Range: [0, +oo]	Recommended: 0.2 

-- Category III: Matching Direction
Because software systems evolve in the direction of time (additions are more likely operations
from a version to its successor), we define the asymmetry parameter to take into account the
direction of a matching. Asymmetry means that edit operations in one direction may cost more than the
same operations in the other direction. An asymmetry value of [0] means that additions do not count,
of [1] that additions have the same weight as deletions, and [near +oo] that deletions do not count.
		Range: [0, +oo]	Recommended: 1 


d: tolerance of errors for node information (the higher the more tolerant) range [0,1]
	0 means no tolerance
	1 errors are cost free

g: tolerance of errors for edge information (the higher the more tolerant) range [0,1]
	0 means no tolerance
	1 errors are cost free

e: edge information weighting  range [0, +oo]
	0   --> edge info does not count at all
	1   --> edge info is as important as node info
	+oo --> node info does not count at all

y: asymmetry (matching direction) differentiate adding and removing elements range [0,+oo]
	0   --> adding elements is free
	0.5 --> adding elements is half as costly as removing elements
	1   --> no difference between adding and removing
	>1  --> removing is cheaper.


(3) MADMatch Output *

The online version of MADMatch only produces two files:

1. madmatch_diagram1__diagram2__d__g__e__y__0.5__1.csv [THE result file]
	All node matches with cost information + additions + removals
		ns: node match
		nd: node deleted
		ni: node added
	Goups of entities are delimited by the string "--"

2. candidates.csv [an auxiliary file]
	A list of the pairs of entities considered for the matching 
		+ information on their "termal footprints" (see MADMatch paper)


Ecole polytechnique de Montreal Ecole polytechnique de Montreal Ecole polytechnique de Montreal  
  Copyright © 2011. Soccer Lab   Soccer Tools | Legal Aspects | Contact Us