一个对评估的状态机的可理解性的度量:实证研究外文文献翻译、中英文翻译、外文翻译
一个对评估的状态机的可理解性的度量:实证研究外文文献翻译、中英文翻译、外文翻译,一个,评估,状态机,可理解,度量,实证,研究,外文,文献,翻译,中英文
附录一:
一个对评估的状态机的可理解性的度量:实证研究
可理解性(也称可理解性或适当可识别性)被认为是构建模型的重要的因素之一。在ISO 25010,可理解性被归类为一种使用属性。它也被认为是维修方面的一个重要因素。也就是说,可以理解的模型可以维护活动有所支持,相应的进行分析, 修改和扩展,从而校正系统,使得系统更加完善。众所周知,在软件开发过程中, 维修任务的50%和整个工艺的60%都是在尝试了解软件。这是因为错误的理解是导致错误的主要原因。误解的接口规范将导致通信错误,并导致执行错误,例如出现缺少功能和故障的现象。恩德雷斯发现,他所分析的误区中,有46%都涉及到误解。
状态机(SM)是一个很流行的行为模型,用于描述一个系统、组件或对象的动态行为。SM被用于开发商如设计师,程序员,经理和测试人员之间的沟通工具。一个状态机(SM)的行为表现为可接受的事件的顺序,以及事件的执行过程,并且说明事件引起的变化。该模型还用于各种软件工程领域,如正式验证这一领域和测试数据自动生成的领域,以及逆向工程的领域。一个单一系统的行为可以由多个形式的状态机(SMs)进行描述。换句话说,即有这些多个形式的状态机(SMs)同时指定了相同的行为,那么也不会造成混乱,因为他们的状态和转换都可以分别配置。例如,一个有界堆容器的行为可以描述为用两种状态,即空和非空。在另一方面, 通过把非空又可以分为部分空和全满这两种状态,这样就可以使得一个状态机(SM) 同时存在三个状态,即空,部分空和满。
不同配置的多个形式的状态机(SMs)的可理解性又是不同,即使它们所描述的行为都是等效的。例如,两个状态机(SM)为上文中所提及的有界堆栈容器所产生的可理解性可能也是不相同的。如果是前者的状态机(SM),有两种状态用于发展,那么全状态的限制则有可能出现被忽略的可能性,因为完整的状态并没有明确在状态机(SM)中提出。为了提高模型的可理解性,现在已经提出并开发了大量的重构设计规则和设计模式。
定量测量可理解性是高度理解和使用状态机(SM)实现其建造功能的第一步。我们所现有认知的整个范围内,评估状态机(SM)的理解性的研究指标从未被报道过。尽管状态机(SM)质量评估作为一项极为重要的指标,但状态机(SM)指标进
行质量评估或故障预测的深入研究却停滞不前,相反,取而代之的是元素指标,如计数状态或转换的数量。很多状态机(SM)生成方法事实上已经得到提出,但生成的多个形式的状态机(SMs)的质量却未被得到过验证。其中的原因可以归因于缺乏为状态机(SM)的品质进行定量测量的方法。
规模和复杂性的几个元素指标已经提出。但是,实际使用这些指标来评估可理解性却有些负担和阻碍,因为状态机(SM)的可理解性不需要小规模或者简单的形式。任何系统的行为都可以描述为一个单一的状态的状态机(SM)。这样的状态机
(SM)是非常小的和简单的。然而,因为单状态捕获与系统的所有状态相关,这可能是比较难于理解的。例如,有界栈容器的行为可以表示为只有一个状态的状态机
(SM)。该状态机(SM)可能有与之堆叠的方法相同的转换数目,例如,两个推入
()和推出()转换形式。但是,状态机(SM)与结构模型如同一类型的图表并没什么不同。
聚和耦合是影响可理解性因素中的重要因素。凝聚是模块中的元素关联程度的一种度量。在强内聚模块中,所有的元素都涉及到一个单一的功能。这种凝聚力的模块可以更容易被理解。耦合,在另一方面,是所测模块之间的关系的一个指标。两个模块之间的高耦往往会使得它更难于被理解。与此相反,低耦合装置则是自包含的。因此,该模块可以更容易的被维护和理解(即所谓的KISS原则:“保持简单, 或字面意义上为愚蠢的”)。
我们做出的假设是,各状态在清楚地表明单一的情况下可以诠释为可理解的状态机(SM)。换句话说,提高理解这一重要因素是状态的简单性而不是状态机(SM)。因此,我们认为,各状态应高度聚合,尽可能少的耦合。我们也相信,如果他们的状态是可以理解的,那么状态机(SM)也可以理解。
我们基于上述假说,于是提出了一个可理解性度量的概念,称为状态机可理解性度量(SUM)。首先,让我们定义内聚和耦合度量。凝聚力指标是用来测量相互之间的状态和转换的一致性。具体地说,它计算状态的状况和相关的转换的约束之间的关系的程度。耦合度量是用来衡量状态之间的独立性。具体地来说,我们的耦合度量分析了由状态得到的相似情况,并计数行为上依赖状态的数目。最后,状态机可理解性度量(SUM)是通过混合的凝聚力和耦合度量综合进行定义的。
为了验证所提出状态机可理解性度量(SUM)的有效性,我们进行了一项实验。在实验中,我们为五个系统分别准备了五不同配置的状态机(SM),一共产生25
单一的状态机(SM)。使用多个形式的状态机(SMs)的主要属性的十个问题的模板,然后准备客观地衡量多个形式的状态机(SMs)的实际可理解性。五个系统(注意:不是多个形式的状态机(SMs)),十个具体问题是单独从问题的模板构造进行的设置。为了测量实际的可理解性,有25位高级工程师,以及15研究生参加了这项实验。每个状态机(SM)的可理解性使用了八个学员的提问,总共40人参加。通过该项试验得到了两个可理解性指标,即是可理解效率(UEff)和可理解的正确性
(UCor),这两个指标被采纳用于定量测量。
为了验证状态机可理解性度量(SUM)的效果,测定了状态机可理解性度量(SUM)值与从25个状态机中测得的UEff 和UCor 之间的相关性。我们使用Pearson相关系数作为分析工具。结果表明,状态机可理解性度量(SUM)和UEff/UCor之间的关系在p值分别近似在0.003和0.027这两个数值上时才有意义。
为了使指标有用,它们也应当能够评估同一系统的状态机(SM)的可理解性。因此,我们进行了一致性分析,以验证该指标能够持续评估状态机(SM)同一系统的UEff或UCor指标。结果表明,状态机可理解性度量(SUM)始终具有与本状态机的UEff/UCor的4、5个系统分别有着各种相关的联系,并且这一联系十分明显。
一个事件的一个先决条件是定位在斜线“/”的左侧,而一个后置条件位于斜线的右侧。这个符号是同样适用于多个状态机之间的转换。这种方法entertime(T: int)在当Cook=-off 和参数T 满足t > 0 时可以得到正确的执行。完成后entertime的过程后,lefttime 的值必须等于值T。这就意味着当lefttime 大于零entertime 将会开始运行,从而产生相应的结果。在这种情况下,在后置条件未指定的变量仍然保存完好。
可理解性可理解为对象,组件或系统在某种程度上的的目的,是评估的一项明确的指标。同样,状态机的性能可以被定义为一个质量的因素,如何快速、正确的使用户可以正确理解状态机系统的行为。一个系统的动态行为是一套可执行的事件序列。如果一个状态机的可理解性是很高的,我们可以很容易地和准确地确定不仅已经发生的的事件序列也可以确定之后的可能即将要发生的特定事件的状态。
人们很容易认为,较小和较不复杂的短信是更容易被人们所理解的。然而,这项研究强调的是有凝聚力和耦合的短信被认为是为了获得高效的短信而不仅仅是研究他的大小或复杂性。例如,高粘性和低耦合的短信在很多方面确实比较小和较不复杂的短信更容易理解。此外,人们总认为衔接耦合可以提高理解性。因为状态
机的主要指标在多个状态机之间的凝聚力和耦合是互补的。在接下来的两部分,我们给出了这一论点的实验结果。我们定义了两个假设方面的总和。第一个假设涉及“可理解性效率”与“和”之间的关系。这个新的假设是基于原先这样的假设之上的,更高内聚、低耦合的短信可以理解的速度是更快的。考虑到两个或多个学科组被分配了不同的短信,有不同的价值和他们的理解和使用展现的短信,即系统的行为,是用来回答他们特定的问题。我们调查是否多个状态机之间具有较高“和”速度受试者能够正确地回答这个问题的概率比那些状态机之间具有较低“和”速度受试者能够正确地回答这个问题的概率要高。在第二个假设,我们调查是否“和”这一因素也可以代表尤卡乐的指标。以状态机进行分配分配为例子,如果我们的假设是正确的,分配给具有较高“和”短信的主体也将获得更多的正确答案,而那些较低“和”短信的主体则将获得较少的正确答案。那就是,如果短信更高内聚、低耦合,受试者则更应正确理解系统的短信。
15名毕业于釜山国立大学计算机科学工程系的学生和有四年工作经验的人参 与了这项的实验活动。实验是在2010开始的。40人已经完成了多个软件工程的课程。此课程的基础知识包括:基于状态的模型,UML和正式的语言,如OCL的理解和分析短信。这些知识的学习可以帮助他们加强他们对于短信内容的理解。他们进行了多项演习,从基于合同的规范建立状态机和生成测试用例,使用短信等方面展开的多项演习。此外,参与者之间的偏差最小化的统一也规范了回答问题,解决样品状态机性能问题的准则。对于每个系统,实验所设计的一个因素会有两个以上的处理采用完全随机设计,即每个主题只有一个状态机/系统,他们被随机分配到每一个状态机中。在这个实验中,我们对五个不同的系统将会产生五个不同的短信,因此40 名实验员进行评估将产生25条短信。每个状态机设置了10个问题,由参与者进行回答问题。对于每个系统,八个案例被分配到同一个状态机中。换句话说,每一个状态机将进行了八次运行。我们共有200 案例。五条短信随机选择五个不同的系统给每个学科。短信会进行随机科目的选择。换句话说,同样的状态机将被在不同的迭代的八个学科进行研究。解决时间限制在每个状态机不超过15分钟。我们记录了与所有的问题有关的15分钟以内的记录时间和提交在状态机中的参与者。实验共90 分钟,包括15分钟的五次迭代和75分钟的准备。
利用的方法是一种被广泛用来分析连续变量的方法。是用来检验两个连续变量之间的线性关系,皮尔森相关分析可以被视为变量中的参数至少有一个符合正态分
布从而进行分析假设。皮尔森相关分析的相关系数R的数值范围在-1和1之间,正属即是说明是正相关,而负值意味着与本地相关变量有关。此外,我们分析了偏相关量研究和理解之间的真正关系时,这个过程中系统的影响已被删除。如果R是接近1 的临界值,这两个变量的相关性则呈现明显接近直线的趋势。另一方面,如果R值接近于零,则这两个变量之间的关系不是线性的。意义这个因素(表示为P值)可以来自一个系数[。例如,P值是0.05,那么对应的系数的值为0.396的数据的数量应该是25。如果系数值高则对应的值则会较低。在本研究中,我们使用的显著性水平为0.05,是一般常用的水平,这意味着一个微不足道的结果的概率会小于5%。那是,我们可以统计确定度量的标准,如果p是小于0.05作为一个有意义的评价短信的理解性指标。类似和Ueff之间的关系,分析表明和尤卡龙之间的关系是积极和显着的。在25条短信的“和”值与实测值之间的线性关系进行统计,则会得出相应的结论。显著这个因素(P = 0.027)应控制在0.05的水平(P 60.05)。皮尔森相关分析的结果没有考虑到多个状态机之间的系统的影响。应满足适当的二元相关测试的要求,即语境因素,系统的标准值是从分析过程中得出的,不过在这个过程中忽略系统的影响。独立的变量即金额和各因变量即Ueff和尤卡龙。公关是一个偏相关系数,和Ueff之间的偏相关结果表明了积极和显著的偏相关系数应为0.833(P = 0)。这一过程伴随着尤卡龙的部分相关系数也呈现显著增加(PR = 0.566,P = 0.004)。总之,皮尔森实验的所有相关系数和偏相关系数都十分显著。因此,我们可以拒绝了两个零假设,分别为h1,0和h2,0。此外,结果表明,金额是可理解性呈正相关时的重要因素。因此,我们不得不接受替代的假设(h1,1):使用短信能有较高的“和” 值以此显著提高效率进而提高学生的理解的正确性。
一个状态机(SM)是一个动态的行为模型,广泛应用于各种领域。可理解性是稳定性的重要因素。对于行为模型来说,可理解性为对进行有效和正确的沟通有着极为重要的作用。然而,在研究过程中以研究评估安全管理体系的可理解性一直不被重视,没有得到可令人信服的研究结果。因此,在本文中,我们提出了一种可理解性的度量,称为状态机可理解性度量(SUM),基于凝聚力和耦合。并且我们进行了一项实验,以确认状态机可理解性度量(SUM)的有用性。
为了验证状态机可理解性度量(SUM)的有效性和普遍性,我们进行相关性和一致性的分析。对于这些分析,我们准备了从五次正式的规范中提取出的五种不同的信息,从而产生了共25中状态机(SM)的。其可理解性(即可理解效率,UEff,
和可理解的正确性,UCor)是通过评估40人计算得到的。在分析结果中,状态机可理解性度量(SUM)和两个可理解的测量之间的p-值分别为0.003和0.027。此外, 在4、5个系统中SUM-UEff和SUM-UCor都分别有积极的关系。这些结果证实了状态机可理解性度量(SUM)可以成为一个有用的状态机(SM)的可理解性指标。此外, 我们发现,简单这一因素并不能积极的影响状态机(SM)的可理解性。实验结果表明,系统越多,状态机(SM)的正确性和可理解性越高。
我们一致认为,实验的结果尚未定论。因此,我们计划使用更多的系统和状态机(SM)以及参与者来不断进行这项研究,通过更多的实验,在未来,研究出高度理解的状态机(SM),并不断推进其发展。一个能够使得理解性很低的状态机(SM) 转变为高度理解的状态机(SM)的重构方法还在不断的研究与探索中。
附录二:
A metric towards evaluating understandability ofstate machines:An empirical study
Understandability (also called comprehensibility or appropriateness recognizability) is considered one of the important quality factors for models. In ISO 25010, understandability is classified as an attribute of usability. It is also considered a crucial factor in the maintenance aspect. That is, understandable models can support maintenance activities to analyze, modify, and extend a system for correction,
adaptation, and perfection. It is well known that in the software development process, 50% of maintenance tasks and 60% of the overall process are consumed trying to understand the software. This is because misunderstanding can be a major cause of implementation errors. The misunderstanding of interface specifications leads to communication errors, and results in implementation errors such as missing functions and malfunctions. Endres found that of the errors he analyzed, 46% involved a misunderstanding.
A state machine (SM) is a popular behavioral model used to describe the dynamic behavior of a system, a component, or an object. SMs are utilized as a communication tool between developers such as designers, coders, managers, and testers. The behavior of a SM is represented by acceptable events sequences, actions corresponding to the events, and state changes according to the events. The model is also utilized in various software engineering fields such as formal validation , automatic test data generation, and reverse engineering .
The behavior of a single system can be described by multiple forms of SMs. In other words, states and transitions can be configured differently even if the SMs specify the same behavior. For example, the behavior of a bounded stack container can be described with two states (empty and nonempty). On the other hand, a SM with three states (empty, partially empty, and full) can be used by splitting the nonempty state into partially empty and full states.
Differently configured SMs can have different understandabilities, even if the behaviors they describe are equivalent. For example, the two SMs for the bounded stack container above may have different understandabilities. If the former SM that has two states is used for development, the constraint for the full state can be overlooked because the full state is not explicitly presented in the SM. To improve the understandability of models, a large number of design refactoring rules and design patterns have been proposed and developed.
A quantitative measure for understandability is the first step in the construction and use of highly understandable SMs. To the best of our knowledge, no studies on metrics aimed at evaluating the understandability of SMs have ever been reported. Despite the importance of SM quality, in-depth studies of SM metrics for quality evaluation or fault prediction have not been conducted; elemental metrics such as counting the number of states or transitions are presented instead. Many SM generation approaches have in fact been proposed , but the quality of the generated SMs is not validated. One of the reasons can be attributed to the absence of quantitative measures for SM qualities.
Several elemental metrics of size and complexity have been proposed for SMs . However, actually using these metrics to evaluate understandability is burdensome because smallness or simplicity is not necessary for understandability of SMs. The behavior of any system can be described by a SM with a single state. That SM is extremely small and simple. However, this can be more difficult to understand, because the single state captures all situations associated with the system. For example, the behavior of the bounded stack container can be represented as a SM with only one state. The SM may have as many transitions as the number of methods of the stack, e.g., two transitions for push() and pop(). However, the SM does not differ from structural models such as class diagrams.
Cohesion and coupling are known as significantly influential factors in understandability . Cohesion is a measure of the degree to which the elements of a module belong together. In a strongly cohesive module, all elements are related to a single function. Such cohesive modules can be easier to understand. Coupling, on the other hand, is an indicator of the relationship between the measured module and others.
High coupling between two modules can make it harder to understand either of them .By contrast, low coupling means that the module is self-contained. Thus, the module can be maintainable as well as understandable (the so-called KISS principle: ‘‘keep it simple, stupid!’’) .
Our hypothesis is that states should distinctly show single situations for understandable SMs. In other words, the important factor for understandability is simplicity of states not the SM. We therefore believe that states should be highly cohesive and less coupled to be understandable. We also believe that the SMs can be understandable if their states are understandable.
We propose an understandability metric, called the State machine Understandability Metric (SUM), which is based on the above hypothesis. First, let us define cohesion and coupling metrics. The cohesion metric is used to measure consistency among the state and transitions. Specifically, it calculates the degree of relationship between the condition of state and the constraints of the associated transitions. The coupling metric is used to measure independence between states.
Specifically, our coupling metric analyzes similarity of situations captured by states, and counts the number of behavioral dependent states. Finally, SUM is defined by mixing the cohesion and the coupling metrics.
To validate the efficacy of the proposed SUM, we conducted an experiment. In the experiment, we prepared five differently configured SMs for each of five systems, yielding 25 SMs in total. Ten question templates using major attributes of SMs were then prepared to objectively measure the actual understandabilities of the SMs. For the five systems (note: not for the SMs), individual sets of ten concrete questions were constructed from the question templates. To measure the actual understandability, 25 senior and 15 graduate students participated in the experiment. The understandabilities of each SM were then measured using the questions and groups of eight participants, yielding 40 participants in total. Two understandability indicators, understandability efficiency (UEff) and understandability correctness (UCor), were adopted for quantitative measurements.
To validate the efficacy of SUM, correlation analysis between the measured SUM
values and the measured UEff and UCor of the 25 SMs was performed. We used Pearson’s correlation coefficient as the analysis tool. The results showed that there are meaningful relationships between SUM and UEff/UCor (p-value = 0.003 and 0.027, respectively).
For metrics to be useful, they should also be able to evaluate the understandability of SMs for the same system. Therefore, we performed a consistency analysis to validate that the metrics can consistently evaluate the UEff or UCor of the SMs for the same system. The results indicated that SUM consistently has positive relationships with the UEff and UCor of four and five systems, respectively.
The remainder of this paper is organized as follows: Section 2 presents the definition of SM, a conceptual overview of understandability, and an outline of existing metrics. Section 3 defines SUM after defining the metrics of state cohesion and state coupling. Section 4 gives details of our experimental settings. Section 5 describes the results of our correlation and consistency analyses. Finally, Section 6 discusses threats to validity and Section 7 draws conclusions and proposes future work.
Similar to cohesion, the coupling of a SM can be determined from the ouplings between its states. The coupling of a state can be determined from the number of other states sharing the same preceding and following states. If there are many other states sharing the same preceding and following states, the coupling of the state is strong. A state that is tightly coupled with other states is difficult to understand because additional understanding of behaviors in similar states is involved. Consequentially, tightly coupled states make a SM hard to understand.
IT shows a comparison of the proposed metrics to existing size and complexity metrics on the SMs for Oven. It is easy to think that smaller and less complex SMs such as Oven3 are more understandable.However, this study insists that cohesion and coupling of SMs have to be considered to obtain high understandability of SMs rather than size or complexity. For example, highly cohesive (high SMCOH) and lowly coupled (low SMCOUP) SMs such as Oven1 can be more understandable than just smaller and less complex SMs such as Oven3. Moreover, SUM which considers both cohesion and coupling can be a major indicator of SM understandability because
cohesion and coupling in SMs are complementary. In the next two sections, we give experiment results for this contention.
We defined two hypotheses with regard to SUM. The first hypothesis concerns the relationship between UEff and SUM. This hypothesis is based on the assumption that the more highly cohesive and lower coupled SMs can be understood faster. Consider that two or more subject groups are assigned different SMs that have different SUM values and they understand the behavior of the system using the exposed SMs, i.e., they answer the given questions. We investigate whether the subjects in the groups exposed to the SMs having the higher SUM can correctly answer the question faster than those exposed to the SMs having the lower SUM. In the second hypothesis, we investigate whether the SUM can also represent UCor. Using the SM assigning example presented above, if our assumption is correct, the subjects assigned to the SMs with the higher SUM will also get more correct answers than those assigned to the SMs with the lower SUM. That is, if SMs are more highly cohesive and lower coupled, subjects should more correctly understand the systems using the SMs.
The human subjects who participated were 25 fourth-year and 15 graduate students
in the Department of Computer Science Engineering at Pusan National University. The experiment was performed in May 2010. The 40 participants had already completed more than one software engineering course. The courses taught them base knowledge such as state-based models, UML, and formal languages such as OCL to understand and analyze SMs. To help them strengthen their understanding of SMs, they performed exercises to construct SMs from contract-based specifications and to generate test cases using SMs. Additionally, to minimize deviation between the participants, an exercise was given to solve sample SM understandability questions with guidelines for answering the questions. For each system, the experiment was designed as one factor with more than two treatments using a completely randomized design [48], i.e., each subject received only one SM per system and they were assigned randomly to each SM. In this experiment, we had five SMs for five different systems, therefore 25 SMs were evaluated with 40 subjects. For each SM, 10 questions were prepared and participants answered the questions. For each system, eight subjects were assigned to each SM. In
other words, each SM was evaluated eight times, in total we had 200 aluations. Five SMs randomly selected for the five different systems were given to each subject. The SMs were given to subjects in random order. In other words, the same SM was examined by eight different subjects in different iterations.
The solving time was limited to 15 min for each SM. We asked that participants who solved all the questions pertaining to a SM within the 15 min record the time and submit. The experiment took a total of 90 min including 15 min for preparation and 75 min for the five iterations.
The method utilized is one that is widely used to analyze the relationship between continuous variables [50]. To test the linear relationship between two continuous variables, Pearson’s correlation analysis can be considered for parametric analysis under the assumption that at least one of the variables meets the normality. Pearson’s coefficient of correlation r can have any value between -1 and 1, a positive value signifies a positive correlation while a negative value signifies the native correlation of the variables. Inaddition, we analyzed the partial correlations to examine genuine relationships between SUM and understandability when the effects of systems have been removed. The correlations of the two variables are clearly close to a straight line if r is closer to 1 or-1. On the other hand, if r has a value close to zero, the relationship between the two variables is a sense that it is not linear.
Significance (expressed as p-value) can be derived from a coefficient. For example, the p-value is 0.05 if the coefficient value is 0.396 when the number of data is 25. The
p-value is low if the coefficient value is high. In this research, we used significance level a = 0.05, which is the level generally used and which means that the probability of an insignificant result is less than 5%. That is, we can statistically confirm the metric as a meaningful indicator to evaluate the understandability of SMs if the p is less than 0.05. Similar to the relationship between SUM and UEff, it was analyzed that the relationship between SUM and UCor was also positive and significant. The second row in Table 10 shows the results of Pearson’s correlation analysis between SUM and UCor. As shown in the table, in the 25 SMs the linear relationship between SUM values and measured UCor values with the subjects was statistically significant (p = 0.027) at a level of 0.05
(p 6 0.05).The Pearson’s correlation analysis results do not take into account the effects of SMs’ systems. Appropriate tests of the bivariate correlation require that the contextual factor i.e., system, be partialled from the analysis. Table 11 shows the partial correlations when
收藏