Identification of refactoring opportunities for source code based on class association relationships
来源期刊:中南大学学报(英文版)2020年第12期
论文作者:刘伟 杨娜 黄辛迪 胡为 胡志刚
文章页码:3768 - 3778
Key words:identification of refactoring opportunities; abstract syntax tree; class association relationships; common association classes; source code
Abstract: In order to deal with the complex association relationships between classes in an object-oriented software system, a novel approach for identifying refactoring opportunities is proposed. The approach can be used to detect complex and duplicated many-to-many association relationships in source code, and to provide guidance for further refactoring. In the approach, source code is first transformed to an abstract syntax tree from which all data members of each class are extracted, then each class is characterized in connection with a set of association classes saving its data members. Next, classes in common associations are obtained by comparing different association classes sets in integrated analysis. Finally, on condition of pre-defined thresholds, all class sets in candidate for refactoring and their common association classes are saved and exported. This approach is tested on 4 projects. The results show that the precision is over 96% when the threshold is 3, and 100% when the threshold is 4. Meanwhile, this approach has good execution efficiency as the execution time taken for a project with more than 500 classes is less than 4 s, which also indicates that it can be applied to projects of different scales to identify their refactoring opportunities effectively.
Cite this article as: LIU Wei, YANG Na, HUANG Xin-di, HU Wei, HU Zhi-gang. Identification of refactoring opportunities for source code based on class association relationships [J]. Journal of Central South University, 2020, 27(12): 3768-3778. DOI: https://doi.org/10.1007/s11771-020-4576-7.
J. Cent. South Univ. (2020) 27: 3768-3778
DOI: https://doi.org/10.1007/s11771-020-4576-7
LIU Wei(刘伟)1, 2, YANG Na(杨娜)2, HUANG Xin-di(黄辛迪)1, HU Wei(胡为)1, HU Zhi-gang(胡志刚)2
1. School of Informatics, Hunan University of Chinese Medicine, Changsha 410208, China;
2. School of Computer Science and Engineering, Central South University, Changsha 410083, China
Central South University Press and Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract: In order to deal with the complex association relationships between classes in an object-oriented software system, a novel approach for identifying refactoring opportunities is proposed. The approach can be used to detect complex and duplicated many-to-many association relationships in source code, and to provide guidance for further refactoring. In the approach, source code is first transformed to an abstract syntax tree from which all data members of each class are extracted, then each class is characterized in connection with a set of association classes saving its data members. Next, classes in common associations are obtained by comparing different association classes sets in integrated analysis. Finally, on condition of pre-defined thresholds, all class sets in candidate for refactoring and their common association classes are saved and exported. This approach is tested on 4 projects. The results show that the precision is over 96% when the threshold is 3, and 100% when the threshold is 4. Meanwhile, this approach has good execution efficiency as the execution time taken for a project with more than 500 classes is less than 4 s, which also indicates that it can be applied to projects of different scales to identify their refactoring opportunities effectively.
Key words: identification of refactoring opportunities; abstract syntax tree; class association relationships; common association classes; source code
Cite this article as: LIU Wei, YANG Na, HUANG Xin-di, HU Wei, HU Zhi-gang. Identification of refactoring opportunities for source code based on class association relationships [J]. Journal of Central South University, 2020, 27(12): 3768-3778. DOI: https://doi.org/10.1007/s11771-020-4576-7.
1 Introduction
Refactoring refers to improve the quality of source code without changing its external behavior so that costs of software development and maintenance can be effectively reduced. FOWLER [1] proposed 72 commonly used code refactoring methods, and then KERIEVSKY [2] proposed 27 pattern-directed refactoring methods combining refactoring and design patterns. Refactoring can improve the code quality of a software, including readability, reusability, maintainability [3, 4], and even security [5]. Refactoring usually includes two parts, refactoring opportunity and code refactoring. The former means whether or not the code needs to be refactored by some pre-defined judgment rules, while the latter uses some means to optimize and refactor the code, thus improving the quality of programming.
In recent years, refactoring has become an important research field in software engineering as a series of papers have been published in computer related journals and conferences covering the identification of refactoring opportunities and automatic refactoring realization. For the identification of refactoring opportunities, TOURWE et al [6] proposed the logic metaprogramming (LMP) to detect bad smells of code through a series of pre-defined logic rules. LIU et al [7] studied the identification of generalization refactoring opportunities and developed a tool called GenReferee according to conceptual relationship, implementation similarity, structural correspondence, and inheritance hierarchies. HIGO et al [8, 9] introduced a metric-based approach with a set of metrics calculated by a self-developed tool called Aries to identify refactoring opportunities and to merge code clones or duplicated fragments in a Java software system, and furthermore they proposed program dependence graph (PDG). DALLAL [10] used object-oriented quality metrics in modelling to predict extract subclasses refactoring opportunities with 25 metrics related to project size, cohesion and coupling. TSANTALIS et al [11-14] carried out substantial research on refactoring opportunities and published several high-level papers successively. For instance, they proposed a methodology for the identification of move method refactoring opportunities to solve many common Feature Envy bad smells, which employs the notion of the distance between system entities (properties/ methods) and classes to identify behavior-related refactoring opportunities [11]. They also proposed a technique introducing polymorphism that extracts refactoring suggestions as a solution to state-checking problems [12]. In addition, the identification of extract method refactoring opportunities is used for large and complex methods by extracting complete computation slice and object state slice from source code; however refactoring opportunities that could possibly cause a change in program behavior after slicing are excluded based on a set of rules [13]. FOKAEFS et al [14] introduced a clustering algorithm based on the Jaccard distance between class members to identify refactoring opportunities during research on identification of extract class refactorings in object-oriented systems. BAVOTA et al [15, 16] proposed an algorithm through game theory and another extract class refactoring opportunity identification algorithm with measures related to metric structure and semantic cohesion based on a series of metrics such as call-based dependency between methods (CDM), concept similarity between methods (CSM). KAYA et al [17] proposed a technique to decompose long methods into smaller, more comprehensible and readable ones, and this technique seeks refactoring opportunities based on variable declarations and uses confining fully extractable code regions without any user intervention. PAPPALARDO et al [18] defined a novel metric that is intended to show how closely connected the elements of a class are. This metric characterizes the strength of the coupling between methods of a class, based on invocations and the size of the parameters involved as well as attribute accesses. They used the computed metric and the assessment of system-wide relationships between classes to suggest Extract Class refactoring opportunities. DALLAL [19] introduced a measure and a corresponding model to precisely predict whether a class includes methods in need of move method refactoring (MMR). The measure is applicable once a class has entered the early development stages without waiting for other classes to be developed. LIU et al [20] proposed an approach to identify renaming opportunities by expanding conducted renamings. Once a rename refactoring is conducted manually or with tool support, the proposed approach recommends to rename closely related software entities whose name is similar to that of the renamed entity. CHARALAMPIDOU et al [21] introduced an approach (accompanied by a tool) that aims at identifying source code chunks that collaborate to provide a specific functionality, and proposed their extraction as separate methods. WANG et al [22] proposed a system-level multiple refactoring algorithm, which can identify the move method, move field, and extract class refactoring opportunities automatically according to the principle of “high cohesion and low coupling”. TERRA et al [23] proposed a recommendation approach that suggests MMRs using the static dependencies established by methods in possible target dasses. In addition, they implemented the approach in a publicly available tool called JMove. Code smells are indicators for refactoring, so GUGGULOTHU et al [24] proposed a novel approach to suggest code smell order. They found relevant metrics for each code smell dataset with the help of feature selection technique and analyzed the internal relation among the code smells with those relevant metrics, and suggested code smell order for developers to save their effort in the refactoring stage. Recently, YOSHIDA et al [25] proposed a proactive clone recommendation system for Extract Method refactoring. Once the proposed system detects an Extract Method refactoring instance based on the analysis of code modifications, it recommends code clones of the refactored code as refactoring candidates. NYAMAWE et al [26] selected alternative refactoring solutions according to how they improve the traceability as well as source code design. They used the entropy-based and traditional coupling and cohesion metrics respectively, applied some alternative refactoring solutions and measured their effect on the traceability and source code design. SHENEAMER [27] proposed a unique learning method that automatically extracts features from the detected code clones and trains models to advise developers on deciding which type needs to be refactored. He introduced a new method to convert refactoring clone type outliers into unknown clone set to improve classification results.
Though many methods for identification of refactoring opportunities are in place, the fact that their precision is not high or the long execution time can be serious problem when the relationships between multiple classes need to be considered at the same time, which will be more serious with the increase of complexity in class associations. Our approach based on class associations proposed in this paper, reduces not only duplicate codes, but also the coupling between classes, thus improving the system scalability and maintainability. This approach first extracts the associations between classes, analyzes the duplication in these associations to obtain the common association classes, and identifies extract class refactoring opportunities for further automatic refactoring. This approach is related to code-level duplication but studies the duplication of relationships between classes. Therefore, the problem under research has large granularity, relatively small time and space complexity, which results in short execution time and high efficiency.
2 Representation and extraction of class associations
Among multiple relationships between classes in object-oriented software systems, class association is the most common one. It is a structured relationship to represent the connections between two objects, also known as delegation. In the source code, if ClassA has a data member (also known as the attribute, domain, or field) belonging to ClassB, these two classes form an association which points from ClassA to ClassB while ClassA is referred to as a client class. As shown in Figure 1, ClassA possessing two object members objB and objC forms associations with ClassB and ClassC respectively, each of which is represented by a solid line with an arrow giving association directions in unified modelling language (UML).
According to the single responsibility principle (SRP) in the object-oriented programming, an object should only contain a single responsibility completely encapsulated in a class. The more responsibilities a class assumes, the less likely it is to be reused. So giving multiple responsibilities in one class is equivalent to coupling these responsibilities together. Therefore, it is important to separate and encapsulate them in different classes to avoid if one of the responsibilities changes the possible impact on others in coupling. In other words, the granularity of the classes in the system should not be too large, and the function of each class should be relatively simple. However, some complex functions’ implementation requires the cooperation of a group of classes. Therefore, some “one to many” associations will appear in the system. A class needs to associate many classes to implement a certain function. To this end, the system may be subject to the following problems if not proper programm.
1) Code duplication. Classes frequently associated often appear as a whole, and part of the code in a client class that creates the objects or calls the methods will often be repeated in the program where a new associated client class needs to be created. In other words, where the addition of a new “one-to-many” association is needed. This duplication in multiple places in source code will affect the maintainability and the performance of program.
Figure 1 Sample of class association relationships
2) High coupling. In the scenario of “one-to-many” associations, the addition of a new client class will lead to multiple associations created with all of the related classes. Besides, more associations in the system will lead to more frequent mutual calls and higher coupling between classes, and further increase the difficulty in maintenance as changes in one class may affect a series of others.
3) Fine-grained reuse. In circumstance that some fine-grained classes in the system frequently need to be associated and called, their reuse requires to be realized individually or be associated one after another. Therefore, it is important to employ more granular reuse to simplify calls from the client side, aiming to achieve more complex features.
A system with multiple “one-to-many” associations will turn to complex “many-to-many” associations causing duplication and coupling in the system required to be reduced or resolved by identifying and refactoring so as to improve system maintainability.
2.1 Representation of class associations
The concept of association class set is introduced to represent the relationships of various classes in the storage system.
Definition 1: Association classes set (ACS). An association class set regarding to a class refers to a set of classes associated with that class. ACS regarding to a particular class is a set of classes of its member objects. The associations between classes are directional. Since self-association may exist, ACS may include a class itself.
For example, in Figure 1, the association among ClassA, ClassB and ClassC is expressed as: ACS(ClassA)={ClassB, ClassC}; ClassB with no member object is expressed as ACS(ClassB)=Φ.
2.2 Extraction of class associations
This paper employs Abstract Syntax Tree (AST) as an intermediate representation through analysis of source code to draw the relationships between classes. The AST is then parsed to create a class association diagram. Java source code is taken as an example in the following demonstration using Eclipse. Each Java file is processed to an AST using Eclipse AST between a set of application programming interface (API) for access and operation of source code provided in Eclipse JDT [28]. Eclipse AST, in particular for the creation and manipulation of ASTs, defines APIs for modifying, creating, reading, and deleting source code. It uses the Factory Method Pattern and Visitor Pattern in design and realization, which helps users to gain insight into the internal structure of code and to build and process ASTs.
Eclipse AST also provides a class named ASTNode, and each grammar structure in Java source code corresponds to one of its subclass with name and meaning basically clear. For instance, class member variables are represented by ASTNode subclass FieldDeclaration. During the analysis of member variables, the visibility, variable name, multiplicity, and default values are all ignored except the variable type, which is stored in ASTNode TYPE in forms such as PrimitiveType, SimpleType, ArrayType. As the varieties of types require different processing methods, the basic principle is that ArrayType is transformed to SimpleType for analysis while PrimitiveType is not considered. The following focuses on the analysis of SimpleType utilizing AST. Each AST is parsed individually with traverse of FieldDeclaration nodes to find all associations classes, namely to find all classes of object and subclass member variables excluding the PrimitiveTypes such as int, double, char. Then association classes set is generated for each class to form its ACS, and all are stored in a map with class name as Key. The ACS corresponding to the class names as Value. ACS of each class is also stored in a List object. The Algorithm 1 for obtaining ACS of all classes is shown in pseudo code.
As shown in Algorithm 1, line 1 declares an empty acsMap to store the names of all classes and their ACSs. From line 2, the files of source code are analyzed individually to form their corresponding ASTs where node FieldDeclaration of class member variables is then analyzed with their types extracted. If the TYPE node of the member variables satisfies certain conditions according to the pre-defined rules, the type will be stored in a set named acsList. Finally, the name of the class and its corresponding acsList are stored in acsMap as Key and Value, respectively. Line 22 returns the set acsMap that stores ACS of all classes.
Algorithm 1 Obtaining ACS of all classes
Apparently, the time complexity of this algorithm is O(nm), where n is the number of source code files approximately equal to the number of classes; m is the average number of member variables in each class. It is worth noting that the class names stored in the acsList are not necessarily the names of system classes, because some of their associated classes are from third-party libraries, such as classes in JDK. This means that for a system with 20 classes participating in association, it may contain only 10 system classes, and the other 10 classes come from external class libraries. For these third-party classes, no action will be taken to analyze their source code, or to analyze the relationships between these library classes or their relationships with system classes. In other words, it is assumed that the ACS of these library classes is null with regard to the source code under analysis, which yet may not be true, but giving the fact that their ACSs do not belong to the current system, so their refactoring is not considered.
3 Research on refactoring opportunities
The identification of refactoring opportunities based on our approach is to analyze the Map that stores the ACS of all classes. To do so, this paper introduces the concept of common association classes.
Definition 2: Common association classes (CAC). A common association class is a set of classes all of which have joint associations with two or more classes. It is the intersection of ACS of two classes. For example, the common association class of classA and classB is expressed as CAC(A, B) = ACS(A)∩ACS(B).
Figure 2 Sample of Common Association Classes
In Figure 2, ClassE in dashed box is a library class possibly without source code, and hence its association relationship is not considered. CAC (ClassA, ClassC)=ACS(ClassA)∩ACS(ClassC)= {ClassB, ClassD, ClassE}∩{ClassB, ClassC, ClassD, ClassE}={ClassB, ClassD, ClassE}; CAC(ClassA, ClassB)=ACS(ClassA)∩ACS(ClassB) ={ClassB, ClassD, ClassE}∩{ClassD}={ClassD}.
The process of obtaining CAC is through paired analysis of common associations between all system classes. During the comparison, if classes first appear in common associations, they are added to the CAC with the associations recorded. If common association classes identified already exist, their CAC is marked associated with a new corresponding system class. In addition, the algorithm requires a pre-defined threshold to specify the minimum number of CAC required for identification, usually given as greater than or equal to 3. In other words, refactoring is needed for each system class if its number of CAC is greater than or equal to 3. Generally, all the classes satisfying the condition of a particular threshold their CAC are stored in a newly created Map. As there is only one unique set of classes in the CAC with regard to the classes that need to be refactored in the system, it is practical to assign names of these classes as Key with their corresponding CAC as Value. The Algorithm 2 for obtaining the CAC is shown in pseudo code.
The Algorithm 2 is explained as follows. First, the common association classes are obtained by two nested traversals of the Key-Value pairs in acsMap through comparing ACS list of any two classes with their names stored in acsListA and acsListB repectively. The time complexity of the traversals is O(k2), where k represents the total number of elements in acsListA and acsListB. Line 12 calculates the intersection of acsListA and acsListB with the result stored in cacList. Then based on whether the number of elements in cacList is greater than or equal to the threshold, the association relationships of any two classes in comparison satisfying the condition are stored in resultMap with class names previously stored in tempList assigned as Key and their CAC assigned as Value. If the CACs of these two classes already exist in the resultMap, the name of the system class is added to the corresponding Key of this CAC, which makes a Key containing more than two classes possible. Finally, ResultMap returns a set of classes required to be refactored together with a set of their CAC. It is evident that time complexity of the algorithm is O(n2k2), where n is the number of Key-Value pairs in acsMap, that is, the total number of classes in the system. Some result examples of the identification of refactoring opportunities processed by the algorithm are shown in Table 1.
In Table 1, Classes’ name and CAC are drawn from the Key and Value of resultMap, respectively. Meanwhile, the size of CAC is the number of classes in a CAC set. Table 1 implies whether refactoring is needed given the associations between classes and the number of CAC.
Algorithm 2 Obtaining CAC of all classes
Table 1 Result examples of identification of refactoring opportunities (threshold=3)
4 Experiments and results analysis
Four projects were selected for the experiment of identification of refactoring opportunities to evaluate the precision and performance of the approach. The results are analyzed in the following two aspects:
1) Correctness of the algorithm. Different thresholds are tried on these four projects under experiments to optimize the performance of the algorithm. The correctness is calculated through manual check based on whether there are false positives given different thresholds.
2) Execution efficiency. The time complexity of the entire algorithm is analyzed and evaluated through exploring the execution efficiency suggested by the relationships between project size and execution time.
This paper selects four real projects for experiment including two application projects and two open source projects all in different scales. HappyChatroom is a real-time chat tool developed using the Java programming. MyCircle is an Android-based mobile social application software. JHotDraw is an open source 2D graphical user interface development framework for rapid development based on Java graphical editor. JRefactory is an open source tool for refactoring Java source code. The basic information of these four projects to be tested is shown in Table 2.
In the experiment, different results are yielded given two different thresholds set as 3 and 4, respectively. Table 3 shows the identification results of refactoring opportunities on JHotDraw, MyCircle and JRefactory at threshold set as 4. However, HappyChatroom has 25 results identified relatively more than the former three, so they are not listed in Table 3.
Precision is introduced to assess the quality of an algorithm via quantitative evaluation of the identification results. The precision is calculated as:
Precision=TP/(TP+FP) (1)
where TP (true positive) and FP (false positive) are the number of refactoring opportunities identified correctly and incorrectly respectively. An FP stands for a false positive case. The numbers of TP and FP are characterized and calculated based on a thorough manual inspection of the results. Table 4 is the statistics from the identification results of refactoring opportunities yielded from our algorithm after testing four experiment projects.
Table 2 List of basic information of four projects to be tested
Table 3 Results of identification of refactoring opportunities (threshold=4)
Table 4 shows that the precision of our algorithm is very high as there is only two false positive cases when the threshold is 3 and no false positive case when the threshold is 4 in the identification results.
Table 4 Statistics from the identification results of refactoring opportunities
Through the comparison and analysis of the results of all four projects, two conclusions are drawn as follows:
1) Compared with the framework projects, there are more complex many-to-many associations between classes in application projects whose number of candidate refactoring opportunities become relatively larger. As shown in Table 4, regardless of the threshold set at 3 or 4, HappyChatroom and MyCircle have more candidates for refactoring than JHotDraw and JRefactory.
2) Compared with the business layer classes, there are more complex associations between the presentation layer classes also known as interface element classes where a large number of candidate refactoring opportunities exist. Through in-depth analysis of the results, complex interactions between presentation layer classes can be reduced through refactoring.
Because the False Negative (FN) cases need to be identified by other tools or manual review, but there is no other relevant identification tool, and the manual review is very difficult for a large project, no False Negative case is counted in this work.
In addition, this paper records and analyzes the execution time of the identification algorithm of the refactoring opportunities of these four projects. The experiment is performed on a PC installed with Windows 10 operating system with configurations as dual-core 2.40 GHz and 8 GB DDR2 RAM. For each project, the identification algorithm is executed 5 times, and the average time is calculated. The results are shown in Table 5.
Table 5 shows that the execution time for different projects increases with the system scale but not significantly, remaining being stable within five executions for any particular one with only minor differences, so the algorithm presents high execution efficiency and good stability. The time complexity of the entire program is analyzed. Combining Algorithms 1 and 2, the total time complexity of the algorithm is: T(n)=O(nm)+ O(n2k2), where n, as an important measure of system size, is the number of source code files usually slightly less than the number of classes (because of the existence of internal classes in programming languages such as Java); m is the number of member variables in each source code file represented by the average member of variables, i.e, m=TotalofFields/n, where TotalofFields represents the total number of member variables in the project; k is the length of the associated class represented by the average length of associated class set, so k= indicating that the value increases with more associated classes and more complicated associations. It is derived after substitution that T=O(TotalofFields)+O)2), overall determined by the total number of member variables, the number of source code files, and the size of the associated class set. In Table 5, the execution time of JRefactory is the longest as the number of classes or system size of JRefactory is larger than the other three. MyCircle has the second largest execution time though the numbers of source files of MyCircle and JHotDraw are comparably relevant, but MyCircle has more attributes and more complicated class associations with more refactoring opportunities identified. HappyChatroom also has more complicated class associations than JHotDraw even with less source files, but its execution time is still slightly longer than that of JHotDraw. The actual execution time of these four projects is sorted, consistent with the analysis of time complexity. Usually the ACS of system classes is not large, and some is even empty for classes with no member variables or with member variables all in PrimitiveType. Therefore, the algorithm shows good efficiency as the system size brings only slow-rate with the increase of the execution time.
Table 5 Execution time of identification algorithm of refactoring opportunities
5 Conclusions
This paper proposes an approach for the identification of refactoring opportunities based on class association relationships capable of automatic identification of the complex association relationships between classes in the system. The approach firstly transforms the source code into an abstract syntax tree, then the member variables in it are analyzed and extracted if satisfying pre-defined conditions to form association classes set (ACS). Then common association classes (CACs) are extracted according to the set threshold, and finally the names of CACs together with names of their associated classes are exported to users as candidates for refactoring. These results will provide guidance for further refactoring and help software developers to choose subsequent refactoring methods to improve code comprehensibility and maintainability.
The results show that the approach has high precision to identify refactoring opportunities in source code after the application on four projects. Moreover, the results give no false positives through manual inspection. It is concluded that this approach shows good execution efficiency as the execution time positively increases with the system scale but not significantly. The execution time for a project with more than 500 classes is less than 4 s processed by a PC in general configuration, so that this approach is applicable for projects of different scales in the identification of refactoring opportunities.
In further work, based on the identification of refactoring opportunities, the automatic refactoring will be studied with algorithms designed and implemented. At the same time, design patterns such as Facade Pattern and Mediator Pattern [29] will also be introduced to deal with complex relationships between classes hence further improve the quality of the code.
Contributors
LIU Wei provided the concept and solution of the approach, and wrote the first draft of manuscript. YANG Na and HUANG Xin-di implemented the approach, collected experimental projects and carried out experiments. HU Wei conducted the literature review and checked the results of the experiment. HU Zhi-gang reviewed each draft of manuscript, proposed some important advices and edited the draft of manuscript. All authors replied to reviewers’ comments and revised the final version.
Conflict of interest
LIU Wei, YANG Na, HUANG Xin-di, HU Wei and HU Zhi-gang declare that they have no conflict of interest.
References
[1] FOWLER M. Refactoring: Improving the design of existing code [M]. Massachusetts: Addison-Wesley, 1999. DOI: 10.1007/3-540-45672-4_31.
[2] KERIEVSKY J. Refactoring to patterns [M]. Massachusetts: Addison-Wesley, 2004. DOI: 10.1007/978-1-4302-2728- 1_15.
[3] KAUR S, SINGH P. How does object-oriented code refactoring influence software quality? Research landscape and challenges [J]. Journal of Systems and Software, 2019, 157: 110394. DOI: 10.1016/j.jss.2019.110394.
[4] FERNANDES E, CHAVEZ A, GARCIA A, FERREIRA I, CEDRIM D, SOUSA L, OIZUMI W. Refactoring effect on internal quality attributes: What haven’t they told you yet? [J]. Information and Software Technology, 2020, 126: 106347. DOI: 10.1016/j.infsof.2020.106347.
[5] MUMTAZ H, ALSHAYEB M, MAHMOOD S, NIAZI M. An empirical study to improve software security through the application of code refactoring [J]. Information and Software Technology, 2018, 96: 112-125. DOI: 10.1016/j.infsof.2017. 11.010.
[6] TOURWE T, MENS T. Identifying refactoring opportunities using logic meta programming [C]// CSMR’03: Proceedings of the Seventh European Conference on Software Maintenance and Reengineering. 2003: 91-100. DOI: 10.1109/csmr.2003.1192416.
[7] LIU Hui, NIU Zhen-dong, MA Zhi-yi, SHAO Wei-zhong. Identification of generalization refactoring opportunities [J]. Automated Software Engineering, 2013, 20(1): 81-110. DOI: 10.1007/s10515-012-0100-0.
[8] HIGO Y, KUSUMOTO S, INOUE K. A metric-based approach to identifying refactoring opportunities for merging code clones in a Java software system [J]. Journal of Software Maintenance and Evolution: Research and Practice, 2008, 20(6): 435-461. DOI: 10.1002/smr.394.
[9] HOTTA K, HIGO Y, KUSUMOTO S. Identifying, tailoring, and suggesting form template method refactoring opportunities with program dependence graph [C]// CSMR’12: Proceedings of the 2012 16th European Conference on Software Maintenance and Reengineering. 2012: 53-62. DOI: 10.1109/csmr.2012.16.
[10] DALLAL J A. Constructing models for predicting extract subclass refactoring opportunities using object-oriented quality metrics [J]. Information and Software Technology, 2012, 54(10): 1125-1141. DOI: 10.1016/j.infsof.2012.04.004.
[11] TSANTALIS N, CHATZIGEORGIOU A. Identification of move method refactoring opportunities [J]. IEEE Transactions on Software Engineering, 2009, 35(3): 347-367. DOI: 10.1109e.2009.1.
[12] TSANTALIS N, CHATZIGEORGIOU A. Identification of refactoring opportunities introducing polymorphism [J]. Journal of Systems and Software, 2010, 83(3): 391-404. DOI: 10.1016/j.jss.2009.09.017.
[13] TSANTALIS N, CHATZIGEORGIOU A. Identification of extract method refactoring opportunities for the decomposition of methods [J]. Journal of Systems and Software, 2011, 84(10): 1757-1782. DOI: 10.1016/j.jss.2011. 05.016.
[14] FOKAEFS M, TSANTALIS N, STROULIA E, CHATZIGEORGIOU A. Identification and application of extract class refactorings in object-oriented systems [J]. Journal of Systems and Software, 2012, 85(10): 2241-2260. DOI: 10.1016/j.jss.2012.04.013.
[15] BAVOTA G, OLIVETO R, DE LUCIA A, ANTONIOL G, GUEHENEUC Y G. Playing with refactoring: Identifying extract class opportunities through game theory [C]// ICSM ’10: Proceedings of the 2010 IEEE International Conference on Software Maintenance. 2010: 1-5. DOI: 10.1109/icsm.2010.5609739.
[16] BAVOTA G, DE LUCIA A, OLIVETO R. Identifying extract class refactoring opportunities using structural and semantic cohesion measures [J]. Journal of Systems and Software, 2011, 84(3): 397-414. DOI: 10.1016/j.jss.2010.11.918.
[17] KAYA M, FAWCETT J W. Identification of extract method refactoring opportunities through analysis of variable declarations and uses [J]. International Journal of Software Engineering and Knowledge Engineering, 2017, 27(1): 49-69. DOI: 10.1142/S0218194017500036.
[18] PAPPALARDO G, TRAMONTANA E. Suggesting extract class refactoring opportunities by measuring strength of method interactions [C]// IEEE 2013 20th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 2013: 105-110. DOI: 10.1109/apsec.2013.123.
[19] DALLAL J A. Predicting move method refactoring opportunities in object-oriented code [J]. Information and Software Technology, 2017, 92: 105-120. DOI: 10.1016/ j.infsof.2017.07.013.
[20] LIU Hui, LIU Qiu-rong, LIU Yang, WANG Zhou-ding. Identifying renaming opportunities by expanding conducted rename refactorings [J]. IEEE Transactions on Software Engineering, 2015, 41(9): 887-900. DOI: 10.1109/tse.2015. 2427831.
[21] CHARALAMPIDOU S, AMPATZOGLOU A, CHATZIGEORGIOU A, GKORTZIS A, AVGERIOU P. Identifying extract method refactoring opportunities based on functional relevance [J]. IEEE Transactions on Software Engineering, 2017, 43(10): 954-974. DOI: 10.1109/tse.2016. 2645572.
[22] WANG Ying, YU Hai, ZHU Zhi-liang, ZHANG Wei, ZHAO Yu-li. Automatic software refactoring via weighted clustering in method-level networks [J]. IEEE Transactions on Software Engineering, 2018, 44(3): 202-236. DOI: 10.1109/tse.2017. 2679752.
[23] TERRA R, VALENTE M T, MIRANDA S, SALES V. JMove: a novel heuristic and tool to detect move method refactoring opportunities [J]. Journal of Systems and Software, 2018, 138: 19-36. DOI: 10.1016/j.jss.2017.11.073.
[24] GUGGULOTHU T, MOIZ S A. An approach to suggest code smell order for refactoring [M]// Emerging Technologies in Computer Engineering: Microservices in Big Data Analytics. Singapore: Springer Singapore, 2019: 250-260. DOI: 10.1007/978-981-13-8300-7_21.
[25] YOSHIDA N, NUMATA S, CHOIZ E, INOUE K. Proactive clone recommendation system for extract method refactoring [C]// 2019 IEEE/ACM 3rd International Workshop on Refactoring (IWoR). ACM, 2019: 67-70. DOI: 10.1109/ IWoR.2019.00020.
[26] NYAMAWE A S, LIU Hui, NIU Zhen-dong, WANG Wen-tao, NIU Nan. Recommending refactoring solutions based on traceability and code metrics [J]. IEEE Access, 2018, 6: 49460-49475. DOI: 10.1109/ACCESS.2018. 2868990.
[27] SHENEAMER A M. An automatic advisor for refactoring software clones based on machine learning [J]. IEEE Access, 2020, 8: 124978-124988. DOI: 10.1109/ACCESS.2020. 3006178.
[28] KUHN T, GMBH E.M, THOMANN O. Abstract syntax tree [EB/OL]. [2019-10-30]. http://www.eclipse.org/articles/ Article-JavaCodeManipulation_AST/index.html.
[29] GAMMA E, HELM R, JOHNSON R, VLISSIDES J. Design patterns: abstraction and reuse of object-oriented design [M]// Software Pioneers. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002: 701-717. DOI: 10.1007/978-3-319- 02192-8_2.
(Edited by ZHENG Yu-tong)
中文导读
基于类关联关系的源代码重构时机识别
摘要:针对面向对象软件系统中类与类之间存在的复杂关联关系,提出了一种基于类关联关系的代码重构时机识别方法,可用于探测代码中存在的重复多对多关联关系,为进一步实施重构提供指导。该方法首先将源代码转换成抽象语法树,然后提取每一个类的成员变量,得到每一个类的关联类集,再寻找类之间的公共关联类,最后根据设定的阈值,保存并输出满足预定条件的候选重构类集以及它们的公共关联类。通过对四个项目进行重构时机识别实验,结果表明,当阈值为3时,方法的精确率超过96%,当阈值为4时精确率达到100%。同时,方法具有较好的执行效率,对于超过500个类的项目,识别程序的执行时间不到4 s,说明该方法可有效应用于不同规模项目的重构时机识别。
关键词:重构时机识别;抽象语法树;类关联关系;公共关联类;源代码
Received date: 2020-04-19; Accepted date: 2020-10-11
Corresponding author: LIU Wei, PhD, Associate Professor; Tel: +86-731-88458173; E-mail: weiliu@csu.edu.cn; ORCID: https://orcid.org/0000-0001-5615-3098