Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

24
Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy & Computing Dept. The Open University, UK AICA 2004, Benevento, 29 Sett 2004 Folder Structure Evolution in Open Source Software Maurizio Morisio Dipartimento di Automatica e Informatica Politecnico di Torino, Italy Juan F. Ramil Computing Dept. The Open University, UK

description

Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy & Computing Dept. The Open University, UK. Maurizio Morisio Dipartimento di Automatica e Informatica Politecnico di Torino, Italy. Juan F. Ramil Computing Dept. The Open University, UK. - PowerPoint PPT Presentation

Transcript of Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

Page 1: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

Andrea Capiluppi

Dipartimento di Automatica e Informatica

Politecnico di Torino, Italy

&Computing Dept.

The Open University, UK

AICA 2004, Benevento, 29 Sett 2004

Folder Structure Evolution in Open Source Software

Maurizio MorisioDipartimento di

Automatica e InformaticaPolitecnico di Torino, Italy

Juan F. RamilComputing Dept.

The Open University, UK

Page 2: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 229 Sett 2004 – Benevento

Outline

Motivation Definitions and Attributes Selection of Projects Identification and Analysis of Patterns Conclusions and Future Work

Page 3: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 329 Sett 2004 – Benevento

Motivation

Page 4: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 429 Sett 2004 – Benevento

Empirical

Evidence

Theories, Models

Good Practice

Motivation

Page 5: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 529 Sett 2004 – Benevento

Motivation On going empirical investigation into the evolution of Open Source Systems (OSS) Goal: understand evolutionary behaviour of long-lived software systems and generate heuristics and guidelines Similar approach to the one used by Lehman and collaborators in study of commercial systems:

✗ Generate empirical hypotheses✗ Observe commonalities and differences✗ Revise existing empirical hypotheses, generate new ones

Long-term goal: achieve theories and models of software evolution through longitudinal studies Our approach: combine observations at different levels of granularity: system, subsystem, module, function

Page 6: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 629 Sett 2004 – Benevento

Motivation Particular goal of this study: observe evolution through another level of granularity: disposition of source files in folders Research questions:

✗ does the evolution of the folder structure provides interesting and useful information on the evolution of a system?

✗ Is there any relationship between folder structure and other characteristics of the evolution, such as growth rate?

Question was assessed using empirical data derived from a number of OSS systems

Page 7: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 729 Sett 2004 – Benevento

Definitions and Attributes

Page 8: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 829 Sett 2004 – Benevento

Definitions and Attributes

Source File: each file containing source code (before build) Source Folder: each folder containing at least one source file

root

Level

Parent Folder

F1 F2

Folder Tree: graphical representation of the structure of source folders Level: subset of nodes having the same distance from the root

Page 9: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 929 Sett 2004 – Benevento

An Example of a Folder Tree

Maximum distance = depth of the tree

Maximum number of folders in a level = width of tree

Page 10: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 1029 Sett 2004 – Benevento

Possible Evolution of a Folder Tree

Parent Folder

F1 F2

F3

Parent Folder

F1 F2[Vertical

expansion]

Parent Folder

F1 F2 F3

[Horizontal expansion]

Page 11: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 1129 Sett 2004 – Benevento

Attributes Size: as indicator of functional power (locs, Kbs, number of files, number of folders) Folder tree: structure (visualization through GraphViz, an OSS tool) Activity rate: evolution speed, types of activity- Counting added and deleted elements is relatively

easy- Counting modified elements: we focus on elements

changed over a period, not on the amount of individual changes:

number of files touched per release

Page 12: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 1229 Sett 2004 – Benevento

Selection of Projects

Page 13: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 1329 Sett 2004 – Benevento

Selection of Projects

Large initial pool of software systems (400), used in a previous study for characterizing OS software From that sample, we extract the larger projects for the present study Results of the extraction was 26 systems

✗ In total 992 release data points These were characterized by analysing the evolution of folder tree evolution

Page 14: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 1429 Sett 2004 – Benevento

Initial Characterization (example 1)

High correlation between different size measures (locs, files, Kbytes) Majority of the projects situated below a threshold of source code size, when measured at latest available RSN

Page 15: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 1529 Sett 2004 – Benevento

Initial Characterization (example 2)

Average size of files in folders stay generally below a threshold in size (20 Kb)

Page 16: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 1629 Sett 2004 – Benevento

Identification and Analysis of Patterns

Page 17: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 1729 Sett 2004 – Benevento

Identification of patterns

Horizontally expanding (10 out of 25) Vertically shrinking (4 out of 25) Vertically expanding (11 out of 25)

✗ Generally associated with horizontal expansions

Study of relation between patterns of folder tree evolution, functional growth and activity rate

Page 18: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 1829 Sett 2004 – Benevento

Horizontally expanding

Smooth growth of source files Peaks of activity rate around major releases

Page 19: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 1929 Sett 2004 – Benevento

Vertically shrinking

Size of project stable along evolution Modifications are diffused in the whole code base during the evolution

Page 20: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 2029 Sett 2004 – Benevento

Vertically expanding

Periods of linear and super-linear growth Peaks of activity rate consist mainly of additions, rather than modifications

Page 21: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 2129 Sett 2004 – Benevento

Conclusions Majority of systems display expansion at least in one of their structural dimensions - a sign of healthy evolution Folder trees are useful to examine evolution of software systems Growth and activity rate trends are useful especially to recognize stages in software evolution Current and future work:

✗ Relationship between type of folder structure evolution and the type of application domain – vertical, horizontal

✗ Observations at different levels of granularity for identifying stages of evolution

✗ Relationship between change and complexity✗ Qualitative abstraction and simulation

Page 22: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 2229 Sett 2004 – Benevento

Empirical

Evidence

Theories, Models

Good Practice

Conclusions

Page 23: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 2329 Sett 2004 – Benevento

Final Remarks

Some of our data and tools used in present and past works, available at: http://mcs.open.ac.uk/ac5468 Need for collaboration between different groups in sharing data, tools and interpretation

Page 24: Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy &

AICA 2004 2429 Sett 2004 – Benevento

……questions? questions?