Replication evaluation, details and sources
Mining for Localization in Android
by Laura Arjona and Gregorio Robles
Submitted to the MSR 2012 Challenge
Universidad Rey Juan Carlos (Madrid, Spain)
Slides used for the presentation of this paper at the MSR 2012.
Based on the criteria proposed in On the reproducibility of empirical software engineering studies based on data retrieved from development repositories (Open Access - Empirical Software Engineering, Volume 17, Numbers 1-2, 75-89), the attributes of this study are given in following table:
Details
Data Source
-
Identification:
- Android changes to the versioning system
- Android bug reports
-
Description:
- Android git repository.
- Android issue tracking system at Google Code. Detailed description at http://source.android.com/source/report-bugs.html
-
Availability: Public.
-
Persistence: Yes.
- Not available/unknown. We base our study on the raw dataset provided by the
MSR 2012 Challenge organizers. They haven't specified any software/method for retrieving
the data from the data source publicly, only providing the raw dataset.
Raw Dataset
-
Identification:
- Android changes
- Android bug reports
-
Description:
- Android changes (git log): detailed description at http://git-scm.com. Updated Dec 6, 2012. Data provided by MSR Challenge 2012, in XML format.
- Android bug reports (Google Code Issue Tracker): detailed description at http://source.android.com/source/report-bugs.html. Updated Dec 6, 2012. Data provided by MSR Challenge 2012, in XML format.
-
Availability: Public.
-
Persistence: Yes.
-
Flexibility: Yes. Provided in XML format.
Extraction Methodology
-
Identification: git_create_database.sql and bugs_create_database.sql SQL scripts, xml2db.py, bugxml2db.py Python scripts.
-
Description:
- SQL scripts to create the databases and tables holding the git log and bug tracking data.
- Python scripts to parse the XML source data to SQL INSERT sentences.
-
Availability: Public. Available as xml2db.py and bugxml2db.py.
-
Persistence: Yes.
-
Flexibility: Yes. Both Python scripts have been released under the GPLv3.
Study Parameters
-
Identification: Date of data retrieval.
-
Description: According to the MSR 2012 Challenge organizers, the raw data was updated December 6th 2012.
Processed Dataset
-
Identification: Output of xml2db.py and bugxml2db.py scripts.
-
Description: Text files with SQL INSERT sentences (one sentence per row).
-
Availability: Public. Available as git.log.sql.gz (340 MB) and android_platform_bugs.sql.gz (14 MB).
-
Persistence: Yes.
-
Flexibility: Yes. They are SQL text files.
Analysis Methodology
-
Identification: msr2012.R R script.
-
Description: R script that query the database and report results.
-
Availability: Public. Available as msr2012.R
-
Persistence: Yes.
-
Flexibility: Yes. The R script has been released under the GPLv3.
Results Dataset
-
Identification: Results graphs and queries
-
Description: Results obtained from running the analysis methodology scripts on
the processed dataset.
-
Availability: Public.
-
Persistence: Yes.
-
Flexibility: No. PDF charts.
Comments and suggestions: Gregorio Robles < grex at gsyc.urjc.es >.
Last modified: June 3rd 2012.