Fall 2018. CS6704: Software Engineering Analytics and Automation

Instructor: Francisco Servant [fservant@vt.edu]
Office hours: After class, or by email appointment
Lectures: Wed 4:00-6:30PM, McBryde Hall 655


Software engineering is a highly complex endeavor. Software engineers today perform complex tasks that involve the creative creation and comprehension of complex information, such as: code structure, implementation rationale, dynamic software behavior, change implications, and development team dynamics. At the same time, software engineers also spend large amounts of time performing mechanical work — dealing with complex software development tools — to be able to perform their tasks.

The field of software engineering analytics and automation aims to improve the software engineering discipline in two ways: first, by performing analytics on the software development process to achieve an understanding of the problematic software engineering tasks and their efficiency bottlenecks; second, by developing automatic techniques that leverage such understanding to support software developers in performing their tasks more effectively and efficiently.

In this course, we will discuss the state-of-the-art research works in software engineering analytics and automation. The students will also work on research projects to develop novel techniques in this area.


Graduate or senior standing in the Department of Computer Science and a prior course in software engineering, e.g., CS 4704.


The majority of the readings in the course will be papers available through the IEEE or ACM Digital Libraries. The instructor will provide all the necessary readings in Canvas.

Grading and Evaluation

Distribution of points:

There are three elements to your grade: a research project, short paper summaries (CTTC), and class attendance and participation. Each assignment will contribute to a different percentage of your grade:

Research Project 50%
CTTC 30%
Class Attendance and Participation (including paper presentations) 20%

Submission guidelines:

Assignments must be submitted as follows:

  • through Canvas. Use the folder for the corresponding assignment
  • as a single PDF (Adobe Acrobat) document
  • by 23:59pm Eastern time on the due date

Late submission policy:

Assignments will be turned in electronically through Canvas. Unexcused late assignments will be penalized 25% per 24-hour period.

Grading scale:

≥ 93% → A ≥ 83% → B ≥ 73% → C ≥ 63% → D
≥ 90% → A- ≥ 80% → B- ≥ 70% → C- ≥ 60% → D-
≥ 87% → B+ ≥ 77% → C+ ≥ 67% → D+ ≥ 0% → F

Weekly paper presentations

Every week, each paper that we read will be first presented by one of you. Your presentation will be very brief (max. 5 minutes), and should include the answer to these questions:

  1. What problem is the paper addressing?
  2. Why is the problem important?
  3. What are the existing solutions to this problem?
  4. What is the proposed solution in this paper?
  5. Why is the proposed solution different or more promising than the existing solutions?
  6. How did they evaluate that the proposed solution in fact improves the state of the art?
  7. To what extent does the proposed solution improve the state of the art?
  8. What is something in which you disagree with the authors or that you would like to discuss? Ask the room what they think.

List of presenters:

You will be assigned to present papers in a randomized order by walking down the list below in a round-robin fashion (when we finish the list, we will start over at the top again). To protect your privacy, I am only including the first 3 letters of your last name in the list:

  • Nac
  • Zha
  • Ste
  • Nad
  • Fu,
  • Ibr
  • Ela
  • Kha
  • Jin
  • Dav
  • Mic
  • She
  • Ani
  • Lia
  • Yin
  • Pat

Cut-to-the-chase (CTTC) critiques

The short paper critiques will allow you to demonstrate that you’ve read and thought about the assigned readings. Readings are assigned, as shown in the schedule, each week of the class. You will write a short critique and analysis of each week’s papers. Since multiple papers are assigned, you’ll have to learn how to present incisive, cut-to-the-chase (CTTC) analyses in few words.

I am not interested in reading a paraphrase of each paper’s abstract. In other words, you should not include in your CTTC the answer to the questions that will be answered in the weekly paper presentations.

I am interested in reading your own assessment of each paper: what points do you believe to be the important ones? Do you believe those points? Why or why not? What points did the author(s) not address that they should have? Try to mostly use sentences like “I agree with…”, “I think that…”, “In my opinion…”, “I disagree that…”. More importantly, back up your comments with the reason behind your comments — i.e., adding “because…” to the former sentences. Also, the best comments are those that inspire a whole new research project. Try hard to think about those as well. What’s next after this paper? How did it inspire you to take this work further?

Each CTTC critique will talk about the papers that will be discussed the following day in class.

Each week you will submit a single PDF file with your critique for every paper assigned for that week. Your comments should use at least half a page of a letter-sized document for each assigned paper. You can use any format that you want, but keep in mind that the grade for your CTTC will be awarded for the quality and quantity of your comments in your critique.

Research project

You will work on a research project related to the topics covered in class. You may work individually or in groups of up to three people. Groups of are expected to deliver a contribution that is multiple times as substantial as that delivered by individuals.

You are expected to design your own research project or select one of the projects proposed by the instructor. To help you design your project, you can look at the Mining challenge of the 2019 International Conference of Mining Software Repositories. You are encouraged to use the dataset proposed for the Mining Challenge if your project is based on analyzing software data. If you define your own project, you should discuss it with the instructor during office hours before writing your proposal report.

You are encouraged to use Bitbucket for your source code and project report documents. Add the instructor (username fservant) as your project member to facilitate feedback.

All your reports should follow the NEW ACM Formatting Guidelines. You should use the sigconf variation of this template.

Research project deliverables (and project grade percentage):

  • Project proposal report (10%)

    • Your project proposal should be at least 2 pages, and it should include:
      • Introduction: describing the problem, why the problem is important, your proposed solution, and how you expect your solution to improve the state of the art.
      • Related work: describing other published research related to the project that you are proposing. Other research can be related because: it motivates the need for your project, it tries to solve the same problem in a different way, it uses your proposed technique for other problems, etc.
  • Project proposal presentation (10%)

    • Your project proposal presentation should include the same information as your project proposal report, but summarized in a 4-minute lightning talk.
  • Project milestone reports (x3, 10% each)

    • Your project proposal should be at least 1 full page, and it should include:
      • Progress made since the last project report
      • Progress commitment for the next project report
      • Potential roadblocks that you may encounter in the remaining weeks, and how you expect to overcome them
  • Mid-semester project presentation (10%)

    • Your project progress presentation will take 4 minutes, and it should include:
      • Problem description (updated from your project proposal)
      • Solution description (updated and extended from your project proposal)
      • Progress made so far (since your project proposal)
      • Planning for the remaining weeks
      • Potential roadblocks that you may encounter in the remaining weeks, and how you expect to overcome them
  • Final project report (30%)

    • Your final project report should be a minimum of 7 pages (including references) and a maximum of 10 pages (plus references). The report should include:
      • Introduction: describing the problem, why the problem is important, your proposed solution, and how you expect your solution to improve the state of the art.
      • Related work: describing other published research related to the project that you are proposing. Other research can be related because: it motivates the need for your project, it tries to solve the same problem in a different way, it uses your proposed technique for other problems, etc.
      • Approach: conceptual description of your solution
      • Implementation: technical description of the solution that you implemented
      • Evaluation: description of the experiments that you performed, and interpretation of your results.
      • Conclusion: Did you solve the problem? To what extent? For which cases?
      • References: Citations of all the research papers relevant to your project
  • Final project presentation (10%)

    • Your final project presentation should take 4 minutes and include:
      • Short problem description (updated from your project progress presentation)
      • Short solution description (updated from your project progress presentation)
      • Challenges that you resolved along the way
      • Evaluation description and interpretation of your results
      • What did you learn? What would you have done differently? What other future work could be performed, inspired by your project?

Schedule (likely to change)

Week Date Topic Assignment
1 2018–08–20 Introduction to course.

Class canceled due to medical reasons.

2018–08–22 Introduction to course.
2 2018–08–26 Assignment due CTTC1 by 11:59pm
2018–08–27 Analytics: Almost There: A Study on Quasi-Contributors in Open Source Software Projects.
Automation: DeFlaker: Automatically Detecting Flaky Tests.
Presented by: Dav, Nac
2018–08–28 Assignment due CTTC2 by 11:59pm
2018–08–29 Analytics: Identifying Features in Forks.
Automation: Enlightened Debugging.
Presented by: Zha, Ste
3 2018–09–04 Assignment due CTTC3 by 11:59pm
2018–09–05 Continuous Integration: Studying the Impact of Adopting Continuous Integration on the Delivery Time of Pull Requests
Automation: I’m Leaving You, Travis: A Continuous Integration Breakup Story
Presented by: Nad, Fu
4 2018–09–11 Assignment due CTTC4 by 11:59pm
2018–09–12 Analytics on Regular Expressions
Exploring regular expression usage and context in Python
Exploring regular expression comprehension
How Well Are Regular Expressions Tested in the Wild?
Presented by: Ibr, Ela, Kha
5 2018–09–18 Assignment due CTTC5 by 11:59pm
2018–09–19 Mining Software Repositories, Mining Challenge
Mining Challenge Dataset 2019: SOTorrent: reconstructing and analyzing the evolution of stack overflow posts
Mining Challenge Paper 2018: A Study on the Use of IDE Features for Debugging
Mining Challenge Winner 2017: How Does Contributors’ Involvement Influence the Build Status of an Open-Source Software Project?
Mining Challenge Winner 2016: Judging a commit by its cover: Correlating commit message entropy with build status on Travis-CI
Presented by: Son, Mic, Jin, Cha
6 2018–09–25 Assignment due Project proposal report and slides by 11:59pm
2018–09–26 Project proposal presentations.
7 2018–10–02 Assignment due CTTC6 by 11:59pm
2018–10–03 Modern Code Review (see Files/Readings folder in Canvas) Presented by: She, Ani, Lia
8 2018–10–09 Assignment due CTTC7 and Project milestone report 1 by 11:59pm
2018–10–10 Software Development Expertise
Towards a Theory of Software Development Expertise
Automatically recommending code reviewers based on their expertise: an empirical comparison
Do you remember this source code?
Presented by: Yin, Pat, Nac
9 2018–10–16 Assignment due CTTC8 by 11:59pm
2018–10–17 Software Visualization
A Systematic Literature Review of Software Visualization Evaluation
Rethinking User Interfaces for Feature Location
RegViz: Visual Debugging of Regular Expressions
Visual Monitoring of Numeric Variables Embedded in Source Code
Presented by: Zha, Ste, Nad, Fu
10 2018–10–23 Assignment due Project milestone report 2 and slides by 11:59pm
2018–10–24 Mid-semester project presentations.
11 2018–10–30 Assignment due CTTC9 by 11:59pm
2018–10–31 Deep Learning in Software Engineering
code2vec: Learning Distributed Representations of Code
Deep Learning Similarities from Different Representations of Source Code
An Empirical Investigation into Learning Bug-Fixing Patches in the Wild via Neural Machine Translation
Presented by: Ibr, Ela, Kha
12 2018–11–06 Assignment due

Canceled. No readings due this week.

CTTC10 by 11:59pm

Class canceled. Instructor on travel. Work on your research projects :)

Presented by: t.b.d.
13 2018–11–13 Assignment due CTTC11 and Project milestone report 3 by 11:59pm
2018–11–14 Test Driven Development
A Dissection of the Test-Driven Development Process: Does It Really Matter to Test-First or to Test-Last?
Analyzing the effects of test driven development in GitHub
What Do We (Really) Know about Test-Driven Development?
(Optional) On the effectiveness of unit tests in test-driven development
Presented by: Jin, Dav, Mic
14 2018–11–21

Thanksgiving. No class.

15 2018–11–27 Assignment due CTTC12 by 11:59pm
2018–11–28 Automated Program Repair
Repairing Programs with Semantic Code Search
Current challenges in automatic software repair
Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair
(Optional) How to Design a Program Repair Bot? Insights from the Repairnator Project
Presented by: She, Ani, Lia
16 2018–12–04 Assignment due Final project report and slides by 11:59pm
2018–12–05 Final project presentations.


Virginia Tech Honor Code:

The work you turn in must be your own. Consequences of cheating in this class: a letter in your academic file, and the course grade is lowered, most likely to F. Material that is copied from books or Web pages needs to be quoted and the source must be given. If you plagiarize, you run the severe risk of failing the class, in a most disgraceful manner.


If you need special accommodations, please contact the instructor during the first week of classes.