Fall 2020. CS6704: Software Engineering Analytics and Automation

Lectures: Tuesdays, 9:30am – 12:00pm
Instructor: Francisco Servant [fservant@vt.edu]
Instructor Office hours: By email appointment
Force-add Process: Please, contact the instuctor on the first day of class.

Description

Software engineering is a highly complex endeavor. Software engineers today perform complex tasks that involve the creative creation and comprehension of complex information, such as: code structure, implementation rationale, dynamic software behavior, change implications, and development team dynamics. At the same time, software engineers also spend large amounts of time performing mechanical work — dealing with complex software development tools — to be able to perform their tasks.

The field of software engineering analytics and automation aims to improve the software engineering discipline in two ways: first, by performing analytics on the software development process to achieve an understanding of the problematic software engineering tasks and their efficiency bottlenecks; second, by developing automatic techniques that leverage such understanding to support software developers in performing their tasks more effectively and efficiently.

In this course, we will discuss the state-of-the-art research works in software engineering analytics and automation. The students will also work on research projects to develop novel techniques in this area.

Prerequisites

Graduate or senior standing in the Department of Computer Science and a prior course in software engineering, e.g., CS 4704.

Readings

The majority of the readings in the course will be papers available through the IEEE or ACM Digital Libraries. The instructor will provide links to all the necessary readings.

Grading and Evaluation

Distribution of points:

There are three elements to your grade: a research project, short paper critiques (CTTC), and class attendance and participation. Each assignment will contribute to a different percentage of your grade:

Class Participation (only if you speak in class) 15%
Paper Presentations 15%
Cut-to-the-chase (CTTC) critiques 20%
Research Project 50%

Submission guidelines:

Assignments must be submitted as follows:

  • through Canvas. Use the folder for the corresponding assignment
  • as a single PDF (Adobe Acrobat) document
  • by 11:59pm Eastern time on the due date

Late submission policy:

Assignments will be turned in electronically through Canvas. Unexcused late assignments will be penalized 1% per 1-hour period.

Grading scale:

≥ 93% → A ≥ 83% → B ≥ 73% → C ≥ 63% → D
≥ 90% → A- ≥ 80% → B- ≥ 70% → C- ≥ 60% → D-
≥ 87% → B+ ≥ 77% → C+ ≥ 67% → D+ ≥ 0% → F

Class meetings

On every class meeting, we will discuss 3 research papers. Each paper will be presented in depth by a student (~25 minutes), followed by a discussion led by the same student (~20 minutes), followed by a 5-minute break. To cover the most recent research work, the list of papers will be updated throughout the semester.

Class participation

You only get credit for class participation if you speak during the discussion.

Paper presentations

Each student will present a total of 2 papers. The first 33 papers will be presented and discussed in class. The remaining ones will be presented in videos for the other students to watch on their own time.

Each presentation will last 25 minutes, and should include the sections listed below. The week that you present, you will not write paper critiques for any of the papers in that week. Instead, you will submit your presentation slides in Canvas. You only submit presentation slides for the week that you present.

After presentation, you will lead a 20-minute discussion about the paper. You should prepare questions in advance for the discussion, e.g., things in which you disagree with the authors, on which you would like to hear others’ opinions.

We will have a 5-min break between papers.

Sections to include in your presentation

Explain each section in detail, so that we can discuss it.

  1. Problem statement. What problem is the paper addressing?
  2. Problem importance. Why is the problem important?
  3. Related Work. What are the existing solutions to this problem?
  4. Approach Insight. Why is the proposed solution different or more promising than the existing solutions?
  5. Approach. What is the proposed solution in this paper?
  6. Evaluation. How did they evaluate that the proposed solution in fact improves the state of the art?
  7. Conclusion. To what extent does the proposed solution improve the state of the art? What did we learn in this paper?
  8. Discussion points. What are the points that you would like to discuss?

Students will present in this order

Ta, Af

Al, Wa

Ha, Sk

Ch, La

Sr, Ra

Wa, Ha

Ch, Po

Zu, Om

Hu, Yu

Do, Mo

Kh, Re

Ch, Ti

Ch, Yi

Yo, Ra

Cut-to-the-chase (CTTC) critiques

You will write a short paper critique (CTTC) for each of the 33 papers that we will discuss in class (except for the weeks that you present a paper). The short paper critiques will allow you to demonstrate that you’ve read and thought about the assigned readings. Readings are assigned, as shown in the schedule, each week of the class. You will write a short critique and analysis of each week’s papers. Since multiple papers are assigned, you’ll have to learn how to present incisive, cut-to-the-chase (CTTC) analyses in few words.

I am not interested in reading a paraphrase of each paper’s abstract. In other words, you should not include in your CTTC the answer to the questions that will be answered in the weekly paper presentations.

I am interested in reading your own assessment of each paper: what points do you believe to be the important ones? Do you believe those points? Why or why not? What points did the author(s) not address that they should have? Try to mostly use sentences like “I agree with…”, “I think that…”, “In my opinion…”, “I disagree that…”. More importantly, back up your comments with the reason behind your comments — i.e., adding “because…” to the former sentences. Also, the best comments are those that inspire a whole new research project. Try hard to think about those as well. What’s next after this paper? How did it inspire you to take this work further?

Each CTTC critique will talk about the papers that will be discussed the following day in class.

Each week you will submit a single PDF file with your critique for every paper assigned for that week. Your comments should use at least half a page of a letter-sized document for each assigned paper. You can use any format that you want, but keep in mind that the grade for your CTTC will be awarded for the quality and quantity of your comments in your critique.

Research project

You will work on a research project related to the topics covered in class. You may work individually or in groups of up to three people. Groups are expected to deliver a contribution that is as many times as substantial as that delivered by individuals.

You are expected to design your own research project or select one of the projects proposed by the instructor (see the “Research Projects” file in Canvas). Another approach that I encourage every year is to design a project that fits the Mining challenge of the 2021 International Conference of Mining Software Repositories. This may allow you to even get a publication out of the research project in this class. Many students of this class have published their project in this conference in previous years. The details of this year’s competition may not be available yet, but you can take a look at last year’s details to get a sense: 2020 International Conference of Mining Software Repositories, Mining Challenge. Finally, if you define your own project, you should discuss it with the instructor during office hours before writing your proposal report.

You are encouraged to use Github for your source code and project report documents. Add the instructor (username fservant) as your project member to facilitate feedback.

All your reports should follow the IEEE formatting instructions IEEE Conference Proceedings Formatting Guidelines (title in 24pt font and full text in 10pt type, LaTeX users must use \documentclass[10pt,conference]{IEEEtran} without including the compsoc or compsocconf options).

Research project deliverables (and project grade percentage):

  • Project proposal report (10%)

    • Your project proposal should be at least 2 pages, and it should include:
      • Introduction: describing the problem, why the problem is important, your proposed solution, and how you expect your solution to improve the state of the art.
      • Related work: describing other published research related to the project that you are proposing. Other research can be related because: it motivates the need for your project, it tries to solve the same problem in a different way, it uses your proposed technique for other problems, etc.
  • Project proposal presentation (10%)

    • Your project proposal presentation should include the same information as your project proposal report, but summarized in a lightning talk.
  • Project milestone reports (x2, 15% each)

    • Your project milestone report should be at least 1 full page, and it should include:
      • Progress made since the last project report
      • Progress commitment for the next project report
      • Potential roadblocks that you may encounter in the remaining weeks, and how you expect to overcome them
  • Mid-semester project presentation (10%)

    • Your project progress presentation should include:
      • Problem description (updated from your project proposal)
      • Solution description (updated and extended from your project proposal)
      • Progress made so far (since your project proposal)
      • Planning for the remaining weeks
      • Potential roadblocks that you may encounter in the remaining weeks, and how you expect to overcome them
  • Final project report (30%)

    • Your final project report should be a minimum of 8 pages (including references) and a maximum of 10 pages (plus references). The report should include:
      • Introduction: describing the problem, why the problem is important, your proposed solution, and how you expect your solution to improve the state of the art.
      • Related work: describing other published research related to the project that you are proposing. Other research can be related because: it motivates the need for your project, it tries to solve the same problem in a different way, it uses your proposed technique for other problems, etc.
      • Approach: conceptual description of your solution
      • Implementation: technical description of the solution that you implemented
      • Evaluation: description of the experiments that you performed, and interpretation of your results.
      • Conclusion: Did you solve the problem? To what extent? For which cases?
      • References: Citations of all the research papers relevant to your project
  • Final project presentation (10%)

    • Your final project presentation should include:
      • Short problem description (updated from your project progress presentation)
      • Short solution description (updated from your project progress presentation)
      • Challenges that you resolved along the way
      • Evaluation description and interpretation of your results
      • What did you learn? What would you have done differently? What other future work could be performed, inspired by your project?

Schedule

Tip: To access the papers behind a paywall from home, you can use VT’s VPN software, or you can use a browser plugin like EZProxy.

Week Date Topic Assignment
1 08-25 Introduction to Software Engineering Analytics and Automation

Take the class survey by Aug 25 11:59pm

08-27

No class meeting

Watch video on reviewing software engineering papers
2 08-31

Assignment due

CTTC1 or presentation slides by 11:59pm

09-01 Readings 1. Empirical Software Engineering:
1. Here We Go Again: Why Is It Difficult for Developers to Learn Another Programming Language?. Presented by: Ta,Af
2. How to Not Get Rich: An Empirical Study of Donations in Open Source. Presented by: Al,Wa
3. Predicting Developers’ Negative Feelings about Code Review. Presented by: Ha,Sk
3 09-07

Assignment due

CTTC2 or presentation slides by 11:59pm

09-08 Readings 2. Machine Learning in Software Engineering:
1. A Neural Model for Generating Natural Language Summaries of Program Subroutines Presented by: Ch,La
2. On Learning Meaningful Code Changes via Neural Machine Translation Presented by: Sr,Ra
3. When deep learning met code search Presented by: Wa,Ha
4 09-14

Assignment due

CTTC3 or presentation slides by 11:59pm

09-15 List of projects released.
Readings 3: Software documentation.
1. Software Documentation Issues Unveiled Presented by: Ch,Po
2. Assessing the Quality of the Steps to Reproduce in Bug Reports Presented by: Zu,Om
3. Decomposing the Rationale of Code Commits: The Software Developers’ Perspective Presented by: Hu,Yu
5 09-21

Assignment due

CTTC4 or presentation slides by 11:59pm

09-22 Readings 4: Modern Code Review.
1. Primers or Reminders? The Effects of Existing Review Comments on Code Review Presented by: Do,Mo
2. CFar: A Tool to Increase Communication, Productivity, and Review Quality in Collaborative Code Review Presented by: Ku,Ru
3. Wait for It: Determinants of Pull Request Evaluation Latency on GitHub Presented by: Kh,Re
6 09-28

Assignment due

Project proposal report and slides by 11:59pm

09-29 Project proposal presentations.
7 10-05

Assignment due

CTTC5 or presentation slides by 11:59pm

10-06 Readings 5: Merge Conflicts.
1. Semistructured Merge in JavaScript Systems Presented by: Ch,Ti
2. Planning for Untangling: Predicting the Difficulty of Merge Conflicts Presented by: Ch,Yi
3. Understanding semi-structured merge conflict characteristics in open-source Java projects Presented by: Yo,Ra
8 10-12

Assignment due

CTTC6 or presentation slides by 11:59pm

10-13 Readings 6: Code-history Mining.
1. How different are different diff algorithms in Git? Presented by: Ta, Af
2. cregit: Token-level blame information in git version control repositories Presented by: Al, Wa
3. On tracking Java methods with Git mechanisms Presented by: Ha, Sk
9 10-19

Assignment due

Project milestone report 1 and slides by 11:59pm

10-20 Mid-semester project presentations.
10 10-26

Assignment due

CTTC7 or presentation slides by 11:59pm

10-27 Readings 7: Expertise.
1. Automatically Recommending Peer Reviewers In Modern Code Review Presented by: Ch, La
2. WhoseFault: Automatic Developer-to-Fault Assignment Through Fault-Localization Presented by: Sr, Ra
3. Continuous Incident Triage for Large-Scale Online Service Systems Presented by: Wa, Ha
11 11-02

Assignment due

CTTC8 or presentation slides by 11:59pm

11-03 Readings 08: Testing
1. Causal Testing: Understanding Defects’ Root Causes Presented by: Ch,Po
2. A Study on the Lifecycle of Flaky Tests Presented by: Zu,Om
3. DeFlaker: Automatically detecting flaky tests Presented by: Hu,Yu
12 11-09

Assignment due

(CTTC9 or presentation slides) and Project milestone report 2 by 11:59pm

11-10 Readings 09: Bugs
1. The secret life of bugs: Going past the errors and omissions in software repositories Presented by: Do,Mo
2. Why Are Some Bugs Non-Reproducible? An Empirical Investigation using Data Fusion Presented by: Kh,Re
3. Watch out for Extrinsic Bugs! A Case Study of their Impact in Just-In-Time Bug Prediction Models on the OpenStack project Presented by: Ch,Ti
13 11-16

Assignment due

CTTC10 or presentation slides by 11:59pm

11-17 Readings 10: Program synthesis
1. Interactive Program Synthesis by Augmented Examples Presented by: Ch,Yi
2. Multi-modal Synthesis of Regular Expressions Presented by: Yo,Ra
3. Automatic Repair of Regular Expressions Presented by: Ch,La
14 11-24

No class - Thanksgiving

15 11-30

Assignment due

CTTC11 or presentation slides by 11:59pm

12-01 Readings 11: Continuous integration
1. Bisecting commits and modeling commit risk during testing Presented by: Ta,Af
2. A Conceptual Replication of Continuous Integration Pain Points in the Context of Travis CI Presented by: Kh,Re
3. Understanding Build Issue Resolution in Practice: Symptoms and Fix Patterns Presented by: Al,Wa
16 12-07

Assignment due

Final project report and slides by 11:59pm

12-08 Final project presentations.

Policies

Virginia Tech Honor Code:

The work you turn in must be your own. Consequences of cheating in this class: a letter in your academic file, and the course grade is lowered, most likely to F. Material that is copied from books or Web pages needs to be quoted and the source must be given. If you plagiarize, you run the severe risk of failing the class, in a most disgraceful manner.

Accommodations:

If you need special accommodations, please contact the instructor during the first week of classes.