Everything You Need to Know About Openjdk's Move to Git and Github
Total Page:16
File Type:pdf, Size:1020Kb
Menu Topics Archives Downloads Subscribe Everything you need to know JAVA 17 about OpenJDK’s move to Git and GitHub Everything you need to know Blame or thank BitKeeper about OpenJDK’s move to Git Why not Mercurial? and GitHub Why Git? Why GitHub? Why the move, and why now? The move from Mercurial to Git Getting the source code and provided an opportunity to consolidate building the OpenJDK the source code repositories. Conclusion by Ian Darwin Dig deeper May 14, 2021 Download a PDF of this article Have you ever built your own Java Development Kit from source? Most end users of the JDK will not need to build their own JDK from the Oracle source code. I’ve needed to do that only a few times when I was running on the OpenBSD UNIX-like system, which is not one of the three supported platforms. Sure, you might want to build your own JDK to try out a new feature that you think should be added to Java. You might choose to build from source to be sure you are running a more trustworthy binary. Having the complete source code readily available, and now in a more commonly used download format, means it is easier than ever to build your own JDK. Yes, it’s a better-documented, easily configured process than in the past. But it’s still a bit confusing. The source code for the OpenJDK recently moved from the Mercurial version control system (VCS) to the Git VCS and the GitHub repository system, and that’s probably a good thing. Here’s some history, why it matters, and what you need to know. Blame or thank BitKeeper If it weren’t for Larry McVoy, developers might not have Git, or GitHub, or Mercurial. Larry McVoy worked at Sun Microsystems from 1988 to 1994. While there, he presciently suggested that Sun’s UNIX (Solaris) operating system and Novell should be merged and open sourced to head off the growth of Microsoft Windows NT (and Linux). Sun’s management didn’t listen. McVoy also developed a code repository and version control system called TeamWare. When McVoy left Sun, he expanded that work into a new version control system called BitKeeper. It was much faster at dealing with large codebases and large changes than the two main open source version control systems then in use: Concurrent Versions System (CVS) and Subversion. BitKeeper was a commercial software product by McVoy’s company, BitMover. He gave a free license to the Linux kernel project, with certain conditions, one being that there would be no reverse engineering of the product. There was consternation among the Linux community over Linus Torvalds' willingness to use a closed source tool for maintaining the open source Linux kernel, but it was the best tool for the job, and the angst merely simmered on the back burner for a few years. Eventually Andrew Tridgell, known as the creator of the Samba project (the Microsoft network file system compatibility for UNIX and Linux), tried to reverse-engineer the BitKeeper network protocols to be able to make an open source client for the server. This action, whether right or wrong, precipitated a rift in the space-time continuum: McVoy completely and abruptly rescinded the Linux project’s license for BitKeeper, bringing kernel development to an abrupt halt. This action precipitated the development of both Git and Mercurial, as you can read about in Zack Brown’s “A Git origin story” in Linux Journal. Here’s the short version. Torvalds created his own VCS, Git, which he says he named after himself: For nonnative speakers of English, a “git” is defined as “an unpleasant, contemptible, or frustratingly obtuse person.” The Git VCS took over support of the Linux kernel in a matter of weeks. At about the same time, Matt Mackall started writing another VCS, Mercurial. He claims to have named it Mercurial after McVoy’s temperament, given how abruptly Linux’ BitKeeper license was canceled. Both Git and Mercurial offer similar features and performance. But over time, Git has largely won over the hearts and minds of developers. Mercurial remains in use but is far less popular. Git is preinstalled on many operating system distributions, and it can easily be added to most others, using rpm, dnf, pkg_add, or whatever standard tool you use for installing third-party software. On systems that don’t include such a tool (such as Windows), you can download the binary, or download Git’s source code and build it yourself. Why not Mercurial? One of the largest projects using Mercurial was Sun’s Java OpenJDK, spread over multiple repositories. There is nothing inherently wrong with Mercurial. It’s just that Mercurial has been seen as an also-ran for decades, and many developers simply skipped it unless they were really dedicated to some project that used it. There are still many projects using it, though; there’s a list at Mercurial’s website that somewhat is out of date (it still shows Java). Interestingly, that page ends with a link to a list of projects that moved from Mercurial to Git. Why Git? Why GitHub? Git is now almost the de facto standard in this area; it’s hard to get a job as a developer unless you are comfortable using Git. Git offers branching, letting developers work on multiple sets of changes at the same time, and merging, pulling multiple branches into another branch, typically the main one. A related platform is GitHub, a cloud-based Git service with plenty of additions. You create an account and then create any number of Git repositories. Repositories can be public (for open source or open info) or private. GitHub is free for individuals and reasonably priced for organizations. To see what GitHub looks like, check out one of my smaller repositories, which is for PdfShow, a tiny PDF-based presentation tool, or one a bit larger, javasrc, which has sample Java program fragments (including the code samples from my Java Cookbook). There are several advantages of GitHub over plain or do-it- yourself Git hosting. A comprehensive web interface. The GitHub web interface has been smoothed over by many talented hands, and it works well. The browser interface allows you to view repositories’ contents and history, view individual files, and edit and commit text files. Issues processing. GitHub provides a bug tracking facility called Issues, since not all things that need attention are bugs. You can tag issues as bugs, enhancements, or a dozen other tags, and you can even define your own tags. In a team environment, the repository’s administrator can assign responsibility for an issue to specific individuals. Pull requests. A pull request is a submission of a diff, or a list of changes, to the owner of the repository, to pull changes into the main branch of a repository. This is very powerful because, unlike mailing diff listings around, a pull request is aware of other activity on the repository and will tell the owner if the diff would still apply cleanly. The pull request allows other developers to comment and it doesn’t get mangled by different email systems’ treatment of line endings, spaces, and tabs. All a pull request’s changes do not have to be in the same repository; in fact, they’re commonly not. A person making changes would typically fork (that is, make their own copy of) a repository, work on that fork, test the changes, and then send a pull request to the admin of the original repository. That admin could accept the pull request to merge the code in, offer comments on how to improve the code, or outright reject code. Social coding. These features add up to one of GitHub’s mottoes: social coding. Teams or individual developers can work together without being in the same office or even the same organization. Social coding was big before the pandemic, and of course it’s even bigger now. Actions. In GitHub, actions let you perform one or more workflow-related actions when commits are made. An action can be as simple or complex as you need: from something as simple yet important as building and running tests up to building a Docker image and deploying it at scale with Kubernetes or some other technology. If you are logged in and on your own repository’s main page, GitHub has a list of the two dozen or so available ready-to-configure actions under the Actions tab. There is also a public list of more than 8,300 prebuilt actions. As you can see, there are plenty of reasons why the OpenJDK project chose Git and GitHub. Why the move, and why now? Before the move from Mercurial, the OpenJDK lived in several repositories. It was decided that it would be best to consolidate into a single repository. At the time this move was first planned—in the Java 9 era—the JDK consisted of eight different repositories. To build, a developer had to download all eight repositories all in the correct directory structure. However, on Mercurial, commits that involved more than one repository were common, but there was no mechanism to enforce atomicity or consistency. To quote from JEP 296: Consolidate the JDK forest into a single repository, created in 2016 The multiplicity of repos presents a larger than necessary barrier to entry to new developers and has led to workarounds such as the “get source” script. The following list of completed JEPs and project documents provides insight into the four major stages of the GitHub consolidation and migration: 2016: JEP 296: Consolidate the JDK forest into a single repository 2018: Project Skara: Evaluate alternatives to Mercurial 2019: JEP 357: Migrate from Mercurial to Git 2019: JEP 369: Migrate to GitHub The team has succeeded admirably: The JDK is now built out of a single repository.