CI持续集成

什么是持续集成系统

When developing software, we want to be able to verify that our new features or bug fixes are safe and work as expected. We do this by running tests against our code. Sometimes, developers will run tests locally to verify that their changes are safe, but developers may not have the time to test their code on every system their software runs in. Further, as more and more tests are added the amount of time required to run them, even only locally, becomes less viable. Because of this, continuous integration systems have been created.

当我们开发软件的时候,我们想要能够验证我们的新功能或者是修复的bug能够安全有效的想预期那样运行.我们通过重新测试我们的代码可以实现.有的时候,开发者会在本地来验证他们的代码的变更是否安全,但是开发者没有时间去测试他们的代码能否在所有可能的系统里面运行.进一步讲,需要的测试用例越多,就需要更多的事件去测试,即使是在本地,也只是很少的.因此,持续集成系统被创建了.

Continuous Integration (CI) systems are dedicated systems used to test new code. Upon a commit to the code repository, it is the responsibility of the continuous integration system to verify that this commit will not break any tests. To do this, the system must be able to fetch the new changes, run the tests and report its results. Like any other system, it should also be failure resistant. This means if any part of the system fails, it should be able to recover and continue from that point.

Continuous Integration (CI) systems主要集中于测试新的代码.当你提交代码到仓库的时候,持续集成系统应该验证这次提交是否应该测试.要做到中诶但,系统必须能够获取到新的变更,能够得到测试的结果和报告.像其他的系统一样,也是有容错机制的,这意味着你可以恢复到某个个点.

This test system should also handle load well, so that we can get test results in a reasonable amount of time in the event that commits are being made faster than the tests can be run. We can achieve this by distributing and parallelizing the testing effort. This project will demonstrate a small, bare-bones distributed continuous integration system that is designed for extensibility.

这个测试系统也应该处理负载,这样我们可以得到测试结果在合理的时间内提交的事件是由速度比测试可以运行。我们可以实现通过分配和并行测试工作。这个项目将展示一个小,持续集成的分布式系统设计的可扩展性。

Project Limitations and Notes

Due to the limitations of code length and unittest, I simplified test discovery. We will only run tests that are in a directory named tests within the repository.

Continuous integration systems monitor a master repository which is usually hosted on a web server, and not local to the CI’s file systems. For the cases of our example, we will use a local repository instead of a remote repository.

Continuous integration systems need not run on a fixed, regular schedule. You can also have them run every few commits, or per-commit. For our example case, the CI system will run periodically. This means if it is set up to check for changes in five-second periods, it will run tests against the most recent commit made after the five-second period. It won’t test every commit made within that period of time, only the most recent one.

This CI system is designed to check periodically for changes in a repository. In real-world CI systems, you can also have the repository observer get notified by a hosted repository. Github, for example, provides “post-commit hooks” which send out notifications to a URL. Following this model, the repository observer would be called by the web server hosted at that URL to respond to that notification. Since this is complex to model locally, we’re using an observer model, where the repository observer will check for changes instead of being notified.

CI systems also have a reporter aspect, where the test runner reports its results to a component that makes them available for people to see, perhaps on a webpage. For simplicity, this project gathers the test results and stores them as files in the file system local to the dispatcher process.

Note that the architecture this CI system uses is just one possibility among many. This approach has been chosen to simplify our case study into three main components.

这个项目使用Git存储库的需要测试的代码。只有将使用标准源代码管理调用,所以如果你不熟悉Git但熟悉其他版本控制系统(VCS)像svn或Mercurial,你仍然可以跟随。
由于代码长度和unittest的局限性,我简化测试发现。我们只会运行测试在一个目录中指定的测试库。
持续集成系统监控主存储库通常驻留在一个web服务器,而不是当地的CI的文件系统。的情况下,我们的示例中,我们将使用一个本地存储库,而不是一个远程存储库。
持续集成系统不需要在一个固定的运行,定期。你也可以让他们每隔几提交运行,或per-commit。在我们的示例中,CI系统会周期性地运行。这意味着如果是建立在五秒钟的时间检查更改,它将运行测试对最近提交后五秒的时间。它不会测试每一个提交了在这段时间内,只有最近的一个。
这个CI系统旨在定期检查存储库的变化。在现实世界的CI系统中,也可以存储库观察者通过托管库得到通知。Github,例如,提供“post-commit钩子”URL发送通知。在这个模型中,存储库观察者将由web服务器托管调用该URL通知的回应。因为这是在本地复杂的模型,我们使用观察者模式,观察者会检查存储库的变化而不是通知。
CI系统也有一个记者方面,测试运行器组件使得他们报道了他们的研究成果可供人们看到的,也许在一个网页。为简单起见,本项目收集测试结果并将它们存储在本地文件系统的文件调度过程。
注意,架构这个CI系统使用只是一个在许多可能性。这种方法已被选为简化我们的案例研究分成三个主要组件。

Introduction

The basic structure of a continuous integration system consists of three components: an observer, a test job dispatcher, and a test runner. The observer watches the repository. When it notices that a commit has been made, it notifies the job dispatcher. The job dispatcher then finds a test runner and gives it the commit number to test.

There are many ways to architect a CI system. We could have the observer, dispatcher and runner be the same process on a single machine. This approach is very limited since there is no load handling, so if more changes are added to the repository than the CI system can handle, a large backlog will accrue. This approach is also not fault-tolerant at all; if the computer it is running on fails or there is a power outage, there are no fallback systems, so no tests will run. The ideal system would be one that can handle as many test jobs as requested, and will do its best to compensate when machines go down.

To build a CI system that is fault-tolerant and load-bearing, in this project, each of these components is its own process. This will let each process be independent of the others, and let us run multiple instances of each process. This is useful when you have more than one test job that needs to be run at the same time. We can then spawn multiple test runners in parallel, allowing us to run as many jobs as needed, and prevent us from accumulating a backlog of queued tests.

In this project, not only do these components run as separate processes, but they also communicate via sockets, which will let us run each process on a separate, networked machine. A unique host/port address is assigned to each component, and each process can communicate with the others by posting messages at the assigned addresses.

This design will let us handle hardware failures on the fly by enabling a distributed architecture. We can have the observer run on one machine, the test job dispatcher on another, and the test runners on another, and they can all communicate with each other over a network. If any of these machines go down, we can schedule a new machine to go up on the network, so the system becomes fail-safe.

This project does not include auto-recovery code, as that is dependent on your distributed system’s architecture, but in the real world, CI systems are run in a distributed environment like this so they can have failover redundancy (i.e., we can fall back to a standby machine if one of the machines a process was running on becomes defunct).

For the purposes of this project, each of these processes will be locally and manually started distinct local ports.

持续集成系统的基本结构由三部分组成:一个观察者,一个测试作业调度器,一个测试运行器。观察者观测代码仓库库。当它发现一个commit时,它通知作业调度器。作业调度器然后发现一个测试运行器,使其提交测试。
构建CI系统的方法有很多。我们可以有观察者,调度器和runner是相同的进程在同一台计算机上。这种方法是非常有限,由于没有对负载进行处理,如果大量的更改添加到库,CI系统将积累大量积压。这种方法也不是容错;如果计算机上运行发生故障或停电,没有后备系统,所以没有测试运行。理想的系统是一个能够处理请求尽可能多的测试工作,并且能够容灾。
建立一个CI系统容错和承载,在这个项目中,每一个组件都是自己的过程。这将让每个过程是独立于他人,让我们每个流程的多个实例运行。这是有用的,当你有不止一个的测试工作,需要同时运行。我们可以产生多个并行测试,让我们尽可能多的工作需要,并阻止我们积累的排队测试。
在这个项目中,不仅这些组件作为独立进程运行,但他们也通过套接字进行通信,这将让我们每个进程运行在一个单独的、网络化的机器。独特的主机/端口地址是分配给每个组件,和每个进程可以与别人交流,发布信息的分配地址。
这个设计将让我们处理硬件故障动态通过启用一个分布式架构。我们可以有观察者在一台机器上运行,测试作业调度器,测试运行在另一个,他们都能通过网络相互通信。如果这些机器,我们可以安排一个新机器在网络上,所以这个系统就自动防故障装置。
这个项目不包括自动恢复代码,因为这是依赖于分布式系统的架构,但在现实世界中,这样的CI系统是运行在分布式环境中,这样他们就可以有故障转移(即冗余。,我们可以回到一个备用计算机如果其中一个机器运行过程成为破产)。
对于本项目,这些过程将在本地和手动开始不同的当地的端口。

Files in this Project

This project contains Python files for each of these components: the repository observer \newline (repo_observer.py), the test job dispatcher (dispatcher.py), and the test runner \newline (test_runner.py). Each of these three processes communicate with each other using sockets, and since the code used to transmit information is shared by all of them, there is a helpers.py file that contains it, so each process imports the communicate function from here instead of having it duplicated in the file.

There are also bash script files used by these processes. These script files are used to execute bash and git commands in an easier way than constantly using Python’s operating system-level modules like os and subprocess.

Lastly, there is a tests directory, which contains two example tests the CI system will run. One test will pass, and the other will fail.

这个项目包含为每个这些组件:Python文件存储库观察者(repo_observer.py),测试作业调度器(dispatcher.py),并测试运行器(test_runner.py)。这三个过程相互通信使用套接字,因为用于传输信息的代码是由所有人共享helpers.py文件,其中包含它,所以每个流程导入函数从这里交流而不是复制的文件。
也有这些进程使用的bash脚本文件。这些脚本文件用于执行bash和git命令在一个更简单的方法比不断使用Python的操作系统级模块如操作系统和子流程。
最后,有一个测试目录,它包含了两个示例测试CI系统将运行。一个测试能通过,另一个就会失败。

Inital Setup

While this CI system is ready to work in a distributed system, let us start by running everything locally on one computer so we can get a grasp on how the CI system works without adding the risk of running into network-related issues. If you wish to run this in a distributed environment, you can run each component on its own machine.

Continuous integration systems run tests by detecting changes in a code repository, so to start, we will need to set up the repository our CI system will monitor.

Let’s call this test_repo:

$ mkdir test_repo
$ cd test_repo
$ git init
This will be our master repository. This is where developers check in their code, so our CI should pull this repository and check for commits, then run tests. The thing that checks for new commits is the repository observer.

The repository observer works by checking commits, so we need at least one commit in the master repository. Let’s commit our example tests so we have some tests to run.

Copy the tests folder from this code base to test_repo and commit it:

$ cp -r /this/directory/tests /path/to/test_repo/
$ cd /path/to/test\_repo
$ git add tests/
$ git commit -m ”add tests”

Now you have a commit in the master repository.

The repo observer component will need its own clone of the code, so it can detect when a new commit is made. Let’s create a clone of our master repository, and call it test_repo_clone_obs:

1	$ git clone /path/to/test_repo test_repo_clone_obs

The test runner will also need its own clone of the code, so it can checkout the repository at a given commit and run the tests. Let’s create another clone of our master repository, and call it test_repo_clone_runner:

1	$ git clone /path/to/test_repo test_repo_clone_runner

虽然这CI系统准备工作在分布式系统中,让我们从一台计算机上本地运行一切我们可以得到一个对CI系统如何工作在不增加的风险跑到网络相关问题。如果你想运行在分布式环境中,您可以运行自己的机器上的每个组件。
持续集成系统通过检测运行测试代码库的变化,所以开始,我们将需要设置存储库CI系统将监测。