解决存储迁移导致的Gitlab Runner问题

背景

组内使用github-runner进行代码规范检测,最近运维进行存储切换导致代码检测运行失败,之前找其他人安装的,年久无人维护只能自己上,记录整个过程。本文默认大家了解gitlab-runner的基础知识,不了解的可以先阅读参考资料中的文章。

问题排查

我们代码检测主要用的cppcheck和cpplint,查看Github-Runner jobs报错,发现是对应的镜像文件找不到,首先恢复镜像

1
2
3
4
Preparing the "docker" executor
00:48
Using Docker executor with image 10.1.107.12:5000/dy/cppcheck ...
ERROR: Preparation failed: adding cache volume: set volume permissions: running permission container "a192958483eb385c5a19432a82b4bfd20d54a7a7cd28a35e3c3f85938bc8ab31" for volume "runner-bcvnjjb9-project-11377-concurrent-0-cache-3c3f060a0374fc8bc39395164f415a70": starting permission container: Error response from daemon: error evaluating symlinks from mount source "/dy_video1/docker-data/volumes/runner-bcvnjjb9-project-11377-concurrent-0-cache-3c3f060a0374fc8bc39395164f415a70/_data": lstat /dy_video1/docker-data/volumes/runner-bcvnjjb9-project-11377-concurrent-0-cache-3c3f060a0374fc8bc39395164f415a70: no such file or directory (linux_set.go:105:5s)

docker 源

docker 官方源被墙,调整为国内源

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
sudo mkdir -p /etc/docker
sudo vim /etc/docker/daemon.json
{
    "registry-mirrors": [
        "https://do.nark.eu.org",
        "https://dc.j8.work",
        "https://docker.m.daocloud.io",
        "https://dockerproxy.com",
        "https://docker.mirrors.ustc.edu.cn",
        "https://docker.nju.edu.cn"
    ]
}
sudo systemctl daemon-reload
sudo systemctl restart docker

恢复镜像

通过centos镜像安装cppcheck和cpplint,安装完成后导出镜像,重新导入并修改名称

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
### 下载centos镜像
docker pull centos:7
### 启动centos容器,进入安装环境cppcheck cpplint
docker run -it centos:7 /bin/bash
### 更换yum 源
curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
### 安装基础编译工具
yum install wget gcc gcc-c++ make cmake
### 下载cppcheck源码编译
wget -O cppcheck.tar.gz https://github.com/danmar/cppcheck/archive/refs/tags/2.12.1.tar.gz
tar -xvzf cppcheck.tar.gz
cd cppcheck-2.12.1 && \
    mkdir build && \
    cd build && \
    cmake .. && \
    cmake --build . && \
    make install SRCDIR={解压的cppcheck路径}/cppcheck-2.12.1/build CFGDIR={解压的cppcheck路径}/cppcheck-2.12.1/cfg FILESDIR=/usr/bin
ln -s /usr/local/bin/cppcheck /usr/bin/cppcheck
## 安装cpplint,注意需先安装python3 版本需要3.8及以上
yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gcc make
yum install libffi-devel -y
wget https://www.python.org/ftp/python/3.8.12/Python-3.8.12.tgz
tar -xzvf Python-3.8.12.tgz
cd Python-3.8.12/
./configure
make&&make install
rm -rf /usr/bin/python
ln -s /usr/local/bin/python3 /usr/bin/python
pip3 install cpplint -i https://pypi.tuna.tsinghua.edu.cn/simple
### 退出容器
exit
### 查看容器ID
docker ps -a
### 容器导出
docker export 79362966cbdc > container.tar
### tar导入为镜像命名为10.1.107.12:5000/dy/cppcheck
docker import container.tar 10.1.107.12:5000/dy/cppcheck

Gitlab-Runner 注册

在下载centos镜像操作中,走了很多弯路,最大的弯路是设置docker代理,提高docker pull速度,由于误操作,导致GitLab-Runner镜像丢失

docker运行Gitlab-Runner
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20

### 下载gitlab-runner 镜像,alpine小一点
docker pull gitlab/gitlab-runner:alpine 
### 使用本地系统卷挂载,启动 Runner 容器
docker run -d --name gitlab-runner --restart always   -v /srv/gitlab-runner/config:/etc/gitlab-runner   -v /var/run/docker.sock:/var/run/docker.sock   gitlab/gitlab-runner:alpine
### 进入容器执行注册
docker exec -it gitlab-runner bash
### 重新注册,git地址和REGISTRATION_TOKEN 在gitlab设置Settings CI/CD中查看
sudo gitlab-runner register --url https://XXXX/ --registration-token $REGISTRATION_TOKEN
> Please enter the gitlab-ci coordinator URL (e.g. https://gitlab.com ) ###直接回车
> Please enter the gitlab-ci token for this runner #### 直接回车
> Please enter the gitlab-ci description for this runner ### 随意写个描述
> Please enter the gitlab-ci tags for this runner (comma separated): ### 定义个tag,我用的
ai-bus1
> Please enter the executor: ssh, docker+machine, docker-ssh+machine, kubernetes, docker, parallels, virtualbox, docker-ssh, shell:
docker
> Please enter the Docker image (eg. ruby:2.1):
alpine:latest
### 退出容器
exit 

再次报错

  1. 勾选Indicates whether this runner can pick jobs without tags
    1. Run untagged jobs
  2. project拉取新分支测试,测试报错
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
WARNING: Failed to pull image with policy "always": Error response from daemon: Get "http://10.1.107.12:5000/v2/": dial tcp 10.1.107.12:5000: connect: connection refused (manager.go:237:0s)
ERROR: Job failed: failed to pull image "10.1.107.12:5000/dy/cppcheck" with specified policies [always]: Error response from daemon: Get "http://10.1.107.12:5000/v2/": dial tcp 10.1.107.12:5000: connect: connection refused (manager.go:237:0s)

预计是docker运行gitlab-runner未绑定端口导致,查看.gitlab-ci.yml文件中image字段是10.1.107.12:5000/dy/cppcheck,我采用解决方案是修改镜像名

#### 修改gitlab-ci.yml文件中image字段
image:codecheck:v1
#### 删除10.1.107.12:5000/dy/cppcheck 镜像
docker rmi 10.1.107.12:5000/dy/cppcheck
#### 导入container.tar命名为codecheck:v1
docker import container.tar codecheck:v1

继续报错

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
WARNING: Failed to pull image with policy "always": Error response from daemon: pull access denied for codecheck, repository does not exist or may require 'docker login': denied: requested access to the resource is denied (manager.go:237:2s)
ERROR: Job failed: failed to pull image "codecheck:v1" with specified policies [always]: Error response from daemon: pull access denied for codecheck, repository does not exist or may require 'docker login': denied: requested access to the resource is denied (manager.go:237:2s)

### 修改宿主机中gitlab-runner配置
vim /etc/gitlab-runner/config.toml  ### 增加pull_policy = "never"
#### 修改容器中gitlab-runner配置
docker exec -it gitlab-runner bash
vim /etc/gitlab-runner/config.toml ### 增加pull_policy = "never"
#### 修改参考
[[runners]]
  name = "XXXX"
  url = "https://gitlab.com/"
  id = 412
  token = "XXXX"
  token_obtained_at = 2023-10-19T02:08:04Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "docker"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.docker]
    tls_verify = false
    image = "alpine:latest"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0
    pull_policy = "never"
### pull_policy参数作用
pull_policy = “never”      ### 只能使用 Runner 所在的 Docker 主机上提取过的本地镜像
pull_policy = “if-not-present”  ### Runner 将首先检查映像是否在本地存在。如果是,则使用图像的本地版本
pull_policy = “always”     ### 默认拉取策略 (未设置 pull_policy 执行默认拉取策略),去拉取公网上的镜像

再次测试,测试通过,问题修复

后续

写完本文后,下班回到家洗澡时,思考这个问题的原因,因运行以前注册的runner提示的错误,与倒数第二个错误类似,怀疑本次问题修复,主要原因是镜像的缺失和运行未绑定端口导致。关闭gitlab-runner docker 容器和gitlab中ai-bus1 runner,重新测试测试运行无误,说明只重新安装cpplint/cppcheck镜像和修改image字段镜像名,即可修复此问题。

参考资料

updatedupdated2024-11-132024-11-13