Cross compiling with Bazel
This time a short introduction to Bazel and how to cross-compile with this tool. I will explain how to build for host platform, as well as for multiple different targets.
You can get the partial repositories used in this exercise here.
Installing Bazelisk
I highly recommend using Bazelisk for managing your Bazel installation. TLDR:
$ wget https://github.com/bazelbuild/bazelisk/releases/download/v1.6.1/bazelisk-linux-amd64
$ sudo mv bazelisk-linux-amd64 /usr/local/bin/bazel
$ sudo chmod +x /usr/local/bin/bazel
Compiling "Hello World!"
Simplest possible structure for a C++ projects looks like this:
$ tree
.
├── BUILD
├── main.cpp
└── WORKSPACE
$ cat BUILD
cc_binary(
name = "hello",
srcs = ["main.cpp"],
)
$ cat main.cpp
#include <iostream>
int main() {
std::cout << "Hello World!" << std::endl;
}
WORKSPACE
file defines the root of the source tree. In our case it is empty, but usually it contains various definitions needed for your project to build.
In the BUILD
file resides a definition of our single target called "hello", which will be constructed from main.cpp
Everything you need to know about Bazel's terminology is here.
We can now build:
$ bazel build //:hello
2020/09/02 17:21:34 Downloading https://releases.bazel.build/3.4.1/release/bazel-3.4.1-linux-x86_64...
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Analyzed target //:hello (14 packages loaded, 47 targets configured).
INFO: Found 1 target...
Target //:hello up-to-date:
bazel-bin/hello
INFO: Elapsed time: 8.451s, Critical Path: 0.62s
INFO: 2 processes: 2 linux-sandbox.
INFO: Build completed successfully, 6 total actions
And run:
$ bazel run //:hello
...
Hello World!
or directly:
$ ./bazel-bin/hello
Hello World!
Downloading dependencies
In order to setup our cross compilation environment we need to download our toolchains. Doing this manually is tedious, fortunately Bazel already includes the necessary facilities for that.
For our needs we need to download two files: the compiler and the sysroot.
$ tree
.
├── BUILD
├── main.cpp
├── third_party
│ ├── BUILD
│ ├── deps.bzl
│ └── toolchains
│ ├── aarch64-rpi3-linux-gnu-sysroot.BUILD
│ ├── aarch64-rpi3-linux-gnu.BUILD
│ ├── arm-cortex_a8-linux-gnueabihf-sysroot.BUILD
│ ├── arm-cortex_a8-linux-gnueabihf.BUILD
│ ├── BUILD
│ └── toolchains.bzl
└── WORKSPACE
All the *.BUILD"
files contain the specification of how to use the content of the downloaded artifacts. The *.bzl
files contain Starlark code that defines the logic of downloading files and exposing them as targets.
The entry point for Bazel to know that it needs to download any external content is in the WORKSPACE
file:
$ cat WORKSPACE
load("//third_party:deps.bzl", "deps")
deps()
It says: load a function called deps from the file deps.bzl in third_party package and then call it.
Similar the deps.bzl
:
$ cat third_party/deps.bzl
load("//third_party/toolchains:toolchains.bzl", "toolchains")
def deps():
toolchains()
It provides a level of indirection to hide the details of external packages from the WORKSPACE
file.
The toolchains.bzl
on the other hand:
$ cat third_party/toolchains/toolchains.bzl
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
URL_TOOLCHAIN = "https://github.com/ltekieli/devboards-toolchains/releases/download/v2020.09.01/"
URL_SYSROOT = "https://github.com/ltekieli/buildroot/releases/download/v2020.09.01/"
def toolchains():
if "aarch64-rpi3-linux-gnu" not in native.existing_rules():
http_archive(
name = "aarch64-rpi3-linux-gnu",
build_file = Label("//third_party/toolchains:aarch64-rpi3-linux-gnu.BUILD"),
url = URL_TOOLCHAIN + "aarch64-rpi3-linux-gnu.tar.gz",
sha256 = "35a093524e35061d0f10e302b99d164255dc285898d00a2b6ab14bfb64af3a45",
)
if "aarch64-rpi3-linux-gnu-sysroot" not in native.existing_rules():
http_archive(
name = "aarch64-rpi3-linux-gnu-sysroot",
build_file = Label("//third_party/toolchains:aarch64-rpi3-linux-gnu-sysroot.BUILD"),
url = URL_SYSROOT + "aarch64-rpi3-linux-gnu-sysroot.tar.gz",
sha256 = "56f3d84c9adf192981a243f27e6970afe360a60b72083ae06a8aa5c0161077a5",
strip_prefix = "sysroot",
)
if "arm-cortex_a8-linux-gnueabihf" not in native.existing_rules():
http_archive(
name = "arm-cortex_a8-linux-gnueabihf",
build_file = Label("//third_party/toolchains:arm-cortex_a8-linux-gnueabihf.BUILD"),
url = URL_TOOLCHAIN + "arm-cortex_a8-linux-gnueabihf.tar.gz",
sha256 = "6176e47be8fde68744d94ee9276473648e2e3d98d22578803d833d189ee3a6f0",
)
if "arm-cortex_a8-linux-gnueabihf-sysroot" not in native.existing_rules():
http_archive(
name = "arm-cortex_a8-linux-gnueabihf-sysroot",
build_file = Label("//third_party/toolchains:arm-cortex_a8-linux-gnueabihf-sysroot.BUILD"),
url = URL_SYSROOT + "arm-cortex_a8-linux-gnueabihf-sysroot.tar.gz",
sha256 = "89a72cc874420ad06394e2333dcbb17f088c2245587f1147ff9da124bb60328f",
strip_prefix = "sysroot",
)
First it loads the http_archive rule, then defines a function which has a repeating pattern inside:
if "NAME_OF_THE_EXTERNAL_RESOURCE" not in native.existing_rules():
http_archive(
name = "NAME_OF_THE_EXTERNAL_RESOURCE",
build_file = Label("//path/to:buildfile.BUILD"),
url = SOME_URL,
sha256 = SOME_SHA256,
)
Which reads: if there is no such rule defined yet, then define it using the http_archive with given name, build file, URL and checksum.
A particular *.BUILD
file contains the definitions of targets coming from the downloaded artifact, for example:
$ cat third_party/toolchains/aarch64-rpi3-linux-gnu.BUILD
package(default_visibility = ['//visibility:public'])
filegroup(
name = 'toolchain',
srcs = glob([
'**',
]),
)
It specifies the default visibility of all the targets in this package to be public, and creates a target "toolchain" which is a handle to all the files inside the artifact.
We can build such a target:
$ bazel build @aarch64-rpi3-linux-gnu//:toolchain
And peek what Bazel did for us:
$ tree -L 1 bazel-02_deps/external/aarch64-rpi3-linux-gnu/
├── aarch64-rpi3-linux-gnu
├── bin
├── BUILD.bazel
├── build.log.bz2
├── include
├── lib
├── libexec
├── share
└── WORKSPACE
Bazel downloaded the artifact inside his cache, copied our aarch64-rpi3-linux-gnu.BUILD
file as BUILD.bazel
and added a WORKSPACE
file indicating that this is another source tree. We can refer to all targets inside such a package by specifying the repository name: @aarch64-rpi3-linux-gnu//:toolchain
.
Setting up custom toolchains
Bazel supports two ways of setting up custom toolchains, the legacy approach with crosstool_top
, and the new approach with platforms. We will construct our rules so that both approaches are available.
In order to cross compile cc_rules we need to run bazel with additional arguments pointing to cross compilation toolchain definition:
bazel build \
--crosstool_top=//bazel/toolchain/aarch64-rpi3-linux-gnu:gcc_toolchain \
--cpu=aarch64
//:hello
It instructs bazel to look for aarch64
toolchain in the cc_toolchain_suite rule named gcc_toolchain
located in bazel/toolchain/aarch64-rpi3-linux-gnu
package.
This file contains definitions of all tools we want to use when cross compiling:
$ cat bazel/toolchain/aarch64-rpi3-linux-gnu/BUILD
package(default_visibility = ["//visibility:public"])
load(":cc_toolchain_config.bzl", "cc_toolchain_config")
filegroup(name = "empty")
filegroup(
name = 'wrappers',
srcs = glob([
'wrappers/**',
]),
)
filegroup(
name = 'all_files',
srcs = [
'@aarch64-rpi3-linux-gnu-sysroot//:sysroot',
'@aarch64-rpi3-linux-gnu//:toolchain',
':wrappers',
],
)
cc_toolchain_config(name = "aarch64_toolchain_config")
cc_toolchain(
name = "aarch64_toolchain",
toolchain_identifier = "aarch64-toolchain",
toolchain_config = ":aarch64_toolchain_config",
all_files = ":all_files",
compiler_files = ":all_files",
dwp_files = ":empty",
linker_files = ":all_files",
objcopy_files = ":empty",
strip_files = ":empty",
)
cc_toolchain_suite(
name = "gcc_toolchain",
toolchains = {
"aarch64": ":aarch64_toolchain",
},
tags = ["manual"]
)
The filegroups are convenience wrappers for files referenced from the externally downloaded artifacts of the compiler and sysroot. This can be more granular as seen from the cc_toolchain
rule definition, but to keep it simple we will reference all files everywhere.
In order to hide from bazel where do we actually get our compiler from, we need to create some wrapper files:
$ tree bazel/toolchain/aarch64-rpi3-linux-gnu/wrappers/
bazel/toolchain/aarch64-rpi3-linux-gnu/wrappers/
├── aarch64-rpi3-linux-gnu-ar -> wrapper
├── aarch64-rpi3-linux-gnu-cpp -> wrapper
├── aarch64-rpi3-linux-gnu-gcc -> wrapper
├── aarch64-rpi3-linux-gnu-gcov -> wrapper
├── aarch64-rpi3-linux-gnu-ld -> wrapper
├── aarch64-rpi3-linux-gnu-nm -> wrapper
├── aarch64-rpi3-linux-gnu-objdump -> wrapper
├── aarch64-rpi3-linux-gnu-strip -> wrapper
└── wrapper
This is needed, because in the tool specifications we cannot reference external repositories with @
syntax and bazel expects this tool to live relatively to the cc_toolchain
rule. Therefore we reference the wrapper script, which in the end knows where does the actual tool reside:
$ cat bazel/toolchain/aarch64-rpi3-linux-gnu/wrappers/wrapper
#!/bin/bash
NAME=$(basename "$0")
TOOLCHAIN_BINDIR=external/aarch64-rpi3-linux-gnu/bin
exec "${TOOLCHAIN_BINDIR}"/"${NAME}" "$@"
Bazel will call bazel/toolchain/aarch64-rpi3-linux-gnu/wrappers/aarch64-rpi3-linux-gnu-gcc
which will exec the actuall gcc which resides inside the sandobx in the external directory: external/aarch64-rpi3-linux-gnu/bin/aarch64-rpi3-linux-gnu-gcc
.
The lines:
load(":cc_toolchain_config.bzl", "cc_toolchain_config")
...
cc_toolchain_config(name = "aarch64_toolchain_config")
load the toolchain configuration from an additional file, which contains the path for particular toolchain tools, as well as default flags for compilation and linking steps:
$ cat bazel/toolchain/aarch64-rpi3-linux-gnu/cc_toolchain_config.bzl
load("@bazel_tools//tools/build_defs/cc:action_names.bzl", "ACTION_NAMES")
load("@bazel_tools//tools/cpp:cc_toolchain_config_lib.bzl",
"feature",
"flag_group",
"flag_set",
"tool_path",
)
all_link_actions = [
ACTION_NAMES.cpp_link_executable,
ACTION_NAMES.cpp_link_dynamic_library,
ACTION_NAMES.cpp_link_nodeps_dynamic_library,
]
all_compile_actions = [
ACTION_NAMES.assemble,
ACTION_NAMES.c_compile,
ACTION_NAMES.clif_match,
ACTION_NAMES.cpp_compile,
ACTION_NAMES.cpp_header_parsing,
ACTION_NAMES.cpp_module_codegen,
ACTION_NAMES.cpp_module_compile,
ACTION_NAMES.linkstamp_compile,
ACTION_NAMES.lto_backend,
ACTION_NAMES.preprocess_assemble,
]
def _impl(ctx):
tool_paths = [
tool_path(
name = "ar",
path = "wrappers/aarch64-rpi3-linux-gnu-ar",
),
tool_path(
name = "cpp",
path = "wrappers/aarch64-rpi3-linux-gnu-cpp",
),
tool_path(
name = "gcc",
path = "wrappers/aarch64-rpi3-linux-gnu-gcc",
),
tool_path(
name = "gcov",
path = "wrappers/aarch64-rpi3-linux-gnu-gcov",
),
tool_path(
name = "ld",
path = "wrappers/aarch64-rpi3-linux-gnu-ld",
),
tool_path(
name = "nm",
path = "wrappers/aarch64-rpi3-linux-gnu-nm",
),
tool_path(
name = "objdump",
path = "wrappers/aarch64-rpi3-linux-gnu-objdump",
),
tool_path(
name = "strip",
path = "wrappers/aarch64-rpi3-linux-gnu-strip",
),
]
default_compiler_flags = feature(
name = "default_compiler_flags",
enabled = True,
flag_sets = [
flag_set(
actions = all_compile_actions,
flag_groups = [
flag_group(
flags = [
"--sysroot=external/aarch64-rpi3-linux-gnu-sysroot",
"-no-canonical-prefixes",
"-fno-canonical-system-headers",
"-Wno-builtin-macro-redefined",
"-D__DATE__=\"redacted\"",
"-D__TIMESTAMP__=\"redacted\"",
"-D__TIME__=\"redacted\"",
],
),
],
),
],
)
default_linker_flags = feature(
name = "default_linker_flags",
enabled = True,
flag_sets = [
flag_set(
actions = all_link_actions,
flag_groups = ([
flag_group(
flags = [
"--sysroot=external/aarch64-rpi3-linux-gnu-sysroot",
"-lstdc++",
],
),
]),
),
],
)
features = [
default_compiler_flags,
default_linker_flags,
]
return cc_common.create_cc_toolchain_config_info(
ctx = ctx,
features = features,
toolchain_identifier = "aarch64-toolchain",
host_system_name = "local",
target_system_name = "unknown",
target_cpu = "unknown",
target_libc = "unknown",
compiler = "unknown",
abi_version = "unknown",
abi_libc_version = "unknown",
tool_paths = tool_paths,
)
cc_toolchain_config = rule(
implementation = _impl,
attrs = {},
provides = [CcToolchainConfigInfo],
)
What is happening here is that we create a new rule which will return a CcToolchainConfigInfo
data structure containing all the tools Bazel needs for compiling. Additionally we set up features, which are a way of specifying behaviors of the toolchain, in our case we specify default flags for compiling and linking C++ code.
With all the code above we can now build for aarch64:
$ bazel build --crosstool_top=//bazel/toolchain/aarch64-rpi3-linux-gnu:gcc_toolchain --cpu=aarch64 //:hello
INFO: Build options --cpu and --crosstool_top have changed, discarding analysis cache.
INFO: Analyzed target //:hello (0 packages loaded, 7129 targets configured).
INFO: Found 1 target...
Target //:hello up-to-date:
bazel-bin/hello
INFO: Elapsed time: 0.970s, Critical Path: 0.03s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
$ file bazel-bin/hello
bazel-bin/hello: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 5.5.5, not stripped
Setting up custom platforms
There are few additional steps needed to use platforms with the above setup. First we need to define our new platform:
$ cat bazel/platforms/BUILD
platform(
name = "rpi",
constraint_values = [
"@platforms//cpu:aarch64",
"@platforms//os:linux",
],
)
Second, we need to create new platform-compatible toolchain target:
$ cat bazel/toolchain/aarch64-rpi3-linux-gnu/BUILD
...
toolchain(
name = "aarch64_linux_toolchain",
exec_compatible_with = [
"@platforms//os:linux",
"@platforms//cpu:x86_64",
],
target_compatible_with = [
"@platforms//os:linux",
"@platforms//cpu:aarch64",
],
toolchain = ":aarch64_toolchain",
toolchain_type = "@bazel_tools//tools/cpp:toolchain_type",
)
Third, register the toolchain in the WORKSPACE file:
$ cat bazel/toolchain/toolchain.bzl
def register_all_toolchains():
native.register_toolchains(
"//bazel/toolchain/aarch64-rpi3-linux-gnu:aarch64_linux_toolchain",
)
$ cat WORKSPACE
...
load("//bazel/toolchain:toolchain.bzl", "register_all_toolchains")
register_all_toolchains()
With that done, Bazel needs different command line arguments to use platforms:
$ bazel build \
--incompatible_enable_cc_toolchain_resolution \
--platforms=//bazel/platforms:rpi \
//:hello
Starting local Bazel server and connecting to it...
INFO: Analyzed target //:hello (19 packages loaded, 7123 targets configured).
INFO: Found 1 target...
Target //:hello up-to-date:
bazel-bin/hello
INFO: Elapsed time: 25.510s, Critical Path: 1.41s
INFO: 2 processes: 2 linux-sandbox.
INFO: Build completed successfully, 6 total actions
$ file bazel-bin/hello
bazel-bin/hello: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 5.5.5, not stripped
Setting up .bazelrc
For convenience all of the additional command line arguments can be hidden in the .bazelrc
file:
$ cat .bazelrc
build:rpi --crosstool_top=//bazel/toolchain/aarch64-rpi3-linux-gnu:gcc_toolchain --cpu=aarch64
build:bbb --crosstool_top=//bazel/toolchain/arm-cortex_a8-linux-gnueabihf:gcc_toolchain --cpu=armv7
build:platform_build --incompatible_enable_cc_toolchain_resolution
build:rpi-platform --config=platform_build --platforms=//bazel/platforms:rpi
build:bbb-platform --config=platform_build --platforms=//bazel/platforms:bbb
Invocation simplifies to:
$ bazel build --config=rpi //:hello
$ bazel build --config=rpi-platform //:hello
Summary
Those steps should be valid for most of the C++ toolchains with slight modifications. Although the process of setting it up might be complicated in the end the benefits are really worth it. You get out-of-the-box cached, distributed and one-command build.