Benchmarks
FluxArchitectures is included in the FluxBench-suite, with results available at speed.fluxml.ai.
To run the benchmarks yourself, add FluxBench as a package and type
using FluxBench
FluxBench.submit()The following sections give the timings for my own hardware (which is very limited as compared to current state-of-the-art equipment).
The setup does not ensure consistency in the number of parameters etc. between models - comparisons are only useful between GPU and CPU runs of the same model, and runs for different versions of FluxArchitectures. In addition, as the benchmark only performs a single backward pass, differences between CPU and GPU are not necessarily indicative of actual training performance.
Currently, only DSANet sees an improvement in training speed on a GPU, whereas all other models train faster on a CPU. The deeper reason is the current limitation of Flux's implementation of RNNs, see this issue that affects all models except DSANet, which doesn't use recurrence.
JetsonNano
The benchmark is run on a JetsonNano Developer Kit with 4GB of memory.
| CPU - Forward | GPU - Forward | CPU - Backward | GPU - Backward | |
|---|---|---|---|---|
| DARNN | 91.290 ms | 13.808 s | 1.810 s | 39.654 s |
| DSAnet | 222.675ms | 556.588 ms | 796.776 ms | 1.409 s |
| LSTNet | 18.256 ms | 1.452 s | 428.595 ms | 7.799 s |
| TPA-LSTM | 23.069 ms | 5.632 s | 1.572 s | 18.750 s |
Intel Core i5 (2.9 GHz)
The benchmark is run on a 13" MacBook Pro (Late 2016) without GPU support.
| CPU - Forward | CPU - Backward | |
|---|---|---|
| DARNN | 29.191 ms | 528.621 ms |
| DSAnet | 137.335 ms | 395.298 ms |
| LSTNet | 3.455 ms | 103.729 ms |
| TPA-LSTM | 7.885 ms | 435.726 ms |
JuliaHub
GPU enabled cloud computing via JuliaHub.
| CPU - Forward | GPU - Forward | CPU - Backward | GPU - Backward | |
|---|---|---|---|---|
| CPU | GPU | CPU | GPU | |
| DARNN | 57.445 ms | 3.939 s | 1.005 s | 11.189 s |
| DSAnet | 120.017 ms | 57.983 s | 541.940 ms | 155.569 ms |
| LSTNet | 3.970 ms | 384.773 ms | 131.375 s | 2.192 s |
| TPA-LSTM | 7.812 ms | 1.435 s | 518.579 ms | 5.170 s |
Technical Details
Version information for Julia and CUDA for the different devices for reference.
JetsonNano
julia> versioninfo(verbose=true)
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
OS: Linux (aarch64-unknown-linux-gnu)
Ubuntu 18.04.6 LTS
uname: Linux 4.9.140-tegra #1 SMP PREEMPT Tue Oct 27 21:02:37 PDT 2020 aarch64 aarch64
CPU: unknown:
speed user nice sys idle irq
#1 710 MHz 12735 s 0 s 562 s 12506 s 150 s
#2 518 MHz 2904 s 0 s 337 s 22803 s 31 s
#3 1479 MHz 2901 s 0 s 382 s 22706 s 35 s
#4 1479 MHz 2701 s 0 s 350 s 22992 s 22 s
Memory: 3.8712120056152344 GB (178.02734375 MB free)
Uptime: 2629.0 sec
Load Avg: 0.48 0.95 0.97
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, cortex-a57)
Environment:
HOME = /home/sdobber
TERM = xterm-256color
PATH = /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/sdobber/Programme/julia-1.6.2/binjulia> FluxBench.Flux.CUDA.versioninfo()
CUDA toolkit 10.2.89, local installation
CUDA driver 10.2.0
Libraries:
- CUBLAS: 10.2.2
- CURAND: 10.1.2
- CUFFT: 10.1.2
- CUSOLVER: 10.3.0
- CUSPARSE: 10.3.1
- CUPTI: 12.0.0
- NVML: missing
- CUDNN: 8.0.0 (for CUDA 10.2.0)
- CUTENSOR: missing
Toolchain:
- Julia: 1.6.2
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5
- Device capability support: sm_30, sm_32, sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75
1 device:
0: NVIDIA Tegra X1 (sm_53, 66.707 MiB / 3.871 GiB available)Intel Core i5 (2.9 GHz )
julia> versioninfo(verbose=true)
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
uname: Darwin 18.7.0 Darwin Kernel Version 18.7.0: Tue Jun 22 19:37:08 PDT 2021; root:xnu-4903.278.70~1/RELEASE_X86_64 x86_64 i386
CPU: Intel(R) Core(TM) i5-6267U CPU @ 2.90GHz:
speed user nice sys idle irq
#1 2900 MHz 206462 s 0 s 94219 s 732080 s 0 s
#2 2900 MHz 70845 s 0 s 29756 s 931955 s 0 s
#3 2900 MHz 210310 s 0 s 72773 s 749481 s 0 s
#4 2900 MHz 60839 s 0 s 25704 s 946013 s 0 s
Memory: 16.0 GB (5212.79296875 MB free)
Uptime: 1.118083e6 sec
Load Avg: 1.67041015625 1.7412109375 1.875
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS = 4
HOME = /Users/sdobber
PATH = /usr/local/lib/ruby/gems/2.6.0/bin:/Users/sdobber/.gem/ruby/X.X.0/bin:/usr/local/opt/ruby/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/Library/TeX/texbin:/opt/X11/bin
XPC_FLAGS = 0x0
TERM = xterm-256colorJuliaHub
julia> versioninfo(verbose=true)
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
uname: Linux 4.14.214-160.339.amzn2.x86_64 #1 SMP Sun Jan 10 05:53:05 UTC 2021 x86_64 x86_64
CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz:
speed user nice sys idle irq
#1 2701 MHz 7954 s 769 s 3085 s 521963 s 0 s
#2 2700 MHz 8465 s 911 s 3105 s 521471 s 0 s
#3 2699 MHz 8316 s 735 s 3098 s 521671 s 0 s
#4 2697 MHz 7440 s 632 s 3064 s 522231 s 0 s
Memory: 59.95986557006836 GB (51687.2734375 MB free)
Uptime: 53773.0 sec
Load Avg: 1.0 3.33 2.42
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)
Environment:
JULIAHUB_USEREMAIL =
JULIAHUB_HOME = /opt/juliahub
JULIA_WORKER_TIMEOUT = 360.0
JULIARUN_DATA_FOLDER = 378548374194874040
JULIARUN_JOB_ID = ooruudwbox
JULIA_GR_PROVIDER = BinaryBuilder
JULIAHUB_NAMESPACE = 378548374194874040
JULIA_NEW_PKG_SERVER = https://juliahub.com/
JULIA_DATASETS_PATH = /var/run/secrets/jr-ooruudwboxsecret/DATA_TOML:/opt/juliahub/JuliaHubDataDriver.toml:@:
JULIAHUB_USERNAME = Sören Dobberschütz
JULIA_DEPOT_PATH = /home/jrun/data/.julia:/home/jrun/.julia
JULIARUN_RUN_MODE = script
JULIA_HOME = /home/jrun/data/.julia
JULIATEAM_HOSTNAME = juliahub.com
JULIARUN_RESTART_POLICY = Never
JULIA_PKG_SERVER = juliahub.com
JULIA_NUM_THREADS = 4
JULIA_EDITOR = "/opt/codeserver/lib/code-server/lib/node"
JULIAHUB_HOME = /opt/juliahub
HOME = /home/jrun
JULIA_DATASETS_PATH = /var/run/secrets/jr-ooruudwboxsecret/DATA_TOML:/opt/juliahub/JuliaHubDataDriver.toml:@:
TERM = xterm-256color
JULIA_DEPOT_PATH = /home/jrun/data/.julia:/home/jrun/.julia
LD_LIBRARY_PATH = /usr/lib/x86_64-linux-gnu:/opt/codeserver/lib
JULIA_HOME = /home/jrun/data/.julia
JRUN_APP_BASE_PATH = /
PATH = /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/opt/julia/bin:/usr/lib/x86_64-linux-gnujulia> Flux.CUDA.versioninfo()
┌ Warning: The NVIDIA driver on this system only supports up to CUDA 11.0.0.
│ For performance reasons, it is recommended to upgrade to a driver that supports CUDA 11.2 or higher.
└ @ CUDA ~/data/.julia/packages/CUDA/DL5Zo/src/initialization.jl:42
CUDA toolkit 11.3.1, artifact installation
CUDA driver 11.0.0
NVIDIA driver 450.51.6
Libraries:
- CUBLAS: 11.5.1
- CURAND: 10.2.4
- CUFFT: 10.4.2
- CUSOLVER: 11.1.2
- CUSPARSE: 11.6.0
- CUPTI: 14.0.0
- NVML: 11.0.0+450.51.6
- CUDNN: 8.20.2 (for CUDA 11.4.0)
Downloaded artifact: CUTENSOR
- CUTENSOR: 1.3.0 (for CUDA 11.2.0)
Toolchain:
- Julia: 1.6.1
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80
1 device:
0: Tesla K80 (sm_37, 11.170 GiB / 11.173 GiB available)