A guide to configuring and running benchmarks for SoC FPGAs running Linux
19 May 2016 - 18:43
| |
Version 22
|
Findlay Shearer
| Altera SoC Workshop Series | Altera, Angstrom, Arria 10, Arria V, Benchmarking, Coremark, Coremark Pro, Cyclone V, Dhrystone, LMBench, Linux, STREAM, SoC, Whetsone
Introduction
This guide is intended to support developers who want to run their own benchmarking programs for
SoC FPGA devices and to verify that their development environment is optimally configured. Benchmarking is a complicated topic and the reader is encouraged to learn as much as possible outside of this guide. When running benchmarks, the developer must ensure that the system has been correctly configured to deliver optimum performance, and this is initially done by comparing developer-measured
CoreMark results to Altera-measured
CoreMark results. The setup should be checked and re-configured until the two sets of results are similar - indicating that the setup is optimal and as configured by Altera. Once this is achieved, the developer can move on to using other benchmarks to measure the system performance. Developers must select benchmarks that most resemble their own project/program/application in order to get an idea of the possible end performance. Additional configuration changes can be beneficial to specific benchmarks, but may simultaneously be detrimental to the results of other benchmarks.
It's assumed most people using Rocketboards are using the Angstrom Linux distribution from Altera's github, and the compiler that comes with it is the Linaro GCC compiler. In Angstrom it's named "arm-angstrom-linux-gnueabi-gcc" and you can get it using "opkg install gcc". You can use any compiler you wish to run these benchmarks, but this article assumes it's GCC.
The optimizations used at Altera vary from -O0 through -Ofast, and generally -Ofast gives the best performance. With the recent versions of GCC automatically including -mfloat-abi=hard and -mfpu=neon, these are now omitted from the command line and only -mcpu=cortex-a9 is needed. -lrt also gives a speed boost.
Major Processor/Memory Benchmarks
Provided here is a list of benchmarks chosen by Altera to run on the
SoCFPGA devices.
Coremark was designed to replace Dhrystone and other earlier processor benchmarks on embedded systems and was mainly designed for microcontrollers. The application processors in the Altera
SoCs are a little advanced for Coremark, and as a result Coremark entirely fits in the L1 cache of the Cortex-A9. Coremark-Pro is better suited for application processors, but Coremark is a lot more popular right now (and easier to set up) so it is also provided here.
Source and Compilation
You can obtain the project/source code from
EEMBC's website. You'll need to register with an e-mail address and then download the tar file. Once you have it, extract the tar file's contents into your working directory. Now follow these steps:
- Edit the linux/core_portme.mak file and change the CC entry to have "CC=arm-angstrom-linux-gnueabi-gcc"
- This is the gcc compiler you can download from the Angstrom repositories
- Change the PORT_CFLAGS entry to "PORT_CFLAGS= -Ofast -mcpu=cortex-a9 -lrt -lpthread"
- These are the optimizations used at Altera, but you can also change or add your own optimizations with this entry
- Edit the linux/core_portme.h file and change the MULTITHREAD entry to "#define MULTITHREAD 2"
- This allows multiple parallel threads to be launched
- Change the USE_PTHREAD entry to "#define USE_PTHREAD 1"
- PTHREAD was found to have the best performance
- Type "make" in the working directory, wait for compilation and program to finish
- Open the run1.log file to see the performance scores
Results
Multithread = 2, PTHREAD = 1, FORK = 0, SOCKET = 0
Coremark Score |
9331.985 |
5641.75 |
4968.94 |
---|
Test Date |
12/17/15 |
3/6/14 |
4/23/14 |
---|
Benchmark |
Coremark |
Coremark |
Coremark |
---|
Dev Kit |
Arria 10 SoC Dev Kit |
Arria V SoC Dev Kit |
Cyclone V SoC Dev Kit |
---|
Dev Kit Rev |
B |
A |
C |
---|
SoC Device |
Arria 10 |
Arria V |
Cyclone V |
---|
Core Frequency |
1500 MHz |
1050 MHz |
925 MHz |
---|
L2 Cache ECC On/Off |
Off |
Off |
Off |
---|
ACP Enabled/Disabled |
Disabled |
Disabled |
Disabled |
---|
Memory Size |
1 GB DDR4 |
1 GB DDR3 |
1 GB DDR2 |
---|
Memory Frequency |
1066 MHz |
533 MHz |
400 MHz |
---|
Memory ECC On/Off |
Off |
Off |
Off |
---|
FPGA Logic Contents |
Empty |
Empty |
Empty |
---|
FPGA Logic Frequency |
N/A |
N/A |
N/A |
---|
OS & Build |
Angstrom v2014.12 - Kernel 3.10.31-ltsi |
Angstrom v2012.12 - Kernel 3.13.0 |
Angstrom v2012.12 - Kernel 3.13.0 |
---|
SW Compiler |
Linaro GCC 4.9.3-2014.11 (Native) |
Linaro GCC 2013.02 (GCC v4.7.3) (Cross) |
Linaro GCC 4.8.3 (Cross) |
---|
Dual-core |
Yes |
Yes |
Yes |
---|
Compiler Flags |
-Ofast -mcpu=cortex-a9 -lrt -lpthread -DPERFORMANCE_RUN=1 |
-O3 -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=hard -lpthread |
-O3 -Ofast -mtune=cortex-a9 -mfpu=neon -lpthread -lrt |
---|
Coremark Pro is an upgraded Coremark that was recently released by EEMBC. It's designed to replace Coremark for application level processors, like the Cortex-A9 and Cortex-A53 of the Altera
SoCs. It contains a few smaller benchmark programs that are aggregated together for a complete system score. You can compare scores on the EEMBC website.
You can obtain the project/source code from
EEMBC's website. You'll need to register with an e-mail address and then download the tar file. Once you have it, extract the tar file's contents into your working directory. Now follow these steps:
- Copy util/make/gcc.mak to util/make/arm-angstrom-linux-gnueabi-gcc.mak
- Edit the new .mak file, change CC to CC=arm-angstrom-linux-gnueabi-gcc
- Edit linux.mak so TOOLCHAIN=arm-angstrom-linux-gnueabi-gcc
- Edit arm-angstrom-linux-gcc.mak so that linker is arm-angstrom-linux-gnueabi-gcc
- Type "make TARGET=linux" in the main directory
- Open builds/logs/linux-arm-angstrom-linux-gnueabi-gcc.log to get scores
- Take scores divided by EEMBC ref numbers, do a geomean of those results
- You'll need to find the reference numbers on EEMBC's website
- Multiply that score by 1000 to get the final score
STREAM
STREAM is a memory benchmark that just measures the bandwidth between the processor and main memory. It reports four scores: Copy, Add, Scale, and Triad. Copy is a straight data→data transfer, Add does (data + scalar)→data, Scale does (data * scalar)→data, and Triad does ((data * scalar) + scalar)→data.
You can obtain the project/source code from the
STREAM website. Then follow these steps:
- Download OpenMP to do dual-core/multi-threads, with this command: "opkg install libgomp"
- Compile the source like this "arm-angstrom-linux-gnueabi-gcc -Ofast -mcpu=cortex-a9 -lrt -fopenmp stream.c -o stream_test"
- Run the binary and record the output
LMBench
LMBench is a complete suite of many smaller benchmarking programs that attempt to measure a complete system. It contains bandwidth, latency, and miscellaneous processor/peripheral benchmark programs. The latency and bandwidth programs were found to be the most useful.
You can obtain the project/source code from the
LMBENCH website. Then follow these steps:
- Untar the downloaded tarball
- There's a bug in LMBench currently, use these commands in the main directory:
- mkdir ./SCCS
- touch ./SCCS/s.ChangeSet
- Go to the source directory
- Change CC to "CC=arm-angstrom-linux-gnueabi-gcc"
- Change OS to "OS=angstrom-linux"
- Change CFLAGS to "-Ofast -mcpu=cortex-a9 -lrt" or something of your choosing
- Go back to main directory
- Type "make results"
- Wait for compilation and text wizard to run
- Fill out text wizard, LMBench automatically runs
- Go to the results directory
- make LIST=/* for a display of the results
Minor Processor/Memory Benchmarks
Dhrystone
The Altera
SoCs are designed with ARM Cortex-A9 cores, and ARM provides the Dhrystone score for these which is 2.5 DMIPS/MHz per core. When you compile Dhrystone and run it on your own you probably won't achieve this score due to Linux and GCC overhead.
Whetstone
Whetstone is a general floating-point benchmark and is relatively obsolete due to Coremark and Coremark-Pro. You can download and run this on your
SoC, but make sure you have -mfloat-abi=hard and -mfpu=neon in your compiler optimizations (if not already the default) to test your hardware float-point unit. Otherwise you can test software floating-point emulation with -mfloat-abi=softfp.
Additional Material
For more information
click here.