Introduction

This guide is intended to support developers who want to run their own benchmarking programs for SoC FPGA devices and to verify that their development environment is optimally configured. Benchmarking is a complicated topic and the reader is encouraged to learn as much as possible outside of this guide. When running benchmarks, the developer must ensure that the system has been correctly configured to deliver optimum performance, and this is initially done by comparing developer-measured CoreMark results to Altera-measured CoreMark results. The setup should be checked and re-configured until the two sets of results are similar - indicating that the setup is optimal and as configured by Altera. Once this is achieved, the developer can move on to using other benchmarks to measure the system performance. Developers must select benchmarks that most resemble their own project/program/application in order to get an idea of the possible end performance. Additional configuration changes can be beneficial to specific benchmarks, but may simultaneously be detrimental to the results of other benchmarks.

Toolchains/Optimizations and OS Environment

It's assumed most people using Rocketboards are using the Angstrom Linux distribution from Altera's github, and the compiler that comes with it is the Linaro GCC compiler. In Angstrom it's named "arm-angstrom-linux-gnueabi-gcc" and you can get it using "opkg install gcc". You can use any compiler you wish to run these benchmarks, but this article assumes it's GCC.

The optimizations used at Altera vary from -O0 through -Ofast, and generally -Ofast gives the best performance. With the recent versions of GCC automatically including -mfloat-abi=hard and -mfpu=neon, these are now omitted from the command line and only -mcpu=cortex-a9 is needed. -lrt also gives a speed boost.

Major Processor/Memory Benchmarks

Provided here is a list of benchmarks chosen by Altera to run on the SoCFPGA devices.

CoreMark

Coremark was designed to replace Dhrystone and other earlier processor benchmarks on embedded systems and was mainly designed for microcontrollers. The application processors in the Altera SoCs are a little advanced for Coremark, and as a result Coremark entirely fits in the L1 cache of the Cortex-A9. Coremark-Pro is better suited for application processors, but Coremark is a lot more popular right now (and easier to set up) so it is also provided here.

Source and Compilation
You can obtain the project/source code from EEMBC's website. You'll need to register with an e-mail address and then download the tar file. Once you have it, extract the tar file's contents into your working directory. Now follow these steps:
  • Edit the linux/core_portme.mak file and change the CC entry to have "CC=arm-angstrom-linux-gnueabi-gcc"
    • This is the gcc compiler you can download from the Angstrom repositories
  • Change the PORT_CFLAGS entry to "PORT_CFLAGS= -Ofast -mcpu=cortex-a9 -lrt -lpthread"
    • These are the optimizations used at Altera, but you can also change or add your own optimizations with this entry
  • Edit the linux/core_portme.h file and change the MULTITHREAD entry to "#define MULTITHREAD 2"
    • This allows multiple parallel threads to be launched
  • Change the USE_PTHREAD entry to "#define USE_PTHREAD 1"
    • PTHREAD was found to have the best performance
  • Type "make" in the working directory, wait for compilation and program to finish
  • Open the run1.log file to see the performance scores

Results
Multithread = 2, PTHREAD = 1, FORK = 0, SOCKET = 0
Coremark Score 9331.985 5641.75 4968.94
Test Date 12/17/15 3/6/14 4/23/14
Benchmark Coremark Coremark Coremark
Dev Kit Arria 10 SoC Dev Kit Arria V SoC Dev Kit Cyclone V SoC Dev Kit
Dev Kit Rev B A C
SoC Device Arria 10 Arria V Cyclone V
Core Frequency 1500 MHz 1050 MHz 925 MHz
L2 Cache ECC On/Off Off Off Off
ACP Enabled/Disabled Disabled Disabled Disabled
Memory Size 1 GB DDR4 1 GB DDR3 1 GB DDR2
Memory Frequency 1066 MHz 533 MHz 400 MHz
Memory ECC On/Off Off Off Off
FPGA Logic Contents Empty Empty Empty
FPGA Logic Frequency N/A N/A N/A
OS & Build Angstrom v2014.12 - Kernel 3.10.31-ltsi Angstrom v2012.12 - Kernel 3.13.0 Angstrom v2012.12 - Kernel 3.13.0
SW Compiler Linaro GCC 4.9.3-2014.11 (Native) Linaro GCC 2013.02 (GCC v4.7.3) (Cross) Linaro GCC 4.8.3 (Cross)
Dual-core Yes Yes Yes
Compiler Flags -Ofast -mcpu=cortex-a9 -lrt -lpthread -DPERFORMANCE_RUN=1 -O3 -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=hard -lpthread -O3 -Ofast -mtune=cortex-a9 -mfpu=neon -lpthread -lrt

Coremark Pro

Coremark Pro is an upgraded Coremark that was recently released by EEMBC. It's designed to replace Coremark for application level processors, like the Cortex-A9 and Cortex-A53 of the Altera SoCs. It contains a few smaller benchmark programs that are aggregated together for a complete system score. You can compare scores on the EEMBC website.

You can obtain the project/source code from EEMBC's website. You'll need to register with an e-mail address and then download the tar file. Once you have it, extract the tar file's contents into your working directory. Now follow these steps:
  • Copy util/make/gcc.mak to util/make/arm-angstrom-linux-gnueabi-gcc.mak
  • Edit the new .mak file, change CC to CC=arm-angstrom-linux-gnueabi-gcc
  • Edit linux.mak so TOOLCHAIN=arm-angstrom-linux-gnueabi-gcc
  • Edit arm-angstrom-linux-gcc.mak so that linker is arm-angstrom-linux-gnueabi-gcc
  • Type "make TARGET=linux" in the main directory
  • Open builds/logs/linux-arm-angstrom-linux-gnueabi-gcc.log to get scores
  • Take scores divided by EEMBC ref numbers, do a geomean of those results
    • You'll need to find the reference numbers on EEMBC's website
  • Multiply that score by 1000 to get the final score

STREAM

STREAM is a memory benchmark that just measures the bandwidth between the processor and main memory. It reports four scores: Copy, Add, Scale, and Triad. Copy is a straight data→data transfer, Add does (data + scalar)→data, Scale does (data * scalar)→data, and Triad does ((data * scalar) + scalar)→data.

You can obtain the project/source code from the STREAM website. Then follow these steps:
  • Download OpenMP to do dual-core/multi-threads, with this command: "opkg install libgomp"
  • Compile the source like this "arm-angstrom-linux-gnueabi-gcc -Ofast -mcpu=cortex-a9 -lrt -fopenmp stream.c -o stream_test"
  • Run the binary and record the output

LMBench

LMBench is a complete suite of many smaller benchmarking programs that attempt to measure a complete system. It contains bandwidth, latency, and miscellaneous processor/peripheral benchmark programs. The latency and bandwidth programs were found to be the most useful.

You can obtain the project/source code from the LMBENCH website. Then follow these steps:
  • Untar the downloaded tarball
  • There's a bug in LMBench currently, use these commands in the main directory:
    • mkdir ./SCCS
    • touch ./SCCS/s.ChangeSet
  • Go to the source directory
  • Change CC to "CC=arm-angstrom-linux-gnueabi-gcc"
  • Change OS to "OS=angstrom-linux"
  • Change CFLAGS to "-Ofast -mcpu=cortex-a9 -lrt" or something of your choosing
  • Go back to main directory
  • Type "make results"
  • Wait for compilation and text wizard to run
  • Fill out text wizard, LMBench automatically runs
  • Go to the results directory
  • make LIST=/* for a display of the results

Minor Processor/Memory Benchmarks

Dhrystone

The Altera SoCs are designed with ARM Cortex-A9 cores, and ARM provides the Dhrystone score for these which is 2.5 DMIPS/MHz per core. When you compile Dhrystone and run it on your own you probably won't achieve this score due to Linux and GCC overhead.

Whetstone

Whetstone is a general floating-point benchmark and is relatively obsolete due to Coremark and Coremark-Pro. You can download and run this on your SoC, but make sure you have -mfloat-abi=hard and -mfpu=neon in your compiler optimizations (if not already the default) to test your hardware float-point unit. Otherwise you can test software floating-point emulation with -mfloat-abi=softfp.

Additional Material

For more information click here.

Give us your feedback

© 1999-2017 RocketBoards.org by the contributing authors. All material on this collaboration platform is the property of the contributing authors. Privacy.