Device Wide AMP
Heterogeneous Asymmetric Multiprocessing Reference Design based on OpenMCAPI and involving communication between Nios running uCOS-II and ARM running Linux SMP

Board: Altera Cyclone V SoC Board
Tools Version: 13.1
State: running
Members: Radu Bacrau

Overview

This document describes a heterogeneous AMP (Asymmetric Multi Processing) example design that is composed of the following:
  • HPS domain: ARM Cortex-A9 running Linux SMP
  • FPGA domain: Nios-II running uC/OS-II

The HPS domain is intended for non-real-time processing, while the FPGA domain is intended to be used for real-time processing.

The communication between HPS and FPGA domains is accomplished through a MCAPI transport layer, that uses shared HPS DDR memory as underlying mechanism.

FPGA domain has its own memories (OCRAM and DDRAM) so that the HPS impact on the real-time processing is minimized. With the dual domain implementation, memory access latency can be better controlled and deterministic. There are no shared peripherals between HPS and Nios-II.

The purpose of this design example is to provide a foundation on which a custom system can be built.

design-overview.png

A simple “hello” application is used to demonstrate the inter-processor communication for this example design. ARM Cortex-A9 sends “hello” message to Nios-II continuously and vice versa. The “hello” message received by HPS will be displayed on HPS UART; while the message received by Nios-II will be displayed on JTAG UART.

Deliverables

This section presents the set of deliverables that are part of the Heterogeneous AMP Example design release.

Item Format
Binaries Archive File
Hardware Design Archive File
Nios II Application Source Archive File
Linux Yocto Recipes Git Tree
OpenMCAPI Library Git Tree
Tools Archive File

Binaries

The reference design deliverables include binaries that can be used to run the reference design directly. The binaries are deliveres as an archive file accessible at http://releases.rocketboards.org/release/2014.04/dwamp-rd/cv-dwamp-rd-bin.tar.gz

The binaries archive contains the following items:

Folder Folder Folder File Description
cv-dwamp-rd bin fpga amp_arm1nios_5csxfc6.sof FPGA Configuration SOF
amp_sof.flash FPGA Configuration Flash
linux altera-amp-image-socfpga_cyclone5.tar.gz Root filesystem
sd_image.bin SD Card Image
amp_arm1nios_5csxfc6.dtb Device Tree Binary
preloader-mkpimage.bin Preloader Image
u-boot-socfpga_cyclone5.img U-Boot Image
u-boot.scr U-Boot Script
vmlinux Linux Kernel Executable
zImage Linuk Kernel Compressed Image
nios amp_mcapi_app.elf Nios II Application Executable
amp_mcapi_app.flash Nios II Application Flash Image

Hardware Design

The hardware design is delivered as an archive file accessible at http://releases.rocketboards.org/release/2014.04/dwamp-rd/cv-dwamp-rd-hw.tar.gz

Some of the relevant included files and folders are:

Folder Folder Item Description
cv-dwamp-rd hw ip/ Folder containing IP files
amp_arm1nios_5csxfc6.qpf Quartus Project File
amp_arm1nios_5csxfc6.qsys Qsys File
amp_arm1nios_5csxfc6.qsf Quartus Settings File
amp_top.v Top level Verilog File
soc_system_timing.sdc Timing File
amp_arm1nios_5csxfc6_board_info.xml Board XML for Device Tree Generator
hps_clock_info.xml Clock XML for Device Tree Generator

Nios II Application Source

The Nios II application source is delivered as an archive file accessible at http://releases.rocketboards.org/release/2014.04/dwamp-rd/cv-dwamp-rd-src.tar.gz

Folder Folder Folder File Description
cv-dwamp-rd sw ucosii app_mcapi_demo.c Nios II Application Source
ucosii_init.c uC/OS-II Initialization Source

Linux Yocto Recipes

All the necessary files to build the Linux kernel, drivers, applications and root filesystem are delivered as a set of Yocto recipes accessible through the git trees at http://rocketboards.org/gitweb.

Component Git address Tag
Yocto Recipes poky-socfpga.git ACDS13.1_REL_AMP_PR

OpenMCAPI Library

The OpenMCAPI Library is delivered as a git tree available at http://rocketboards.org/gitweb.

Component Git address Tag
OpenMCAPI Libary openmcapi.git ACDS13.1_REL_AMP_PR

Tools

The tools required for this reference design are delivered as an archive at http://releases.rocketboards.org/release/2014.04/dwamp-rd/cv-dwamp-rd-tools.tar.gz.

The archive contains the following items:
Folder Folder File Decription
cv-dwamp-rd tools uboot-socfpga.tar.gz Patched Preloader Source
make_sdimage.sh Script for creating bootable SD card image

Getting Started

This section presents how to run the example reference design by using the precompiled binary deliverables.

Pre-requisites

The following are needed in order to be able to run the Heterogeneous AMP Design Example:
  • Cyclone V SoC Development Board - rev D recommended (rev C will also work)
  • Micro SD card and SD card writer
  • Host PC running Linux - for dd
  • Quartus II v13.1 - for nios2-flash-programmer

Obtain binaries

Download the example design archive, and save it to the user home folder.

Unzip the example design archive:
$ cd ~
$ tar xzf cv-dwamp-rd-bin.tar.gz 

The following files will be used in order to run the Reference Design:
Folder Folder Folder File Description
cv-dwamp-rd bin fpga amp_sof.flash FPGA Configuration Flash
linux sd_image.bin SD Card Image
nios amp_mcapi_app.flash Nios II Application Flash Image

Board Setup

This section presents the board setting required for running the Heterogeneous AMP example design.

Jumpers:
Jumper Setting
J5 open
J6 shorted
J7 shorted
J9 open
J13 shorted
J16 open
J26 right shorted
J27 right shorted
J28 right shorted
J29 right shorted
J30 left shorted
J31 open

Switches:
Switch Setting
SW1 All OFF
SW2 1:OFF 2:ON 3:ON 4:ON
SW3 1:ON 2:OFF 3:ON 4:ON 5:OFF 6:ON
SW4 1:OFF 2:OFF 3:ON 4:ON

Configuring Board to Use EPCQ

By default the board is configured to use the onboard FPGA configuration device as EPCS.

If not already done so, please follow the instructions posted at http://www.altera.com/support/kdb/solutions/rd11192013_118.html in order to configure the board to use the configuration device as EPCQ.

Note that these steps are only needed to be done once. After that the board will be configured correctly, with the configuration stored in flash.

For a step-by-step tutorial on how to configure board to use EPCQ please click here.

Prepare SD Card

Write SD Card Image
$ cd ~/cv-dwamp-rd
$ sudo dd if=bin/linux/sd_image.bin of=/dev/sdx
$ sudo sync

Please replace '/dev/sdx' with the name of the SD card device on your host computer.

Flashing Board

Write FPGA Flash Image
$ ~/altera/13.1/nios2eds/nios2_command_shell.sh
$ cd ~/cv-dwamp-rd
$ nios2-flash-programmer --base=0x40000 --epcs bin/fpga/amp_sof.flash

Write Nios II Flash Image
$ ~/altera/13.1/nios2eds/nios2_command_shell.sh
$ cd ~/cv-dwamp-rd
$ nios2-flash-programmer --base=0x40000 --epcs bin/nios/amp_mcapi_app.flash

Running Example Design

1. Insert SD card into slot.

2. Power cycle the board. This will cause the FPGA to be configured, and Nios to boot from flash. Also Linux will boot from the SD card.

3. Start Nios-II Terminal
$ ~/altera/13.1/nios2eds/nios2_command_shell.sh
$ nios2-terminal

This will show that Nios-II is running the demo application:
Start MCAPI demo for Nios-II uC/OS-II
Node 1: MCAPI Initializing
Node 1: MCAPI Initialized
Node 1: Creating tx port 1000
Node 1: Creating rx port 1001
Node 1: Get remote rx port 1001 from node 0

4. At Linux console, login using username 'root' and no password.

5. At Linux console, run the following commands to start the demo application:
# modprobe altera_hwmutex
# modprobe mailbox-altera
# modprobe mcomm
# modprobe mcomm_socfpga
# mcapi_test --loop 0 1

The Linux console will show the demo working:
Start MCAPI Demo in Linux
Node 0: MCAPI Initialized
Node 0: Creating tx port 1000
Node 0: Creating rx port 1001
Node 0: Get remote rx port 1001 from node 1
Node 0: Connecting 0:1000 to 1:1001
Node 0: Waiting for connection setup with Node 1001
Node 0: Connection complete
Node 0: Opening send endpoint
Node 0: Opening receive endpoint
Node 0: MCAPI negotiation complete!
received message 0: Hi from node 1 - 0
received message 1: Hi from node 1 - 1
received message 2: Hi from node 1 - 2
received message 3: Hi from node 1 - 3
received message 4: Hi from node 1 - 4
received message 5: Hi from node 1 - 5
received message 6: Hi from node 1 - 6
..

6. Nios-II terminal will show the FPGA side also communicating through MCAPI:
Node 1: Connecting 1:1000 to 0:1001
Node 1: Waiting for connection setup with Node 1001
Node 1: Connection complete
Node 1: Opening send endpoint
Node 1: Opening receive endpoint
Node 1: MCAPI negotiation complete!
Received msg 0: hi from node 0 - 0
Received msg 1: hi from node 0 - 1
Received msg 2: hi from node 0 - 2
Received msg 3: hi from node 0 - 3
Received msg 4: hi from node 0 - 4
Received msg 5: hi from node 0 - 5
Received msg 6: hi from node 0 - 6
..

6. Press CTRL-C at Linux console to stop the demo application. The console will show the application being gracefully terminated:
cleanup
Node 0: Disconnecting ...
Node 0: Closing send endpoint
Node 0: Closing receive endpoint
Node 0: MCAPI disconnection complete!

6. The Nios-II terminal will also the application being closed:
app_mcapi_demo.c:162 status -227
Node 1: Disconnecting ...
Node 1: Closing send endpoint
Node 1: Closing receive endpoint
Node 1: MCAPI disconnection complete!
cleanup
Press Enter to restart demo,  ny other keys to end the demo

Hardware Design

This section briefly describes the hardware system design.

hardware-overview.png

The AMP example design system requires having the following components:
  • FPGA
    • Nios II
    • Mutex IP
    • Mailbox IP
    • On Chip RAM
    • JTAG UART
    • Address span extender
    • SDRAM controller
    • EPCS flash controller
  • HPS (ARM Cortex-A9)

Components

Nios II

A single Nios-II core without MMU is instantiated in FPGA for real-time processing. Instruction and data cache are turned on. Nios-II processor is hooked up to soft IP such as JTAG UART, mailbox and mutex. CPU ID of Nios-II core is defined by user at hardware design time in Qsys. It is defined as 2 for this example design.

Mailbox

Mailbox soft IP are used for inter-processor interrupt and data passing. This is a new IP introduced in ACDS13.1. 2 mailboxes are needed for inter-processor communication between ARM Cortex-A9 and Nios-II since the IP can only passing messages in single direction. An interrupt will be fired to the recipient when a message is written into mailbox.

Mutex

Multiple mutexes are instantiated to protect critical sections of the code. This is to prevent multiple accesses to the critical sections at the same time. Both ARM Cortex-A9 and Nios-II need to obtain the mutex lock before they can write to the protected shared memory region.

On Chip RAM (OCRAM)

On chip RAM in FPGA domain is 256KB. OCRAM is connected to Nios-II and it is served as instruction and data memory for Nios-II. uC/OS-II OS code and application codes can be loaded into OCRAM if the code footprint is less than 256KB.

SDRAM controller

SDRAM controller is needed when the uC/OS-II code footprint grows beyond 256KB that it cannot be fit into OCRAM. DDR connected to FPGA will be used instead of OCRAM for this example design.

EPCS Flash Controller

EPCS flash controller is connected to Nios-II so that EPCS can be used as storage for FPGA bit stream and uC/OS-II software binary.

JTAG UART

JTAG UART is used to display message received by Nios-II for demo and validation purpose. It can be omitted if display is not needed.

Memory Maps

The L3 Interconnect within HPS support remap feature. Please refer Interconnect document for detail remap and memory map of each of the HPS peripheral. Software will need to set the remap bit correctly in order to access the H2F and LWH2F interfaces. The remap configuration is handled by Preloader

MPU View

The memory map of system peripherals in FPGA as view by the MPU which sit on top of LWH2F with base address of 0xFF20_000 is show in following table.

Peripheral Address offset Size (bytes) Attribute
Timer 0x0 32 Timer for general purpose
mailbox_arm2nios 0x20 16 Mailbox instance for communication direction of ARM to NiosII
mailbox_nios2arm 0x30 16 Mailbox instance for communication direction of NiosII to ARM
mutex_juart 0x40 8 Mutex instance for Jtag UART protection
mutex_0_ddr 0x50 8 Mutex instance for DDR region protection
mutex_1_ddr 0x58 8 Mutex instance for DDR region protection
mutex_2_ddr 0x60 8 Mutex instance for DDR region protection
mutex_3_ddr 0x68 8 Mutex instance for DDR region protection
sysid_qsys 0x10000 8 Unique system ID
jtag_uart 0x10008 8 JTAG UART
button_pio 0x10010 16 DIP switch input
led_pio_flow_ctrl 0x10020 32 LED output display, also used as flow control PIO
dipsw_pio 0x10040 16 Push button input
The memory map of system peripherals in FPGA as view by the MPU which sit on top of H2F with base address of 0xC000_0000 is show in following table.

Peripheral Address offset Size (bytes) Attribute
fpga_ddr 0x0 1G DDR3 on FPGA domain
niosii_ocm 0x2000_0000 256K Onchip memory for NiosII
epcs_x1 0x2004_0000 2K EPCS

Nios-II View

The memory map of system peripherals in FPGA as view by the Nios-II is shown in following tables.

Address of peripherals on NiosII Data Master Interface:

Peripheral Address offset Size (bytes) Attribute
niosii_ocm 0x0 256K Onchip memory for NiosII
epcs_x1 0x4_0000 2K EPCS
Timer 0x6_0000 32 Timer for general purpose
mailbox_arm2nios 0x6_0020 16 Mailbox instance for communication direction of ARM to NiosII
mailbox_nios2arm 0x6_0030 16 Mailbox instance for communication direction of NiosII to ARM
mutex_juart 0x6_0040 8 Mutex instance for Jtag UART protection
mutex_0_ddr 0x6_0050 8 Mutex instance for DDR region protection
mutex_1_ddr 0x6_0058 8 Mutex instance for DDR region protection
mutex_2_ddr 0x6_0060 8 Mutex instance for DDR region protection
mutex_3_ddr 0x6_0068 8 Mutex instance for DDR region protection
jtag_uart 0x7_0008 8 JTAG UART
led_pio_flow_ctrl 0x7_0020 32 LED output display, also used as flow control PIO
hps_sdram 0x3000_0000 256M HPS SDRAM of location 0x3000_0000 of HPS system memory
fpga_sdram 0x4000_0000 1G FPGA SDRAM as executable memory
Address of peripherals on NiosII Instruction Masters Interface:

Peripheral Address offset Size (bytes) Attribute
niosii_ocm 0x0 256K Onchip memory for NiosII, accessible through TCM
epcs_x1 0x4_0000 2K EPCS
hps_sdram 0x3000_0000 256M HPS SDRAM of location 0x3000_0000 of HPS system memory. This memory is currently not executable as the HPS SDRAM MPFE port is yet declared as executable memory
fpga_sdram 0x4000_0000 1G FPGA SDRAM as executable memory

Software Design

The AMP example design showcase a system that runs 2 different operating systems in one SoC. HPS will be running Linux SMP operating system, and Nios II will be running uC/OS-II operating system. The example design software is to lay down a foundation for a communication channel between the two processors in the AMP example design from different domain and different architecture. The communication channel is established via MCAPI. Each processor is treated as a MCAPI node. The transport layer for the communication channel is shared memory

software-overview.png

Mutex soft IP is used to protect the critical sections of this shared memory region. It is to control mutual exclusive access to the shared memory region to prevent memory corruption due to concurrent access. The mutex soft IP provides a hardware-based atomic test-and-set operation, allowing software in a multiprocessor environment to determine which processor owns the mutex.

Mailbox soft IP is used to notify and interrupt the processors on the new messages in the communication channel. Mailbox IP driver is required for both Linux SMP and uC/OS-II. This driver will be used by openMCAPI library for inter-processor communicationwhen a packet needs to be sent to the other processor in this example design, “Hello” messages will be exchanged between the processors; these messages will be divert to UART port controlled by the processor for demonstration and validation purposes.

Nios-II fetches its instructions from DDR in this example design as this may allow bigger code footprint. An OCRAM with size of 256KB is connected to Nios-II in this example design. Users may choose to switch the Nios-II to run from OCRAM for faster performance. However, the code footprint cannot grow beyond 256KB. Slower performance will be expected when Nios-II run from SDRAM and the execution timing may not be deterministic since the latency to SDRAM access is not deterministic. It is expensive to have a large OCRAM in a FPGA design as it will takes up a large portion of FPGA area. Therefore, there will be a tradeoff between FPGA area and performance for different sizes of code footprint.

Boot media for this example design is SD/MMC, but it can be modified to use other boot media..Users may choose to boot from QSPI or NAND for their own customization. Software changes are not needed, but boot image creation and boot setting on the board will differ

Boot Flow

boot-flow.png

HPS will be brought up together with FPGA. When the HPS is booting, the first boot component is Boot ROM, Boot ROM will fetch the preloader from SD/MMC and then boot into preloader. After that, preloader will fetch u-boot from SD/MMC and boot into u-boot. U-boot is responsible to fetch Linux image from SD/MMC and then boot into Linux.

FPGA will be configured from EPCS connected to Nios-II. The reset vector of Nios-II points to boot copier in EPCS controller memory. Nios-II is part of FPGA design; therefore Nios-II will only be brought up after FPGA is configured. Nios-II ELF will be fetched by boot copier from EPCS. Nios-II will then boot into uC/OS-II.

Besides that, u-boot is also responsible to release the H2F, LWH2F and F2SDRAM bridges after FPGA goes into user mode. These bridges are controlled by the HPS, FPGA domain has no write access to the control register groups to control these bridges. Access from FPGA domain to HPS domain will be backpressured if the bridges have not been released, Nios-II may stall resulting from this backpressure mechanism.

Resource Partitioning

There are no shared peripherals in this example design; ARM Cortex-A9 and Nios-II control the dedicated peripherals in their own domains. The only shared resource between ARM Cortex-A9 in HPS and Nios-II in FPGA will be SDRAM that connected to HPS physically. MCAPI transport layer is shared memory, therefore the HPS SDRAM needs to be accessible from both processors. Nios-II can access the HPS SDRAM via F2SDRAM bridge. The SDRAM will be partition into two partitions through u-boot booting arguments. Linux will be using most of the SDRAM, and a small portion of the memory will be used as a shared memory region between HPS and Nios II.

The shared memory must be marked as non-cacheable in both Linux and uC/OS-II. This is important as both Nios-II and ARM Cortex-A9 have their own caches; there is no cache coherency control between these two processors. Therefore, the processors have no knowledge on whether the contents in the cache are the latest if caching is allowed.

MCAPI Library

MCAPI specification is produced by the Multicore Association to standardize API for communication and synchronization between processing cores in embedded system.The following figure shows where the MCAPI framework resides in a multicore system.

For more details about MCAPI please refer to http://www.multicore-association.org/workgroup/mcapi.php.

The version of MCAPI specification implemented in this example design is 1.0.63. The specification can be downloaded from http://www.multicore-association.org/request_mcapi.php?what=MCAPI (requires registration).

mcapi-overview.png

MCAPI API are defined in MCAPI specification for user space applications; the APIs are OS agnostic and architecture agnostic. These characteristics allows the application developed using MCAPI API becomes portable for different platforms and OSes. However, the communication mechanism underlying MCAPI API between different OSes and cores are OS dependent and architecture dependent.The MCAPI library has been ported to support ARM Linux and Nios-II uC/OS-II for Altera SoCFPGA platform.

Features

This subsection gives a short description of the features supported by the MCAPI specifications:
  • Supports communication types
    • Connectionless message
    • Connection-oriented packet channels
    • Connection-oriented scalar channels (8, 16, 32 and 64-bit variants)
  • Supports blocking and non-blocking communication for packet channels and message channels
  • Supports blocking communication only for scalar channels
  • Priority handling
    • per message basis for message communication
    • per-endpoint basis for packet channels communication
    • per-endpoint basis for scalar channels communication
  • Packet channels are uni-directional
  • All communication types deliver data in FIFO manner

Frame Formats


This subsection describes the frame format for different type of communication. The maximum frame size is implementation specific. It is default to 1024B.

Connectionless Message Communication


The following two figures show the frame format being used for message communication type. The first 12 bytes are common openMCAPI header. Message communication is mainly used for control path messaging (such as setting up connection, create endpoints etc) and exchanging message between cores. For control path messaging, message type is defined in Protocol Type field after the common header. Control path messaging payload is in various sizes depending on the protocol type / request type. While the maximum payload size for data path is implementation specific. It is default to 1012B (maximum frame size minus 12B header overhead).

Message Frame Format (Control Path):
msg-frame-control-path.png

Message Frame Format (Data Path):
msg-frame-data-path.png

Connection-oriented Packet Channel Communication


The following figure shows the frame format being used for packet channel communication type. The first 12 bytes are common openMCAPI header. Packet channel communication is mainly used for data path messaging to exchange information between two applications reside on different cores. Application message carried in payload field is application specific and to be defined by the application. The maximum payload size is implementation specific. It is default to 1012B (maximum frame size minus 12B header overhead).

Packet Frame Format:
packet-frame.png

Connection-oriented Scalar Channel Communication


The following figure shows the frame format being used for scalar channel communication type. The first 12 bytes are common openMCAPI header. Scalar channel communication is mainly used for data path messaging to exchange scalar values between two applications reside on different cores. Payload of scalar frame is in fixed length, it is either 8 bits, 16 bits, 32 bits or 64 bits. The payload is expected to be consumed by the application directly without further decapsulation.

Scalar Frame Format:
scalar-frame.png

Maximum Message and Packet Size


The maximum message/packet size is made the same for both directions since both OSes communicates via the same shared memory transport layer. The maximum message/packet size is 1024B. The maximum packet size can be changed during build time via autotools configuration options.

Configuration Parameters


For this MCAPI library, the number of nodes to be supported is 2, which is one node forARM Cortex-A9 and one for Nios-II. While the endpoints supported for each node is 4, 2 endpoints are needed for control plane processing and 2 endpoints for data plane messaging as the messaging channels are uni-directional.

The following table shows several important settings which can be easily specified at build time and are applicable to both Linux and uC/OS-II builds.

Configuration option Meaning Default Value
MCAPI_MAX_DATA_LEN Maximum size, in bytes, of data that can be sent through an endpoint. The transport-level buffer may be slightly larger, to accommodate transport-specific metadata. 1024
MCAPI_MAX_ENDPOINTS The maximum number of endpoints. 4
CONFIG_SHM_NR_NODES Number of nodes communicating via shared memory segment. 2
The following table shows configuration parameters that are applicable only to uC/OS-II build.

Configuration option Meaning Default Value
UCOSII_MCAPI_MCOMM_MODE_INT The running mode of MCAPI threads, either in poll or interrupt mode. If 1 then is in interrupt mode otherwise is poll mode. 0
UCOSII_MCAPI_MCOMM_PHYS_BASE Physical base address for MCAPI shared memory region in UC/OS-II. User defined
UCOSII_MCAPI_MCOMM_INIT_LOCK Mutex instance name to be used as initialization lock in MCAPI library. The name can be retrieved from Qsys design. User defined
UCOSII_MCAPI_MCOMM_DESCQ_LOCK0 Mutex instance name to be used as descriptor queue lock in MCAPI library. The name can be retrieved from Qsys design. User defined
UCOSII_MCAPI_MCOMM_DESCQ_LOCK1 Mutex instance name to be used as descriptor queue lock in MCAPI library. The name can be retrieved from Qsys design. User defined
UCOSII_MCAPI_MCOMM_BUFQ_LOCK Mutex instance name to be used as buffer queue lock in MCAPI library. The name can be retrieved from Qsys design. User defined
UCOSII_MCAPI_MCOMM_MBOX_TX Mailbox instance name to be used as message sender in MCAPI library. The name can be retrieved from Qsys design. User defined
UCOSII_MCAPI_MCOMM_MBOX_RX Mailbox instance name to be used as message sender in MCAPI library. The name can be retrieved from Qsys design. User defined
ADDRESS_SPAN_EXTENDER_NIOS2SDRAM1G_BASE Base address of the address span extender bridge in the hardware design for FPGA domain. The bridge is used to connect Nios-II to f2sdram bridge so that Nios-II can access HPS SDRAM. User defined
For Linux, the configuration parameters are applied by using the DeviceTree, as shown in the example below:
mcomm: mcomm@0x0x30000000 {
    compatible = "altr,mcomm";
    reg = < 0x30000000 0x00100000 >;
    int_mode = < 1 >;
    init-lock = < &mutex_0_shared_ddr >;
    bufq-lock = < &mutex_1_shared_ddr >;
    descq-lock-0 = < &mutex_2_shared_ddr >;
    descq-lock-1 = < &mutex_3_shared_ddr >;
    mailbox-tx = < &mbox_arm2nios >;
    mailbox-rx = < &mbox_nios2arm >;
};

The following table presents the DeviceTree parameters:

Parameter Description Default Value
reg Base address and size. < 0x30000000 0x00100000 >
int_mode Operation mode: 1-interrrupt, 0-polling < 1 >
init-lock Mutex instance name to be used as initialization lock in MCAPI library. < &mutex_0_shared_ddr >
bufq-lock Mutex instance name to be used as buffer queue lock in MCAPI library. < &mutex_1_shared_ddr >
descq-lock-0 Mutex instance name to be used as descriptor queue lock in MCAPI library. < &mutex_2_shared_ddr >
descq-lock-1 Mutex instance name to be used as descriptor queue lock in MCAPI library. < &mutex_3_shared_ddr >
mailbox-tx Mailbox instance name to be used as message sender in MCAPI library. < &mbox_arm2nios >
maibox-rx Mailbox instance name to be used as message sender in MCAPI library. < &mbox_nios2arm >

Assumptions and Constraints

  • The code is based on the MCAPI specification version 1.0.63.
  • Base on openMCAPI framework released by Mentor Graphics as open source on website: https://bitbucket.org/hollisb/openmcapi/wiki/Home.
  • Number of nodes and connectivity topology are known at design time, not run-time. Therefore MCAPI specification does not specify link configuration and link management.
  • MCAPI specification does not take care of endianess, endianess is architecture and implementation specific. Nios-II soft cores and ARM Cortex-A9 cores are both little endian.
  • Some assumptions and constraints inherited from openMCAPI framework
    • Only supports unicast, not supporting multicast and broadcast.
    • For connection-oriented communication, it is assumed to be reliable. There is no acknowledgement mechanism like TCP connection.
    • For connectionless communication, sending a message to a non-exist endpoint is not an invalid send request, this type of requests will be reported as success by MCAPI API. Error handling for this type of error is not within the scope of this implementation.
    • Packets will be discarded if no working link to the specified destination node can be found, there is no error reporting by MCAPI API. Error handling for this type of error is not within the scope of this implementation.
    • Error code will be returned when the system is running out of buffer. It is up to the application layer to react on the failure and resend the message upon receiving error code.

OpenMCAPI Framework

This section gives an overview of OpenMCAPI framework, that was used as starting point for the MCAPI implementation on both Linux and Nios-II.

For more details about OpenMCAPI please refer to https://bitbucket.org/hollisb/openmcapi/wiki/Home.

The following figure shows an overview of the Mentog Graphics OpenMCAPI implementation on Linux OS on PowerPC platform.

openmcapi-overview.png

The MCAPI API called by application resides in user space.

There is a transport layer sits between kernel driver and MCAPI API layer. This transport layer is agnostic to target platform and OSes. OS specific implementation is abstracted by the OS abstraction layer (in this case known as Linux layer). All platforms may use the transport layer to link to the platform specific and OS specific implementation of physical layer in kernel space (kernel modules).

Shared memory (SHM) is used as the physical layer for OpenMCAPI library

Buffer management, queue management and route interface management are handled in the transport layer driver (a.k.a SHM management driver).

The MCAPI generic layer and transport layer are compiled into a static library (libmcapi.a) so that it can be linked to applications.

Master-slave model is used in the SHM management driver. All cores shares the same SHM management block. However, only the first node becomes master and the rest will be slave nodes.

A pool of buffers is created from the shared memory region allocated in kernel module for MCAPI communication.

Each node on the core is allocated with a route interface that enables the messages/packets being routed to the right core. Besides that a buffer descriptor queue is created for each node to hold the incoming messages/packets from other nodes.

Kernel module beneath transport layer is responsible for memory allocation and mapping for the shared memory region. Callback functions from the kernel modules are registered to this transport layer.

Shared Memory (SHM) Buffer Management

For the communication between cores, a shared memory region is defined and allocated. The memory allocated must be a contiguous region. Both cores are able to read and write the shared memory region for intercore communication.

The shared memory region is initialized by the master node (first MCAPI node in the system) to create a pool of fixed size (1024B) buffers that are indexed.

The maximum SHM buffers allocated for the system during initialization is 128 (SHM_BUFF_COUNT). Therefore, the buffers are indexed with range of [0..127].

A buffer is taken from the pool and allocated to a node for message/packets communication when needed. The availability of buffers in the pool is tracked via a counter (shm_buf_count) and bit masks (buff_bit_mask) to derive which buffer is free for use.

A locking mechanism is introduced to prevent race condition of update the shared memory buffer variables (such as the counter and bit mask) due to the MCAPI library is running in multi-threaded environment and also multi-core system.

Shared Memory (SHM) Queue Management

MCAPI messages/packets are enqueued into the SHM queue to reach destination.

The queues are accessible by all nodes in the system.

The depth of the queue is 16 entries (SHM_BUFF_DESC_Q_SIZE).

The descriptor queue is implemented as a ring queue using producer-consumer concept that the sender is producer and the receiver as consumer. The sender will enqueue the message/packet to be sent into the ring queue and the receiver will dequeue it from the queue.

There is a counter for each queue to indicate how many entries are enqueued into the particular queue. The counter is treated as a mailbox here. When the counter is greater than 0 then it means that the mailbox is active; then it will either trigger interrupt to the destination node or wake up the receiving thread from sleeping to process the message/packet.

Shared Memory and Buffer Management

Buffer pools are shared between cores for existing implementation.

There number of SHM buffers allocated is 128 (# of SHM buffer). The maximum message/packet size (MCAPI_MAX_DATA_LEN) supported is 1024B and the header overhead for MCAPI frame is 12 bytes.

Besides that, the SHM buffer entry consist of buffer index, next pointer, buffer size, and etc, that total up to 20 bytes per entry.

Each SHM buffer entry has a fixed size MCAPI buffer (MCAPI_MAX_DATA_LEN) allocated for MCAPI message/packet.

For HHP MCAPI library, the number of nodes supported is 2, and the number of endpoints supported for each node is 4.

MCAPI_MAX_DATA_LEN is set to 1024B.

Therefore, the memory size for a SHM bufferentry is:SHM buffer size = 20B + MCAPI_MAX_DATA_LEN= 1044B

Thus, the minimum size required for the SHM buffers is stated as below:Total SHM buffer size = # of SHM buffer * SHM buffer size= 128 * 1044B= 133632B˜ 130kB

Besides that, the MCAPI buffer is aligned to 4K address. Therefore, the total memory size (M) required for the SHM buffers (shared memory region) is at least 131kB.

The shared memory region for this buffer pool is located in SDRAM with both cores visible to this region. To avoid data coherency problem, the shared memory region is mapped to both OSes as non-cacheable region.

Locking Mechanism

There are three type of locks being used to protect the critical section of the shared memory region:
  • Initialization lock (_shm_drv_mgmt_struct_-> shm_init_lock) A lock to prevent multiple initialization of shared memory region from multiple cores that run different OSes. The shared memory region needs to be initialized once only and then being used by both cores.
  • Buffer descriptor queue lock (_shm_drv_mgmt_struct_->_shm_buff_desc_q_->lock) A lock to prevent multiple access and race condition to the buffer descriptor queue. The buffer descriptor queue is a ring queue and the maximum entries in the ring queue is 16. The buffer descriptor queue uses producer and consumer concept, the sender is known as the producer and the receiver is known as the consumer. The number of MCAPI messages/packets being enqueued in the ring queue is kept tracked via a counter. Both the producer and consumer will manipulate on the counter, one to increment the counter after enqueue and one to decrement the counter after consume it. Therefore, it should be protected by a lock to prevent race condition.
  • Buffer pool lock (_shm_drv_mgmt_struct_->_shm_buff_mgmt_blk_->lock) A lock to prevent multiple access and race condition to the buffer pool. The SHM buffer pool is the main structure to pass the messages/packets between cores. The SHM buffer pool is located in shared memory region. Both cores can get free buffers from the pool and free unused buffers back to the pool concurrently as they are running with different OS instances that have no knowledge on each other. Variables such as number of free buffers available, buffer index that marked the buffer as used or unused need to be protected. Therefore, the buffer pool needs to be protected with a lock.

Data Handling Flow

For communication between endpoints, a sender needs to create the message and then send the data over MCAPI channel to the endpoints. The receiver will then receive the message at its endpoint.

Both polling and interrupt mechanisms are supported through Linux kernel module in the openMCAPI library. Upon initialization of the MCAPI library in Linux; two threads are created for each node to handle incoming messages to its endpoints. One of the thread (mcapi_receive_thread) processes all the incoming messages/packets and enqueue the messages/packets to their endpoint’s receive queue. Control plane messages will be enqueued bythis thread into control plane endpoint’s receive queue. While the other thread (mcapi_process_ctrl_msg) processes the control plane messages.

The control plane thread’s routine is an infinite loop that waits for the incoming messages to the RX control endpoint for a particular node. The thread will be suspended if there is no message in the RX control endpoint’s queue. It will be signaled and resumed when there is any control messages arrive into the queue.

While for the mcapi_receive_thread thread, the thread can either be running in poll mode or interrupt mode. The thread’s routine is also an infinite loop. In poll mode, the thread will poll its SHM receive queue to check for message/packet entries from other endpoints. The thread continues polling until there are entries available and then dequeue the message/packet entry and enqueue the entry to the corresponding endpoint’s receive queue. The polling continue until the thread is killed.

When interrupt is enabled for the system/cores, the mcapi_receive_thread is put into sleeping state in the interruptible wait queue (for Linux OS environment). The thread can either be woken up upon the waiting condition is true or an interrupt is received. An interrupt will be generated targeting the destination node when a message/packet is sent to that particular node. In this case, the IRQ handler of the target node will be triggered and wake up this sleeping thread. The other condition to wake up the thread is the waiting condition evaluates to true when the wait queue is woke up. The waiting condition evaluates to true when the mailbox is active, this means that when the counter of queue entries in SHM queue is not zero.

Implementation Details

OpenMCAPI Library Porting

This section briefly describe a few points that need to take into consideration when enabling openMCAPI for Altera SoCFPGA platform support from hardware design software design perspectives.

Autotools Enablement

openMCAPI library has been enabled to be built and configured with autotools. The original build method using phyton bases ‘waf’ tool has been disabled. There are a few files being added and modified to enable autotools as shown below:
  • Makefile.am
  • config.sub
  • configure.ac
  • libmcapi/Makefile.am
  • libmcapi/include/openmcapi.h
  • libmcapi/shm/linux/kmod/Makefile
  • util/Makefile.am

Altera SoCFPGA Platform Enablement

openMCAPI has been enabled for Altera SoCFPGA platform for ARM Linux and Nios-II uC/OS-II. There are a few files being added/modified to enable that as shown below:
  • include/openmcapi_cfg.h
  • include/ucosii/mcapi_os.h
  • libmcapi/include/arch/arm/barrier.h
  • libmcapi/include/nios2/barrier.h
  • libmcapi/include/lock.h
  • libmcapi/mcapi/ucosii/mcapi_os.c
  • libmcapi/shm/linux/kmod/Kbuild
  • libmcapi/shm/linux/kmod/common.c
  • libmcapi/shm/linux/kmod/loop.c
  • libmcapi/shm/linux/kmod/mcomm.h
  • libmcapi/shm/linux/kmod/socfpga.c
  • libmcapi/shm/linux/shm_os.c
  • libmcapi/shm/shm.c
  • libmcapi/shm/shm.h
  • libmcapi/shm/ucosii/shm_os.c
  • libmcapi/shm/ucosii/socfpga.c
  • libmcapi/shm/ucosii/ucosii_mcomm.h
  • util/memtool.c

Locking Mechanism

The physical layer of transport layer for the MCAPI library is a shared memory, both Nios II and ARM A9 have the read and write access to the shared memory region concurrently. Therefore a locking / mutex mechanism needs to be enforced to protect the critical section of this shared memory region.

These locks are implemented with mutex soft IP, the number of mutex soft IP requires for this example design is 4 (LOCK_MAX_NUM = 2 + DESCQ_LOCK_NUM ). Please note that the number of buffer descriptor queue locks need to be increased when the number of MCAPI nodes supported is increased (DESCQ_LOCK_NUM == CONFIG_SHM_NR_NODES). Therefore, the hardware design need to be modified to increase the number of mutexes when more MCAPI nodes are supported in future.

The device tree for Linux and also SHM driver (socfpga.c)needs to be updated accordingly also if there are changes in the number of mutexes in the design.Besides that, configure.ac file also needs to updated accordingly to take in more mapping of mutexes.

Interrupt Mechanism

Interrupt mechanism is needed to notify the cores in a system on the arrival of new MCAPI packets/messages. MCAPI library will make use of Mailbox IP to interrupt the other processor when a packet needs to be sent. Processor can use Mailbox IP to interrupt the other processor by writing the CMD and DATA register to the Mailbox IP’s address.

Please note that the number of mailbox soft IP required needs to be increased when more than one Nios-II cores are supported in future. The existing design only requires 2. Each pair of MCAPI nodes requires 2 mailboxes for the interrupt mechanism as mailbox is uni-directional.

The device tree for Linux and also SHM driver (socfpga.c)needs to be updated accordingly also if there are changes in the number of mailboxes in the design.Besides that, configure.ac file also needs to updated accordingly to take in more mapping of mailboxes.

Memory Barrier Support

For ARM architecture, memory barrier opcode is available for memory operation synchronization; therefore the memory barrier opcode is used to make sure memory operations in sync between HPS and FPGA domain. Please refer to libmcapi/include/arch/arm/barrier.h for the implementation.

However, there is no memory barrier opcode for Nios-II architecture. Memory barrier support is done by issuing a dummy read to any memory location. This will ensure the memory write operations are carried out prior this read operation. This is the hardware support in qsys fabric. Please refer to libmcapi/include/arch/nios2/barrier.h for the implementation. This mechanism may need to be updated following the changes of Nios-II architecture in future.

Non-cache support in Nios-II

Nios-II is enabled with data cache, there is no MMU support to specify the memory region cache policy. By default all data access is cached. However memory access to HPS SDRAM shared region must be un-cache; therefore memory access to the HPS SDRAM shared region is achieved by marking bit-31 of the SDRAM address to bypass cache. This is the supported feature in Nios-2. This mechanism may need to be updated following the changes of Nios-II architecture in future.

Address Mapping of Shared Memory Region

The shared memory region resides in HPS SDRAM; a window bridge is used to enable Nios-II to map to the HPS SDRAM into its view with the base address specify in qsys design. The BSP support for window bridge in ACDS13.1 is not complete yet; therefore a definition for the mapping of windown bridge on the region that HPS SDRAM region can be seen by Nios-II is hard coded to 0x3000000 as ADDRESS_SPAN_EXTENDER_NIOS2SDRAM1G_RESET_MAP in this example design (socfpga.c for uC/OS-II). This needs to be replaced when BSP editor expose this definition in BSP. Please note the mapping value may be changed based on the design. It is design time parameter.

CPUID for ARM Processor and Nios-II Processor

A unique CPUID needs to be assigned to all the processors in the system so that each processor can be identified separately. The CPUID field is used when acquiring the mutex lock, therefore it must be unique and no overlapping on the CPUID value for ARM processor and Nios-II processor. ARM Cortex-A9 is a dual core processor, the CPUID for core 0 is 0 and core 1 is 1. Linux SMP is run on ARM processor and the processor ID return for SMP mode is 0. CPUID for Nios-II is user defined, it is defined as 2 in this example design.

uC/OS-II Task Priority

uC/OS-II is a pre-emptive RTOS, higher priority task will interrupts lower priority task. Therefore, it is very important to set the task priority correctly. Priority 0 and 1 are reserved for system tasks that have the highest priority. The default range of task priority for uC/OS-II ranges from 0 - 63. Task with priority 63 (OS_LOWEST_PRIO) has the lowest priority. Application running in uC/OS-II can be assigned to priority between 3 (APP_CFG_TASK_START_PRIO)to 61.

There are 2 tasks being spawned out in MCAPI implementation. One of the tasks (mcapi_receive_thread) is on data plane that continue to check on availability of messages/packets to a particular node and dispatches the message/packets to the intended endpoint queue. The other task (mcapi_process_ctrl_msg) is responsible to manipulating control plane message. The control plane task should have higher priority than the data plane task as control plane task handles management tasks such as node creation, endpoint creation, establish connection and etc. Data plane task is the main entry for MCAPI library to receiving incoming messages/packets.

The following Table shows the task priority being assign to all tasks executed on uC/OS-II core, N can be fined tuned based on the system design. The default value for N in this reference design is 5.

Task Priority Task Category
0 to 1 System tasks
2 to N – 1 Application tasks
N MCAPI task (mcapi_process_ctrl_msg)
N + 1 MCAPI task (mcapi_receive_thread)
N + 2 to 61 Application tasks
62 Timer task
63 Idle task

FPGA Configuration


There are multiple ways to configure FPGA, such as via HPS software (bootloader and Linux driver) or external flashes. This example design deviates from CV SoC GSRD for the FPGA configuration approach as the tool support for AMP example design is limited now. There is no complete solution to enable Nios-II ELF to be downloaded from HPS domain yet. Therefore, this example design falls back to traditional approach that uses external flash (EPCS) for the FPGA configuration and Nios-II ELF download.

Boot Script for u-boot

u-boot boot script is used to release bridges upon FPGA goes into user mode (L1 – L6 in Table 2) and also to partition HPS SDRAM (L8 in Table 2).

The shared memory region is partitioned in u-boot so that Linux cannot claim the shared memory region as free memory pool in the kernel, it can only mapped it as I/O region. In this example design the upper 256MB of HPS SDRAM is reserved for the shared memory region though the memory size required is less than 256kB. User may also make use of the shared memory region for larger data sharing, this can be achieved by further partition the shared memory region for data payload and then the MCAPI message will only send the address of the data payload to the receipients.

Users may choose to change the buffer size via autotools configuration and the memory partition size in u-boot to desired size through u-boot memory parameter.

L1 setenv fpga_in_user_mode 0x94
L2 setenv fpgamgr_status 0xff706000
L3 while itest *$fpgamgr_status -ne $fpga_in_user_mode; do
L4 echo .
L5 done
L6 run bridge_enable_handoff;
L7 run mmcload;
L8 setenv mmcboot setenv bootargs console=ttyS0,115200 mem=768M root=${mmcroot} rw rootwait\;bootz ${loadaddr} - ${fdtaddr}
L9 run mmcboot;

Device Tree for ARM Linux

XML files needed for device tree generators (DTG) to generate the dts for this example design is similar to the XML files in CV SoC GSRD. The mcomm node for openMCAPI library must be added into the XML files to generate a workable dts. For example:
<DTAppend name="mcomm@0x0x30000000" type="node" parentlabel="sopc0" newlabel="mcomm"/>
<DTAppend name="compatible"  parentlabel="mcomm" >
<val type="string">altr,mcomm</val>
</DTAppend>
<DTAppend name="reg" parentlabel="mcomm">
<val type="hex">0x30000000</val>
<val type="hex">0x800000</val>
</DTAppend>
<DTAppend name="int_mode" type="number" parentlabel="mcomm" val="0"/>
<DTAppend name="init-lock" parentlabel="mcomm" >
        <val type="phandle">mutex_0_shared_ddr</val>
</DTAppend>
<DTAppend name="bufq-lock" parentlabel="mcomm" >
        <val type="phandle">mutex_1_shared_ddr</val>
</DTAppend>
<DTAppend name="descq-lock-0" parentlabel="mcomm" >
        <val type="phandle">mutex_2_shared_ddr</val>
</DTAppend>
<DTAppend name="descq-lock-1" parentlabel="mcomm" >
        <val type="phandle">mutex_3_shared_ddr</val>
</DTAppend>
<DTAppend name="mailbox-tx" parentlabel="mcomm" >
        <val type="phandle">mbox_arm2nios</val>
</DTAppend>
<DTAppend name="mailbox-rx" parentlabel="mcomm" >
        <val type="phandle">mbox_nios2arm</val>
</DTAppend>

Besides that, <IRQMasterIgnore className="altera_nios2_qsys"/> must be added into boardinfo file so that dts can be generated correctly for ARM view. This is because the interrupt lines are hooked up to multiple masters in this design.

Linux Kernel Configuration

Mailbox soft IP driver and Mutex soft IP driver need to be enabled in Linux kernel configuration in addition to the existing kernel configuration for CV SoC GSRD.

Building the Reference Design

The reference design is delivered both in binary format (ready to run) and in source format. This section presents how to build the example design from sources.

build-flow.png

Prerequisites

The hardware design requires a Linux host machine in order to be built. This is because the Linux Yocto recipes require a Linux host machine. All the other steps can also be performed on a Windows machine.

The following tools are required in order to build the Reference Design:

Tool Usage
SoC EDS v13.1 Generate and build Preloader
Generate and compile Device Tree
Quartus II v13.1 Compile Hardware Design
Generate and compile Nios II BSP supporting uC/OS-II
Compile OpenMCAPI library for Nios II
Compile Nios II application
Create Flash Files

Building Hardware Design

1. Get the hardware design archive from http://releases.rocketboards.org/release/2014.04/dwamp-rd/cv-dwamp-rd-hw.tar.gz and save it the home folder.

2. Unzip the hardware design archive:
$ cd ~
$ tar xzf cv-dwamp-rd-hw.tar.gz

The following folder will be created: ~/cv-dwamp-rd/hw.

3. Compile the design using the procedure detailed at Compiling Hardware Design.

The following files will be created in the folder ~/cv-dwamp-rd/hw:
Item Description
amp_arm1nios_5csxfc6.sof FPGA Configuration File
hps_isw_handoff Handoff folder for Preloader Generator
amp_arm1nios_5csxfc6.sopcinfo SopC Info file used by Device Tree Generator and Nios-II BSP Editor
4. Convert the sof file to flash format, suitable to be written to the board EPCQ configuration device:
$ ~/altera/13.1/nios2eds/nios2_command_shell.sh
$ cd ~/cv-dwamp-rd/hw/output_files
$ sof2flash --epcq amp_arm1nios_5csxfc6.sof --output=amp_sof.flash

This will create the FPGA configuration flash file: ~/cv-dwamp-rd/hw/output_files/amp_sof.flash.

Generating and Compiling the Device Tree

The following commands can be used to generate the Device Tree, after the hardware design is compiled:
$ ~/altera/13.1/embedded/embedded_command_shell.sh
$ cd ~/cv-dwamp-rd/hw
$ sopc2dts --input amp_arm1nios_5csxfc6.sopcinfo --output amp_arm1nios_5csxfc6.dts \
--board amp_arm1nios_5csxfc6_board_info.xml --board hps_clock_info.xml --bridge-removal all $ dtc -I dts -O dtb -o amp_arm1nios_5csxfc6.dtb amp_arm1nios_5csxfc6.dts

The following files will be created in the folder ~/cv-dwamp-rd/hw:
File Description
amp_arm1nios_5csxfc6.dts Device Tree Source
amp_arm1nios_5csxfc6.dtb Device Tree Binary

Building Preloader

In order to build the Preloader the following steps are needed:

1. Build the Hardware design to obtain the Handoff folder.

2. Retrieve the tools archive from http://releases.rocketboards.org/release/2014.04/dwamp-rd/cv-dwamp-rd-tools.tar.gz and unzip it in the home folder. You will obtain the folder ~/cv-dwamp-rd/tools/.

2. Configure and Build the Preloader
$ ~/altera/13.1/embedded/embedded_command_shell.sh
$ mkdir -p ~/cv-dwamp-rd/sw/preloader
$ cd ~/cv-dwamp-rd/sw/preloader
$ bsp-create-settings \
--preloader-settings-dir=../../hw/hps_isw_handoff/amp_arm1nios_5csxfc6_hps_0/ \
--settings=settings.bsp --type=spl $ bsp-update-settings \
--settings=settings.bsp \
--set spl.PRELOADER_TGZ ../../tools/uboot-socfpga.tar.gz $ bsp-generate-files --settings=settings.bsp --bsp-dir . $ make

This will create the Preloader image: ~/cv-dwamp-rd/sw/preloader/preloader-mkpimage.bin.

Building Linux

The following commands need to be run in order to build the Linux binaries. Note that the ~/cv-dwamp-rd/swneeds to be created first if not already existent:
$ mkdir -p ~/cv-dwamp-rd/sw 
$ cd ~/cv-dwamp-rd/sw
$ git clone http://git.rocketboards.org/poky-socfpga.git  
$ cd  poky-socfpga/
$ git checkout -b test_branch_name tags/ACDS13.1_REL_AMP_PR
$ source altera-init
$ bitbake altera-amp-image virtual/kernel virtual/bootloader 

This will create the following files in the folder ~/cv-dwamp-rd/sw/poky/build/tmp/deploy/images:
File Name Description
altera-amp-image-socfpga_cyclone5.cpio Rootfs as cpio archive
altera-amp-image-socfpga_cyclone5.ext3 Rootfs as ext3 image
altera-amp-image-socfpga_cyclone5.jffs2 Rootfs as jffs2 image
altera-amp-image-socfpga_cyclone5.tar.gz Rootfs as tar.gz archive
u-boot-socfpga_cyclone5 U-boot elf executatble
u-boot-socfpga_cyclone5.bin U-boot binary
u-boot-socfpga_cyclone5.img U-boot image
vmlinux Linux kernel elf executable
zImage Linux kernel compressed image

Creating SD Card Image

1. Generate and Compile Preloader

2. Build Linux Yocto Recipes

3. If not already done so, retrieve the tools archive from http://releases.rocketboards.org/release/2014.04/dwamp-rd/cv-dwamp-rd-tools.tar.gz and unzip it in the home folder. You will obtain the folder ~/cv-dwamp-rd/tools

5. Create folder to contain the sd card image and intermediate files
$ mkdir -p ~/cv-dwamp-rd/sd_card 

6. Create U-Boot script file ~/cv-dwamp-rd/sd_card/u-boot.txtwith the following contents:
setenv fpgamgr_status 0xff706000
while itest *$fpgamgr_status -ne $fpga_in_user_mode; do
   echo .
done
mw.l 0x30000000 12348765
mw.l 0x30000004 12348765
run bridge_enable_handoff;
run mmcload;
setenv mmcboot setenv bootargs console=ttyS0,115200 mem=768M root=${mmcroot} rw rootwait\;bootz ${loadaddr} - ${fdtaddr}
run mmcboot;

Then wrap the script with the U-Boot required mkimage header:
$ cd ~/cv-dwamp-rd/sd_card
$ mkimage  -A arm -O linux -T script -C none -a 0 -e 0 -n "My script" -d u-boot.txt u-boot.scr

Note that on some systems the mkimage may not be installed. You can use instead the executable that is built with the Preloader: ~/cv-dwamp-rd/sw/preloader/uboot-socfpga/tools/mkimage.

Or you can install the ubot-tools using various methods, depending on your host Linux distribution. For CentOS for example you can use the following instructions:
$ wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
$ sudo rpm -Uvh epel-release*rpm
$ rm epel-release*rpm
$ sudo yum install uboot-tools 

7. Run the following commands to create the SD card:
$ ~/altera/13.1/embedded/embedded_command_shell.sh
$ cd ~/cv-dwamp-rd/sd_card
$ mkdir rootfs
$ cd rootfs
$ sudo tar xzf ~/cv-dwamp-rd/sw/poky-socfpga/build/tmp/deploy/images/altera-amp-image-socfpga_cyclone5.tar.gz
$ cd ..
$ cp ~/cv-dwamp-rd/hw/amp_arm1nios_5csxfc6.dtb socfpga.dtb
$ cp ~/cv-dwamp-rd/sw/poky-socfpga/build/tmp/deploy/images/zImage .
$ cp ~/cv-dwamp-rd/sw/preloader/preloader-mkpimage.bin .
$ cp ~/cv-dwamp-rd/sw/poky-socfpga/build/tmp/deploy/images/u-boot-socfpga_cyclone5.img .
$ sudo ~/cv-dwamp-rd/tools/make_sdimage.sh  \
-k socfpga.dtb,u-boot.scr,zImage \
-p preloader-mkpimage.bin \
-b u-boot-socfpga_cyclone5.img \
-r rootfs/ \
-o sd_card_image.bin \
-g 1G $ sudo rm -rf rootfs/ socfpga.dtb zImage preloader-mkpimage.bin u-boot-socfpga_cyclone5.img

Creating and Building uC/OS-II BSP Library

The hardware design needs to be compiled before running this step.

The following commands will create the Nios II uC/OS II based BSP:
$ ~/altera/13.1/nios2eds/nios2_command_shell.sh
$ mkdir -p ~/cv-dwamp-rd/sw/
$ cd ~/cv-dwamp-rd/sw/
$ nios2-bsp ucosii ucosii_bsp ../hw/amp_arm1nios_5csxfc6.sopcinfo

The following items are created:
Folder Description
~/cv-dwamp-rd/sw/ucosii_bsp Folder containing the BSP: source code, Makefile etc
The following commands will build the Nios II uc/OS II BSP:
$ ~/altera/13.1/nios2eds/nios2_command_shell.sh
$ cd  ~/cv-dwamp-rd/sw/ucosii_bsp
$ make 

The following file is built:
File Description
~/cv-dwamp-rd/sw/ucosii_bsp/libucosii_bsp.a BSP Library File

Obtaining and Building uc/OS-II OpenMCAPI Library

This section presents on how to get the OpenMCAPI library and compile it for Nios II.

The OpenMCAPI library code is obtained by cloning the git tree from rocketboards.org:
$ mkdir -p  ~/cv-dwamp-rd/sw/
$ cd ~/cv-dwamp-rd/sw/
$ git clone http://git.rocketboards.org/openmcapi.git  
$ cd  openmcapi/
$ git checkout -b test_branch_name ACDS13.1_REL_AMP_PR

The Nios II BSP Library needs to be compiled before the OpenMCAPI library can be compiled.

The steps for configuring and compiling the library are:
$ ~/altera/13.1/nios2eds/nios2_command_shell.sh
$ cd ~/cv-dwamp-rd/sw/openmcapi
$ autoreconf -i
$ mkdir build
$ cd build
$ ../configure --host=nios2-elf BAREMETAL_OS=ucosii KERNELDIR=<home_folder>/cv-dwamp-rd/sw/ucosii_bsp \
MCAPI_MCOMM_MODE_INT=-1 ADDRESS_SPAN_EXTENDER_NIOS2SDRAM1G_BASE=0x30000000 \
UCOSII_MCAPI_MCOMM_PHYS_BASE=0x30000000 UCOSII_MCAPI_INIT_LOCK="/dev/mutex_0_shared_ddr" \
UCOSII_MCAPI_MCOMM_BUFQ_LOCK="/dev/mutex_1_shared_ddr" \
UCOSII_MCAPI_MCOMM_DESQ_LOCK0="/dev/mutex_2_shared_ddr" \
UCOSII_MCAPI_MCOMM_DESQ_LOCK1="/dev/mutex_3_shared_ddr" \
UCOSII_MCAPI_MCOMM_MBOX_RX="/dev/mbox_arm2nios" UCOSII_MCAPI_MCOMM_MBOX_TX="/dev/mbox_nios2arm" \
CXXFLAGS=-g CFLAGS=-g $ make

Note that the configure command requires an absolute path to the Nios II BSP to be provided: <home_folder>/cv-dwamp-rd/sw/ucosii_bsp.

This will create the following file:
File Description
~/cv-dwamp-rd/sw/openmcapi/build/libmcapi/libopenmcapi.a OpenMCAPI Library

Building uc/OS-II Application

The uC/OS-II Application uses the following items that need to be compiled first:
  • Nios II BSP Library
  • OpenMCAPI Library

Obtaining the Source Code

Retrieve the source code archive from http://releases.rocketboards.org/release/2014.04/dwamp-rd/cv-dwamp-rd-src-nios.tar.gz and unzip it into the home folder.

This will create the following:
Folder Folder File Description
~/cv-dwamp-rd/sw ucosii app_mcapi_demo.c Nios II Application Source
ucosii_init.c uC/OS-II Initialization Source

Compiling the Application

The following steps are required in order to compile the Nios II Application:
$ ~/altera/13.1/nios2eds/nios2_command_shell.sh
$ cd ~/cv-dwamp-rd/sw/ucosii
$ nios2-app-generate-makefile --bsp-dir=../ucosii_bsp/ --no-src
$ nios2-app-update-makefile --app-dir . --add-src-rdir .
$ nios2-app-update-makefile --app-dir . --add-inc-dir ../openmcapi/include/
$ nios2-app-update-makefile --app-dir . --add-inc-dir ../openmcapi/include/ucosii/
$ nios2-app-update-makefile --app-dir . --add-lib-dir ../openmcapi/build/libmcapi/
$ nios2-app-update-makefile --app-dir . --add-lib-name openmcapi
$ nios2-app-update-makefile --app-dir . --set-elf-name amp_mcapi_app.elf
$ make

This will create the following file:
File Description
~/cv-dwamp-rd/sw/ucosii/amp_mcapi_app.elf Nios-II Elf executable

Converting to Flash File

After the Nios II application is compiled, it needs to be converted to flash format, so that it can written to the EPCQ memory.

The following prerequisites are needed:
  • FPGA Configuration Flash File
  • Nios II ELF Executable

The following steps can be used to generate the flash file:
$ ~/altera/13.1/nios2eds/nios2_command_shell.sh
$ cd ~/cv-dwamp-rd/sw/ucosii/
$ elf2flash --after=../../hw/output_files/amp_sof.flash --input=amp_mcapi_app.elf \
--epcs --output=amp_mcapi_app.flash

The generated flash file is:
File Description
~/cv-dwamp-rd/sw/ucosii/amp_mcapi_app.flash Nios Flash File

Debugging the Reference Design

There are multiple tools that can be used to debug various pieces of the Reference Design:
  • ARM DS-5 Altera Edition
  • SignalTap II
  • System Console

debugging.png

Tool Feature Connection
ARM DS-5 Altera Edition Debug ARM software JTAG
Trace ARM software JTAG or ETM
Debug Linux Applications Ethernet
System Console Debug FPGA Designs JTAG
SignalTap Debug FPGA Designs JTAG
Nios II Eclipse Debug Nios II software JTAG

ARM DS-5 Altera Edition

The ARM DS-5 Altera Edition Toolkit is part of the Altera SoC Embedded Design Suite (SoC EDS).

Some of the features of the tool are:
  • Debugging ARM software by using a JTAG connection
  • Tracing ARM software by using a JTAG or ETM connection
  • Debugging Linux applications by using an Ethernet connection
  • FPGA adaptive debugging
    • Ability to display and edit Soft IP peripheral registers
    • Ability to cross-trigger with the FPGA fabric

For more information about SoC EDS and DS-5 AE please refer to:

Signal Tap II

The SignalTap® II Logic Analyzer helps with the process of design debugging. This logic analyzer is a solution that allows you to examine the behavior of internal signals, without using extra I/O pins, while the design is running at full speed on an FPGA device.

The SignalTap II Logic Analyzer is scalable, easy to use, and is available as a stand-alone package or included with the Quartus® II software subscription. This logic analyzer helps debug an FPGA design by probing the state of the internal signals in the design without the use of external equipment. Defining custom trigger-condition logic provides greater accuracy and improves the ability to isolate problems.

The SignalTap II Logic Analyzer does not require external probes or changes to the design files to capture the state of the internal nodes or I/O pins in the design. All captured signal data is conveniently stored in device memory until you are ready to read and analyze the data.

For more details about SignalTap II please refer to the following:

System Console

System Console is a flexible system-level debugging tool that helps designers quickly and efficiently debug their design while the design is running at full speed in an FPGA.

System Console enables designers to send read and write system-level transactions into their Qsys system to help isolate and identify problems.

It also provides a quick and easy way to check system clocks and monitor reset states, which can be particularly helpful during board bring-up. In addition, System Console allows designers to create their own custom verification or demonstration tool using graphical elements, such as buttons, dials, and graphs, to represent many system-level transactions and monitor the processing of data.

For more details please refer to the following:

For an example of how System Console can be used to debug the Golden System Reference Design please refer to:

Errata

  1. In some instances, there will be some text (printf) not printed from u-boot during boot time. If user attached putty, it may further cause the boot to halt. Tested against teraterm and minicom and they only caused some text to not be printed but not halting the boot.
  2. The switch --loop for mcapi_test doesn't work. Please use -l for loop related runs for mcapi_test application.

References

Give us your feedback

© 1999-2017 RocketBoards.org by the contributing authors. All material on this collaboration platform is the property of the contributing authors. Privacy.