<strong>Paper Title</strong><br>
High Frequency Trading System-on Chip (HFT-SOC with FPGA/GPU Fabrics)<br>
<br>

<strong>Abstract</strong><br>
This paper discusses programmable System-on-Chip (SoC) design for High Frequency Trading System
implementation on a single chip using FPGA and GPU fabrics. The single-chip approach can be of particular value to HFT
systems developers facing stringent time-to-market, cost, performance, design reuse, and longevity requirements. The
proposed new design of SoC with embedded processors in field-programmable gate array of FPGAs, GPUs and CPUs runs
collaboratively for High Frequency Trading. The computational load of trading decision is divided over multiple processing
FPGA nodes operating in parallel to reduce the computation load on a single FPGA node. However, this requires a dedicated
Trade Processing Unit (TPU) a controller that can distribute the Market Data to and from the nodes connected with SoCs at
wire-speed. This paper concentrates on the system topologies and on the implementation aspects of the TPU and the allied
trade decision making sub modules such as Complex Event Processing (CEP), Trading Engine (TE), Derivatives Analytics
Engine (DAE), Pre-Trade and Post Trade Risk Engine (RE) and Order Processing Engine (OPE) all on FPGAs and GPUs on
a single SoC because of its integrated circuit like performance and high-grain programmability. The performance of this
HFT SoC has potential for improvement from 100 to 1000 times compare to traditional multi-core processors-based systems
which run on scheduling of OS. This HFT on single chip design which does not use any OS shows that our design can
combine the benefits of the specialization of FPGA, the parallelism of GPU, and the scalability of processors, memory
controllers, and peripherals with customizable FPGA fabric in a single SoC.
The TPU’s hardware implementation designed here is a modular, highly parameterizable design for 10-gigabit Ethernet. The
design can be verified by simulations and synthesizable test-benches. The design also can be synthesized on different FPGA
devices while varying parameters to analyze the achieved performance. High-end FPGA devices, such as Altera Stratix
family, can meet the target processing speed of 10-gigabit Ethernet. Here the latency of TPU can be comparable to a typical
switch.
An advanced design of CEP Engine at wire speed is proposed for processing Market Data to identify the complex events,
patterns
and matching with right strategies using semantics by a given market event and a set of financial strategies, find all strategies
satisfied by the identified market event.
The design of the computationally intensive module DAE is implemented by taking full advantage of the FPGA architecture
by building the financial computing applications for parallelism, such as option pricing, Value at Risk and Credit Risk
analysis are intimately mapped to hardware. All of the DAE module applications use computationally expensive Monte-
Carlo simulations with different work load relies on the average result of thousands of independent stochastic paths, massive
parallelism can be adopted to accelerate the computation with FPGAs.
Monte-Carlo Simulation Engine MCE is designed exclusively on separate FPGA to cater computational needs for DAE. All
other sub-modules of trading can be implemented on FPGAs with little easier than DAE module and sub-microsecond
latency can be achieved.
Keywords - Programmable Syste, Carlo Simulation Engine