# High-Speed SPI Bus Host Controller for Embedded Systems

Andrii Yarmilko Department of Automated Systems Software Bohdan Khmelnytsky National University of Cherkasy Cherkasy, Ukraine a-ja@ukr.net

Abstract—The article focuses on the consideration of the two-level embedded control systems construction based on the SPI bus, in which the Wi-Fi module in the master mode acts as a component of the upper level, effectively implements extended HMI functionality and CNC program buffering. The expediency and possibility of such control systems is substantiated, a practical solution option is offered. The clock frequency of 40MHz and word bit rate of 16 bits was defined as optimal. The advantages of composite decryption of system signals and multiplexing of clock signals are considered. The software model of information exchange is given. Block transaction mode DMA hardware support has been developed. The main differences of the offered high-speed SPI bus from standard decisions, perspective directions of the host controller realization and technical and economic advantages of its use are noted.

Keywords—host controller, SPI bus, embedded system, human-machine interface

## I. INTRODUCTION

The current range of microcontrollers and SoCs is an impressive. The products of only one individual company STMicroelectronics exceed 60 items [1]. It would seem possible to find a chip that will fully meet the requirements of even the most demanding project. However, the practice of developing controls, in some cases, calls into question the feasibility of building a single-chip system. The reasons are several. First, the choice of microcontroller, more powerful and richer on the periphery, usually leads to a noticeable increase in the cost of the system. This can be critical in the case of mass production. Secondly, the reservation of the pin's resource complicates the topology of the printed circuit board, increases the requirements for the accuracy of its manufacture. It should be noted that a significant increase in the range of microcontrollers' peripheral resource is accompanied by a lack of IO lines. Therefore, its use is complicated even by the remapping approach.

A compromise between the mentioned limitations can be achieved by modifying the procedures for managing the data movement processes in the microcontroller system. Such management should be aimed at more fully using the potential of standard data exchange buses. In particular, this concerns the SPI interface application. Literary sources provide a large number of solutions regarding the practice of using the SPI and its adaptations, including for solving problems with increased requirements for the speed of information processes [2] - [6]. In this study, similar solutions were considered in the context of microcontroller units' application as part of embedded systems and HMI tools. So, the urgent task is the structural, hardware and software adaptation of existing circuit design solutions based on modules with limited pin resources to the needs and requests of modern control tools.

It's proposed a solution to this problem by developing an external host controller of high-speed SPI bus and synthesis of software control model, both at the level of multi-channel bridge and the digital system as a whole. The advantages of this solution are improved scalability of embedded systems, integration of human-machine interface (HMI) components and optimization of peripheral loading through intelligent use of serial I/O ports of modern microcontrollers and SoC.

## II. BACKGROUND OF RESEARCH

The struggle for the niche of medium and low-budget CNC machines, technological and service equipment has shown that even low-price microcontrollers can provide proper quality control, such as STM32F401RTC (\$2.5/pc.) or even STM32F103C8T6 (\$2/pc.). However, in cases where it is desirable to provide the user with HMI-means of interaction with technological equipment, the cost of control systems increases by two orders of magnitude. The usage of graphics panels of the operator does not completely eliminate the problem, as the customer often insists on additional tactile means of controlling equipment in a special design. These requirements are related to the pollution of technological environments, aggressiveness and abrasiveness of their manifestations on the operator's gloves. The introduction of such tools significantly complicates the control system and makes the usage of serial HMI panels less attractive. Insufficient unification of human-machine interaction software development tools also does not provide them with advantages.

It should be noted that the use of HMI in process control is intermittent, and in some cases it is needed only at the stage of settings. The implementation of specific installations in the interactive mode with a limited widget's set and signal processing methods requires the usage of more powerful computers. Its interfaces should provide wider screen areas and have a high resolution. In particular, this applies to touch sensors too. External connection of portable computers is convenient to carry out wirelessly with Wi-Fi modules, which also have to be integrated into embedded systems.

These circumstances lead to the layering of data acquisition modules, PLC, HMI and Wi-Fi. At the same time, their resource is redundant and is used in the technological process control only in part. Further explanations will help to understand the specifics of the proposed approaches in the hardware and software of embedded systems. The essence of the proposals is to provide the possibility of expanding the system with a minimum level of modification.

The 3D printer modernization was considered as an object of applied implementation of the proposed software and hardware solutions. The main direction of such modernization was increasing of printing speed and optimizing of human-machine interaction.

## III. SPI BUS HOST CONTROLLER TOPOLOGY

The choice of the embedded systems extension bus is one of the main points of the study. It was held according to the following criteria:

- level of integration into modern microcontrollers, SoC, data collection and storage devices, control;
- bandwidth (communication performance);
- simplicity of hardware implementation, ease of connection and scalability;
- easy to configure and use, reliability;
- unambiguous addressing;
- lack of access arbitrage mechanisms, including network transfer requests;
- the ability to select any length of the package;
- full duplex communication mode.

In terms of hardware implementation simplicity and connection ease, the choice of bus type was limited to interfaces with serial data transmission. The rest of the criteria allowed to giving preference to SPI bus [7], [8]. This interface is not standardized, but using it to connect slave devices has never been a problem.

Another point of the study was to determine the frequency of bus timing. In practice, it is limited by the frequency characteristics of the SPI master, slave device, as well as the total capacity of the inputs and bus lines. It is clear that increasing the frequency, although reducing noise immunity is attractive in terms of minimizing the transactions time and queues for multichannel information exchange. The key factor in the decision was the choice ESP32-S2 as the master of the Wi-Fi module bus, and the fastest slave devices LCD and SDHC card. This combination of hardware components and their layout (60 mm bus length) determined the optimal clock frequency of 40MHz. Additional reasons for choosing this option will be repeatedly given below with reference to other components of the software and hardware complex.

According to the results of modeling and analysis of the embedded system's functionality, the topology was obtained, which is shown in Fig.1. Consideration exactly of the SPI high-speed bus controller circuitry "topology" is due to a significant level of generalizations both in terms of scaling and in terms of providing different operation modes. It is based on discrete low-integration digital chips. In terms of functional load, the D1 and D6 chips perform the functions of a host controller router, and the D2-D5 chips are slave devices of the SPI bus. This structure differs from the classical radial topology. This difference is due to the lack of CS selection lines in most subordinate devices and the need to ensure the latch data of the shift registers. The feasibility of using universal shift registers, such as the MCP23S17, as opposed to the possibility of using GPIO expanders is a valid question [9]. The answer is simple: their clock



Fig. 1. Topology of high-speed multichannel SPI bus.

frequency cannot exceed 10MHz, but this will reduce the bandwidth of the whole bus. As a result, it will lead to hierarchical queues for information exchange. This will greatly complicate the software implementation of the multichannel SPI bus manager. The minimum devices list capable of providing the required completeness of the equipment's functionality and the chips parameters are given in Tab. 1. All time and frequency parameters are taken from descriptions of the specified manufacturers for a supply voltage of 3.3V.

 TABLE I.
 CONTROL SYSTEM COMPONENTS PARAMETERS OF THE SPI BUS GROUP

| Notation | Parameters |                        |              |               |                   |                |
|----------|------------|------------------------|--------------|---------------|-------------------|----------------|
|          | Model      | f <sub>max</sub> (MHz) | $	au_d$ (ns) | Number (pcs.) | Manufacturer      | Notes          |
| _        | ILI9486    | $20^{a}$               |              | —             | Ilitec            | LCD controller |
|          | SSD1963    | 25 <sup>a</sup>        |              | _             | Solomon Systech   |                |
| _        | ESP32-S2   | 80 /FSPI/              |              | _             | Espressif Systems | Wi-Fi module   |
| D1       | 74LVC139   |                        | 2,5          | 1             | Nexperia          |                |
| D2       | 74LV164    | 70                     |              | 2             | Nexperia          |                |
| D3       | 74LVC595   | 180                    |              | 2             | Nexperia          |                |
| D4       | 74LV165    | 115                    |              | 2             | Nexperia          |                |
| D5       | W25Q128    | 80                     |              | 1             | Winbond           | SPI-Flash      |
| D6       | 74LVC1G157 |                        | 2,7          | 1             | Nexperia          |                |

Implementation of the embedded control system involves three components:

- Human-machine interaction at the local level in • optional performance. That is, in normal mode, the system operates autonomously, and services can be accessed through the onboard HMI, which includes an LCD monitor, tactile touch panel and buttons. For low-budget systems, the HMI panel as such may be missing. But a minimalist set of buttons and a communicator are always integrated into the system. Within the proposed host structure, the LCD monitor is accessed on two shift registers D2, which allows you to make connection by the 8080 interface [5]. Buffering of 16-bit data (65K-color, in RGB 5-6-5 format) is not performed because each transaction is activated by a positive WRx gate. The D/C line is program-controlled and determines whether the generated word is data or a command.
- The ROM module (D5) is used to store configuration files, control program, raster sprites and recipes.
- Discrete control lines and sensors are accessible via D3 and D4 dual shift registers, respectively. The 74LVC595-based D3 register with serial input and parallel output loads data into its buffer stage upon receipt of a positive STCP strobe. The D4 register on the 74LV165 serial-in/parallel-out chip loads data into the low signal shift register /PL. To ensure the integrity of the input data, the strobe /PL precedes the full-duplex data exchange with D3 and D4, which is timed to the Y1 channel clock.

A feature of the proposed SPI-bus host controller is the division of the master's CLK signal on the decoder D1.2 channels enabled in the demultiplexer mode. First of all, the expediency of such routing is due to ensuring the operation of the SPI-bus at a frequency of 40MHz, as the parallel

<sup>a.</sup> In the 8080 interface mode – parallel transmission of 16-bit data

clocking of 7 lines has resulting in total capacity more than 30pF. The presence of such a large capacity could lead to critical tightening and distortion of the CP and CLK signals edges.

Without violating the SPI bus bandwidth, the series of D1-D4 and D6 chips can be changed from LVC (LV) to AHC. More widely used HC series (HCT) can be used for applications where the SPI bus clock can be reduced to 20MHz. If necessary, the W25Q128 memory can be replaced with an SDHC card with a speed of at least 6 class. If 74LVC138 encoders/demultiplexers are used and the multiplexer is changed to 4-channel, the number of slave devices can be expanded to 8. However, this approach may require refinement of the software model below.

# IV. PROVIDING OF DMA TRANSACTION MODE

In the previous section, the hardware structure of the SPI bus host controller was presented in a simplified manner. This is due to efforts to ensure the phasing of perception. In reality, the information exchange via the LCD monitor and SD card is so significant that the consumption of MCU power for the formation of system gates has a significant programmatic impact on reducing the efficiency of the entire control system. Traditionally, the optimization of block data exchange is achieved by using DMA mode. The implementation of this mode in the case of LCD-monitor with video buffering and via SPI-bus sprites transmission causes the problem of system signals hardware synchronization with the data stream. In this research, the problem was solved by flip-flop buffering of the 16-bit word obtained in the serial shift register and processing the /TCU (carry strobe - terminal count up) scaling signal of the 74AC161 4-bit binary synchronous counter. The need for data word buffering is due to two factors. First, at high clock speeds of the parallel data bus, the difference in the delay time of signals on individual lines is noticeable. This requires

to increasing the time between putting the data on the bus and forming the  $\uparrow$ WR gate. However, the allowable delay is limited by the duration of the SPI register reload timeout from the transmitter FIFO buffer. As a rule, at hardware filling of the FIFO-buffer by the DMA controller the delay does not exceed 1 SCLK cycles. For high-speed SPI buses, the time is limited to 15ns. In such a short time, data can be guaranteed to be transmitted only within one crystal. Second, the clock speed of the parallel bus is 16 times lower than the SPI bus f<sub>clk</sub>. Therefore, it is not rational to minimize the WR gate formation time too much. Fig. 2 shows the circuitry of cascade, which implements the synthesis of data transmission mode DMA system signals.

For MODE#0' of the SPI interface, when data is recorded on the SCLK rising edge, it is necessary to ensure the operation of the increment counter on the CP signal (from CPU or CT) on the SCLK falling edge. Under such conditions, the scaling line TC will go low when the 16-bit word transmission is completed. To buffer the received data, the TC signal must be inverted and applied to the STCP line of both 74AHC595 registers. Within the research, the formation of the LCD *\WR* signal was carried out by a 74LVC1G123 monostable multivibrator with the pulse duration of 100 ns. This pulse duration is the limit, ie such that the input capacitance  $R_x$  has an inconspicuous effect, so an external capacitor with a capacitance of 10pF is not connected to the lines  $C_x$  and  $R_x$ . At  $V_{cc}$ =3.3V, the pulse duration of 100±20ns is provided by a resistor  $R_{EXT}$ =1.5k $\Omega$ . It is important to ensure the delay in the removal of the Y0=Low signal (similar to CS#0 in DMA mode) after the sprite transmission completion until the end of the LCD ↑WR gate formation.

The sprite transmission control algorithm includes four stages:

In stage 1, LCD line D/C and D1.1 decoder's lines A0, A1, /E1 are converted programmatically to low logic state. This causes Y0=CS#0=Low, which activates and switches the LCD to command read mode. The 16-bit word 0x002A of the Column Address Set command is loaded into the SPI transmitter and hardware transmission is performed. After emptying the transmitter register, the software delay to complete the LCD ↑WR signal hardware generation is not required, as the time for interrupt processing is greater. The status of the LCD D/C line changes to logic 1 and two address words of the start

and end sprite columns are transmitted by sequentially writing them in the SPI transmitter register.

- In stage 2, the status of the LCD D/C line changes to logic 0 and the Page Address Set command code 0x002B is transmitted. The D/C signal changes to logic 1 and two address words of the start and end sprite lines are transmitted.
- Stage 3: the status of the LCD D/C line changes to logic 0 and the Memory Write command word 0x002C is transmitted. The D/C signal changes to High, the DMA module is adjusted, and the controller's SPI module is switched to DMA mode.
- In stage 4, N words of the sprite are sequentially transmitted. Upon completion of the transaction, a software delay of 100 ns is generated, after which D1.1 decoder's /E1 gate is converted to High logic level.

Thus, in reality, due to the software control of the D/C LCD line, a quasi-DMA mode is implemented, but this circumstance has little effect on the efficiency of packet data transmission on the LCD. For example, when only one current coordinate of the 3D printer extruder is displayed, which is equivalent to 5 character bits and an area of 2160 pixels, only the first 7 words are transmitted with program clock. That is, the effectiveness reaches 99.7%.

## V. INFORMATION EXCHANGE SOFTWARE MODEL

The synthesis of the multichannel SPI bus scheduling model took into account the optimal segmentation of the LCD monitor and EEPROM memory data blocks, the exchange priority and system processes clocking. These criteria are not obvious and need further clarification. The lowest bus load is caused by transactions of actuators (Driver) control words and technological process (Key&Sensor) monitoring: their system frequency is low -20..100Hz, and time - only 0,5µs. However, the transmission of these signals is critical to compliance with the clock accuracy  $\Delta \tau \leq 10^{-5}$ s. Note that the gating of the micro-step mode of the drive operation is carried out on separate dedicated channels (f<sub>clk</sub>=40..100kHz), and Dir\_ signals of the Drive group should be transmitted via the SPI bus. Instead, the data packets of the LCD monitor and EEPROM are heavy and, accordingly, cause a significant load on the serial bus. In such circumstances, it was determined appropriate to



Fig. 2. Circuits of the DMA transaction signal synthesis cascade.

segment macro packets into blocks of 512..4000 bytes. The minimum block size is based on the size of the EEPROM or SDHC card page. The transmission of this size block lasts 0.125ms, which is equivalent to the 8 kHz frequency. Therefore, more than 80 blocks can be sent between the Driver and Key&Sensor signals.

Structurally, the multichannel SPI bus manager consists of a scheduler and a loader. The separation of processes is due to the need to optimize the using of system clocking time intervals. As can be seen from the above, in most cases, the current technological processes control can be done by LCDmonitors with buffering video frame (with\_GRAM\_Frame) [10]. Displays, such as ILI9486 or SSD1963, implement commands to display a limited area of the sprite region: Column Address Set (0x2A), Page Address Set (0x2B) and Memory Write (0x2C). This approach significantly relieves the information exchange between the MCU and the LCD, as static areas of visualization are excluded from processing. The geometric parameters of the dynamic displaying content elements are different, but to minimize the localization data amount, it is desirable to send sprites in one piece. The task of scheduler is to optimize graphic information blocks' and EEPROM pages sequence according to the criteria of cyclic buffers current filling of process control frames and minimize the waiting time for the next Driver and Key&Sensor signals transaction. The loader's software module according to the queue of blocks defined by the scheduler, directly forms the information exchange stack.

### **CONCLUSIONS**

The proposed implementation of high-speed SPI bus differs from the typical on the following signs:

- Selection signals of the CS # slave device have been replaced by a binary address to which the system gates are mixed.
- The clock signal is buffered and distributed by the demultiplexer.
- The MISO signal is multiplexed.

The differences give SPI buses faster performance, improve the scalability of control systems, and reduce the requirements for MCU or SoC pin resource. A fuller using of the proposed solutions benefits is achieved in two-tier control systems:

- The lower level must be executed on the microcontroller and will operate in hard real-time mode (ie even without RTOS) based on direct cascading hierarchical interrupt handling. It will control the technological equipment and technological process.
- The upper level must be implemented on the Wi-Fi controller, which provides HMI interaction on the SPI bus under limited pin resource.

This approach is optimal in terms of hardware. It provides the ability to perform corporate (group) control of technological equipment, such as a farm of 3D printers, from a single tablet or smartphone through a browser application. However, the diversity of equipment is not an obstacle to corporate control in accordance with proposed solutions.

The high-speed SPI bus host controller can be implemented in the form of a specialized chip, PLD or based on unified digital chips. Combining the latter two options, despite the departure from the dominant trend towards increased integration, provides the possibility of spatial scattering of SPI peripherals improves the control system scalability and simplifies PCB design.

#### REFERENCES

- [1] "STM32 32-bit Arm Cortex MCUs" [Online]. Available: https://www.st.com/en/microcontrollers-microprocessors/stm32-32bit-arm-cortex-mcus.html. Accessed on: January 24, 2022.
- [2] A. Purohit, M. R. Ahmed, and V. Reddy, "Area Optimization using Structural Modeling for Gate Level Implementation of SPI for Microcontroller", International Journal of Innovative Technology and Exploring Engineering (IJITEE), vol. 9, no. 1, pp. 4763 – 4768, Nov. 2019.
- [3] M.-C. Tuan, S. L. Chen, Y.-K. Lai, C.-C. Chen, and H.-Y. Lee, "A 3wire SPI Protocol Chip Design with Application-Specific Integrated Circuit (ASIC) and FPGA Verification," in *Proceedings of the 3 rd World Congress on Electrical Engineering and Computer Systems* and Science (EECSS'17), Rome, Italy, June 4 – 6, 2017, Paper No. EEE 110
- [4] I. N. Rodionov, I. V. Nesterenko, D. V. Telyshev, and I. A. Sapozhkov, "Display Interfaces for the Control Unit of an Implantable Cardiac Pump," in *13th Russian-German Conference on Biomedical Engineering (RGC)*, RGC, Aachen, Germany, May 23 25, 2018 [Online]. Available: <u>https://publications.rwth-aachen.de/record/723572/files/723572.pdf</u>. Accessed on: January 24, 2022.
- [5] W.-J. Wang, J.-B. Zhou, P. Fei, Y.-P. Li, H.-J. Qin, and J. Yu, "Implementation of High-speed Communication Based on SPI Bus Interface in Multichannel Energy Spectrum Analyzer", Hedianzixue Yu Tance Jishu/Nuclear Electronics and Detection Technology, vol. 37, no. 1, pp. 29 – 32 and 42, Jan. 2017.
- [6] M. Raj, "100 MHz High Speed SPI Master: Design, Implementation and Study on Limitations of using SPI at High Speed", IJRITCC, vol. 5, no. 7, pp. 697 – 700, Jul. 2017.
- "Using SPI for embedded system debug," *Byte Paradigm*, Revision
   1.02 8-Mar-15 [Online]. Available: https://www.byteparadigm.com/files/documents/BP\_UsingSPIForDe
   bug\_WP.pdf. Accessed on: January 24, 2022.
- [8] S. Saha, M. A. Rahman, and A. Thakur, "Design and implementation of SPI bus protocol with Built-in-self-test capability over FPGA", in *International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT 2014)*, Dhaka, Bangladesh, 2014, pp. 1-6, doi: 10.1109/ICEEICT.2014.6919076.
- "MCP23017/MCP23S17. 16-Bit I/O Expander with Serial Interface," Microchip Technology Inc., DS20001952C, 2016 [Online]. Available: <u>https://ww1.microchip.com/downloads/en/DeviceDoc/20001952C.pdf</u> Accessed on: January 24, 2022.
- [10] "Using FlexIO to Drive 8080 Bus Interface LCD Module: Using the FlexIO module to emulate 8080 bus interface," NXP Semiconductors, Document Number: AN5313, Application Note, Rev. 0, 07/2016 [Online]. Available: <u>https://www.nxp.com/docs/en/applicationnote/AN5313.pdf</u>. Accessed on: January 24, 2022.