Home > Technical Information > Other > How to significantly improve the efficiency of debugging high-speed memory

How to significantly improve the efficiency of debugging high-speed memory

Source:中华龙

Category:Other

2023-04-11 14:47:11

159

Intermittent memory failures can be very complex to handle. The root cause of these failures may be a combination of one or more different causes, including BIOS errors, protocol errors, signal integrity problems, hardware problems, memory or other subsystem problems.

Although some teams are able to quickly resolve memory debugging issues, more teams are unable to cope with intermittent failures.

This paper outlines the debugging methods for intermittent memory failures. Several examples are provided to illustrate how different causes of memory problems can be found. Engineers who frequently encounter system failures or memory test failures can also benefit from the debugging methods described here.

For intermittent memory failures, there are three steps to find the root cause:

1) Determine if the failure can be repeated. Try copying the condition that caused the failure. Repeated failures are often an effective way to see the characteristics of a failure.

2) Connect the memory bus to the logic analyzer with a normal probe or a slotted analysis probe for quick inspection:

&8226; Timing relationship for the entire DDRII bus
&8226; Error code for probability of parts per million
&8226; Protocol Error
&8226; Clock Quality

3) Use a high-speed oscilloscope and a high-bandwidth probe to measure the parameters at the receiving end of the signal.

&8226; For data written to memory, probe on SDRAM
&8226; Probe on the memory controller for data read from memory

　　Step 1

When attempting to rebuild a failure condition, remember that the root cause of the failure may be from subsystems or sub-applications that are not directly connected to memory. LAN access, power sequence of subsystems, sleep mode entry or exit, and power cycle are all important factors to consider when evaluating memory failure.

Crosstalk and resource conflicts caused by various subsystems, different working modes and multiple loops have been the root cause of many intermittent memory failures.

Isolating a problem under a particular test or setting can make it easier. For example, failures that occur during a test may point to software programs or signal integrity issues, such as crosstalk or intersymbol interference. For repeatable failures, users can make multiple measurements under the condition of failure.

Repeated failure conditions are easy to say, but much more difficult to do. Details to consider include:

- Software

&8226; Is there an error record?
&8226; BIOS, operating system, and applications run during testing

- Environment

&8226; What is the room temperature when the system fails?
&8226; The air flow of the system under test during the failure.
&8226; Is the system power supply within the technical specifications?

- Hardware

&8226; Have other systems with the same design passed the validation test?
&8226; Are other systems failing too? Or is this the only system that has this failure?
&8226; Revised versions of circuit boards, DIMM, processors, etc. for faulty systems.
&8226; What is the difference between a failing system and a working system?
&8226; Component changes in manufacturing?

If conditions can be repeated, then test under these conditions; If the conditions cannot be repeated, the selected memory is tested and the test conditions (such as temperature limits and power limits) are changed sequentially.

　　Step 2

Logical analysis effectively supplements the limitations of high-speed oscilloscopes when debugging DDR systems. Logical analysis using a DDR probe or slot analysis probe allows you to quickly see the problem areas in the system. Engineers can save a lot of time by quickly narrowing down the problem area using logic analysis tools and then using high-performance oscilloscopes to check for suspicious signals.

The Logic Analyzer System provides:

&8226; High resolution timing analysis at 64K depth is performed on all DDR buses through a simple connection. The 64K depth signal can be adjusted from 100% before triggering to 100% after triggering.
&8226; Unique high resolution eye maps can identify faulty signals with a probability of one in a million.
&8226; The global ruler can be set automatically from the search function (up to 1024).
&8226; Coloring filter can identify the code type of the track and help to observe memory access.
&8226; Protocol decoding conversion command for functional validation.
&8226; Global rulers track waveforms and list windows.
&8226; For all signals using the same clock, eye mapping allows you to see all the signals at a glance.

Figure 1: High resolution timing track when DDRII is activated.

The measurements of interest in Figure 1 include:

&8226; Clock cycle measurement. (The system in Fig. 1 is DDRII_400 with a clock cycle of 5ns.)
&8226; Determine the distribution of conversion widths by measuring a valid window of data with a ruler or by moving the mouse over a track.
&8226; The measured RAS/CAS wait time from the rise edge of valid commands (command clock (CK0), where CS is low, during WRITE / READ commands) to the rise edge of data selections during a data pulse.
&8226; From valid activation (command clock rise, S0 = 0, where command = Activate) to RAS/CAS latency measured by valid WRITE/CAS.
&8226; Refresh rate.
&8226; Pre-charge interval.

In Figure 1, S0 (Pick Selection) occasionally starts within 250ps of the rise of CK0 (Instruction Clock) in the obvious problem area indicated by the ruler.

This may exceed the DDRII 400 setup/retention time (Ts/Th)> 600ps indicator. For correct verification of setup and retention times, we must use high-speed oscilloscopes and probes to detect CK0/CK0# and slice selection on SDRAM. If Ts/Th is marginal to any signal, it may cause intermittent or persistent memory failures.

Figure 2: Eye Scan of CK0 and S0.

Before we connect the oscilloscope probe and verify the Tsetup/Thold of S0, we can further evaluate the marginal timing relationship using the oculogram measurement function on the logical analyzer. See the eye chart shown in Figure 2:

- CK0 is a square wave
The --S0 is a triangular wave that forms an eye map related to the rise of the CK0.
Slow rise time of --S0 may be the root cause of intermittent system failures in this system. Slow edges make eye maps worse and reduce setup time (Tsetup)
- Identify potential problems from failure signals with a probability of a few parts per million. A fraction of a million faulty signals show green spots inside the eye map. In this case, there is no evidence of a failure signal. Slow edges are the main problem.

The system in Figure 2 requires the use of an oscilloscope to ultimately determine the set-up time of the selected signal (Tsetup). Oscilloscope measurement technology is introduced in the third step.

The next example of quickly viewing a memory system using a logic analyzer will show how a unique way to add color filtering can quickly detect protocol errors by understanding an overview of memory access through code type recognition.

In our example, a shading filter is set on the logical analyzer to help locate closed page overruns, in which case the READ or WRITE commands for a store (Bank) are not synchronized with the commands that activate the boot store.

The shading filter is set to store bank0 (B0) to provide a red shade and a blue shade for store bank1 (B1).

Pink = B0 Activation
Red = B0 Read

Turquoise= B1 Activation
Light blue = B1 read

Coloring filter enables engineers to use code type identification while viewing waveforms to identify areas requiring further detection.

Figure 3: Coloring filters enable engineers to quickly identify code types that indicate memory access problems.

In Figure 3, B0 activation (pink) precedes a series of B0 READ (red) commands.

However, B1 is not activated (turquoise) until B1 is read (light blue) on the left side of the screen. If B1 activation does not fall within the allowable prescribed time range, then a problem has occurred.

An example of using a logic analyzer shows how to use the eye measurement function on a logic analyzer. The Eye Map Measuring Tool provides a single voltage threshold Eye Map of a signal from +5ns to -5ns relative to the clock edge reference point.

Eye measurements are readily available:
　　　　
* Clock Duty Cycle
* Noise and signal integrity issues
*Data Valid Window and Eye Map Closure
* Interchannel delay

Eye measurements are the fastest way to calibrate the sampling positions of logical analyzers.

Figure 4: The eye measurement function on the logic analyzer can clearly understand the memory bus signal relationship.

In Figure 4, the screen above shows the Eye Finder results on a DRII system with a clean differential clock. From the Eye Finder results, we noticed that:

&8226; As you can see from the white area (eye) of the same size on either side of T=0, the duty cycle of the instruction clock is 50%.
&8226; At T=0, the long and thin transition area (yellow) of the instruction clock indicates a clean clock edge.

The following screen is a DRI system with an impure (noisy) clock. We can tell by looking at the Eye Finder results that the clock is not pure:

&8226; The command clock has a wide conversion area.
&8226; CK0 and CK0#sampled single-ended eye asymmetry. (Asymmetric eyes may also indicate incorrect logic analyzer thresholds.)

　　Step 3

To determine the root cause of the failure, high-speed oscilloscopes and probes are usually required for parameter measurement. For DDRII measurements, a 20 Gs/s sample with a 7 GHz probe and a 6 GHz bandwidth oscilloscope are used to provide measurement capabilities for system characteristic verification. The parameters that need to be measured on the oscilloscope include:

- Establish retention time Ts/Th
- rise time
- Clock overshoot
- Frequency
Dither Analysis Software

The position of the probe is essential for parameter measurement in signal characteristic verification. Most important of all:

&8226; Detect READ data and strobe on memory controller
&8226; Detect WRITE data and gates on SDRAM

Figure 5: READ and WRITE strobe maps depend on the detection location.

Figure 5 is a logical analyzer eye scan (Eye Scan) measurement at T=0 relative to the rise and fall edges of DQS5. The measurement results were obtained using a slot analysis probe in a DIMM slot.

In Fig. 5, WRITE selected a large eye map with a good shape. The probe position on the slot analysis probe is close enough to SDRAM that there is no reflection in the signal.

READ strobe deteriorates due to reflection on the slot analysis probe. The ophthalmogram is sufficient to measure the strobe offset and pulse width relatively. However, the location on the bus is not sufficient to actually verify the characteristics of the READ service.

Figure 5 also illustrates the importance of the probe location because when viewed on a slot analysis probe, the amplitude of the READ signal is distorted, with little resemblance to the actual eye map on the memory controller.

To view READ data as seen by the memory controller, the oscilloscope probe must be placed on the memory controller. The front end of the micro probe makes this possible.

Many techniques use the tools and techniques described in this article to validate and debug high-speed memory systems. Many engineers have adopted time-saving tools that allow them to debug faster and see better system performance.

Source:Xiang Xueqin

Copyright & Disclaimer

All works on this website that state "Source: ICMoment", all copyright belongs to ICMoment, please specify icmoment, https://www.icmoment.com, violators will be investigated for related The website will be held legally responsible.

This website reproduces and indicates works from other sources, the purpose is to pass on more information, does not mean that the network agrees with its views or to confirm the authenticity of its content, does not assume direct responsibility for such works of infringement and joint and several liability. When other media, websites or individuals reprint from this website, they must retain the source of the work indicated on this website and bear their own legal responsibility for copyright and other issues.

If the content of the work, copyright and other issues are involved, please contact us within one week from the date of publication of the work, otherwise it is regarded as a waiver of the relevant rights.

Model Naming Methods for National Standard Integrated Circuits (1), (2)

Basic requirements for lead-free solder for electronic assembly

How to significantly improve the efficiency of debugging high-speed memory

Copyright & Disclaimer

Model Price

Model

Price