In this assembly line fashion any one instruction still requires as long to complete, but as soon as it finishes executing, the next instruction is right behind it, with most of the steps required for its execution already completed.

Vector processors use this technique with one additional trick. On the receipt of a vector instruction, special hardware sets up the memory access for the arrays and stuffs the data into the processor as fast as possible.

This referred to the way the machine gathered data. It set up its pipeline to read from and write to memory directly. This allowed the STAR to use vectors of any length, [ citation needed ] making it highly flexible.

Unfortunately, the pipeline had to be very long in order to allow it to have enough instructions in flight to make up for the slow memory.

That meant the machine incurred a high cost when switching from processing vectors to performing operations on individual randomly located operands.

Cray was able to look at the failure of the STAR and learn from it. He decided that in addition to fast vector processing, his design would also require excellent all-around scalar performance.

That way when the machine switched modes, it would still provide superior performance. Additionally he noticed that the workloads could be dramatically improved in most cases through the use of registers.

Just as earlier machines had ignored the fact that most operations were being applied to many data points, the STAR ignored the fact that those same data points would be repeatedly operated on.

However, there were limitations with this approach. Registers were significantly more expensive in terms of circuitry, so only a limited number could be provided.

Instead of reading any sized vector several times as in the STAR, the Cray-1 would have to read only a portion of the vector at a time, but it could then run several operations on that data prior to writing the results back to memory.

Given typical workloads, Cray felt that the small cost incurred by being required to break large sequential memory accesses into segments was a cost well worth paying.

Since the typical vector operation would involve loading a small set of data into the vector registers and then running several operations on it, the vector system of the new design had its own separate pipeline.

Cray referred to this concept as chaining , as it allowed programmers to "chain together" several instructions and extract higher performance.

The new machine was the first Cray design to use integrated circuits ICs. Although ICs had been available since the s, it was only in the early s that they reached the performance necessary for high-speed applications.

In all, the Cray-1 contained about , gates. ICs were mounted on large five-layer printed circuit boards , with up to ICs per board.

The typical module distinct processing unit required one or two boards. In all the machine contained 1, modules in varieties.

Each cable between the modules was a twisted pair , cut to a specific length in order to guarantee the signals arrived at precisely the right time and minimize electrical reflection.

Each signal produced by the ECL circuitry was a differential pair, so the signals were balanced. This tended to make the demand on the power supply more constant and reduce switching noise.

The load on the power supply was so evenly balanced that Cray boasted that the power supply was unregulated.

To the power supply, the entire computer system looked like a simple resistor. In this case, each circuit board was paired with a second, placed back to back with a sheet of copper between them.

The copper sheet conducted heat to the edges of the cage, where liquid Freon running in stainless steel pipes drew it away to the cooling unit below the machine.

The first Cray-1 was delayed six months due to problems in the cooling system; lubricant that is normally mixed with the Freon to keep the compressor running would leak through the seals and eventually coat the boards with oil until they shorted out.

New welding techniques had to be used to properly seal the tubing. The only patents issued for the Cray-1 computer concerned the cooling system design.

In order to bring maximum speed out of the machine, the entire chassis was bent into a large C-shape. Speed-dependent portions of the system were placed on the "inside edge" of the chassis, where the wire-lengths were shorter.

This allowed the cycle time to be decreased to NCAR estimated that the overall throughput on the system was 4. Addressing was bit, with a maximum of 1,, bit words 1 megaword of main memory, where each word also had 8 parity bits for a total of 72 bits per word.

