Category Archives: Computing

Hardware and software aspects of computing.

Apollo 11 — when tech needed innovation and a bit of piloting

By today’s standards, the landing by humans on the Moon was technologically primitive.

Keep in mind, the Apollo 11 mission happened before the Internet; in fact, the first two nodes of the ARPAnet, from which the Internet sprung, wouldn’t be connected until several months later. Apollo is credited with pushing micro-miniaturization of electronics. Without it, the Apollo Guidance Computer would not have been possible, or at least weighed many times more than it did. This machine, which aided the landing of the Eagle lunar module on the Moon, had 2048 words of memory, each word being 16 bits long. It had a clock speed at 2.048 MHz, about 1/500th to 1/1000th of current smartphones, which may have multiple processors at 1 to 2 GHz.

In the end, the computer was overloaded, and pilot Neil Armstrong took over to make a landing under manual control with read-out assistance from astrodynamicist Edwin “Buzz” Aldrin. (The computer did not die; it was over-saturated with computation tasks, but continued to function.)

The landers that preceded Apollo to the Moon did not have digital computers.  The Surveyor series of landers had servos, which fed back to various spacecraft systems, resulting in soft landings.

Apollo Guidance Computer and display/keypad
Apollo Guidance Computer and display/keypad

Engineering design was dominated by drafting boards; computer graphics was in its primitive developmental stages, and along with it, interactive CAD of mechanical parts was barely beginning. The NASA STRuctual ANalysis program (NASTRAN) was under development during this time, finally being released to NASA in 1968, after the Saturn V was designed.

On the other hand, some things haven’t changed much. There is no miniaturization of a human crew. They need a certain amount of consumables, which must be stored for the trip. Rocket engines still use chemical propulsion. LOX/RP-1 (liquid oxygen and refined kerosene), the propellant combination used by the Saturn V first stage, is still a mainstay of launch vehicle design. The efficiency of translating chemicals into F=MA (or really F=v*dm/dt) has not appreciably changed.

And yet, with all the technology constraints and unchanging laws of physics, American primitive technology and ingenuity got humans to the surface of the Moon, and brought them safely back to Earth.  … And yet, 45 years later …..

That first landing did not go completely according to plan. Armstrong had to take over, with Aldrin’s assistance. Armstrong was under pressure to pick a safe spot quickly (which the automatic systems had not done), and put the craft down. By the time it landed, the Eagle had about 15-20 seconds of fuel left. Mission Control in Houston very likely had a sinking feeling that this could end badly; hence the comment about “a bunch of guys about to turn blue. We’re breathing again. Thanks a lot.”

A re-enactment of the landing, based on radio transmissions, transcripts, and video, shows just how close they were to ending in disaster. (Kudos to Thamtech, LLC, for assembling the site together a couple of years ago.)

A text-based countdown script in Python

Problem: You want to count down to an event (e.g., a rocket launch). But the web-based animated countdown consumes too much screen space and battery power (i.e., your laptop’s fan turns on when you go to that web page).

Solution: a text-based countdown in a small shell or console window. This one, down.py, is written in Python 3. The project on GitHub is called DownPy.

I’ve tested this on Linux (Ubuntu 12.04) and Mac OS X Mavericks (10.9). It has no fancy appearance, and all it does is count down.  But that’s also why it barely takes any power.

If you want to try this in the next few days, here is an example.  This the command for counting down to the currently scheduled launch time of NASA’s Orbiting Carbon Observatory (OCO-2).

./down.py 2014-07-01 02:56 -z -7:00

If you’re a git user, then you should know how to clone the DownPy project. And if you aren’t, there is really only a single file. Make sure you have Python 3, copy the file, make it executable, and go for it!  Of course, you can also open multiple shell/command windows, and have a countdown for each different event you are interested in. (There are some Mars spacecraft encounters coming up.)

So now you how I spent one day of my weekend.  I came up with the basic date/time queries in Python earlier in the week. Then on the weekend, I created a loop that adjusts to lag and other compute load oddities.  By the evening, I had it reporting and rewriting on a single, non-scrolling line.

[More info on Python, including downloads.]

Falcon 9 soft landing video – work in progress

F9-waterlanding-2014-0418-partial-reconstruct-frame By releasing the raw MPEG stream of the April 18 water landing of the Falcon 9, SpaceX asked for assistance from computing video imaging hacker/expert community to restore stream.  It had been plagued by choppy transmission through bad weather conditions.

Now, the hackers/experts have a partially restored result.  The sequence begins just before ignition of the engine, and continues as the exhaust plume hits the water.  Throughout the landing sequence, the legs are already deployed.

The next landing sequence test should occur after the next launch.  A Falcon 9 is expected to launch an ORBCOMM OG2 satellite into low Earth orbit.  This launch was originally scheduled for May 10, but was scrubbed due to problems with static fire tests on May 8 and 9.  The earliest launch opportunity at the busy Cape Canaveral is June 11.

More info:

The energy cost of computation

If you have a smartphone, and the battery is quickly being drained, you may have discovered that by quitting apps or removing them from your phone, the battery lasts longer.  Sensors, transmitters and receivers, and computers all take energy.  It is something that mobile device designers are concerned about.  It turns out app developers make important decisions which affect the battery life.

Of course, I consider aircraft and spacecraft to be very mobile devices.  In fact, in the last week, I’ve heard from parts of the aerospace community on this subject.  More specifically, they are concerned about how to minimize the energy consumption of computation.  Their interest ranges from mobile CPUs and sensors to high performance computing.

Aside from the work that hardware designers do, what can/should software developers and users be aware of?  I’m going to try to lay a foundation for the subject in this post.  It may initially seem that energy consumption is outside a software person’s control.  I will give some examples where it actually makes a difference.  Other examples are taken from hardware, but illustrate the implications of system design choices, e.g., using a simple embedded system vs. a multi-tasking system.

Before doing so, I need to confess something.  Between the computing hardware and software worlds, each side tends to believe that I belong to the other.  Neither side enthusiastically claims me as their own.  Really, I learned computing because I couldn’t get my hands on a wind tunnel.  My youth was spent with T-square, right triangles, and French curves rather than oscilloscopes, resistors, and diodes. But professionally, I am a computer scientist.  I happen to have worked with a lot of digital and computing systems designers, including a few years at the chip level.

All gates are created equal… to a first approximation…

For our purposes, we will think in terms of gates — fundamental blocks which have a 0 or 1 as their output.  It is a major simplification.  Chip designers frequently think in terms of NAND or NOR gates, switches, or transistors (going from most to least abstract).  In fact, a NAND gate may be a 2-,  3-, or 4- input NAND gate.  There are also nuances in rise and fall times of propagated signals.  I’m also going to assume a silicon technology like CMOS rather than nMOS or pMOS or bipolar types.  In practice, CMOS dominates the vast majority of digital devices.  But for our purposes, all gates are created equal. (We’ll ignore whether or not some seem to be more equal than others…)

Fundamentally, changing the state of a gate takes energy.  For a first order approximation, it doesn’t matter if it moves from 0 to 1 or 1 to 0; the result is pretty much the same. The amount of energy needed for a computation is directly related to the number gate state changes needed to complete the computation.

Program counters, page boundaries

One of the strange side effects is on the program counter.  Assume you have a short loop that runs from 0x0010 to 0x0013 and back again.  This takes less energy than a loop that runs from 0x0ffe to 0x1001.  Why?  Two major sets of gate state changes happen:

  • Going from 0x0fff to 0x1000, there are 1 change of 0->1 and 12 changes of 1->0 — a total of 13 state changes.
  • When a pass through the loop finishes, it jumps from 0x1001 to 0x0ffe, which has 11 changes of 0->1 and 2 changes of 1->0 — again 13 state changes.

In the other case, running between 0x0010 and 0x0013 the state changes are confined to 2 bits.

Moral of the story:  frequent tight loops should avoid page boundaries.

The program counter is simply one part of the story.  There is the execution of instructions, the impact on registers and memory, etc.  But if those are the same between the two address sets, the code placement emerges as a variable that can be manipulated.

Algorithmic performance

In understanding the order of an algorithm, a sorting program that processes n inputs in n steps (or even 2n steps) is considered to be linear.  Its run-time is of order n, written O(n).  As the number of inputs increases, asymptotic trends emerge.

A sorting program whose run-time increases as O(n log n) ultimately performs better than a program that runs O(n2) . (Computer scientists and software engineers understand this as the difference between the “quicksort” vs “bubblesort” algorithms.)  If tricks can be played to let the sorting algorithm approach O(n), that algorithm has the potential to perform even better.

Varying processor performance

In the smartphone world, we’re beginning to see devices that have multiple high-performance cores and one low-power low-performance core on the same piece of silicon.  When performance demand drops off and only maintenance tasks are running, the low-power core continues running while the high-performance cores are sleeping.

This technique is exploit by processor design ARM Holdings in its big.LITTLE™ processing design.  (Yes, the world “big” is lower case, and “LITTLE” is all caps.)  ARM claims this can reduce energy consumption by 70% or more for light workloads and by 50% for moderate workloads.  In the case of big.LITTLE, ARM Cortex-A15 is paired with the lower power ARM Cortex-A7. [strange typos fixed 9/18]

Of course, the selection of whether to run just the low-power core as opposed to the high-performance cores is made by the operating system.

In some simpler systems, the user is able to select between higher performance or lower power, thus extending battery life.  In this case, system clock speed is set high or low accordingly.  The user enters a choice via the device user interface, which then interpreted by the operating system as a performance choice.

Communications

Smartphones typically have at least three types of radio — Bluetooth, wi-fi, and a wireless telecom standard (3G, 4G, perhaps even more). Communications for an app are a trade-off between necessary data rate vs. power consumption. When possible, the app developer should choose the communication mode that requires the least amount of power for the job.  Often this means preferring wi-fi over the wireless telecom.  (In fact, Bluetooth takes far less power than the others, but is not used as a multi-hop network protocol.)

DMA, buffers, interrupts

An operating system faces varying challenges in servicing I/O requests, particularly if it operates under real-time constraints.  In modern computer architectures, block data can be transferred under DMA control without interrupting the CPU.  (DMA = direct memory access)  But once the DMA operation is finished, an interrupt has to be issued so that the data can be dealt with.  This, of course, requires a context switch from a process to the kernel to possibly another process.

Some operating systems are more efficient about switching between processes than other.  Historically, UNIX has been better at this than Windows.  However, when you introduce threads (a lighter weight scheduling mechanism) into the picture, there is no clear advantage.  Thus, Windows programs are often designed with many threads.  In general, basic UNIX programs do not use multiple threads, but can easily be assembled as building blocks.  On the other hand, UNIX database programs typically run as monolithic components, but make extensive use of threads.  Java programs invariably run many threads.

Some I/O interfaces cannot run DMA and require more frequent OS attention.  Sometimes there is a buffer for 3 or 4 characters.  Before that overflows, the OS needs to copy the buffer content.  This can result in very high context switching overhead when certain apps are running.

Co-processors

The newest iPhone, the 5S, supplements the main A7 chip with an M7 chip to directly deal with data from accelerometers, gyroscope, and compass.  Details of the M7 chip have not yet been published.  I suspect this vastly reduces the CPU load with motion-based apps are running.  It certainly cuts out a lot of interrupts.  What is not clear to me is whether or not the M7 also does low-power matrix computations.  If this a sequence of matrix operations is only done 50 or 100 times per second, a high-performance multiplier-accumulator may not be necessary, and a low-power version can be put in the co-processor, further relieving the CPU of certain real-time burdens.

Loop vs. ‘halt’

When there is no more computation to do, some operating systems would put themselves into a tight idle loop waiting for next interrupt to come in.  Others would execute a ‘halt’ instruction and wait.  If you measure the CPU temperature, the latter is significantly cooler.  The former would not be practical for a mobile device.  Naturally, I consider aircraft and spacecraft to be mobile.  So I dislike operating systems with tight idle loops.

Memory management schemes

A key computing architecture feature that makes smartphones possible is demand paging, a memory management scheme invented in the late 1960s.  They make multi-tasking and adaptability to new programs a fundamental reality. But, the logic design behind a memory management unit (MMU) requires a LOT of gates, and thus consumes a lot of energy.  Thus, for simple dedicated real-time systems, it may be best to avoid the need for a paging MMU.

Processsors such as ARM Cortex-M series use a segmentation scheme that load registers with a segment base address and a segment length.   The complexity and power costs of a demand-paging MMU are not there.

The PDP-11 used a hybrid between paging and segment.  PDP-11 programs were limited to 64K bytes.  But the processor had 8 segment registers to map 8K segments the desired parts of memory.  The processor could then switch application programs by switching the contents of the segment registers.  As a result, many models of the PDP-11 were able to handle several users on a time-sharing system, giving rise to the UNIX operating system.

Given the small address space and small set of segment registers, a PDP-11 would take considerably few gates than a VAX-11.  If both were resurrected today, the PDP-11 would be more power efficient.

Virtual machines

It is worth nothing that virtualization, the current practice that replaces several physical machines by virtual machines has had an immense impact on energy footprint in data centers.  However, a context switch between virtual machines is even more heavyweight than switching processes on a single machine.

The virtualization kernel and hypervisor(s) are presumed to be reliable components that separate unreliable operating systems from each other. Thus, the failure of a single process in an operating doesn’t affect other key application processes on the other virtual machine because there are not other application processes there.

In fact, the physical machine running the multiple virtual machines is probably consuming more energy than combining all the processes on a reliable operating system on the same physical machine.  But managers of data centers are constrained to commercial choices available to them, which favor certain operating systems.  Thus, the minimum energy consumption that can be achieved is not as low as it can be for a single operating system.

To be sure, there are good uses for virtualization.  When different software packages require different versions of the same operating system, virtualization provides a way to host all the packages on the same hardware. Virtualization has evolved to where an application can be migrated off one physical machine and onto another one, letting the former be brought down for maintenance without interrupting program operation.  These are just a couple of examples of legitimate uses.

The list goes on…

There are certainly considerable CPU design strategies can can affect energy consumption.

I could start in on speculative execution and other CPU accelerators, these gain performance at the expense of additional power.  A stark contrast to these is the SPARC chip-multithreading (CMT) architecture.  The fundamental concept gets rid of the accelerators, but replaces it by a massive set of simpler cores and threads, resulting in much lower power per CPU thread.

At this point, we’re clearly no longer in the realm that software or system designers can affect.  And little or nothing can be done through software.

So that’s my view of computation energy consumption from a software perspective, but also peering into the system hardware.