Note: This is an experimental planet, if you have a feed you would like to add or you would like your feed to be removed please contact me.

Planet 9

February 05, 2010

Quod Erat Dragon

More reading material

PL hackery:

http://www.cs.rutgers.edu/~ccshan/prepose/prepose.pdf -- Kiselyov, Oleg and Shan, Chung-chieh. Functional Pearl: Implicit Configurations —or, Type Classes Reflect the Values of Types.

http://www.cs.rutgers.edu/~ccshan/tagless/jfp.pdf -- Carette, Jacques and Kiselyov, Oleg and Shan, Chung-chieh. Finally Tagless, Partially Evaluated, Tagless Staged Interpreters for Simpler Typed Languages.

PL Theory:

http://www.cis.upenn.edu/~bcpierce/papers/lti-toplas.pdf -- Pierce, Benjamin C. and Turner, David N. Local Type Inference.

http://www.cis.upenn.edu/~bcpierce/FOOL/papers9/shields.pdf -- Shields, Mark and Jones, Simon Peyton. First-Class Modules for Haskell

Formally verified compilers:

http://www.cl.cam.ac.uk/~mom22/jit/jit.pdf -- Myreen, Magnus O. Verified Just-In-Time Compiler on x86.

Logic programming:

http://www.cs.sunysb.edu/~tswift/webpapers/tplp-submit.pdf -- Swift, Terrance and Warren, David S. XSB: Extending Prolog with Tabled Logic Programming

February 05, 2010 11:59 PM

February 04, 2010

newsham

McCain flip-flop

The daily show catches a McCain flip-flop. Four years ago:
The day that the leadership of the military comes to me and says "senator, we ought to change the policy" then I think we ought to consider seriously changing it.
But today when the leadership of the military asks to repeal don't ask/don't tell:
I'm deeply disappointed in your statement [...] In this moment of immense hardship for the armed services we should not be looking to overturn don't ask/don't tell.
There's also a good op-ed by John Oliver on why we should have a don't ask/don't tell policy on people approaching old age.

The Daily Show With Jon StewartMon - Thurs 11p / 10c
A Few Gay Men & Women
www.thedailyshow.com
<embed allowfullscreen="true" allownetworking="all" allowscriptaccess="always" bgcolor="#000000" flashvars="autoPlay=false" height="301" src="http://media.mtvnservices.com/mgid:cms:item:comedycentral.com:263464" style="display: block;" type="application/x-shockwave-flash" width="360" wmode="window"></embed>
Daily Show
Full Episodes
Political HumorHealth Care Crisis

by newsham (noreply@blogger.com) at February 04, 2010 06:08 PM

Western Union Man, and other great 419 heros

If you hate 419 scams and you love pranks, you'll probably be a big fan of 419eaters. They scam the scammers, tying up their time and ridiculing them. Here's some brilliance from their forums, it's Western Union Man!

That's a picture of an actual scammer that the scam baiter got him to take of himself.

If you're looking for something elaborate, check out http://www.419eater.com/html/SkeletonCoast/safari2.html.

by newsham (noreply@blogger.com) at February 04, 2010 05:48 PM

research!rsc

Names

Every programmer has a variable naming philosophy. This is mine:

A name's length should not exceed its information content. For a local variable, the name i conveys as much information as index or idx and is quicker to read. Similarly, i and j are a better pair of names for index variables than i1 and i2 (or, worse, index1 and index2), because they are easier to tell apart when skimming the program. Global names must convey relatively more information, because they appear in a larger variety of contexts. Even so, a short, precise name can say more than a long-winded one: compare acquire and take_ownership. Make every name tell.

The information content metric gives a quantitative argument against long-winded names: they're simply inefficient. I internalized this metric years ago but only realized this phrasing of it recently, perhaps because I have been looking at too much Java code.

by rsc (noreply@blogger.com) at February 04, 2010 05:00 PM

February 03, 2010

OS Inferno

Inferno на BeagleBoard

Собственно вот:

Подробности тут: http://inferno-hell.blogspot.com/2010/02/beagleboard-dviinfernogui.html (если, конечно, кто-нибудь сможет понять).

by j1m (noreply@blogger.com) at February 03, 2010 06:42 PM

newsham

A dare

I dare you to watch this Frontline episode all the way through without IRCing, texting, blogging, checking your email, watching a youtube video or calling anyone up. I wasn't able to.

by newsham (noreply@blogger.com) at February 03, 2010 03:34 AM

February 02, 2010

newsham

Firefox, Chinese root cert, DNS and trust

This article alerting us that Firefox trusts a Chinese root certificate has some people worried. There's probably good reason for this. But, the truth is that there are an outrageous number of untrustworthy root certificates in root caches of just about any browser. There is also a legitimate need for China to issue certificates. The more important issue here is that we all put our full trust in every one of these root certificates. Why? This violates the principle of least privilege. We would all be better off if SSL certificate authorities were bound to the domain system and limited in scope to a particular subdomain. For example, some certificates authorities should be able to issue certificates for dot-uk addresses, but a Chinese CA definitely shouldn't. But don't hold your breath waiting for this to happen. The certificate authorities make a metric buttload selling certificates for each and every  domain name you want to have an SSL certificate for. If CAs were tied closely to domains it would be possible to buy a single CA cert for your entire domain and you would be able to generate as many site certificates as you wanted to. End of revenue stream.

by newsham (noreply@blogger.com) at February 02, 2010 07:54 PM

research!rsc

Go's Package Name Space

Go organizes programs into individual pieces called packages. A package gets to pick a short name for itself, like vector, even if the import statement must use a longer path like "container/vector" to name the file where the compiled code is installed. The early Go compilers used the package name as a unique identifier during linking, so that vector's New function could be distinguished from list's New. In the final binary, one was vector.New and the other list.New. As we started to fill out the standard library, it became clear that we needed to do something about managing the package name space: if multiple packages tried to be package vector, their symbols would collide in the linker. For a while we considered segmenting the name space, reserving lower-case names for standard packages and upper-case names for local packages. (Since package names and object file names are conventionally the same, one reason not to do this is that it would require a case-sensitive file system.)

Other languages simply use longer names. Both Java and Python tie the name to the directory in which the package is found, as in com.java.google.WebServer for the code in com/java/google/WebServer.class. In practice this leads to unnecessarily long identifiers, something Go tries to avoid. It also ties the name to a particular mechanism for finding code: a file system. One of the reasons that import paths are string constants in Go is so that it is easy to substitute other notations, like URLs.

Last spring, during a long discussion about how to divide up the package name space, Robert Griesemer cut the Gordian knot by suggesting that we allow multiple packages to choose a single name and fix the tool chain to cope. The import statement already allows introducing a local alias for the package during the import, so there's no linguistic reason package names have to be unique. We all agreed that this was the right approach, but we weren't sure how to implement it. Other considerations, like the open source release, took priority during most of 2009, but we recently returned to the problem.

Ultimately, the linker needs some unique name for each symbol in the program; the fundamental problem caused by deciding that package names won't be unique is to find another source of uniqueness that fits into the tool chain well.

The best approach* seems to be to use the package's import path as the unique identifier, since it must uniquely identify the package in the import statement already. Then container/vector's New is container/vector.New. But! When you're compiling a package, how does the compiler know what the package's import path will be? The package statement just says vector, and while every compilation that imports "container/vector" knows the import path, the compilation of vector itself does not, because compilation is handled separately from installing the binary in its final, importable location.

Last week I changed the gc compiler suite to do this. My solution to the import path question was to introduce a special name syntax that refers to “this package's import path.” Because the import paths are string literals in the Go compiler metadata, I chose the empty string—""—as the self-reference name. Thus, in the object file for package vector, the local symbol New is written "".New. When the linker reads the object file, it knows what import path it used to find the file. It substitutes that path for the "", producing, in this case, the unique name container/vector.New.

Not embedding a package's final installed location in its object file makes the object files easy to move and duplicate. For example, consider this trivial package:

package seq
var n int
func Next() int {
    n++
    return n
}

It's valid for a Go program to import the same path multiple times using different local names, but all the names end up referring to the same package:

package main

import (
    "fmt"
    s "seq" // changed to "seq1" later
    t "seq"
)

func main() {
    fmt.Println(s.Next(), s.Next(), t.Next(), t.Next())
}

prints 1 2 3 4, because it all four calls are to the same Next function:

$ 6g seq.go
$ 6g -I. main.go
$ 6l -L. main.6
$ 6.out
1 2 3 4
$ 

But if we change one of the imports to say "seq1" and then merely copy the "seq" binary to "seq1", we've created a distinct package, using lowly cp instead of a compiler:

$ cp seq.6 seq1.6
$ ed main.go
120
/seq
 s "seq"
s/seq/seq1
 s "seq1"
wq
121
$ 6g -I. main.go
$ 6l -L. main.6
$ 6.out
1 2 1 2
$ 

Now the s.Next calls refer to seq1.6's Next, while the t.Next calls refer to seq.6's Next. Duplicating the object actually duplicated the code. This is very different from the behavior of a traditional C compiler and linker.

A digression: the explicit "". prefix is not strictly necessary. It would be cleaner if the linker treated every symbol as needing to be qualified by the import path, so that all the "". could be dropped. But occasionally it's important to be able to break the rules, for example to define a symbol that is logically in one package be implemented in another. For example, the implementation of unsafe.Reflect is actually in the binary for package runtime, because that's where all the interface manipulation code lives:

$ 6nm pkg/darwin_amd64/runtime.a|grep Reflect
iface.6: T unsafe.Reflect
$

Another reason to use an explicit prefix is to admit names with no prefix at all, as would be generated by legacy C code. Otherwise, what should C's printf be in? If the linker enforced a strict boundary between packages, both of these examples would be impossible. Most of the time that would be a good thing, but systems languages do not have the luxury of stopping at “most of the time.” Last October, a few weeks before the public release of Go, I changed the linker to insert import path qualifiers on all names during linking, but it was too disruptive a change to commit before the release. Last week's implementation, which allows for semipermeable package boundaries, is a much better fit for Go.

This week Ian Lance Taylor is working on eliminating the global package name space assumption in gccgo. He'd like to avoid making changes to the linker, which rules out introducing a “this package” notation like "". Gccgo must be able to write objects that know their own import paths, which means gccgo must know the import path at compile time. But how? There will be a new gccgo command line option, and the build system will simply tell the compiler what the import path is.

In retrospect, I wonder if the effort of "" in the gc tool chain was justified compared to adding an option. The gc implementation is easier to use, but it's not clear how important that will be. Time will tell.


* An alternative approach would be to generate a random identifier each time the compiler is invoked and to use it for the package compiled by that run. When other packages import the compiled package, they can read the identifier and use it to generate references to that package's symbols. The most glaring problem with this approach is that the symbol names you'd see while debugging would be ugly, like mangled C++ names but worse. Another problem is that it would break aggressive incremental compilation: if fmt gets recompiled, all packages that import it would have to be recompiled to pick up the new identifier, even if the external interface hadn't changed. It would be nice to avoid those recompilations, especially in large programs.

by rsc (noreply@blogger.com) at February 02, 2010 05:00 PM

January 28, 2010

Eric Van Hensbergen

9P Overview Slides

<object height="355" style="margin:0px" width="425"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=9pvirtfsoverview-12646254344022-phpapp01&amp;stripped_title=9p-overview"><param name="allowFullScreen" value="true"><param name="allowScriptAccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=9pvirtfsoverview-12646254344022-phpapp01&amp;stripped_title=9p-overview" type="application/x-shockwave-flash" width="425"></embed></object>


MP4 Video Link or Livestream below:

<object classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" height="340" id="preview-player1" width="340"><param name="movie" value="http://cdn.livestream.com/grid/LSPlayer.swf?channel=graverobbers&amp;clip=pla_5da534c4-5d1f-460e-adda-836aee4b514e&amp;autoPlay=false"><param name="allowScriptAccess" value="always"><param name="allowFullScreen" value="true"><embed allowfullscreen="true" allowscriptaccess="always" height="340" id="preview-player" src="http://cdn.livestream.com/grid/LSPlayer.swf?channel=graverobbers&amp;clip=pla_5da534c4-5d1f-460e-adda-836aee4b514e&amp;autoPlay=false" type="application/x-shockwave-flash" width="340"></embed></object>
Watch live streaming video from graverobbers at livestream.com

by Eric Van Hensbergen (noreply@blogger.com) at January 28, 2010 03:27 PM

9gridchan

Saturday, 3 January 2010 - coming soon, updated Qemu image with many new tools

Saturday, 3 January 2010 - coming soon, updated Qemu image with many new tools

Now that recent projects like writable /proc/pid/ns and rootless post kernel load startup have been announced on 9fans, its time to make a new version of the Qemu preinstalled image with all the new stuff. It will include the latest updates of the previous grid software as well as hubfs, grio, and the new startup process - all available optionally, along with the traditional default kernels and configs. However, I am going to try to make the customized environment as compelling as possible, because I'd like to make a good case for the idea that using multiple system nodes, virtualized or not, has practical benefits. Some goals:

  • multi-machine aware startup to bind in fileserver and cpu resources early if available

  • easier to make use of 9gridchan.org hosted resources - and 9gridchan registry needs more real services

  • easy to set up highly unconventional namespace structures coexisting on a single machine (rootless boot helps)

  • simple, scripted data replication and machine backup operations

  • easy bootstrapping from grid of VMs to physical machines or mixed grid

Release goal is by the end of January, with a rough spin available for interested testers a week before or so.

by www-data at January 28, 2010 04:50 AM

Wednesday 21 October 2009 - Overdue Hubfs announcement - improved "screen" utility

Wednesday 21 October 2009 - Overdue Hubfs announcement - improved "screen" utility

Since this has been up on sources for several months now, I really should have announced this here already. Hubfs is an improved screen-like tool for Plan 9, implemented as a 9p fs with client. It is much more resource efficient than iosrv and, as a 9p fs, much better integrated into the overall system. The user interface is also much more sane. It can be installed from the source tarball contrib/mycroftiv/hubfs.tgz, or with contrib/install mycroftiv/hubfs. Basic usage is just 'hub NAME' to either start a new shell/hub session or connect to an existing. The manpage explains most of the details. Following completion of a solid version of hubfs I hit the burnout threshold as a result of several months of 16 hours a day of Plan 9 and retreated to a remote corner of namespace for a couple months. With the arrival of the winter months, however, I have a good excuse to turn on all my computers in the service of home heating, and undoubtedly this will result in a renewed burst of development energy.

by www-data at January 28, 2010 04:50 AM

Wednesday 8 July 2009 - Announcing Iosrv - "screen without a screen"

Wednesday 8 July 2009 - Announcing Iosrv - "screen without a screen"

After several weeks of isolation in their underground lab, the /9/grid's craziest researchers are now unleashing upon the world our latest weapon in the struggle for an open collaborative decentralized worldwide grid:

Iosrv.tgz

Available on Bell Labs sources - contrib/install mycroftiv/iosrv or as a tarball contrib/mycroftiv/iosrv.tgz to mk install at your pleasure, or here - this is what happened when we decided to get serious about screen-like functionality for Plan 9 while also trying to pursue some tantalizing hints about some of Doug McIlroy's original ideas that led to the creation of UNIX pipes. Iosrv (which is controlled via a wrapper scrip called io for both the clients and server) does not emulate a terminal in the manner GNU screen does - instead, it follows the Plan 9 principle of staying close to file i/o fundamentals and provides multiplexed buffered pipes that can be attached to the three standard file descriptors used by textual environment programs such as the rc shell.

As a consequence of this, we have some new tricks, and here's the best one:

The clients attached to an iosrv can all share locally executing shells with each other in addition to using the shells on the iosrv host

The standard model of usage that corresponds to how GNU screen works is for clients to attach to an existing iosrv in an imported /srv (started with a command like io rcjul15 on the iosrv remote host ) using the io wrapper script:

io SRVNAME

This command connects to an existing /srv/srvname. Once connected, additional commands will be trapped by the client. To create and attach to new shells hosted by the iosrv remote machine with the command remote # where '#' is a multiple of 3. The first new shell might be created with:

remote 3

followed by subsequent shells created by replacing '3' with successively larger multiples of 3 as the number of backed shells increases. (Each shell uses 3 file descriptors) To move between shells after they have been created, attach # will connect to previously created rc shell Hubs.

attach 0

will move back to the set of file descriptors beginning at Hub 0. (More on Hubs in a moment). The new trick is to start a new shell like this:

local 6

which will start a new rc shell on the CLIENT machine, but connect its i/o file descriptors to the remote iosrv HOST. The consequence is that the locally executing rc becomes shared back to the iosrv, and other clients of the remote host can attach to it using

attach 6

and the shell running on your machine is available to them exactly as a shell running on the iosrv host. In other words, iosrv really does 'serve i/o' and doesn't care what's considered the client or server from the viewpoint of the user, it just pipes data back and forth to whatever its pipe Hubs are connected to.

Ok, there's that word again - Hub - the Hub is the low level data structure iosrv uses to organize connections. Following Pike's classic dictum that "data dominates", we are offering a new variant of an old abstraction, the pipe. A Hub is a pipe, but it has multiple nozzles on each end. It accepts multiple simultaneous readers and writers and allows them to connect and disconnect independently to and from the flow of data in real time. The core iosrv knows nothing of rc, or shells, it simply manages a set of file descriptors connected to Hubs. A standard shell has 3 open file descriptors, so 3 Hubs allow an arbitrary number of clients to all make use of the same shell. Adding a new shell simply means creating 3 new Hubs and then starting an rc using file descriptors managed by that iosrv. This why standard attach, remote, and local commands always take a multiple of 3 as their numeric parameter.

On the client side, a smaller program called ioshell navigates the forest of piped file descriptors for the user, acting to connect the file descriptors of the initiating client to the Hubs provided by iosrv. It also monitors user input for command strings and then takes actions or passes them to iosrvs ctl file as appropriate.

The iosrv is completely parallel, forking off independent processes for reading and writing each connected file descriptor. It is not uncommon for a busy iosrv to be coordinating 50-100 running processes and holding double that many file descriptors. This will not however clog the host system, because all processes use QLocks and/or blocking reads to control execution, yield execution frequently, and each Hub has individually tunable process cycle sleeptime. Testing shows that even a low resource Qemu based virtual machine can easily handle multiple simultaneous clients and servers to multiple local and remote rc shells.

The iosrv also provides fine-grained control over resources for the user, although the interface to this is a work in progress. Using iosrv for screen-like functionality is its main intended purpose, but the design is deliberately open-ended. It is already possible to use iosrv as a kind of greatly expanded 'tee' for data streams and we will be trying to prototype some multinode applications that pipe data around in constant loops between them.

We have been using iosrv heavily and testing it extensively. It is not perfectly polished but we have found it to be stable, reliable, and functional in its current state. Suggestions for features and code improvements always welcome.

Look for shared public shell exports as services in the 9gridchan.org 9p service registry soon, also.

by www-data at January 28, 2010 04:50 AM

Thursday 18 June 2009 - screen functionality in a few lines of rc

Thursday 18 June 2009 - screen functionality in a few lines of rc

This is an extension of the earlier post - this is a pair of scripts intended to be used on nodes setup similarly to those described in 'reversed remote execution' - it allows shared terminals to have persistent state in the background with the ability to be pushed onto the remote clients (which are providing rio as a service).

#! /bin/rc
# start up an rc with its i/o connected to pipes set
# this monitors and its window should be set to scroll if not backgrounded

mkdir /tmp/wpin
mkdir /tmp/wpinclone
mkdir /tmp/wpout1
mkdir /tmp/wpout2
mkdir /tmp/wpout3
bind '#|' /tmp/wpin
bind '#|' /tmp/wpinclone
bind '#|' /tmp/wpout1
bind '#|' /tmp/wpout2
bind '#|' /tmp/wpout3
window -m
rc <{tee /tmp/wpinclone/data < /tmp/wpin/data1} | tee /tmp/wpout1/data /tmp/wpout2/data /tmp/wpout3/data


#! /bin/rc
# attach to a copy of grc using wpout $1 and push to a second wsys also

mount $wsys /mnt/wsys new
bind -b /mnt/wsys /dev
read -m /tmp/wpout$1/data1 | tee /dev/cons &
read -m /dev/cons | tee /tmp/wpin/data &
cat >> /tmp/wpin/data

The usage model is probably not exactly transparent from these scripts - the first script is named grc and it creates a set of pipes and starts an rc with its input and output redirected to them. This is what creates the 'screen' like functionality - any number of processes can control the shell by writing to /tmp/wpin/data, and as many clients can read the output as there are teed output pipes. Window -m is started after binds to provide a seed window - you will probably want to start more than one additional window -m to preserve access to the pipes in the namespace.

The second script I name getgrc and it assumes that you have set $wsys to a traget window system, then starts a new window on it and connects the i/o of both the remote and the local window to the specified output pipe. It should be noted that the quasi-screen functionality is independent of the idea of mounting a remote window system and using i/o from both/either location. If you want a simple screen equivalent, simply making the pipes and starting the rc with i/o redirections allows multiple clients to use it by connecting to the piped files.

by www-data at January 28, 2010 04:50 AM

Tuesday 16 June 2009 - Reversed remote execution connections and shared sub rios

Tuesday 16 June 2009 - Reversed remote execution connections and shared sub rios

An experience I continually have with Plan 9 is discovering that everything I have ever wanted to do with computers but have never been able to effectively before can be done with about 3 lines of rc and the standard Plan 9 toolkit. Right now I am gleefully exploring what I find to be an incredibly rich and useful set of possibilities created by rio-as-fileserver. First, let me describe the setup I'm using so you can follow along - you want two plan 9 systems for this, with at least one setup as a CPU server. A pair of Qemu VMs will work fine. Let's just assume you have a pair of CPU servers running, machines A and B, each with a graphical display, rio running, and with a standard exportfs listener on port 17007.

You are working using cpu server A, and your friend is using cpu server B. For convenience, we will describe this as if you have full privileges on each other's machines, but these tricks can also work fine between parties with more limited trust with appropriate modifications to how the filesystem exports are set up. Your goal is to use Plan 9 for some lightweight interactive resource sharing for real time communication and collaboration. Each of you opens a subrio within your main rio session. To start off, you need access to some of machine B's resources. You are working using cpu server A, and you start with:

import -c cpuB /srv /n/cpuBsrv

Now you want to write a message to your friend that will appear on his screen in a new window briefly.

wsys = /n/cpuBsrv/rio.friend.68962

window -m rc -c 'echo hi fred && sleep 5'

And up on Fred's screen on machine B pops a window saying 'hi fred' that sticks around for 5 seconds, then vanishes. How and why? Because rio is a fileserver, and the window command mounts whatever rio is specified in the $wsys. The window command is actually an rc script that can show us the principles involved for a lot of nice tricks. How about just popping up a window to talk back and forth:

mount /n/cpuBsrv/rio.friend.68962 /mnt/wsys new

bind -b /mnt/wsys /dev

cat /dev/cons &

cat >/dev/cons

And you have created a window on your friends screen, with each of you able to see whatever the other types. This is a perfectly functional and usable way to communicate, and the ability to initiate the connection by making the window appear on demand in the 'shared' rio is an improvement over other simple equivalents like telnet or netcat based direct chat.

Another very simple and useful application is to allow your friend to run an interactive rc on your machine, with the display on his system, while you watch. In other words, he is using your machine in a manner similar to a telnet/ssh login, even though you 'pushed' the window onto his machine. The first two setup lines are the same as before:

mount /n/cpuBsrv/rio.friend.68962 /mnt/wsys new

bind -b /mnt/wsys /dev

exec rc -v < /dev/cons |tee /dev/cons

Now you can watch in real time as your friend uses your system. You can also easily invert this - and let your friend watch what you are doing. Just change the final line to:

exec rc <{tee /dev/cons} |tee /dev/cons

This changes the source of the controlling input to the shell from the remote /dev/cons to the local standard input. I believe both of these simple interaction models provide useful functionality in very direct ways. The real-time mirroring of the textual input and output in a shell window is provided without any of the usual overhead of a tool like VNC viewing and the use of a sub rio session as a container for the shared windows makes sandboxing via namespace manipulation very easy - and it should be noted that sharing a rio service is very different than providing full cpu server functionality, as only the functionality of rio is being exposed to the remote client. These simple filesystem and i/o based operations provided fine grained real time connectivity options with lower overhead and potentially greater security than the predominating methods.

The above commands are very simple and can certainly be improved upon - in particular the standard error is being abused and the prompt and errors will be suppressed for the remote user. I'm certain a lot of Plan 9 users out there have nifty improved versions of these tricks or similar. Using the fileserver properties of rio to create a sandboxed dynamic shared environment is a good fit for the goals of the /9/grid so we'll be posting an example of a scripted 'shared workspace' setup script and commands soon - its a pretty clear upgrade from the current 'grid graffiti wall' to a real-time 'grid graffiti rio!'

by www-data at January 28, 2010 04:50 AM

Sunday 7 June 2009 - Planning for demo grid setup

Sunday 7 June 2009 - Planning for demo grid setup

We are trying to determine the best way to have a larger set of public services - more nodes, and a demonstration of ideas like the High Availability Grid, an attempt to make a redundant and failproof environment that still maximizes the utility of the attached resources and can be flexibly reconfigured on the fly. Most of the following are things that we have already protyped and tested, but haven't been available through our public portals. Things we'd like to implement (perhaps not all at once):

  • A high availability configuration of resources using about 6-8 functional nodes divided between 2-3 physical machines.

  • On-demand temporary single VMs

  • "The proving grounds" - a combination of both of the above, but with the intention of stability and stress testing for various multimachine configurations.The only way to test 'failproofing' is to force failures and test the ability of the systems to recover.

We've been doing some testing of VMware server in addition to our normal Qemu/9vx/p9p based setups. We prefer free open source personally, but some people prefer to VMware to qemu so we may begin providing vmware .vmdk images as well as Qemu qcow2s. For anyone who wants to try using vmware now, it is possible to use qemu-img convert to make a vmdk from a .qcow2, but there were a few additional adjustments we needed to make for best results, notably recompiling the included pccpuf kernel.

by www-data at January 28, 2010 04:50 AM

Wednesday 3 June 2009 - Documentation added for new tools

Wednesday 3 June 2009 - Documentation added for new tools

Well, some documentation for some of the new tools at least, but its a start. Writing documentation is hard. The organization of knowledge is the tricky thing. The balance of conceptual explanation and task-oriented how-to is hard to get right. The docs in the docs directory still need to be supplemented with some specific task-oriented recipes for doing things like making a fully customized local grid, which requires a bit of coordination of action at the host os level and the VM interiors. As a warmup, I'll try to summarize the process here:

The base system image is designed to serve as the seed for a multisystem grid. It includes a variety of /cfg directories to use as base roles for new systems, and the confighelper script is designed to help do customization quickly. If you want to create a personalized local grid of machines from the base image, here is how to do it:

  • Test the base image in its default configuration to verify functionality before customizing.

  • login as bootes in cpu server mode and start up the confighelper

  • choose to (r)eset machine key and passwords, and answer the prompts. Your chosen authdom is particularly essential - it can be your domain if you have one, or just an arbitrary name for 'your grid turf'.

  • reboot the vm to cpu server mode and answer the prompts to create the new machine key. You only get once chance to set bootes' password here, be careful. Then run auth/changeuser bootes and enter the same password you just did, and then run auth/changeuser for your other users and set their new passwords. Run auth/changeuser -n for each user also to update the netkeys.

  • at this point your new auth setup should be complete and you should be able to drawterm/cpu in, 9fs if the port 564 listener is running, use netkey auth, etc. Please test whatever functions are important to you.

  • this is your new seed image. If you use it as a base for cloning your additional grid systems, they will all include copies of the auth information and keys and users.

  • now you can manufacture as many customized systems as you want using the confighelper tools. to make a new customized system, first create a new blank qemu qcow2 for it. qemu-img create -f qcow2 newsysname.qcow2.img 1G creates a qcow2 blank disc that can expand up to 1gb of storage. A venti backed fossil is very small so you probably wont use much of that space.

  • now boot qemu with the blank disk attached as -hdb. in windows for instance, you might do qemu -L . -hda g9.qcow2.img -hdb newcpu.qcow2.img -redir tcp:567::567 -redir tcp:17010::17010 -m 256, then drawterm in as bootes. Run the confighelper script and choose to (a)dd new system to grid. It will prompt you for a system name and other info and a config from /cfg to clone - it will then transform the disk on drive 2 into a customized clone of your seed system on drive 1.

  • continue this process to make as many customized VMs as you want. A very common customization to apply would be changing the venti server IP so you could boot additional nodes on other machines using your existing venti server.

  • these cloned VMs are bootable in exactly the same way as the parent images, and can make 'children' of their own.

  • Making a new blank qcow and booting and running the configscript takes about 1 minute once you are familiar with the process, so you can make two VMs for every computer on your LAN more or less instantaneously. Note that you might want to 'preload' your /lib/ndb/local with the new machines using confighelper's nbd updater prior to starting the cloning process, so all children have all their peers in their ndb.

by www-data at January 28, 2010 04:50 AM

Monday 1 June 2009 - New Release Cycle begins!

Monday 1 June 2009 - New Release Cycle begins!

Well, we've got a new version of the qemu-based grid tools up - it provides a lot of new functionality, flexibility, and hopefully ease-of-use. We haven't really started documenting everything that's going on with it, and we are going to be rolling out new services related to its design during the upcoming weeks. The image and tools itself should start getting more frequent updates and we are hoping to leverage Venti and other Plan 9 based tools to do so.

If you already have access to a linux machine, the new base version of the image provides what we hope is pretty neat out-of-the box functionality - you extract the .tgz, cd into your new gridtools directory, and run (as root) a simple iptables script to redirect some low ports to high ports for the qemu VM to access without root privileges. The gridlord script (run as normal user inside the extraced directory, no install required) then provides a simple menu-based interface to control a full distributed plan9 system all on your local machine. Fire up a VM and let it boot in cpu server mode, then drawterm in as bootes, glenda, or gridna with password: gridpass. The VM image itself is meta-pre-configured - check out the /cfg directory, which holds relatively self-descriptive configurations for multiple machine roles. /usr/bootes/bin/rc contains confighelper -- a wrapper for an set of configuration helper scripts designed to let you quickly and easily control some administrative variables. Please note that right now this is an ad-hoc set of recipes based on the base configuration we provide. We hope to develop it into a more general purpose Plan 9 configuration utility, but it is currently not recommended for use with anything other than the 9gridchan.org preconfigured qemu image.

There are several other customizations and extras included in the image - and source code to everything, of course. /usr/grid is a 'non-login' user whose directories contain some of the grid tools. /usr/grid/src and /usr/bootes/lib have most of the additonal materials, along with /usr/grid/bin/rc and /usr/grid/bin/386 - these directories are bound in during most user's profiles, except glenda. We left Glenda basically untouched from the Bell Labs setup in this image. Thanks to everyone who helped test and offered suggestions for this version of the tools, and we look forward to improving it further according to user suggestions. As of the time of this posting, the surrounding documentation and web resources are still mostly un-updated so things like the walkthrough screenshots gallery still apply to the old version of the image.

by www-data at January 28, 2010 04:50 AM

Wednesday 13 May 2009 - Self-assembling high availability grids

Wednesday 13 May 2009 - Self-assembling high availability grids

Currently under testing and development: the amazing "infinite rabbits from a single venti-hat" High Availability Grid. The past few weeks we've been exploring ideas triggered by the possibility of storing entire sets of machines in both virtual machine and fossil vacroot form in a set of arenas and isect data for a venti server. Before your eyes glaze over from the seeming over-complexity, lets cut to the chase, the goal for the end user.

The end product is a relatively small file distributed as a .tgz that contains a complete set of files for a venti server and a large collection of .vac files. The venti data includes the rootscores of many different machine fossils, and also entire preinstalled qemu virtual machine images that correspond to those rootscores. Using either a native or plan9port venti, the user can choose from a large number of differentlty configured machines to instantiate on either native hardware or as a virtual machine. The venti has data corresponding to machines ranging from an untouched default install to a large grid of dozens of machines. Due to the nature of venti, storing an arbitrarily large number of differently configured virtual machines and fossils has almost no overhead.

The system images included are configured to act as an integrated plan 9 network with a few tricks designed to provide a high availability 'cloud' environment out of the resources available. This is the system employed on the 9gridchan version of the grid which will be opened for public use soon. Our current setup uses one master and one backup venti, three fossil file servers, three cpu servers, and an arbitrary number of terminals. The creation of a single unified environment for the user along with failsafe reliability is accomplished via simple scripted filesystem imports and binds.

Any of the 3 fossils can boot backed by either of the two ventis, which use venti/rdarena and venti/wrarena to make scores available on both. Each CPU server boots with a TCP root from a different fossil. However, all the fossils are basically identical, and when the user logs into any cpu server, 9fs imports of the other fossils are imported, and the primary fossil's $home is bound over the boot $home by the cpu. A simple backgrounded script then mirrors the data from the active $user to the backup fossils.

The user interaction model is to dial each of the three cpu servers from a different window on their terminal. Each window will behave identically and binds $home from the same primary fossil, but is using a different CPU with a different tcp root and venti backup. Once the user has dialed in their three windows, they can work freely in any of them and see the same file data. If that fossil fails, the backup fossils, no more than 5 minutes out of date, can be bound to $home. A standard usage model is to run rio on one cpu, acme on another, and rc shell only on the third. The user experience should be identical to working with a single all-in-one machine with the difference of enhanced reliability and the increased performance of truly independent multiple CPUs.

A public release of a first version of these tools along with some updates to the g/scripts and preconfigured qemu images should be arriving soon. Feel free to stop in #plan9chan on irc.freenode.net if you want to explore this system in its development state. Standard public resources such as omni are unaffected by the current development environment but will become integrated at the time of an initial public distribution.

by www-data at January 28, 2010 04:50 AM

Apr 12 2009 - Just checking in

Apr 12 2009 - Just checking in

No big news on the grid past few weeks - had a service interruption for a few days, but it was resolved. Shockingly, the brief absence of 9gridchan services did not cause any significant rioting or result in the overthrow of any governments. We are greatly relieved.

by www-data at January 28, 2010 04:50 AM

Feb 19 2009 - Random update

Feb 19 2009 - Random update

Random blog update to make sure people know the project is healthy and ongoing. An actual public announcement of some kind on a known Plan 9 forum such as 9fans has been something we've been waiting a long time to do.

by www-data at January 28, 2010 04:50 AM

Feb 02 2009 - Explanatory captions for the walkthroughs.

Feb 02 2009 - Explanatory captions for the walkthroughs.

Thanks to the ceaseless flow of innovation and progress in free and open source software, the walkthrough screenshots tutorial galleries now feature captions and better navigation! Additionally, we are working to provide new services and content on the dynamic 9p registry system. Nothing too thrilling, but as an example, the omniguest user is now exporting /bin. After connecting to that service, binding the imported directory over the local /bin in an rc window allows testing of any of the installed software on the public cpu server without actually logging in.

by www-data at January 28, 2010 04:50 AM

Jan 31 2009 - Setup screenshots gallery

Jan 31 2009 - Setup screenshots gallery

Created a couple galleries with screenshots of starting up the image, connecting with drawterm, and running configscript. Hopefully having a visual reference will make it easier for people trying plan 9 for the first time to see if they are seeing what they are supposed to see. More screenshot galleries will be added to demonstrate making use of the g/script toolkit and 9gridchan.org resources.

by www-data at January 28, 2010 04:50 AM

Jan 30 2009 - post #2: more on 1.1

Jan 30 2009 - post #2: more on 1.1

The preinstalled qemu node image has several purposes. Most importantly is to provide an easy-to-use form of Plan 9 for every platform. Qemu runs on everything and a preinstalled image means that once the download is complete, the user can be inside a fully functional Plan 9 enivornment instantly. Another goal is to provide easy access to the key component of Plan 9: a CPU server. Another is is to provide a user-driven, decentralized platform for grid computing research and development. The latest version of the image makes bootup easier with a single menu for boot choices. There are more variant versions of plan 9 software and some customizations like the default font, taken from quanstro's subpixel fonts. A small sampler collection of .vac scores for downloading additional docs and collections of Plan 9 pictures from venti.9gridchan.org is also included. Thanks as always to the Plan 9 contributors whose work is included such as fgb (contrib and abaco), andrey (irc7 and links port), and others.

by www-data at January 28, 2010 04:50 AM

Jan 30 2009 - Development blog post #1

Jan 30 2009 - Development blog post #1

A big day of upgrades - along with release 1.1 of the /9/grid qemu image and toolkit, we've upgraded www.9gridchan.org with Uriel's latest Werc + Soul9's image gallery application plugin. Thanks to them for their development efforts. The new 1.1 image features more stuff and a smaller download and installed size, along with default user accounts for instant drawterm/cpu access for testing. (CHANGE THOSE PASSWORDS before allowing access from external networks, everyone!)

Ongoing projects include adding scripts for managing venti servers and databases of .vac files to the g/toolkit and a grid game based on namespace manipulation. Future blog posts will also provide some tips and tricks for using grid resources and tools.

by www-data at January 28, 2010 04:50 AM

Eric Van Hensbergen

Linux 9P Trace and Walkthrough

<object height="355" style="margin:0px" width="425"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=9pvirtfscode-12646254287829-phpapp02&amp;stripped_title=9p-virtfs-code-walkthrough"><param name="allowFullScreen" value="true"><param name="allowScriptAccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=9pvirtfscode-12646254287829-phpapp02&amp;stripped_title=9p-virtfs-code-walkthrough" type="application/x-shockwave-flash" width="425"></embed></object>


Unfortunately video didn't come out.

by Eric Van Hensbergen (noreply@blogger.com) at January 28, 2010 03:35 AM

January 27, 2010

newsham

Dirty tricks, China censorship, High speed rail and other news...

2010 and the dirty trick committee is at it again!
Federal officials accused four men, including a conservative activist, of posing as telephone repairmen to tamper with phones at the New Orleans offices of Democratic Sen. Mary Landrieu. [...] Late Monday morning, according to an FBI affidavit, Messrs. Flanagan and Basel, dressed in blue work shirts, fluorescent green vests and construction hard hats, entered Sen. Landrieu's offices and told a staffer they had come to fix the phone lines. By then, Mr. O'Keefe already had arrived at the offices, according to the FBI. -- http://online.wsj.com/article/SB20001424052748703906204575027500584356116.html

Microsoft is defending China's internet censorship program proving that they're still pretty good at being evil. Take that, Google! Also thanks for all the bugs!

That rabble rouser Obama wants to give us high speed rail! That is just pure awesome. I bet there are lots of people who want his head. Oh, and in other news "2009 goes into the history books as the worst year the [airline] industry has ever seen." Yay! In my opinion no other industry treats its customers as poorly as the airline industry. They've received numerous bailouts in the past and seem to have decided that they do not need to earn their dollars. Oh, and they'd rather sequester hundreds of people on the tarmac for hours on end to save a few bucks. Please rot in hell.

I'm starting to get that "double-dip" feeling. Here's some new housing data.

Industry anti-piracy groups are still using heavy handed methods and harassing innocent people. Now laywers who were beaten up for their lunch money in school can get back at society as hired thugs for the music industry.

And going a little deeper on the headlines, check out Nate's wrapup on the PS3 hack.

by newsham (noreply@blogger.com) at January 27, 2010 06:37 PM

research!rsc

Generating Good Syntax Errors

One of the great religious debates in compiler writing is whether you should use parser generators like yacc and its many descendants or write parsers by hand, usually using recursive descent. On the one hand, using parser generators means you have a precise definition of the language that you are parsing, and a program does most of the grunt work for you. On the other hand, the proponents of hand-written recursive descent parsers argue that parser generators are overkill, that parsers are easy enough to write by hand, and that the result is easier to understand, more efficient, and can give better error messages when presented with syntactically invalid programs.

Like in most religious debates, the choice of side seems to be determined by familiarity more than anything else. I knew how to use yacc before I knew how to write a parser by hand, which put me solidly on the parser generator side of the fence. Now that I do know how to apply both techniques, I'd still rather use a parser generator. In fact, I've worked on projects where I've written parser generators rather than write a parser by hand. Good notation counts for a lot, as we shall see.

In Coders at Work, Ken Thompson talks to Peter Seibel about yacc and lex:

Seibel: And are there development tools that just make you happy to program?

Thompson: I love yacc. I just love yacc. It just does exactly what you want done. Its complement, lex, is horrible. It does nothing you want done.

Seibel: Do you use it anyway or do you write your lexers by hand?

Thompson: I write my lexers by hand. Much easier.

Many of the objections raised by the hand-written parser camp are similar to Thompson's objection to lex—the tool doesn't do what you want—but there's no fundamental reason a tool can't. For example, the spurious conflicts that an LALR(1) algorithm produces are definitely hard to understand, but if you replace it with LR(1) or GLR(1), you get a more useful tool. And if you do learn how to work within the LALR(1) algorithm, even yacc is very powerful.

The objection to parser generates that seems to resonate most is that generators like yacc produce inadequate error messages, little more than “syntax error.” Better error messages were one of the key benefits hoped for when g++ converted from a yacc-based C++ parser to a hand-written one (and to be fair, C++ syntax is nearly impossible to parse with any tool; the many special cases cry out for hand-written code). Here the objection seems harder to work around: the parser internally gets compiled into an automaton—usually a big table of numbers—that moves from state to state as it processes input tokens. If at some point it can't find a next state to go to, it reports an error. How could you possibly turn that into a good message?

Clinton Jeffery showed how in a paper published in ACM TOPLAS in 2003 titled “Generating LR Syntax Error Messages from Examples.” As you can guess from the title, the idea is to introduce a training stage after the parser is generated but before it is used for real. In that stage, you feed examples of syntax errors into the parser and look at what state it ends up in when it detects the error, and maybe also what the next input token is. If the automaton hits an error in that state (and with that input token) during real use, you can issue the message appropriate to the example. The important part is that the parser states are basically an abstract summary of the surrounding context, so it's reasonable to treat errors in a particular state with a single message. For example, suppose you find that this Go program

package main

import (
 "fmt",
 "math"
)

causes a syntax error at the comma, in state 76. In the parser, that state means that you're in the middle of an import block and expecting to see a string constant. The state abstracts that context, so you'd end up in the same state if you were parsing this erroneous program:

package fmt

import ( "strings"; "testing", "xgb" )

Having told the parser that the appropriate error message for the first program is “unexpected , during import block,” the parser will use the same message for the second program.

It's an elegant idea, and the implementation can be kept on the side, without adding any complexity to the grammar itself. I heard about this idea through the grapevine years ago (in 2000, I think) but had never gotten a chance to try it until this week.

There are three Go parsers: Ian Lance Taylor wrote a hand-written recursive descent parser in C++ for gccgo, Robert Griesemer wrote a hand-written recursive descent parser in Go (import "go/parser") for godoc and gofmt, and Ken Thompson wrote a yacc-based parser in C for the gc compiler suite.

On Monday night, I implemented Jeffery's idea in the gc compiler suite. The gc compilers use bison, the GNU implementation of yacc. Bison doesn't support this kind of error management natively, but it is not hard to adapt. Changing bison would require distributing a new version of bison; instead, my implementation post-processes bison's output.

The examples are in a new file go.errors. Because the lexer is written by hand, it's not available in the simulation, so the examples are token name sequences rather than actual program fragments. In token lists, the program above and its corresponding error message are:

% loadsys package LIMPORT '(' LLITERAL import_package import_there ','
"unexpected , during import block",

Understanding the tokens on the first line requires knowing what the various grammar token names mean, but they basically mimic the syntax, and this tool is targeted at the people working on the grammar anyway. An awk program loads bison's tables from its debugging dump y.output and then processes the go.errors file, replacing each line like the above with the number of the offending state and input token. That produces a table that can be linked into the compiler and consulted when a syntax error is encountered. If the state and input token match an entry in the table, the better error is used instead of a plain syntax error.

That was an outsized amount of work to close one bug, but now it's easy to add better messages for other common situations. For example, the fact that only the short := declaration form is allowed in for initializers sometimes trips up new Go programmers. When presented with this program:

package main

import "fmt"

func main() {
 fmt.Printf("hello, world\n")
 for var i = 0; i < 10; i++ {
  fmt.Println(i)
 }
}

the compiler used to just say there was a syntax error on line 7, but now it explains:

$ 6g x.go
x.go:7: syntax error: var declaration not allowed in for initializer

because of this stanza in go.errors:

% loadsys package imports LFUNC LNAME '(' ')' '{' LFOR LVAR LNAME '=' LNAME
"var declaration not allowed in for initializer",

Gccgo and gofmt give more traditional token-based errors:

$ gccgo x.go
x.go:7:6: error: expected operand
...

$ gofmt x.go
x.go:7:6: expected operand, found 'var'

What's interesting isn't that neither has bothered to handle this as a special case yet. What's interesting is to consider the work required to do so. Changing either would require understanding the corresponding hand-written parser well enough to find the right place to put the special case. With the example-based approach, you just write an example, and the tool figures out where in the parser it needs to go.

by rsc (noreply@blogger.com) at January 27, 2010 05:00 PM

January 25, 2010

newsham

Some news today

China is denying it had anything to do with the Google hack and pointing the finger back at the US saying that certain unnamed intelligence agencies have hacked a number of email accounts. I believe half of that statement.

The lady who was fined $2M for illegal file sharing of music has had her fine reduced to $52k. The judge simply thought the original fine was too "monstrous and shocking" and reduced it to the largest value he thought was no longer so. $52k sounds more appropriate for sharing 24 songs with no profit motive.

There's been progress on hacking the PS3 to open up the platform (and support piracy). The headline says "hacked" but it sounds like there's still a bit of work to be done.

Its looking more likely that the guy in charge of making sure our economy doesn't tank during the biggest recession of our lifetime is going to get to keep his job. It must be nice working in the financial industry where you can be popular and even make tons of money even if you destroy the wealth of your clients or constituents. Some people are happy at the job he did to get us out of the recession, and I think he deserves credit for that, but now that we're out of recession, he should still be shown the door for letting us get into the mess in the first place.

by newsham (noreply@blogger.com) at January 25, 2010 06:55 PM

January 23, 2010

newsham

Python marshalling, iterators and currying

I dig protocols and as a result I write marshalling code often. When I do, I often take the opportunity to rethink some ideas and try out something new. I'm going to use this post to talk about some marshalling ideas and code that I'm playing with now. The goal here is to write marshallers that are short and concise and easy to use. Performance is not considered.

Marshalling
Marshalling is the process of converting some structured data into a serial format. One way to think about this in python is a function that maps some structured datum into an iterable that yields characters. The simplest marshallers can be written directly using generators:
def encChar(ch):
yield ch
def encNum1(x) :
yield chr(x & 0xff)
More complex marshallers can be written by chaining together simpler encoders. The itertools module provides a chain function which does this. Here we marshall 16-bit and 32-bit numbers in big-endian format:
from itertools import chain
def encNum2(x) :
return chain(encNum1(x >> 8), encNum1(x))
def encNum4(x) :
return chain(encNum2(x >> 16), encNum2(x))
Marshalling a fixed list of values can also be accomplished by chaining together several encoders:
def encFixedList_(es, xs) :
return chain(*(e(x) for e,x in zip(es, xs)))
This function chains together the encoding of each item in xs using the matching encoder in es. In practice you'd want a special zip that results in error if the number of es and xs was not the same, but lets ignore that for now. This function will work just fine, but it has a problem in practice -- usually you'll want to use this function partially applied, providing the es but leaving the xs to be filled in later. Here's where currying comes in handy. We'll make a function that takes just the es and returns a function that takes the xs later:
def encFixedList(*es) :
return lambda xs : chain(*(e(x) for e,x in zip(es, xs)))
This is much easier to use:
encMyList = encFixedList(encNum4, encChar, encNum2)
You might notice that the encoding of fixed lists is basically an encoding of records. If we presuppose that all of our records have a toList method that returns all of the record's fields as a list, we can write a marshaller for these records:
def encRecord(*es) :
enc = encFixedList(*es)
return lambda rec : enc(rec.toList())
class MyRec(object) :
def __init__(self, a, b, c) :
self.a, self.b, self.c = a,b,c
def toList(self) :
return (self.a, self.b, self.c)
def __str__(self) :
return '[MyRec a=%s b=%s c=%s]' % self.toList()
encMyRec = encRecord(encChar, encNum1, encNum1)
To use these marshallers we pass the value we want to encode and then convert the resulting iterable to a string:
def encode(enc, x) :
return ''.join(enc(x))
x1 = encode(encNum4, 0x01020304)
print '%r' % x1
x2 = encode(encMyList, [31415927, 'x', 257])
print '%r' % x2
x3 = encode(encMyRec, MyRec('A', 5, 10))
print '%r' % x3

Unmarshalling
Unmarshalling is the process of turning a stream of bytes back into structured data. Instead of working with an iterable and converting it to structured data we'll be working with an iterator. The difference is subtle; we'll want to consume bytes and keep track of where we are in the stream. An iterator does just that. Each of our unmarshallers will take in an iterator of characters, consume some of the bytes and return a decoded value. As before our simplest unmarshallers are for characters and single-byte numbers:
def decChar(it) :
return it.next()
def decNum1(it) :
return ord(decChar(it))
Unmarshalling larger numbers is just a matter of unmarshalling several smaller numbers and composing them. Here are the unmarshallers for 16- and 32-bit big endian numbers. These definitions make use of the left-to-right order of evaluation:
def decNum2(it) :
return (decNum1(it) << 8) | decNum1(it)
def decNum4(it) :
return (decNum2(it) << 16) | decNum2(it)
And unmarshall fixed lists is just a matter of decoding each constituent in turn. Here again we use currying:
def decFixedList(*ds) :
return lambda it : [d(it) for d in ds]
decMyList = decFixedList(decNum4, decChar, decNum2)
To unmarshall a record we need a way to unmarshall the constituent fields and then create the record. The decFixedList function can be used to unmarshall the records if we have a way to construct the record from a list. We'll presuppose that our records have a constructor that take the field values in turn, then unmarshalling is simply:
def decRecord(klass, *ds) :
dec = decFixedList(*ds)
return lambda it : klass(*dec(it))
decMyRec = decRecord(MyRec, decChar, decNum1, decNum1)
To use these unmarshallers to decode a binary string we must first get an iterator and then pass it to the decoder:
def decode(dec, binary) :
return dec(iter(binary))
print decode(decNum4, x1)
print decode(decMyList, x2)
print decode(decMyRec, x3)

Records
One refinement I often make is to simplify the construction of records since I often have many records in the protocol:
class Record(object) :
def __init__(self, *args) :
for f,v in zip(self.__fields__, args) :
setattr(self, f, v)
def toList(self) :
return [getattr(self, f) for f in self.__fields__]
def __str__(self) :
fs = ', '.join('%s=%r' % (f, getattr(self, f)) for f in self.__fields__)
return '[%s %s]' % (self.__name__, fs)
Now defining a new record is done simply by filling in the __name__ and __fields__ values:
class MyRec2(Record) :
__name__ = 'MyRec2'
__fields__ = 'code', 'cost', 'quantity'
encMyRec2 = encRecord(encChar, encNum2, encNum1)
decMyRec2 = decRecord(MyRec2, decChar, decNum2, decNum1)
x4 = encode(encMyRec2, MyRec2('X', 2495, 3))
print '%r' % x4
print decode(decMyRec2, x4)

Discussion
In the past I had used a special buffer class to marshall data into and out of instead of using iterables and iterators. This approach seems to make better use of the mechanisms already existing in Python. This code makes writing marshallers and unmarshallers for records very simple. One downside is that it hurts debugging a little bit. The stack traces you get when something goes wrong are more complicated than they would be if record marshallers were manually written out by the programmer. Another downside is that marshallers and unmarshallers are written separately even though for correct operation they must be kept in synch and basically encode the same information. A more complicated library could define pairs of marshallers and unmarshallers together and in fact I've tried this approach in the past.
All in all, I am happy with this approach and find it concise and very convenient for prototyping. Look what we've accomplished in just a few lines of code!

from itertools import chain

class Record(object) :
def __init__(self, *args) :
for f,v in zip(self.__fields__, args) :
setattr(self, f, v)
def toList(self) :
return [getattr(self, f) for f in self.__fields__]
def __str__(self) :
fs = ', '.join('%s=%r' % (f, getattr(self, f)) for f in self.__fields__)
return '[%s %s]' % (self.__name__, fs)

def encode(enc, x) :
return ''.join(enc(x))
def encChar(ch):
yield ch
def encNum1(x) :
yield chr(x & 0xff)
def encNum2(x) :
return chain(encNum1(x >> 8), encNum1(x))
def encNum4(x) :
return chain(encNum2(x >> 16), encNum2(x))
def encFixedList(*es) :
return lambda xs : chain(*(e(x) for e,x in zip(es, xs)))
def encRecord(*es) :
enc = encFixedList(*es)
return lambda rec : enc(rec.toList())

def decode(dec, binary) :
return dec(iter(binary))
def decChar(it) :
return it.next()
def decNum1(it) :
return ord(decChar(it))
def decNum2(it) :
return (decNum1(it) << 8) | decNum1(it)
def decNum4(it) :
return (decNum2(it) << 16) | decNum2(it)
def decFixedList(*ds) :
return lambda it : [d(it) for d in ds]
def decRecord(klass, *ds) :
dec = decFixedList(*ds)
return lambda it : klass(*dec(it))

by newsham (noreply@blogger.com) at January 23, 2010 08:57 PM

newsham

Dear Microsoft

Dear Microsoft,
  I want to use the software I paid for. More accurately, my niece does. We bought her a laptop a few Christmases ago. Now her computer is unusable, which is partly her fault (in the same sense that having graffiti on your car is your fault if you parked in a bad neighborhood). Now I'm helping her reinstall her system to give her a fresh start. Here's the rub: the laptop manufacturer didn't give her an install or recovery DVD with a copy of Windows Vista that we paid for when we bought the computer. (There's even a nice proof-of-authenticity sticker on the bottom of the machine with a license key!) The manufacturer has informed me that they will send me the recovery DVD at no cost, unless you count $24.95 shipping and handling. So this is the situation I'm in: I have to pay $25 and wait a week or two to reinstall a copy of Windows Vista that we already have a legitimate license to use. I'm failing to see what the Windows Genuine Advantage is here. It would cost me less and be less effort to find a pirated copy of the software on the internet and install that instead. Do you see the perverse incentive structure that exists?
With Love, Tim.

by newsham (noreply@blogger.com) at January 23, 2010 06:19 PM

NineTimes - Inferno and Plan 9 News

New alpha driver for Marvell Yukon 2 Ethernet

Erik released a new alpha driver for Marvell Yukon 2 Ethernet NICs.

From: erik quanstrom <quanstro@qua...>
Subject: yukon 2 alpha driver
Date: Sat, 16 Jan 2010 18:15:50 -0500

is anyone interested in testing a marvell yukon 2
driver?  i am currently working with a 88e5057
(1186/4b00, dge-560t).  but most single-port yukon 2
parts should work fine.  if so. please contact me
off list.

- erik

You can install it from contrib with

term% contrib/install quanstro/yuk

This will install two files, etheryuk.c and yukdump.h, into your /sys/src/9/pc directory.

Edit /sys/src/9/pc/pcf and add the following line to the list of Ethernet devices:

    etheryuk        pci

After that compile your new kernel by running

mk 'CONF=pcf'

Then copy the new kernel into /n/9fat and boot with it.

On my ASUS A8R32-MVP with 88E8053 chip the following new device will now show up:

#l0: yuk: 1000Mbps port 0xDFEFC000 irq 4: 00173137b3d3

I tested it during the last several days and it works great, good work!

Please send your feedback and bug reports to Erik.

by www-data at January 23, 2010 01:51 AM

New alpha driver for Marvell Yukon 2 Ethernet

Erik released a new alpha driver for Marvell Yukon 2 Ethernet NICs. (See his post at 9fans)

You can install it from contrib with term% contrib/install quanstro/yuk

This will install two files, etheryuk.c and yukdump.h, into your /sys/src/9/pc directory.

Edit /sys/src/9/pc/pcf and add the following line to the list of Ethernet devices: etheryuk pci

After that compile your new kernel by running mk 'CONF=pcf'

Then copy the new kernel into /n/9fat and boot with it.

On my ASUS A8R32-MVP with 88E8053 chip the following new device will now show up: #l0: yuk: 1000Mbps port 0xDFEFC000 irq 4: 00173137b3d3

Please send your feedback and bug reports to Erik.

by www-data at January 23, 2010 01:51 AM

January 22, 2010

newsham

GOOG hack, MSFT vulnerability and disclosure

It turns out that Microsoft knew about the IE vulnerability used in the GOOG hack since last September. They were planning on rolling out a fix in February. If the fix had been rolled out earlier, this particular attack might never have happened (but keep in mind, there are many unfixed vulnerabilities in all software!). So why such a long lead time between report and fix? Its due to a lot of factors, but I'm going to harp on one: responsible disclosure.

The computer security field has been fiercely debating how disclosure should be handled for quite a long time. In the "good old days" there was a cabal of security professionals who shared new found vulnerabilities only with each other and vendors. This was nice if you were in the cabal (think of how easily you can impress your clients if you're a security auditor with access to a large number of unfixed vulnerabilities). Unfortunately it meant many bugs were not fixed for long periods of times. It also made it very hard for newcomers to learn about security and enter the field. In a backlash a large number of security researchers turned to full disclosure policies: when they found a bug they told everyone about it, often including details of how to perform an exploit. This means vendors, good guys and bad guys all got access to this information. This was great if you were a bad guy, or if you were a good guy without access to the cabal. It also put a lot of pressure on vendors to fix vulnerabilities ASAP. Vendors did not much care for this. They pushed for another model called responsible disclosure in which vendors were given advanced notice and only after a reasonable amount of time or when a fix was available would the full details be given to the public. So these are three prominent points on a continuum of disclosure policies, and where you stand on this line is often a matter of religion. So much so that the proponents come up with loaded terms such as "responsible disclosure" to imply that those who don't hold the same view are some how irresponsible.

So back to my point -- the vulnerability used in the google hack was reported to MSFT back in September. They realized this was a very important bug, but decided to release a fix in February, five months later! Clearly "responsible" disclosure gave MSFT a lot of flexibility in postponing the release of this fix. Once the details of the hack were released, MSFT scrambled to issue a release in a much shorter timeline. Did responsible disclosure work well in this instance?  I would argue "no". Would full disclosure have worked better? Possibly, but keep in mind that while a fix would have been available much quicker, many more attackers would have had access to this information during that shorter amount of time.

By the way, how did the attackers get access to this exploit? Have attackers penetrated MSFT? Do they have insiders working at MSFT who have access to the vulnerability database?  If either of these are true, then I would argue that responsible disclosure gives the advantage to these attackers. I also wonder if a country as powerful as China has insiders working at many of our top software companies. The tech industry employs many Chinese nationals.  I don't mean to imply that they are all agents of their government, but it would not surprise me if there were a few agents in the mix. I'm not sure there's a great way to defend against a threat like this. Software is so full of security holes, and a company usually has a large database full of software bugs found internally or reported by external customers. Often times the full impact of a bug is not known and important security bugs are left unpatched for years.

by newsham (noreply@blogger.com) at January 22, 2010 06:31 PM

January 21, 2010

newsham

Big news day

Wow, what a news day today.

In lighter news, this is pretty cool! Turning your clothes into batteries (and not the matrix way).

by newsham (noreply@blogger.com) at January 21, 2010 06:24 PM

OS Inferno

Среда acme на видео

Всем, кто хотел познать красоту acme, но не мог это сделать, посвящена целая серия видеоуроков, опубликованных в блоге http://thenewsh.blogspot.com. Ссылка.

by j1m (noreply@blogger.com) at January 21, 2010 05:52 AM

January 17, 2010

newsham

Scary Asteroid

The scariest asteroid you've never heard about:
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,0,0" height="264" width="400"><param name="flashvars" value="webhost=fora.tv&amp;clipid=8779&amp;cliptype=highlight"><param name="allowScriptAccess" value="always"><param name="allowFullScreen" value="true"><param name="movie" value="http://fora.tv/embedded_player"><embed allowfullscreen="true" allowscriptaccess="always" flashvars="webhost=fora.tv&amp;clipid=8779&amp;cliptype=highlight" height="264" pluginspage="http://www.macromedia.com/go/getflashplayer" src="http://fora.tv/embedded_player" type="application/x-shockwave-flash" width="400"></embed></object>

Have a nice day!

by newsham (noreply@blogger.com) at January 17, 2010 06:20 PM

January 15, 2010

newsham

The Acme environment in Plan9

Here's a three part introduction to the Acme environment in Plan9. Acme's an editor, a file browser, a paned text user interface a shell environment, and more. It's small, simple, elegant and extensible. A lot of people are put off by it when they first try it since it is so foreign and its not obvious how to use all of its features. This short video series shows some of the features of Acme and how it fits into the Plan9 environment.

<object height="265" width="320"><param name="movie" value="http://www.youtube.com/v/dopu3ZtdCsg&amp;hl=en_US&amp;fs=1&amp;"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="265" src="http://www.youtube.com/v/dopu3ZtdCsg&amp;hl=en_US&amp;fs=1&amp;" type="application/x-shockwave-flash" width="320"></embed></object>

<object height="265" width="320"><param name="movie" value="http://www.youtube.com/v/2vjD_B__SbQ&amp;hl=en_US&amp;fs=1&amp;"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="265" src="http://www.youtube.com/v/2vjD_B__SbQ&amp;hl=en_US&amp;fs=1&amp;" type="application/x-shockwave-flash" width="320"></embed></object>

<object height="265" width="320"><param name="movie" value="http://www.youtube.com/v/cR96WQ6OR00&amp;hl=en_US&amp;fs=1&amp;"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="265" src="http://www.youtube.com/v/cR96WQ6OR00&amp;hl=en_US&amp;fs=1&amp;" type="application/x-shockwave-flash" width="320"></embed></object>

by newsham (noreply@blogger.com) at January 15, 2010 07:05 PM

January 08, 2010

newsham

A protocol development trick

I'm working on a new protocol. To ease development I am splitting up the work into two parts. First I'll worry about the protocol semantics and content and then I'll figure out the encoding. (Leave aside for a moment that you should never totally ignore the encoding when developing a protocol; I have some experience, so its in the back of my mind as I go along). I still want to write code to test this thing out. Luckily I can do this pretty easily in python without nailing down a concrete encoding by using dictionaries and the pickling library:

def send(s, **kw) :
s.send(pickle.dumps(kw))
def recv(s) :
return pickle.loads(s.recv(64*1024))

I'm using UDP so I get natural message delimiters. If this was TCP I'd have to wrap the pickled data with a delimiter somehow (ie. prepend a length field). Now sending and receiving messages is straightforward:

send(s, foo=1, bar="test", priority="utmost!", flags=SYN|ACK)
m = recv(s)
print m['priority']

And to make field access a little easier I define a message object so I can use field access instead of dictionary indexing. It also gives me pretty printing:

class Msg(object) :
def __init__(self, *kw) :
self.__names__ = kw.keys()
for n,v in kw.items() :
setattr(self, n, v)
def __str__(self) :
attrs = ' '.join("%s=%s" % (n,getattr(self,n)) for n in self.__names__)
return '[Msg %s]' % attrs
__repr__ = __str__

def recvMsg(s) :
return Msg(**recv(s))

m = recvMsg(s)
print m.priority
print m

So now I can write my protocol in terms of message fields, leave the details of field encoding to later and still have an executable prototype to play with.

btw, anyone know good code formatting options that are compatible with blogger.com (besides http://code.google.com/p/syntaxhighlighter/ which looks pretty god awful ugly)?

by newsham (noreply@blogger.com) at January 08, 2010 08:49 PM

January 07, 2010

newsham

Just because you're a paranoid schizophrenic doesn't mean they're not out to get you

This story could have been a Hollywood suspense thriller: A housekeeper kidnaps, drugs, kills and burns the body of a rich mad scientist after getting access to his investments and power of attorney. But its not a movie, it actually happened. http://www.latimes.com/news/nation-and-world/la-na-scientist-murder7-2010jan07,0,1907606,full.story

by newsham (noreply@blogger.com) at January 07, 2010 06:15 PM

Factorization of a 768-bit RSA modulus

Factorization of a 768-bit RSA modulus by Zimmermann et al.
This paper reports on the factorization of the 768-bit number RSA-768 by the number field sieve factoring method and discusses the implications for RSA.

by newsham (noreply@blogger.com) at January 07, 2010 05:47 PM

January 01, 2010

Quod Erat Dragon

qedragon @ 2010-01-01T08:05:00

Happy New Gregorian Year, Livejournal.

My new year's resolution is 1280x800. What's yours?

January 01, 2010 08:06 AM

December 31, 2009

newsham

Side channel paper on Nate's blog

Check out Nate's paper on side channel attacks: http://rdist.root.org/2009/12/30/side-channel-attacks-on-cryptographic-software/. I expect these kind of weaknesses to pop up all over the place in web applications that use keyed hashes to tamper-proof values. As Nate points out, virtualization and cloud computing are going to result in more opportunities for these kinds of attacks.

PS: if you're into cryptography, security, copy protection or C64's, you should be following Nate's blog.

by newsham (noreply@blogger.com) at December 31, 2009 08:46 PM

Google books on the NewsHour

The PBS NewsHour did a story on Google Books the other day. The project's goal is to make all books available online for searching and browsing. This is quite a disruptive project and upsets a lot of people. What amazes me is just how many people it upsets. The project (and any project like this) seems to be such an obvious step forward for humanity. Part of the controversy is over all of this information being controlled by Google. Perhaps the controversy will spur others to compete.

<script src="http://www.pbs.org/wgbh/pages/frontline/js/pap/embed.js?news01n37f5qd53" type="text/javascript"></script>

by newsham (noreply@blogger.com) at December 31, 2009 07:07 PM

Another seL4 paper

I've been a keen follower of the seL4 project for a long time. As you may know, they announced the completion of their proven implementation this summer. Here's the latest in a long series of papers to come out of the project: http://ertos.nicta.com.au/publications/papers/Klein_09:login.pdf
This paper is written for a wider audience and explains the project's accomplishments in a very accessible manner. The paper has a refreshing discussion about the limitations of what they accomplished. It also puts it into context in comparison with other techniques such as testing, using type-safe languages and static code analyzers.

Here are a few interesting quotes:

Does this mean that we have proved seL4 is secure? Not yet, really. We proved that seL4 is functionally correct. Secure would first need a formal definition, which in itself is a wide field and depends on what you want to use the kernel for. Taken seriously, security is a whole-system question, including the system’s human components. 
After using the initial code, running static analyzers on them and finding and fixing 16 bugs:

Formal verification then found 144 more bugs in the C code, in total 160. This means even though the code appeared to be running just fine for normal application, there were an order of magnitude more bugs lurking in there than the 16 found initially. [...] As mentioned, we also did proofs on the design and the specification level. This was to 95% before the code was written, and we fixed about 150 issues in each. That means, in total we have discovered roughly 460 issues in kernel code, design and specification.

by newsham (noreply@blogger.com) at December 31, 2009 06:50 PM

December 29, 2009

newsham

gsm cracked...

http://news.bbc.co.uk/2/hi/technology/8429233.stm Someone at CCC announced a GSM crack. The GSMA is saying that the crack is "highly illegal." Go figure. The attack lowers the bar on how hard and expensive it is to crack GSM calls and of course the gov't was already capable:
According to Ian Meakin, of mobile encryption firm Cellcrypt, only government agencies and "well funded" criminals had access to the necessary technology.

by newsham (noreply@blogger.com) at December 29, 2009 06:26 PM

newsham

News...

Would a business treat its customers so bad that the president himself would have to step in to regulate?  Yes, indeed:
The Obama administration said Monday it would begin routinely penalizing U.S. air carriers for lengthy tarmac delays. [...] Airlines would be fined $27,500 per passenger for violations.
Those who don't like the government poking its head into private business can blame the airlines for bringing this on themselves. Twenty-seven thousand is a small price to pay for holding passengers against their will. I would suggest jail time is more appropriate. http://online.wsj.com/article/SB126140826775100235.html?mod=WSJ_hpp_MIDDLTopStories

Can you believe that twitter is profitable? I'm not sure I can.  http://news.cnet.com/8301-17939_109-10419569-2.html

Edit: oops, I botched the airline news story URL above.  It's fixed now.

by newsham (noreply@blogger.com) at December 29, 2009 03:05 AM

December 28, 2009

newsham

Good 'ole NLS

I'm pretty fascinated by computer history. There are so many great ideas out there. Here's a bit of history I did not know too much about: http://en.wikipedia.org/wiki/NLS_(computer_system) Its pretty amazing what these guys were doing way back in 1968 and 1970 and how much of it didn't make its way to us until so much later.

Here's a one-hour video demonstration of the system from 1968. It was so impressive that it was dubbed "the mother of all demos." See for yourself:
<embed allowfullscreen="true" allowscriptaccess="always" id="VideoPlayback" src="http://video.google.com/googleplayer.swf?docid=-8734787622017763097&amp;hl=en&amp;fs=true" style="height: 326px; width: 400px;" type="application/x-shockwave-flash"></embed>

One of the great things about computer history is that it comes alive in emulators. You can actually boot and run some of these great ancient systems such as 1st and 7th edition UNIX or the CADR lisp machine. The Computer History Museum built an emulator of NLS but, as is too often the case with historic computer software, is having trouble getting the rights to redistribute it.

Thanks to grey for turning me onto this info.

Edit: I'm told there's a good free book that covers this material. I haven't yet read it.

by newsham (noreply@blogger.com) at December 28, 2009 11:09 PM

PRO test

Very interesting and dangerous times in Iran lately. http://www.cnn.com/2009/WORLD/meast/12/28/iran.protests.streets/

It's sad that american media has been transformed from a powerful world wide power into a sideline youtube commentary. Also, thats my job!
Kudos to the brave men and women in Iran who are keeping the reports coming. I hope they know how many people around the world are rooting for them. We hope you get the right to freely express your opinion free of fear of retribution whether its through the current government or a new one. Here's to you:
<object height="265" width="320"><param name="movie" value="http://www.youtube.com/v/aS-gGYaA8F0&amp;hl=en_US&amp;fs=1&amp;"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="265" src="http://www.youtube.com/v/aS-gGYaA8F0&amp;hl=en_US&amp;fs=1&amp;" type="application/x-shockwave-flash" width="320"></embed></object>

by newsham (noreply@blogger.com) at December 28, 2009 07:00 PM

post-christmas news

"The U.S. Transportation Security Administration said it had stepped up pre-flight screening in the United States and Europe." according to reuters. Yes, the TSA knee can still jerk. Here I was under the impression that finding bombs was part of the normal TSA screening. What exactly is the extra screening going to be looking for that they don't currently look for? Or is this just more security theater.


AT&T decided that it would be smart to block NYC customers from signing up with iPhone. Hilarity ensues. Btw, am I the only one that finds their ad campaign responding to the verizon "maps" campaign to be completely inane? "Verizon likes to talk about our maps! But they don't even let you talk on the phone while surfing the web! And we have the fastest 3G network! [*] 3G coverage not available in all areas." WTF? Also, don't go talking about your limited 3G coverage in the same ad that is trying to poke at you for having lousy 3G coverage.


Guess who's giving more financial handouts to FNM? You, of course! Nice of them to sneak it in during a quiet vacation week. And the managers won't even accept stock as compensation. And you thought this whole financial crisis thing was over. In unrelated news, FNM is up 17% today.

by newsham (noreply@blogger.com) at December 28, 2009 06:05 PM

December 26, 2009

johnny

Intellectual property

Intellectual property: The new war

So the rich fuckers don't slack, eh. Be careful with the fuckers, they'll do anything that their money can permit them to keep you down, latest trend is closing down the doors on open and free intellectual property.

The G5 (or G7 or G20 or whatever was it?) is pushing it's own agenda: ACTA , and of course we shouldn't forget about Genome and software patents.

Honestly, i wouldn't be surprised if you had to pay a tax for having a blue-eyed blond kid in 20 years.

by lighttpd at December 26, 2009 07:15 PM

December 25, 2009

Quod Erat Dragon

Root-causing the droid bug

I did some more digging in my checkout of the Android tree. You'll be
entertained.

You will want to look at the Android tree's
dalvik/libcore/archive/src/main/java/java/util/zip/ZipFile.java
( For contrast, see ./external/zlib/contrib/minizip/unzip.c )
and in particular this comment:

/*
* Find the central directory and read the contents.
*
* The central directory can be followed by a variable-length comment
* field, so we have to scan through it backwards. The comment is at
* most 64K, plus we have 18 bytes for the end-of-central-dir stuff
* itself, plus apparently sometimes people throw random junk on the end
* just for the fun of it.
*
* This is all a little wobbly. If the wrong value ends up in the EOCD
* area, we're hosed. This appears to be the way that everbody handles
* it though, so we're in pretty good company if this fails.
*/

And there's your 64K limit, kinda, sorta mostly.... The code looks for
101010256L == 0x06054B50 ("PK\x05\x06" little endian) which apparently
signals the End Of Central Directory. Anyway... That sequence occurs at
0x0a33ef3 (exploit) and 0x0a2756f (legit). The ZipFile.java code scans
forward from 64K back from the end of the file; the unzip.c code starts at
the end and walks backwards.

How about a magic trick?
I'm going to make this signature validity... disappear!
*WHAM*
TA DA! It's ... gone!

Even more entertainingly, there is in fact some "junk" tossed on the end of
the file at this point, courtesy of Motorola and SignApk (it even tells us
so in the file)... the exploit zip doesn't begin until somewhere near
0x0a27c30 -- 1729 (or so) bytes of junk later, so there's your 63K limit for
this particular exploit.

December 25, 2009 08:01 PM

un-zipping the droid's jar

I was pointed at a very curious file today. Since I can't rehost it here,
I'll just give the MD5 and SHA1 sum so you can google for it -- indeed google
has already indexed the MD5 sum to a few likely mirrors.

94a0c30ea9104c2776d042e760bfd716 droid-root.zip
20ad50644644097eb9914cdc18a2527503291d45 droid-root.zip

Anyway, how curious is this file? Well, it's this curious:

% mkdir unzip unjar
% (cd unzip; unzip ../droid-root.zip)
% (cd unjar; jar -xf ../droid-root.zip)
% diff -rq unjar unzip
Only in unjar: bp.img
Only in unjar: mbm.bin
Only in unjar/META-INF: CERT.RSA
Only in unjar/META-INF: CERT.SF
Files unjar/META-INF/com/google/android/updater-script and unzip/META-INF/com/google/android/updater-script differ
Only in unjar/META-INF: MANIFEST.MF
Only in unjar: patch
Only in unjar: rdl.bin
Only in unjar: recovery
Only in unjar/system: app
Only in unzip/system: bin
Only in unjar/system: fonts
Only in unjar/system: lib

Some additional fun facts about this file:
0000000 -- first byte of file, appears to be legitimate META-INF/MANIFEST.MF header & blob
0a22020 or so -- index of legitimate update.zip (META-INF/MANIFEST.MF entry)
0a27586 -- "signed by SignApk"
0a27600 and following -- appears to be HTC Certificate
0a27c3d -- start of next ZIP file? ("PK" and system/ entry)
0a27cc3 or so -- header & blob for /system/bin/su file
0a33970 or so -- index of illegitimate update.zip (META-INF/ entry)
0a3400e -- last byte of file

Observe the following (10648637 is 0x0a27c3d) :
% dd if=droid-root.zip of=test.zip skip=10648637 bs=1
50130+0 records in
50130+0 records out
50130 bytes (50 kB) copied, 0.163549 s, 307 kB/s
% file test.zip
test.zip: Zip archive data, at least v1.0 to extract
% mkdir unzip.sfx
% (cd unzip.sfx; unzip ../test.zip)
Archive: ../test.zip
error [../test.zip]: missing 10648637 bytes in zipfile
(attempting to process anyway)
creating: system/
creating: system/bin/
inflating: system/bin/su
creating: META-INF/
creating: META-INF/com/
creating: META-INF/com/google/
creating: META-INF/com/google/android/
inflating: META-INF/com/google/android/updater-script
error [../test.zip]: attempt to seek before beginning of zipfile
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)

If you flip the dd around, the prefix is observed to be a perfectly sane .zip file,
handled by both unjar and unzip in identical ways.

So what we've got is a file which is secretly two zip files concatenated. The index
of the second is mangled so that some tool, namely unjar, blows by it in its search
for the index, but that another tool, namely unzip, does not.

And now for why this matters. Jar file signing is very clever: the list of files
and their hashes is generated and signed and the resulting signature detached and
itself stored in the jar. This allows multiple parties to attest to the contents
of the archive. However, what's happening here is that one tool -- the one verifying
the signature -- sees what it expects to see: the tail of the file (the second zip
file) is outside the archive and therefore doesn't alter the list of files inside
the archive or their contents. So the signature is still believed valid. The file
is then extracted again using a different tool which operates on the
second zip file. At this point, no signature or manifest checks are made, and
your phone is now correctly yours again.

--nwf;

P.S. The reason for the "error" message in the unzip command above is that the
second index has been told that there's this first zip file atop it and so it's
using "back-references" as a kind of compression to allow for more interesting
bits to fit in the exploitable window.

December 25, 2009 08:01 PM

December 24, 2009

newsham

OLPC3

OLPC is unveiling their vision for the next generation computer:



The machine will have no keyboard or connectors (using inductive charging). It will be a full paper-sized plastic display (plastic logic?), low-power (with ARM, I guess no more windows?), have a touch screen that supports a virtual keyboard and cost only $75. Of course OLPC won't be making the machines themselves, they will just inspire the design, but rest assured it will be ready by 2012!  Or maybe not..
Still, its an interesting vision. Lets see how much of it comes to pass.

http://www.wired.com/gadgetlab/2009/12/xo-3-concept-a-crazy-thin-tablet-olpc-for-just-75/
http://news.bbc.co.uk/2/hi/technology/8428147.stm
http://www.forbes.com/2009/12/22/tablet-computer-negroponte-technology-cio-network-olpc.html

by newsham (noreply@blogger.com) at December 24, 2009 06:16 PM

OS Inferno

VGA-драйвер

Brian L. Stuart продолжает выкладывать свои наработки, озвученные на IWP9. В этот раз он сделал доступным простой VGA-драйвер для нативной версии Inferno. Как и раньше исходный код доступен на странице https://code.google.com/p/inferno-bls/.

by j1m (noreply@blogger.com) at December 24, 2009 09:28 AM

December 21, 2009

newsham

Mighty morphing faces

Pretty cool stuff demonstrated in this video. They model a face using lots of input faces, computing an average face and a vector space of faces. They then fit the face to faces in 2D pictures and allow editing of characteristics like how feminine or masculine the face is, emotional gestures like smiling and frowning and body weight. I can't wait until the National Enquirer gets a hold of this stuff!

<object height="265" width="320"><param name="movie" value="http://www.youtube.com/v/nice6NYb_WA&amp;hl=en_US&amp;fs=1&amp;"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="265" src="http://www.youtube.com/v/nice6NYb_WA&amp;hl=en_US&amp;fs=1&amp;" type="application/x-shockwave-flash" width="320"></embed></object>

by newsham (noreply@blogger.com) at December 21, 2009 11:14 PM

Decade review for tech

From the register, a good review of the decade in tech: http://www.theregister.co.uk/2009/12/16/noughties_review/

  • Most disruptive device: iPhone
  • Biggest makeover: amazon and ec2
  • security shock: worms
  • biggest folly: SCO/linux
  • biggest blunder: MSFT stops IE dev
  • longest suicide note: sun micro
  • most wanted: jboss and mysql
  • survivor: winxp
  • linux distro: ubuntu
  • worth repeating: big media will stop illegal filesharing
  • mixed blessing: "free" (code, information, money)
  • most utility: usb
  • new microsoft: google
  • kleptomania: oracle
  • surprise: netbooks
  • needed correction: recession

by newsham (noreply@blogger.com) at December 21, 2009 05:59 PM

December 17, 2009

newsham

Drone hacking

http://www.reuters.com/article/idUSTRE5BG3RM20091217
Shi'ite fighters in Iraq used software that cost as little as $26 to intercept the video feeds, potentially allowing them to monitor U.S. military operations.
It was only a matter of time. When it comes to hacking, America's the country with everything to lose and little to gain. You'd think as a result we'd be more interested in good computer engineering and defensive computer security but that does not appear to be the case.
I wonder if the pentagon is taking their software vendors to task over the breach. Probably not, as procurement officers are probably looking to software vendors for future career opportunities for their post-government life.

by newsham (noreply@blogger.com) at December 17, 2009 06:29 PM

December 15, 2009

research!rsc

Gofmt

One of the more controversial aspects of the Go project is that Go has a canonical program format, implemented by a program (gofmt) that will pick up any Go program and reformat it in the canonical format. Every program in the Go source tree has been formatted with gofmt. The code review and revision control tools check that any new code is gofmt-formatted too.

(I am avoiding the word “style,” because gofmt doesn't enforce style, in the sense of The Elements of Programming Style. Running code through gofmt changes the layout but not whether it's a well-written, elegant program. Referring to these mundane formatting rules as style elevates them far beyond their importance. The only style that gofmt imposes is consistency.)

The most obvious benefit of using gofmt is that when you open an unfamiliar Go program, your brain doesn't get distracted, even subconsciously, about why that brace is in the wrong place; you can focus on the code, not the formatting. But there are many more interesting uses for gofmt. Gofmt can take any file in the Go source tree, parse it into an internal representation, and then write exactly the same bytes back out to the file. There are two reasons this works: the primary reason is that a lot of effort went into gofmt, but an important secondary reason is that gofmt only has to worry about one formatting convention, and we've agreed to accept that as the official one.

Once you have a tool that can parse and print a program losslessly, it's easy to insert mechanical processing in the middle, to transform the program between parsing and printing. Want to change something about the language? Change it in gofmt and reformat the source tree. This has already happened multiple times since the public launch.

The December 9 release of Go added a new built-in function copy(dst, src) that copies data from one slice to another. (It's a built-in both because it's such a common operation and because Go's lack of pointer arithmetic makes it hard to write the overlap check yourself.) The new copy replaced the library routine bytes.Copy, which was similar but only worked on byte slices.

Gofmt has an option, -r, to specify a rewrite rule in the form pattern -> replacememt. In the pattern and replacement, single-letter lower case names serve as wildcards matching arbitrary expressions. Want to convert from bytes.Copy to copy?

gofmt -w -r 'bytes.Copy(d, s) -> copy(d, s)' *.go

(The -w flag writes the new version of the program back to the input file.)

The December 9 release also added support for writing x[i:] instead of x[i:len(x)]. Want to convert programs to use the new syntax when possible?

gofmt -w -r 'x[i:len(x)] -> x[i:]' *.go

These two minor changes were upstaged by discussion of a larger syntactic change, the elimination of line-ending semicolons. Again, gofmt is crucial to making the switch. The syntax being considered is mostly, but not completely, backwards compatible with the current version of Go. Normally, that would mean having to make the compilers support both syntaxes for some time, during which the compiler might warn about programs depending on the old syntax, expecting users to fix their programs. Instead, having one tool whose job it is to rewrite programs means that all the other tools can focus on other things. Gofmt can fix programs mechanically, and the compilers and other tools can drop the old syntax instead of having to support both.

I'd like to see more tools built using gofmt and the libraries it uses (go/parser and go/printer). For example, there's no reason an IDE plugin couldn't sit on the same libraries to build interesting refactoring tools. Gofmt's -r is just the beginning. It's going to be exciting.

One final note. Being able to pick up a program and write it back out, preserving comments, is not an easy task in any language. Robert Griesemer deserves all the credit for the huge effort to make gofmt handle real programs so well.

by rsc (noreply@blogger.com) at December 15, 2009 05:00 PM

December 14, 2009

Anant Narayanan

Figuring out the Goo.gl API

UPDATE: ‘Fatalis’ has pointed out in the comments that the POST should be made to http://goo.gl/api/url with User-agent set to ‘toolbar’. The code now works, Yay!

Google just announced their own URL shortening service. Their service can only be used from the toolbar or FeedBurner, and I don’t particularly like adding extra toolbars to my browser. Maybe I can figure out a way to use their service from the command line?

I downloaded the toolbar XPI, unzipped it and peeked inside. Horribly indented JS awaited me. Nothing jsbeautifier couldn’t fix though. Few minutes later, I arrived at this readable JS function:

var getUrlShorteningRequestParams = function (b) {
    function c() {
        for (var l = 0, m = 0; m < arguments.length; m++)
          l = l + arguments[m] & 4294967295;
        return l
    }
    function d(l) {
        var m = String(l > 0 ? l : l + 4294967296);
        for (var o = 0, n = false, p = m.length - 1; p >= 0; --p) {
            var q = Number(m.charAt(p));
            if (n) {
                q *= 2;
                o += Math.floor(q / 10) + q % 10
            } else o += q;
            n = !n
        }
        m = m = o % 10;
        o = 0;
        if (m != 0) {
            o = 10 - m;
            if (l.length % 2 == 1) {
                if (o % 2 == 1) o += 9;
                o /= 2
            }
        }
        m = String(o);
        m += l;
        return l = m
    }
    function e(l) {
        for (var m = 5381, o = 0; o < l.length; o++) m = c(m << 5, m, l.charCodeAt(o));
        return m
    }
    function f(l) {
        for (var m = 0, o = 0; o < l.length; o++) m = c(l.charCodeAt(o), m << 6, m << 16, -m);
        return m
    }

    var i = e(b);
    i = i >> 2 & 1073741823;
    i = i >> 4 & 67108800 | i & 63;
    i = i >> 4 & 4193280 | i & 1023;
    i = i >> 4 & 245760 | i & 16383;

    var h = f(b);
    var k = (i >> 2 & 15) << 4 | h & 15;
    k |= (i >> 6 & 15) << 12 | (h >> 8 & 15) << 8;
    k |= (i >> 10 & 15) << 20 | (h >> 16 & 15) << 16;
    k |= (i >> 14 & 15) << 28 | (h >> 24 & 15) << 24;
    j = "7" + d(k);

    i = "user=toolbar@google.com&url=";
    i += encodeURIComponent(b);
    i += "&auth_token=";
    i += j;
    return i
};

So, I call getUrlShorteningRequestParams("http://www.kix.in/"); to get "user=toolbar@google.com&url=http%3A%2F%2Fwww.kix.in%2F&auth_token=78925814685". I see in their code that they do a POST request to the service to obtain a JSON return value that would contain the short URL. I punch it in using cURL:

$ curl -v -d "user=toolbar@google.com&url=http%3A%2F%2Fwww.kix.in%2F&auth_token=78925814685" http://goo.gl/
* About to connect() to goo.gl port 80 (#0)
*   Trying 74.125.19.102... connected
* Connected to goo.gl (74.125.19.102) port 80 (#0)
> POST / HTTP/1.1
> User-Agent: curl/7.19.7 (i386-apple-darwin10.2.0) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
> Host: goo.gl
> Accept: */*
> Content-Length: 77
> Content-Type: application/x-www-form-urlencoded
>
< HTTP/1.1 405 HTTP method POST is not supported by this URL

Oops! Well, not really, the URL shortener from the toolbar doesn’t work either, I just get the full URL whenever I try to “share” something. Has anybody actually generated a real goo.gl short URL yet?

Their auth_token parameter seems completely superfluous to me as it is generated from the URL itself. Don’t we all know security by obscurity doesn’t work :)

by Anant at December 14, 2009 11:11 PM

newsham

C9 Lectures

This is very cool: http://channel9.msdn.com/tags/C9+Lectures/
Erik Meijer and Graham Hutton bring us a 13-part lecture on functional programming based on Hutton's
book.  I haven't viewed em yet, but if they're as good as the book, should be a hit.  I'm hoping to watch them asap.

Edit: btw, here's some motivation for why anyone would care about this stuff.. Interesting observation in the abstract: http://scyourway.supercomputing.org/conference/view/spost112_1

Edit: beware, Meijer wanders into many complex side topics in the lectures I've seen so far and does not focus on the material that may be difficult for a new user.  This lecture series might not be the best introduction for new users.

by newsham (noreply@blogger.com) at December 14, 2009 08:00 PM

December 12, 2009

OS Inferno

Visopsys + Inferno?

"Ailes развивается как форк Visopsys, наследуя все её основные особенности, однако в будущем мы планируем реализовать идеи, схожие с идеями системы Inferno и преобразовать текущее монолитное ядро в микроядро и набор системных служб". Ссылка.

by j1m (noreply@blogger.com) at December 12, 2009 07:30 PM

December 10, 2009

newsham

Crazy good surf

Today (thurs, 12/10) starts the pipeline masters.  Its been a crazy year for surf.  They had 4 waimea bay days this last week, including running the Eddie Aikau on tuesday (see http://www.flickr.com/photos/39795266@N06/4170227902/in/photostream/). Surf has finally gotten small enough to run the pipeline masters and it looks like good conditions and good size for the foreseeable future (see http://www.surfnewsnetwork.com/). You can watch the action live online: http://www.triplecrownofsurfing.com/pipelinemasters/live.php

by newsham (noreply@blogger.com) at December 10, 2009 06:02 PM

research!rsc

Regular Expression Article #2

In January 2007 I posted an article on my web site titled “Regular Expression Matching Can Be Simple And Fast.” I intended this to be the first of three; the second would explain how to do submatching using automata, and the third would explain how to make a really fast DFA. I got distracted for a few years but have finally finished the second article.

http://swtch.com/~rsc/regexp/regexp2.html

Simple (easy to follow, I hope) code demonstrating the techniques from the article can be found at http://code.google.com/p/re1/.

by rsc (noreply@blogger.com) at December 10, 2009 05:00 PM

December 08, 2009

OS Inferno

[tips] Софтверный RAID

Интересно, а многие ли в курсе, что нативная версия Inferno позволяет создавать софтверные RAID-массивы. Например, после выполнения следующих пяти команд мы получим две пары дисков, объединенных в RAID уровня 1, которые в свою очередь объединены RAID-0 массив с файловой системой KFS:

; bind -a '#b' /dev
; echo mirror m0 /dev/sdC0/data /dev/sdD0/data >/dev/ds/ctl
; echo mirror m1 /dev/sdC1/data /dev/sdD1/data >/dev/ds/ctl
; echo inter data /dev/ds/m0 /dev/ds/m1 >/dev/ds/ctl
; disk/kfs -f /dev/ds/data

И ведь любой из учавствующих в RAID дисков может быть импортирован с другой машины.

by j1m (noreply@blogger.com) at December 08, 2009 08:51 PM

research!rsc

Data Structures Go Programs

I've been writing about Go internals, which I'll keep doing, but I also want to spend some time on Go programs. Go's reflect package implements data reflection. Any value can be passed to reflect.NewValue (its argument type is interface{}, the empty interface) and then manipulated and examined by looking at the result, a reflect.Value. Typically, you use a type switch to determine what kind of value it is—a *IntValue, a *StructValue, a *PtrValue, and so on—and then walk it further. This is how fmt.Print prints arbitrary data passed to it.

Go allows programs to tag the fields in struct definitions with annotation strings. These annotations have no effect on the compilation but are recorded in the run-time type information and can be examined via reflection. The annotations can be hints for reflection-driven code traversing the data structures. This blog post shows two examples of such code: an XML parser that writes directly into data structures, and a template-based output generator that reads from them.

An XML Linked List

The Go XML package implements a function Unmarshal which parses XML directly into data structures. As a contrived but illustrative example, here is an XML encoding of a linked list:

var encoded = `
   <list><value>a</value>
       <list><value>b</value>
           <list><value>c</value>
           </list>
       </list>
   </list>
`

(That's a backquoted multi-line Go string literal.)

Given this Go data structure definition:

// http://gopaste.org/view/private:7v2CQ
type List struct {
   Value string;
   List *List;
}

func (l *List) String() string {
if l == nil {
 return "nil"
}
return l.Value + " :: " + l.List.String()
}

we can invoke xml.Unmarshal to turn the XML above into an instance of this data structure:

func main() {
var l *List;
if err := xml.Unmarshal(strings.NewReader(encoded), &l); err != nil {
 log.Exit("xml.Unmarshal: ", err)
}
fmt.Println(l)
}

This program prints a :: b :: c :: nil. The call to xml.Unmarshal walked the XML and the data structure at the same time, matching the XML tags <value> and <list> against the names of the structure fields and allocating pointers where necessary.

There are decisions to be made when matching XML against the data structure that aren't easily expressed with just the usual struct information. For example, we might want to make the value an attribute instead of a sub-element. Because attributes and sub-elements might use the same names, the XML package requires struct fields holding attributes to be tagged with "attr". This data structure and XML encoding would also work in the program above:

// http://gopaste.org/view/private:2yR7E
type List struct {
   Value string "attr";
   List *List;
}

var encoded = `
   <list value="a">
       <list value="b">
           <list value="c"/>
       </list>
   </list>
`

If a field has type string, it gets the character data from the element with that name, but sometimes you need to accumulate both character data and sub-elements. To do this, you can tag a field with "chardata". This data structure and XML encoding would also work:

// http://gopaste.org/view/private:U3ews
type List struct {
Value string "chardata";
List *List;
}

var encoded = `<list>a<list>b<list>c</list></list></list>`

The annotations are trivial but important: where the encoding and the data structures are not a perfect match (and they rarely are), the annotations provide the hints necessary for the XML package to make them work together anyway.

Code Search Client

Let's look at real XML data. Google's Code Search provides regular expression search over the world's public source code. The search [file:/usr/sys/ken/slp.c expected] returns a single, famous result. Instead of using the web interface, though, we can issue the same query using the Code Search GData API and get this XML back (simplified and highlighted for structure):

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
     xmlns:gcs="http://schemas.google.com/codesearch/2006"
     xml:base="http://www.google.com">
 <id>http://www.google.com/codesearch/feeds/search?q=file:/usr/sys/ken/slp.c+expected</id>
 <updated>2009-12-04T23:49:22Z</updated>
 <title type="text">Google Code Search</title>
  <generator version="1.0" uri="http://www.google.com/codesearch">Google Code Search</generator>
 <entry>
   <gcs:package name="http://www.tuhs.org" uri="http://www.tuhs.org"></gcs:package>
   <gcs:file name="Archive/PDP-11/Trees/V6/usr/sys/ken/slp.c"></gcs:file>
   <gcs:match lineNumber="325" type="text/html">&lt;pre&gt;  * You are not &lt;b&gt;expected&lt;/b&gt; to understand this.
&lt;/pre&gt;</gcs:match>
 </entry>
</feed>

From the API documentation or just by eyeballing, we can write data structures for the information we'd like to extract:

type Feed struct {
   XMLName    xml.Name "http://www.w3.org/2005/Atom feed";
   Entry []Entry;
}

type Entry struct {
   XMLName xml.Name "http://www.w3.org/2005/Atom entry";
   Package Package;
   File File;
   Match []Match;
}

type Package struct {
   XMLName xml.Name "http://schemas.google.com/codesearch/2006 package";
   Name string "attr";
   URI string "attr";
}

type File struct {
   XMLName xml.Name "http://schemas.google.com/codesearch/2006 file";
   Name string "attr";
}

type Match struct {
   XMLName xml.Name "http://schemas.google.com/codesearch/2006 match";
   LineNumber string "attr";
   Type string "attr";
   Snippet string "chardata";
}

The XMLName fields are not required. The XML package records the XML tag name in them. If there is a string annotation (as there is here), then the XML package requires that the XML element being considered has the given name space and tag name. That is, the annotation is both documentation and an assertion about what kind of XML goes into this structure.

A full Atom GData feed has many other fields, but these are the only ones that we care about. The XML parser will throw away data that doesn't fit into our structures.

We can fetch the feed via HTTP and unmarshal the HTTP response data directly into the data structures:

r, _, err := http.Get(`http://www.google.com/codesearch/feeds/search?q=` +
   http.URLEscape(flag.Arg(0)));
if err != nil {
   log.Exit(err);
}

var f Feed;
if err := xml.Unmarshal(r.Body, &f); err != nil {
   log.Exit("xml.Unmarshal: ", err);
}

and then print the results:

for _, e := range f.Entry {
   fmt.Printf("%s\n", e.Package.Name);
   for _, m := range e.Match {
       fmt.Printf("%s:%s:\n", e.File.Name, m.LineNumber);
       fmt.Printf("%s\n", cutHTML(m.Snippet));
   }
   fmt.Printf("\n");
}

(The cutHTML function returns its argument with HTML tags stripped: the m.Snippet is HTML but we just want to print text.)

Compared to the xml.Unmarshal, that loop with all the fmt.Printf calls sure looks clunky. We can use a reflection-driven process to print the data too. The template package is a Go implementation of json-template. We can write a template like:

var tmpl = `{.repeated section Entry}
{Package.Name}
{.repeated section Match}
{File.Name}:{LineNumber}:
{Snippet|nohtml}
{.end}

{.end}`

and then print the data using that template:

t := template.MustParse(tmpl, template.FormatterMap{"nohtml": nohtml});
t.Execute(&f, os.Stdout);

The formatter map definitions arrange that the {Snippet|nohtml} in the template run the data through the nohtml function when printing. Both variants of our program produce the same output:

http://www.tuhs.org
Archive/PDP-11/Trees/V6/usr/sys/ken/slp.c:325:
 * You are not expected to understand this.

Full source code is at http://gopaste.org/view/private:7ybCj

Parsing HTML

I still have three more tricks up my sleeve.

First, in a pinch and if the HTML is mostly well formatted or you're just lucky, the XML parser can handle HTML too. Second, when the XML parser walks into a struct field, if the field is a pointer and is nil, it allocates a new value, as we saw in the linked list example. But if the field is a non-nil pointer already, the parser just follows the pointer. The same goes for growing slices. Third, if the XML parser cannot find a struct field for a particular subelement, before giving up it looks for a field named Any and uses that field instead.

Combined, these three rules mean that this structure:

type Form struct {
   Input []Field;
   Any *Form;
}

type Field struct {
   Name string "attr";
   Value string "attr";
}

can be used to save a tree of form and other information from a web page. When you unmarshal into it, you get a tree showing the structure of the web page, with each element becoming a *Form because the parser used the Any field.

But who wants to walk that tree just to find the form fields? Instead, we can set the Any pointer to point back at the Form it is in. Then as the XML parser thinks it is walking the data structure, in fact it will be visiting the same Form struct over and over. Each time it finds an <input>, the parser will append it to the Input slice.

// http://gopaste.org/view/private:b8Ya2
func main() {
r, _, err := http.Get(`http://code.google.com/p/go/`);
if err != nil {
 log.Exit(err);
}

var f Form;
f.Any = &f;

p := xml.NewParser(r.Body);
p.Strict = false;
p.Entity = xml.HTMLEntity;
p.AutoClose = xml.HTMLAutoClose;
if err := p.Unmarshal(&f, nil); err != nil {
 log.Exit("xml.Unmarshal: ", err);
}

for _, fld := range f.Input {
 fmt.Printf("%s=%q\n", fld.Name, fld.Value);
}
}

This program prints:

q=""
projectsearch="Search projects"

Further Reading

There are a few more examples of this technique in the standard Go libraries. The net package's DNS client uses this technique to build generic packing and unpacking routines for DNS packets. The JSON package provides approximately the same functionality as the XML package, but with JSON as the wire format. In fact, JSON is arguably more suited to the task. (Remember: “the essence of XML is this: the problem it solves is not hard, and it does not solve the problem well.”) The ASN1 package also works this way, though its generality is limited: it implements just enough to parse SSL/TLS certificates. There are also other ways to use reflection, which will have to wait for future posts.

It's actually a little surprising how often this technique gets used in the Go libraries. Part of the reason is that it's a shiny new hammer, but it seems to be a very useful shiny new hammer, because so much of modern programming is talking to other programs over a variety of wire encodings.

Unmarshalling directly into data structures via reflection, with just a few necessary hints as annotations, can be implemented in other languages, but the examples I've seen are not nearly as elegant or as concise as the Go versions.

Apologies to Jon Bentley.

by rsc (noreply@blogger.com) at December 08, 2009 05:00 PM

December 06, 2009

OS Inferno

[софт] laptopfs и персональный репозиторий Brian L. Stuart

Brian L. Stuart, автор книги "Principles of operating systems: design and applications" и интерпретатора языка scheme для inferno представил свой новый проект: laptopfs.

Laptopfs представляет собой совсем небольшой файловый сервер, реализующий представление автора о том, как должны синхронизироваться данные на настольном ПК и переносном устройстве (ноутбуке). Вместо того, чтобы делать основным хранилищем накопитель устройства и по мере надобности производить сброс важных данных на ПК, Brian L. Stuart предлагает диаметрально противоположный подход: использовать ПК в качестве основного хранилища данных и синхронизировать их с файлами устройства во время обращения.

Более детально технология описана в документе A File System for Laptops, представленном на конференции IWP9 2009. Получить исходный код можно на странице Brian L. Stuart на GoogleCode.

by j1m (noreply@blogger.com) at December 06, 2009 09:09 AM

December 03, 2009

newsham

Yes we can!

Daily show draws direct parallels between the new "change" president Obama and the much hated president Bush. Sadly the faces may have changed but a lot of the policy remains the same:

The Daily Show With Jon StewartMon - Thurs 11p / 10c
30,000
www.thedailyshow.com
<embed allowfullscreen="true" allownetworking="all" allowscriptaccess="always" bgcolor="#000000" flashvars="autoPlay=false" height="301" src="http://media.mtvnservices.com/mgid:cms:item:comedycentral.com:257663" style="display:block" type="application/x-shockwave-flash" width="360" wmode="window"></embed>
Daily Show
Full Episodes
Political HumorHealth Care Crisis

*sigh*

by newsham (noreply@blogger.com) at December 03, 2009 06:02 PM

Historic Loss

I asked the famous Dr. Kirk McKusick about the tape images for old BSD distributions recently. He put together a great archive that he sells that contains sources from the BSD distributions going back to the start. Anyway, the reply was a little sad.  I hope he doesn't mind me putting it on my blog:

Sorry to say we did not save images of those files. I had the original tapes until 2000 when I gave them to the Berkeley library archives. When some folks asked about them a couple of years ago, I pointed them to the Berkeley library who (apparently) discarded them as junk.

Oh well.. another sad loss of UNIX history. At least a lot of the material has been preserved.

by newsham (noreply@blogger.com) at December 03, 2009 05:30 PM

research!rsc

The Generic Dilemma

Generic data structures (vectors, queues, maps, trees, and so on) seem to be the hot topic if you are evaluating a new language. One of the most frequent questions we've had about Go is where the generics are. It seems like there are three basic approaches to generics:

  1. (The C approach.) Leave them out.
    This slows programmers.
    But it adds no complexity to the language.
  2. (The C++ approach.) Compile-time specialization or macro expansion.
    This slows compilation.
    It generates a lot of code, much of it redundant, and needs a good linker to eliminate duplicate copies. The individual specializations may be efficient but the program as a whole can suffer due to poor use of the instruction cache. I have heard of simple libraries having text segments shrink from megabytes to tens of kilobytes by revising or eliminating the use of templates.
  3. (The Java approach.) Box everything implicitly.
    This slows execution.
    Compared to the implementations the C programmer would have written or the C++ compiler would have generated, the Java code is smaller but less efficient in both time and space, because of all the implicit boxing and unboxing. A vector of bytes uses significantly more than one byte per byte. Trying to hide the boxing and unboxing may also complicate the type system. On the other hand, it probably makes better use of the instruction cache, and a vector of bytes can be written separately.

The generic dilemma is this: do you want slow programmers, slow compilers and bloated binaries, or slow execution times?

I would be happy to learn about implementations that somehow manage to avoid all three of these bad outcomes, especially if there were good written descriptions (papers, blog posts, etc.) about what was hard and why it's a good approach. I'd also be interested to see good written descriptions of trying one of those approaches and what went wrong (or right).

Please leave comments with pointers. Thanks.

by rsc (noreply@blogger.com) at December 03, 2009 05:00 PM

December 01, 2009

newsham

External costs of security advice

Thanks to Jesse for bringing this cool paper to my attention:
http://research.microsoft.com/en-us/um/people/cormac/papers/2009/SoLongAndNoThanks.pdf
In the model we set forward it is not users who need to be better educated on the risks of various attacks [...] but the security community. Security advice simply offers a bad cost-benefit tradeoff to users.
The paper describes how the cost of following security advice is often higher than the cost of the attacks they prevent.

by newsham (noreply@blogger.com) at December 01, 2009 07:55 PM

Overuse of surveillance

http://paranoia.dubfire.net/2009/12/8-million-reasons-for-real-surveillance.html
"Sprint Nextel provided law enforcement agencies with its customers' (GPS) location information over 8 million times between September 2008 and October 2009."
Do you feel violated yet?

by newsham (noreply@blogger.com) at December 01, 2009 06:18 PM

research!rsc

Go Data Structures: Interfaces

Go's interfaces—static, checked at compile time, dynamic when asked for—are, for me, the most exciting part of Go from a language design point of view. If I could export one feature of Go into other languages, it would be interfaces.

This post is my take on the implementation of interface values in the “gc” compilers: 6g, 8g, and 5g. Over at Airs, Ian Lance Taylor has written two posts about the implementation of interface values in gccgo. The implementations are more alike than different: the biggest difference is that this post has pictures.

Before looking at the implementation, let's get a sense of what it must support.

Usage

Go's interfaces let you use duck typing like you would in a purely dynamic language like Python but still have the compiler catch obvious mistakes like passing an int where an object with a Read method was expected, or like calling the Read method with the wrong number of arguments. To use interfaces, first define the interface type (say, ReadCloser):

type ReadCloser interface {
    Read(b []byte) (n int, err os.Error);
    Close();
}

and then define your new function as taking a ReadCloser. For example, this function calls Read repeatedly to get all the data that was requested and then calls Close:

func ReadAndClose(r ReadCloser, buf []byte) (n int, err os.Error) {
    for len(buf) > 0 && err == nil {
        var nr int;
        nr, err = r.Read(buf);
        n += nr;
        buf = buf[nr:len(buf)];
    }
    r.Close();
    return;
}

The code that calls ReadAndClose can pass a value of any type as long as it has Read and Close methods with the right signatures. And, unlike in languages like Python, if you pass a value with the wrong type, you get an error at compile time, not run time.

Interfaces aren't restricted to static checking, though. You can check dynamically whether a particular interface value has an additional method. For example:

type Stringer interface {
    String() string;
}

func ToString(any interface{}) string {
    if v, ok := any.(Stringer); ok {
        return v.String();
    }
    switch v := any.(type) {
    case int:
        return strconv.Itoa(v);
    case float:
        return strconv.Ftoa(v, 'g', -1);
    }
    return "???";
}

The value any has static type interface{}, meaning no guarantee of any methods at all: it could contain any type. The “comma ok” assignment inside the if statement asks whether it is possible to convert any to an interface value of type Stringer, which has the method String. If so, the body of that statement calls the method to obtain a string to return. Otherwise, the switch picks off a few basic types before giving up. This is basically a stripped down version of what the fmt package does. (The if could be replaced by adding case Stringer: at the top of the switch, but I used a separate statement to draw attention to the check.)

As a simple example, let's consider a 64-bit integer type with a String method that prints the value in binary and a trivial Get method:

type Binary uint64

func (i Binary) String() string {
    return strconv.Uitob64(i.Get(), 2)
}

func (i Binary) Get() uint64 {
    return uint64(i)
}

A value of type Binary can be passed to ToString, which will format it using the String method, even though the program never says that Binary intends to implement Stringer. There's no need: the runtime can see that Binary has a String method, so it implements Stringer, even if the author of Binary has never heard of Stringer.

These examples show that even though all the implicit conversions are checked at compile time, explicit interface-to-interface conversions can inquire about method sets at run time. “Effective Go” has more details about and examples of how interface values can be used.

Interface Values

Languages with methods typically fall into one of two camps: prepare tables for all the method calls statically (as in C++ and Java), or do a method lookup at each call (as in Smalltalk and its many imitators, JavaScript and Python included) and add fancy caching to make that call efficient. Go sits halfway between the two: it has method tables but computes them at run time. I don't know whether Go is the first language to use this technique, but it's certainly not a common one. (I'd be interested to hear about earlier examples; leave a comment below.)

As a warmup, a value of type Binary is just a 64-bit integer made up of two 32-bit words (like in the last post, we'll assume a 32-bit machine; this time memory grows down instead of to the right):

Interface values are represented as a two-word pair giving a pointer to information about the type stored in the interface and a pointer to the associated data. Assigning b to an interface value of type Stringer sets both words of the interface value.

(The pointers contained in the interface value are gray to emphasize that they are implicit, not directly exposed to Go programs.)

The first word in the interface value points at what I call an interface table or itable (pronounced i-table; in the runtime sources, the C implementation name is Itab). The itable begins with some metadata about the types involved and then becomes a list of function pointers. Note that the itable corresponds to the interface type, not the dynamic type. In terms of our example, the itable for Stringer holding type Binary lists the methods used to satisfy Stringer, which is just String: Binary's other methods (Get) make no appearance in the itable.

The second word in the interface value points at the actual data, in this case a copy of b. The assignment var s Stringer = b makes a copy of b rather than point at b for the same reason that var c uint64 = b makes a copy: if b later changes, s and c are supposed to have the original value, not the new one. Values stored in interfaces might be arbitrarily large, but only one word is dedicated to holding the value in the interface structure, so the assignment allocates a chunk of memory on the heap and records the pointer in the one-word slot. (There's an obvious optimization when the value does fit in the slot; we'll get to that later.)

To check whether an interface value holds a particular type, as in the type switch above, the Go compiler generates code equivalent to the C expression s.tab->type to obtain the type pointer and check it against the desired type. If the types match, the value can be copied by by dereferencing s.data.

To call s.String(), the Go compiler generates code that does the equivalent of the C expression s.tab->fun[0](s.data): it calls the appropriate function pointer from the itable, passing the interface value's data word as the function's first (in this example, only) argument. You can see this code if you run 8g -S x.go (details at the bottom of this post). Note that the function in the itable is being passed the 32-bit pointer from the second word of the interface value, not the 64-bit value it points at. In general, the interface call site doesn't know the meaning of this word nor how much data it points at. Instead, the interface code arranges that the function pointers in the itable expect the 32-bit representation stored in the interface values. Thus the function pointer in this example is (*Binary).String not Binary.String.

The example we're considering is an interface with just one method. An interface with more methods would have more entries in the fun list at the bottom of the itable.

Computing the Itable

Now we know what the itables look like, but where do they come from? Go's dynamic type conversions mean that it isn't reasonable for the compiler or linker to precompute all possible itables: there are too many (interface type, concrete type) pairs, and most won't be needed. Instead, the compiler generates a type description structure for each concrete type like Binary or int or func(map[int]string). Among other metadata, the type description structure contains a list of the methods implemented by that type. Similarly, the compiler generates a (different) type description structure for each interface type like Stringer; it too contains a method list. The interface runtime computes the itable by looking for each method listed in the interface type's method table in the concrete type's method table. The runtime caches the itable after generating it, so that this correspondence need only be computed once.

In our simple example, the method table for Stringer has one method, while the table for Binary has two methods. In general there might be ni methods for the interface type and nt methods for the concrete type. The obvious search to find the mapping from interface methods to concrete methods would take O(ni × nt) time, but we can do better. By sorting the two method tables and walking them simultaneously, we can build the mapping in O(ni + nt) time instead.

Memory Optimizations

The space used by the implementation described above can be optimized in two complementary ways.

First, if the interface type involved is empty—it has no methods—then the itable serves no purpose except to hold the pointer to the original type. In this case, the itable can be dropped and the value can point at the type directly:

Whether an interface type has methods is a static property—either the type in the source code says interface{} or it says interace{ methods... }—so the compiler knows which representation is in use at each point in the program.

Second, if the value associated with the interface value can fit in a single machine word, there's no need to introduce the indirection or the heap allocation. If we define Binary32 to be like Binary but implemented as a uint32, it could be stored in an interface value by keeping the actual value in the second word:

Whether the actual value is being pointed at or inlined depends on the size of the type. The compiler arranges for the functions listed in the type's method table (which get copied into the itables) to do the right thing with the word that gets passed in. If the receiver type fits in a word, it is used directly; if not, it is dereferenced. The diagrams show this: in the Binary version far above, the method in the itable is (*Binary).String, while in the Binary32 example, the method in the itable is Binary32.String not (*Binary32).String.

Of course, empty interfaces holding word-sized (or smaller) values can take advantage of both optimizations:

Method Lookup Performance

Smalltalk and the many dynamic systems that have followed it perform a method lookup every time a method gets called. For speed, many implementations use a simple one-entry cache at each call site, often in the instruction stream itself. In a multithreaded program, these caches must be managed carefully, since multiple threads could be at the same call site simultaneously. Even once the races have been avoided, the caches would end up being a source of memory contention.

Because Go has the hint of static typing to go along with the dynamic method lookups, it can move the lookups back from the call sites to the point when the value is stored in the interface. For example, consider this code snippet:

1   var any interface{};  // initialized elsewhere
2   s := any.(Stringer);  // dynamic conversion
3   for i := 0; i < 100; i++ {
4       fmt.Println(s.String());
5   }

In Go, the itable gets computed (or found in a cache) during the assignment on line 2; the dispatch for the s.String() call executed on line 4 is a couple of memory fetches and a single indirect call instruction.

In contrast, the implementation of this program in a dynamic language like Smalltalk (or JavaScript, or Python, or ...) would do the method lookup at line 4, which in a loop repeats needless work. The cache mentioned earlier makes this less expensive than it might be, but it's still more expensive than a single indirect call instruction.

Of course, this being a blog post, I don't have any numbers to back up this discussion, but it certainly seems like the lack of memory contention would be a big win in a heavily parallel program, as is being able to move the method lookup out of tight loops. Also, I'm talking about the general architecture, not the specifics o the implementation: the latter probably has a few constant factor optimizations still available.

More Information

The interface runtime support is in $GOROOT/src/pkg/runtime/iface.c. There's much more to say about interfaces (we haven't even seen an example of a pointer receiver yet) and the type descriptors (they power reflection in addition to the interface runtime) but those will have to wait for future posts.

Code

Supporting code (x.go):

package main

import (
 "fmt";
 "strconv";
)

type Stringer interface {
 String() string
}

type Binary uint64

func (i Binary) String() string {
 return strconv.Uitob64(i.Get(), 2)
}

func (i Binary) Get() uint64 {
 return uint64(i)
}

func main() {
 b := Binary(200);
 s := Stringer(b);
 fmt.Println(s.String());
}

Selected output of 8g -S x.go:

0045 (x.go:25) LEAL    s+-24(SP),BX
0046 (x.go:25) MOVL    4(BX),BP
0047 (x.go:25) MOVL    BP,(SP)
0048 (x.go:25) MOVL    (BX),BX
0049 (x.go:25) MOVL    20(BX),BX
0050 (x.go:25) CALL    ,BX

The LEAL loads the address of s into the register BX. (The notation n(SP) describes the word in memory at SP+n. 0(SP) can be shortened to (SP).) The next two MOVL instructions fetch the value from the second word in the interface and store it as the first function call argument, 0(SP). The final two MOVL instructions fetch the itable and then the function pointer from the itable, in preparation for calling that function.

by rsc (noreply@blogger.com) at December 01, 2009 05:00 PM

November 30, 2009

research!rsc

Go Data Structures

When explaining Go to new programmers, I've found that it often helps to explain what Go values look like in memory, to build the right intuition about which operations are expensive and which are not. This post is about basic types, structs, arrays, and slices.

Basic types

Let's start with some simple examples:

The variable i has type int, represented in memory as a single 32-bit word. (All these pictures show a 32-bit memory layout; in the current implementations, only the pointer gets bigger on a 64-bit machine—int is still 32 bits—though an implementation could choose to use 64 bits instead.)

The variable j has type int32, because of the explicit conversion. Even though i and j have the same memory layout, they have different types: the assignment i = j is a type error and must be written with an explicit conversion: i = int(j).

The variable f has type float, which the current implementations represent as a 32-bit floating-point value. It has the same memory footprint as the int32 but a different internal layout.

Structs and pointers

Now things start to pick up. The variable bytes has type [5]byte, an array of 5 bytes. Its memory representation is just those 5 bytes, one after the other, like a C array. Similarly, primes is an array of 4 ints.

Go, like C but unlike Java, gives the programmer control over what is and is not a pointer. For example, this type definition:

type Point struct { X, Y int }

defines a simple struct type named Point, represented as two adjacent ints in memory.

The composite literal syntax Point{10, 20} denotes an initialized Point. Taking the address of a composite literal denotes a pointer to a freshly allocated and initialized Point. The former is two words in memory; the latter is a pointer to two words in memory.

Fields in a struct are laid out side by side in memory.

type Rect1 struct { Min, Max Point }
type Rect2 struct { Min, Max *Point }

Rect1, a struct with two Point fields, is represented by two Points—four ints—in a row. Rect2, a struct with two *Point fields, is represented by two *Points.

Programmers who have used C probably won't be surprised by the distinction between Point fields and *Point fields, while programmers who have only used Java or Python (or ...) may be surprised by having to make the decision. By giving the programmer control over basic memory layout, Go provides the ability to control the total size of a given collection of data structures, the number of allocations, and the memory access patterns, all of which are important for building systems that perform well.

Strings

With those preliminaries, we can move on to more interesting data types.

(The gray arrows denote pointers that are present in the implementation but not directly visible in programs.)

A string is represented in memory as a 2-word structure containing a pointer to the string data and a length. Because the string is immutable, it is safe for multiple strings to share the same storage, so slicing s results in a new 2-word structure with a potentially different pointer and length that still refers to the same byte sequence. This means that slicing can be done without allocation or copying, making string slices as efficient as passing around explicit indexes.

(As an aside, there is a well-known gotcha in Java and other languages that when you slice a string to save a small piece, the reference to the original keeps the entire original string in memory even though only a small amount is still needed. Go has this gotcha too. The alternative, which we tried and rejected, is to make string slicing so expensive—an allocation and a copy—that most programs avoid it.)

Slices

A slice is a reference to a section of an array. In memory, it is a 3-word structure contaning a pointer to the first element, the length of the slice, and the capacity. The length is the upper bound for indexing operations like x[i] while the capacity is the upper bound for slice operations like x[i:j].

Like slicing a string, slicing an array does not make a copy: it only creates a new structure holding a different pointer, length, and capacity. In the example, evaluating the composite literal []int{2, 3, 5, 7, 11} creates a new array containing the five values and then sets the fields of the slice x to describe that array. The slice expression x[1:3] does not allocate more data: it just writes the fields of a new slice structure to refer to the same backing store. In the example, the length is 2—y[0] and y[1] are the only valid indexes—but the capacity is 4—y[0:4] is a valid slice expression. (See Effective Go for more about length and capacity and how slices are used.)

Because slices are multiword structures, not pointers, the slicing operation does not need to allocate memory, not even for the slice header, which can usually be kept on the stack. This representation makes slices about as cheap to use as passing around explicit pointer and length pairs in C. Go originally represented a slice as a pointer to the structure shown above, but doing so meant that every slice operation allocated a new memory object. Even with a fast allocator, that creates a lot of unnecessary work for the garbage collector, and we found that, as was the case with strings above, programs avoided slicing operations in favor of passing explicit indices. Removing the indirection and the allocation made slices cheap enough to avoid passing explicit indices in most cases.

New and Make

Go has two data structure creation functions: new and make. The distinction is a common early point of confusion but seems to quickly become natural. The basic distinction is that new(T) returns a *T, a pointer that Go programs can dereference implicitly (the black pointers in the diagrams), while make(T, args) returns an ordinary T, not a pointer. Often that T has inside it some implicit pointers (the gray pointers in the diagrams). New returns a pointer to zeroed memory, while make returns a complex structure.

There is a way to unify these two, but it would be a significant break from the C and C++ tradition: define make(*T) to return a pointer to a newly allocated T, so that the current new(Point) would be written make(*Point). We tried this for a few days but decided it was too different from what people expected of an allocation function.

Coming soon...

This has already gotten a bit long. Interface values, maps, and channels will have to wait for future posts.

by rsc (noreply@blogger.com) at November 30, 2009 06:04 AM

November 28, 2009

yiyus

Going Forth

Going Forth

I have spent the last months discovering Forth: learning about it, reading about it, trying 4th, porting pForth to Plan9,... It is a language that I love, since I spent many hours programming my HP49G calculator and I always was a fan of the RPN syntax and simplicity in general.

My last discovery has been retroForth, a minimalistic Forth which runs on a portable virtual machine, Ngaro. When some days ago Google the Go-team presented their new language I was watching Rob's talk and could not stop thinking about launching Ngaro virtual machines in goroutines communicated through channels. So... that's what I did:

With the help of crcx (the original Ngaro author, many thanks!) and after experimenting with a brainfuck virtual machine the porting process was pretty straightforward. The retroForth image is booting without problems and you can already do some fancy things like launching children VMs and communicate their I/O ports through channels. It is a WIP yet. More to come...

Have fun!

by http at November 28, 2009 12:23 AM

November 24, 2009

research!rsc

Is This Thing On?

I stopped updating my blog eighteen months ago, because I needed to focus on finishing my thesis. Once I did that, I graduated and moved to California and started working on and with Go, which I couldn't write about. Now that Go has been released, I have a variety of things I'd like to say involving Go so I'm reviving the blog. Enjoy.

by rsc (noreply@blogger.com) at November 24, 2009 05:51 PM

November 23, 2009

newsham

plan9+olpc - usb and root filesystem

With some hackery (had to hack up the boot process a little) I am now able to boot plan9 with a root filesystem on my olpc. The USB disk stuff is still a little rough around the edges, so I had to do some work to get the filesystem partitioned and installed on a thumb drive, then patch up the boot process to reread in the partition tables. After that, its just a normal boot, enter the root filesystem path, and drop to a shell. The system is complete enough to build kernels on it, now, however its really really slow. I definitely need networking on this thing so I can make use of a cpu server and possibly a network filesystem instead of a flash based one.

by newsham (noreply@blogger.com) at November 23, 2009 04:30 AM

November 22, 2009

newsham

Too big to fail

Choice stuff.  "today what we are doing is modernizing the financial services industry, tearing down these antiquated laws and granting banks significant new authority. This will save consumers billions of dollars a year through enhanced competition" - Bill Clinton, 10 years ago, as he repealed the Glass-Steagall act. *sigh* http://www.rollingstone.com/nationalaffairs/index.php/2009/11/12/a-decade-without-guardrails-live-without-glass-steagall/

by newsham (noreply@blogger.com) at November 22, 2009 04:37 PM

Joshix

Sp v9

The ninth incarnation of the sp theme for Habari blogs is available with a git(1) clone or pull from git://labs.utopian.net/sp.git. It works atop the latest developer revisions of Habari. An irregularly-attended gzipped tar archive exists.

v9 scrapes away the "Gesso" CSS framework in favor of using the latest (r18) Boilerplate sources directly. Print styles developed in the fork survive in print.css. Larger fonts in the sidebar are the most noticeable of visible changes that also include minute differences in vertical spacing.

by Josh at November 22, 2009 10:28 AM

newsham

USB for OLPC+plan9

Thanks to some hints from fgb I was able to get USB working in plan9 on my OLPC.  The problem was that the OLPC uses the "Geode companion" chip which provides a bunch of devices, including USB, and this chip pretends to have a PCI interface, but doesnt have the PCI configuration registers. The simple solution is to just fake it with a table of configuration register values. A small hack to the pci code and now the devices show up in the pci driver and USB just works. I think this means I can put a filesystem on a USB thumb drive, boot from the thumb drive right into a full plan9 system (albeit without networking or frame buffer).

by newsham (noreply@blogger.com) at November 22, 2009 04:01 AM

November 21, 2009

Rob Pike

WD-40

<object height="344" width="425"><param name="movie" value="http://www.youtube.com/v/dQHXQ7ztla8&amp;hl=en_US&amp;fs=1&amp;rel=0"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="344" src="http://www.youtube.com/v/dQHXQ7ztla8&amp;hl=en_US&amp;fs=1&amp;rel=0" type="application/x-shockwave-flash" width="425"></embed></object>

by rob (noreply@blogger.com) at November 21, 2009 10:38 PM

November 20, 2009

newsham

This is just bizarre

Its hard to believe that this could even be true:  http://news.bbc.co.uk/2/hi/americas/8369674.stm
"Four people have been arrested in Peru on suspicion of killing dozens of people in order to sell their fat and tissue for cosmetic uses in Europe." "The liquidised product fetched $15,000 (£9,000) a litre and police suspect it was sold on to companies in Europe" "Gen Felix Burga, head of Peru's police criminal division, said there were indications that "an international network trafficking human fat" was operating from Peru."

by newsham (noreply@blogger.com) at November 20, 2009 09:09 PM

November 18, 2009

newsham

Best headline ever


You couldn't make this stuff up if you tried:

Rabbi 'offered cocaine for sex'

http://news.bbc.co.uk/2/hi/uk_news/england/manchester/8367085.stm

So lets recap the last few years in religeous extra-curricular activities:  catholic priests like wine and altar boys. evangalist pastors prefer meth and gay hookers and rabbis coke and women.  I think the chosen people may have something worth looking into, here.
(I could make an offensive, stereotypical and culturally insensitive remark about how he cut the drugs to increase his profits, but I won't bother ;-)


by newsham (noreply@blogger.com) at November 18, 2009 06:22 PM

more on MSFT XBOX bricking

I briefly mentioned earlier that MSFT disabled a bunch of XBOXs. I didn't know the scale of it. It turns out that an estimated ONE MILLION units were disabled. Since the entire population of XBOX live users is just 20M, thats a significant chunk of paying customers. I'm quite frankly amazed that MSFT thinks that pissing off this many of its customers is a good idea. I wonder how many of them will just give up and move on to competing products. While some of these users were definitely playing illegally obtained games, the users still purchase consoles, occasionally purchase games and contribute to the XBOX community. If I was Sony I'd be running commercials inviting former XBOX users over. Who wouldn't want 1M of their competitor's customers?

by newsham (noreply@blogger.com) at November 18, 2009 05:54 PM

November 17, 2009

Inferno Programmer's Notebook

lab 104 - ducts: bi-directional mount channels

NAME

lab 104 - ducts: bi-directional mount channels

NOTES

Hi there, I want to talk to you about ducts. Are your ducts old fashion, out of date?

In this lab, I'll walk through a little facility I built to allow me to have bi-directional mounts over a single file descriptor channel. The nature of the way I access my Blue Gene environment requires me to do an awful lot over a single communications channel. I wanted the ability to both access the target node's file system while simultaneously exporting mine, while only using a single file descriptor.

To facilitate this, I wrote a little 9P-knowledgeable multiplexor which directs T-msgs to an export thread and R-msgs to a mount thread while multiplexing the responses from those threads back onto the same file descriptor.

The initial implementation of the base mechanism was relatively simple until I had to start considering how to clean things up when I started unmounting directories on one side or both. The liberal use of sys->pipe throughout the code meant it was somewhat difficult to detect when things were finished. I ended up adding a signaling packet to the protocol to notify the remote Duct when I had closed my mount, and then implemented a channel based monitor infrastructure so I could track what was still running and appropriately shut down the various pieces as things unmounted (it is possible to unmount one side without unmounting the other).

The interface is fairly simple, if using the module interface you spawn an uplink, specifying an input file descriptor and an output file descriptor for the connection (which may be the same FD), the path you wish to export and the mount point for where you want to place the imported file hierarchy.

If you are running it from the command line, it uses stdin and stdout for the input and output FDs and you only need specify the export path and mount point.

EXAMPLES

host1% listen -A tcp!*!1127 duct /usr/ericvh /n/foo

host2% dial -A tcp!host1!1127 duct /usr/inferno /n/bar

FILES

  • 104/duct.m
  • 104/duct.b

by Eric Van Hensbergen (noreply@blogger.com) at November 17, 2009 02:37 AM

lab 101 - limbo B+ tree

NAME

lab 101 - limbo B+ tree

NOTES

This lab is an implementation of a B+ tree. It is code from a much older lab that was overly complicated and buggy. I pulled the B+ tree code out, tried to fix the bugs and clean it up. The interface is roughly the same as dbm(2). Here is a synopsis.

include "btree.m";
btreem := load Btreem Btreem->PATH;
Datum, Btree: import Btreem;

Btree: adt {
 create: fn(file: string, perm: int): ref Btree;
 open: fn(file: string, flags: int): ref Btree;
 
 fetch: fn(b: self ref Btree, key: Datum): Datum;
 delete: fn(b: self ref Btree, key: Datum): int;
 store: fn(b: self ref Btree, key: Datum, val: Datum):int;

 firstkey: fn(b: self ref Btree): Datum;
 nextkey: fn(b: self ref Btree, key: Datum): Datum;

 flush: fn(b: self ref Btree);
 close: fn(b: self ref Btree);
};

init: fn();

Like Dbm the keys and values are stored as arrays of bytes, and being a B+tree the values are stored only in the leaf nodes. The maximum key and val size is 255 each. The block size is 8192, so the maximum leaf node size is 515 making the minimum branching factor 15. The maximum number of records in an internal nodes assuming an int as key is 630.

The delete is not fully implemented; it doesn't merge nodes.

Here is some example code listing the full contents of a btree.

sys := load Sys Sys->PATH;
btreem := load Btreem Btreem->PATH;
Datum, Btree: import btreem;

btreem->init();
bt := Btree.open("index.bt", Sys->ORDWR);
for(key := bt.firstkey(); key != nil; key = bt.nextkey(key)){
 v := bt.fetch(key);
 sys->print("%s %s\n", string key, string v);
}

Here is a rough comparison of btree vs. dbm. This simple test is more of a sanity check that the btree doesn't do anything horribly wrong in its implementation (which it did originally, by making a syscall to get the daytime for every block it tried to get).

% awk '{print $1, NR}' < /lib/words | tr -d 'cr' > t1

% >index.bt
% >dbm.pag
% >dbm.dir

% time sh -c 'btree/store < t1' 
0l 4.172r 4.172t

% time sh -c 'dbm/store dbm < t1'
0l 35.25r 35.25t

% ls -l dbm.* index.bt
--rw-rw---- M 6 xcs0998 XCS0998    8192 Jun 04 12:38 dbm.dir
--rw-rw---- M 6 xcs0998 XCS0998 1048576 Jun 04 12:38 dbm.pag
--rw-rw---- M 6 xcs0998 XCS0998 770048 Jun 04 12:36 index.bt

% time sh -c 'btree/list > /dev/null'
0l 5.187r 5.187t

% time sh -c 'dbm/list dbm > /dev/null'
0l 9.438r 9.438t

FILES

inferno-lab/101

by caerwyn (noreply@blogger.com) at November 17, 2009 02:35 AM

November 15, 2009

newsham

making stairs fun

This is pretty cool (thanks candra!).

<object height="295" width="480"><param name="movie" value="http://www.youtube.com/v/2lXh2n0aPyw&amp;hl=en_US&amp;fs=1&amp;rel=0"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="295" src="http://www.youtube.com/v/2lXh2n0aPyw&amp;hl=en_US&amp;fs=1&amp;rel=0" type="application/x-shockwave-flash" width="480"></embed></object>

by newsham (noreply@blogger.com) at November 15, 2009 07:29 PM

sunday news

"Some critics say a civilian trial -- instead of a military tribunal -- for self-proclaimed Sept. 11 mastermind Khalid Sheikh Mohammed and his accomplices could end up targeting the Bush administration and its anti-terror policies."  God, that would rule.  Win Win!  (http://www.foxnews.com/politics/2009/11/14/view-pending-trial-attempt-prosecute-bush-administration/)


Some company tried to make a mac clone a few years ago and got sued.  Well it looks like the trial is over and the court is saying that AAPL is allowed to keep its OS from working on competitors hardware.  (http://blogs.zdnet.com/hardware/?p=6147)  Thats kind of sad, the competition would have done consumers and OS X some good (although probably hurt AAPL's hardware sales).  The ruling basically says that Apple can enforce its EULA.  *sigh* those EULAs need to die. Its sad seeing the court uphold these things. Apple also made their latest version of OS X fail on some of the "hackintoshes" that people have been building lately (when people buy a non-apple computer and purchase a copy of OS X and install it on that computer).  Way to treat your consumers, apple.

by newsham (noreply@blogger.com) at November 15, 2009 05:58 PM