by renee (noreply@blogger.com) at February 09, 2010 11:53 AM
sausage skin kingThe day that the leadership of the military comes to me and says "senator, we ought to change the policy" then I think we ought to consider seriously changing it.But today when the leadership of the military asks to repeal don't ask/don't tell:
I'm deeply disappointed in your statement [...] In this moment of immense hardship for the armed services we should not be looking to overturn don't ask/don't tell.There's also a good op-ed by John Oliver on why we should have a don't ask/don't tell policy on people approaching old age.
| The Daily Show With Jon Stewart | Mon - Thurs 11p / 10c | |||
| A Few Gay Men & Women | ||||
| <embed allowfullscreen="true" allownetworking="all" allowscriptaccess="always" bgcolor="#000000" flashvars="autoPlay=false" height="301" src="http://media.mtvnservices.com/mgid:cms:item:comedycentral.com:263464" style="display: block;" type="application/x-shockwave-flash" width="360" wmode="window"></embed> | ||||
| ||||
by newsham (noreply@blogger.com) at February 04, 2010 06:08 PM

by newsham (noreply@blogger.com) at February 04, 2010 05:48 PM
Every programmer has a variable naming philosophy. This is mine:
A name's length should not exceed its information content. For a local variable, the name
iconveys as much information asindexoridxand is quicker to read. Similarly,iandjare a better pair of names for index variables thani1andi2(or, worse,index1andindex2), because they are easier to tell apart when skimming the program. Global names must convey relatively more information, because they appear in a larger variety of contexts. Even so, a short, precise name can say more than a long-winded one: compareacquireandtake_ownership. Make every name tell.
The information content metric gives a quantitative argument against long-winded names: they're simply inefficient. I internalized this metric years ago but only realized this phrasing of it recently, perhaps because I have been looking at too much Java code.
by newsham (noreply@blogger.com) at February 03, 2010 03:34 AM
by newsham (noreply@blogger.com) at February 02, 2010 07:54 PM
Go organizes programs into individual pieces called packages.
A package gets to pick a short name for itself, like vector,
even if the import statement must use a longer path like "container/vector"
to name the file where the compiled code is installed.
The early Go compilers used the package name as a unique identifier
during linking, so that vector's New function
could be distinguished from list's New.
In the final binary, one was vector.New and the other list.New.
As we started to fill out the standard library, it became clear that we needed
to do something about managing the package name space:
if multiple packages tried to be package vector,
their symbols would collide in the linker.
For a while we considered segmenting the name space,
reserving lower-case names for standard packages
and upper-case names for local packages.
(Since package names and object file names are
conventionally the same, one reason not to do this
is that it would require a case-sensitive file system.)
Other languages simply use longer names.
Both Java and Python tie the name to the directory in which
the package is found, as in com.java.google.WebServer
for the code in com/java/google/WebServer.class.
In practice this leads to unnecessarily long identifiers, something
Go tries to avoid.
It also ties the name to a particular mechanism for finding code: a file system.
One of the reasons that import paths are string constants in Go is
so that it is easy to substitute other notations, like URLs.
Last spring, during a long discussion about how to divide up the package name space, Robert Griesemer cut the Gordian knot by suggesting that we allow multiple packages to choose a single name and fix the tool chain to cope. The import statement already allows introducing a local alias for the package during the import, so there's no linguistic reason package names have to be unique. We all agreed that this was the right approach, but we weren't sure how to implement it. Other considerations, like the open source release, took priority during most of 2009, but we recently returned to the problem.
Ultimately, the linker needs some unique name for each symbol in the program; the fundamental problem caused by deciding that package names won't be unique is to find another source of uniqueness that fits into the tool chain well.
The best approach* seems to be to use
the package's import path as the unique identifier,
since it must uniquely identify the package in the
import statement already.
Then container/vector's New is
container/vector.New.
But! When you're compiling a package,
how does the compiler know what the package's import path will be?
The package statement just says vector, and while
every compilation that imports "container/vector" knows the import path,
the compilation of vector itself does not,
because compilation is handled separately from
installing the binary in its final, importable location.
Last week I changed the gc compiler suite to do this.
My solution to the import path question
was to introduce a special
name syntax that refers to “this package's import path.”
Because the import paths
are string literals in the Go compiler metadata, I chose the empty string—""—as the
self-reference name.
Thus, in the object file for package vector, the local symbol
New is written "".New.
When the linker reads the object file, it knows what
import path it used to find the file. It substitutes that path
for the "", producing, in this case, the unique name
container/vector.New.
Not embedding a package's final installed location in its object file makes the object files easy to move and duplicate. For example, consider this trivial package:
package seq
var n int
func Next() int {
n++
return n
}
It's valid for a Go program to import the same path multiple times using different local names, but all the names end up referring to the same package:
package main
import (
"fmt"
s "seq" // changed to "seq1" later
t "seq"
)
func main() {
fmt.Println(s.Next(), s.Next(), t.Next(), t.Next())
}
prints 1 2 3 4, because it all four calls are to the same Next function:
$ 6g seq.go $ 6g -I. main.go $ 6l -L. main.6 $ 6.out 1 2 3 4 $
But if we change one of the imports to say "seq1"
and then merely copy the "seq" binary to "seq1",
we've created a distinct package, using lowly cp instead of a compiler:
$ cp seq.6 seq1.6 $ ed main.go 120 /seq s "seq" s/seq/seq1 s "seq1" wq 121 $ 6g -I. main.go $ 6l -L. main.6 $ 6.out 1 2 1 2 $
Now the s.Next calls refer to seq1.6's Next,
while the t.Next calls refer to seq.6's Next.
Duplicating the object actually duplicated the code.
This is very different from the behavior of a traditional C compiler and linker.
A digression: the explicit "". prefix is not strictly necessary.
It would be cleaner
if the linker treated every symbol as needing to be qualified by the
import path, so that all the "". could be dropped.
But occasionally it's important to be able to break the rules,
for example to define a symbol that is logically in one package
be implemented in another. For example,
the implementation of unsafe.Reflect is actually
in the binary for package runtime,
because that's where all the interface manipulation code lives:
$ 6nm pkg/darwin_amd64/runtime.a|grep Reflect iface.6: T unsafe.Reflect $
Another reason to use an explicit prefix is to admit
names with no prefix at all, as would be generated by legacy C code.
Otherwise, what should C's printf be in?
If the linker enforced a strict boundary between packages,
both of these examples would be impossible.
Most of the time that would be a good thing, but
systems languages do not have the luxury of stopping
at “most of the time.”
Last October, a few weeks before the public release of Go,
I changed the linker to insert import path qualifiers on all names
during linking, but it was too disruptive a change to commit
before the release.
Last week's implementation, which allows for semipermeable package boundaries,
is a much better fit for Go.
This week Ian Lance Taylor is working on eliminating the
global package name space assumption in gccgo.
He'd like to avoid making changes to the linker,
which rules out introducing a “this package” notation like "".
Gccgo must be able to write objects that know their own import paths,
which means gccgo must know the import path at compile time.
But how? There will be a new gccgo command line option, and the build system
will simply tell the compiler what the import path is.
In retrospect, I wonder if the effort of "" in
the gc tool chain was justified compared to adding an option.
The gc implementation is easier to use,
but it's not clear how important that will be.
Time will tell.
* An alternative approach would be to generate a
random identifier each time the compiler
is invoked and to use it for the package compiled by that run.
When other packages import the compiled package, they can
read the identifier and use it to generate references to that
package's symbols.
The most glaring problem with this approach is that
the symbol names you'd see while debugging would
be ugly, like mangled C++ names but worse.
Another problem is that it would break aggressive
incremental compilation: if fmt gets recompiled,
all packages that import it would have to be recompiled to pick
up the new identifier, even if the external interface hadn't changed.
It would be nice to avoid those recompilations, especially in large programs.
by Eric Van Hensbergen (noreply@blogger.com) at January 28, 2010 03:27 PM
Now that recent projects like writable /proc/pid/ns and rootless post kernel load startup have been announced on 9fans, its time to make a new version of the Qemu preinstalled image with all the new stuff. It will include the latest updates of the previous grid software as well as hubfs, grio, and the new startup process - all available optionally, along with the traditional default kernels and configs. However, I am going to try to make the customized environment as compelling as possible, because I'd like to make a good case for the idea that using multiple system nodes, virtualized or not, has practical benefits. Some goals:
multi-machine aware startup to bind in fileserver and cpu resources early if available
easier to make use of 9gridchan.org hosted resources - and 9gridchan registry needs more real services
easy to set up highly unconventional namespace structures coexisting on a single machine (rootless boot helps)
simple, scripted data replication and machine backup operations
easy bootstrapping from grid of VMs to physical machines or mixed grid
Release goal is by the end of January, with a rough spin available for interested testers a week before or so.
Since this has been up on sources for several months now, I really should have announced this here already. Hubfs is an improved screen-like tool for Plan 9, implemented as a 9p fs with client. It is much more resource efficient than iosrv and, as a 9p fs, much better integrated into the overall system. The user interface is also much more sane. It can be installed from the source tarball contrib/mycroftiv/hubfs.tgz, or with contrib/install mycroftiv/hubfs. Basic usage is just 'hub NAME' to either start a new shell/hub session or connect to an existing. The manpage explains most of the details. Following completion of a solid version of hubfs I hit the burnout threshold as a result of several months of 16 hours a day of Plan 9 and retreated to a remote corner of namespace for a couple months. With the arrival of the winter months, however, I have a good excuse to turn on all my computers in the service of home heating, and undoubtedly this will result in a renewed burst of development energy.
After several weeks of isolation in their underground lab, the /9/grid's craziest researchers are now unleashing upon the world our latest weapon in the struggle for an open collaborative decentralized worldwide grid:
Iosrv.tgz
Available on Bell Labs sources - contrib/install mycroftiv/iosrv or as a tarball contrib/mycroftiv/iosrv.tgz to mk install at your pleasure, or here - this is what happened when we decided to get serious about screen-like functionality for Plan 9 while also trying to pursue some tantalizing hints about some of Doug McIlroy's original ideas that led to the creation of UNIX pipes. Iosrv (which is controlled via a wrapper scrip called io for both the clients and server) does not emulate a terminal in the manner GNU screen does - instead, it follows the Plan 9 principle of staying close to file i/o fundamentals and provides multiplexed buffered pipes that can be attached to the three standard file descriptors used by textual environment programs such as the rc shell.
As a consequence of this, we have some new tricks, and here's the best one:
The clients attached to an iosrv can all share locally executing shells with each other in addition to using the shells on the iosrv host
The standard model of usage that corresponds to how GNU screen works is for clients to attach to an existing iosrv in an imported /srv (started with a command like io rcjul15 on the iosrv remote host ) using the io wrapper script:
io SRVNAME
This command connects to an existing /srv/srvname. Once connected, additional commands will be trapped by the client. To create and attach to new shells hosted by the iosrv remote machine with the command remote # where '#' is a multiple of 3. The first new shell might be created with:
remote 3
followed by subsequent shells created by replacing '3' with successively larger multiples of 3 as the number of backed shells increases. (Each shell uses 3 file descriptors) To move between shells after they have been created, attach # will connect to previously created rc shell Hubs.
attach 0
will move back to the set of file descriptors beginning at Hub 0. (More on Hubs in a moment). The new trick is to start a new shell like this:
local 6
which will start a new rc shell on the CLIENT machine, but connect its i/o file descriptors to the remote iosrv HOST. The consequence is that the locally executing rc becomes shared back to the iosrv, and other clients of the remote host can attach to it using
attach 6
and the shell running on your machine is available to them exactly as a shell running on the iosrv host. In other words, iosrv really does 'serve i/o' and doesn't care what's considered the client or server from the viewpoint of the user, it just pipes data back and forth to whatever its pipe Hubs are connected to.
Ok, there's that word again - Hub - the Hub is the low level data structure iosrv uses to organize connections. Following Pike's classic dictum that "data dominates", we are offering a new variant of an old abstraction, the pipe. A Hub is a pipe, but it has multiple nozzles on each end. It accepts multiple simultaneous readers and writers and allows them to connect and disconnect independently to and from the flow of data in real time. The core iosrv knows nothing of rc, or shells, it simply manages a set of file descriptors connected to Hubs. A standard shell has 3 open file descriptors, so 3 Hubs allow an arbitrary number of clients to all make use of the same shell. Adding a new shell simply means creating 3 new Hubs and then starting an rc using file descriptors managed by that iosrv. This why standard attach, remote, and local commands always take a multiple of 3 as their numeric parameter.
On the client side, a smaller program called ioshell navigates the forest of piped file descriptors for the user, acting to connect the file descriptors of the initiating client to the Hubs provided by iosrv. It also monitors user input for command strings and then takes actions or passes them to iosrvs ctl file as appropriate.
The iosrv is completely parallel, forking off independent processes for reading and writing each connected file descriptor. It is not uncommon for a busy iosrv to be coordinating 50-100 running processes and holding double that many file descriptors. This will not however clog the host system, because all processes use QLocks and/or blocking reads to control execution, yield execution frequently, and each Hub has individually tunable process cycle sleeptime. Testing shows that even a low resource Qemu based virtual machine can easily handle multiple simultaneous clients and servers to multiple local and remote rc shells.
The iosrv also provides fine-grained control over resources for the user, although the interface to this is a work in progress. Using iosrv for screen-like functionality is its main intended purpose, but the design is deliberately open-ended. It is already possible to use iosrv as a kind of greatly expanded 'tee' for data streams and we will be trying to prototype some multinode applications that pipe data around in constant loops between them.
We have been using iosrv heavily and testing it extensively. It is not perfectly polished but we have found it to be stable, reliable, and functional in its current state. Suggestions for features and code improvements always welcome.
Look for shared public shell exports as services in the 9gridchan.org 9p service registry soon, also.
This is an extension of the earlier post - this is a pair of scripts intended to be used on nodes setup similarly to those described in 'reversed remote execution' - it allows shared terminals to have persistent state in the background with the ability to be pushed onto the remote clients (which are providing rio as a service).
#! /bin/rc
# start up an rc with its i/o connected to pipes set
# this monitors and its window should be set to scroll if not backgrounded
mkdir /tmp/wpin
mkdir /tmp/wpinclone
mkdir /tmp/wpout1
mkdir /tmp/wpout2
mkdir /tmp/wpout3
bind '#|' /tmp/wpin
bind '#|' /tmp/wpinclone
bind '#|' /tmp/wpout1
bind '#|' /tmp/wpout2
bind '#|' /tmp/wpout3
window -m
rc <{tee /tmp/wpinclone/data < /tmp/wpin/data1} | tee /tmp/wpout1/data /tmp/wpout2/data /tmp/wpout3/data
#! /bin/rc
# attach to a copy of grc using wpout $1 and push to a second wsys also
mount $wsys /mnt/wsys new
bind -b /mnt/wsys /dev
read -m /tmp/wpout$1/data1 | tee /dev/cons &
read -m /dev/cons | tee /tmp/wpin/data &
cat >> /tmp/wpin/data
The usage model is probably not exactly transparent from these scripts - the first script is named grc and it creates a set of pipes and starts an rc with its input and output redirected to them. This is what creates the 'screen' like functionality - any number of processes can control the shell by writing to /tmp/wpin/data, and as many clients can read the output as there are teed output pipes. Window -m is started after binds to provide a seed window - you will probably want to start more than one additional window -m to preserve access to the pipes in the namespace.
The second script I name getgrc and it assumes that you have set $wsys to a traget window system, then starts a new window on it and connects the i/o of both the remote and the local window to the specified output pipe. It should be noted that the quasi-screen functionality is independent of the idea of mounting a remote window system and using i/o from both/either location. If you want a simple screen equivalent, simply making the pipes and starting the rc with i/o redirections allows multiple clients to use it by connecting to the piped files.
An experience I continually have with Plan 9 is discovering that everything I have ever wanted to do with computers but have never been able to effectively before can be done with about 3 lines of rc and the standard Plan 9 toolkit. Right now I am gleefully exploring what I find to be an incredibly rich and useful set of possibilities created by rio-as-fileserver. First, let me describe the setup I'm using so you can follow along - you want two plan 9 systems for this, with at least one setup as a CPU server. A pair of Qemu VMs will work fine. Let's just assume you have a pair of CPU servers running, machines A and B, each with a graphical display, rio running, and with a standard exportfs listener on port 17007.
You are working using cpu server A, and your friend is using cpu server B. For convenience, we will describe this as if you have full privileges on each other's machines, but these tricks can also work fine between parties with more limited trust with appropriate modifications to how the filesystem exports are set up. Your goal is to use Plan 9 for some lightweight interactive resource sharing for real time communication and collaboration. Each of you opens a subrio within your main rio session. To start off, you need access to some of machine B's resources. You are working using cpu server A, and you start with:
import -c cpuB /srv /n/cpuBsrv
Now you want to write a message to your friend that will appear on his screen in a new window briefly.
wsys = /n/cpuBsrv/rio.friend.68962
window -m rc -c 'echo hi fred && sleep 5'
And up on Fred's screen on machine B pops a window saying 'hi fred' that sticks around for 5 seconds, then vanishes. How and why? Because rio is a fileserver, and the window command mounts whatever rio is specified in the $wsys. The window command is actually an rc script that can show us the principles involved for a lot of nice tricks. How about just popping up a window to talk back and forth:
mount /n/cpuBsrv/rio.friend.68962 /mnt/wsys new
bind -b /mnt/wsys /dev
cat /dev/cons &
cat >/dev/cons
And you have created a window on your friends screen, with each of you able to see whatever the other types. This is a perfectly functional and usable way to communicate, and the ability to initiate the connection by making the window appear on demand in the 'shared' rio is an improvement over other simple equivalents like telnet or netcat based direct chat.
Another very simple and useful application is to allow your friend to run an interactive rc on your machine, with the display on his system, while you watch. In other words, he is using your machine in a manner similar to a telnet/ssh login, even though you 'pushed' the window onto his machine. The first two setup lines are the same as before:
mount /n/cpuBsrv/rio.friend.68962 /mnt/wsys new
bind -b /mnt/wsys /dev
exec rc -v < /dev/cons |tee /dev/cons
Now you can watch in real time as your friend uses your system. You can also easily invert this - and let your friend watch what you are doing. Just change the final line to:
exec rc <{tee /dev/cons} |tee /dev/cons
This changes the source of the controlling input to the shell from the remote /dev/cons to the local standard input. I believe both of these simple interaction models provide useful functionality in very direct ways. The real-time mirroring of the textual input and output in a shell window is provided without any of the usual overhead of a tool like VNC viewing and the use of a sub rio session as a container for the shared windows makes sandboxing via namespace manipulation very easy - and it should be noted that sharing a rio service is very different than providing full cpu server functionality, as only the functionality of rio is being exposed to the remote client. These simple filesystem and i/o based operations provided fine grained real time connectivity options with lower overhead and potentially greater security than the predominating methods.
The above commands are very simple and can certainly be improved upon - in particular the standard error is being abused and the prompt and errors will be suppressed for the remote user. I'm certain a lot of Plan 9 users out there have nifty improved versions of these tricks or similar. Using the fileserver properties of rio to create a sandboxed dynamic shared environment is a good fit for the goals of the /9/grid so we'll be posting an example of a scripted 'shared workspace' setup script and commands soon - its a pretty clear upgrade from the current 'grid graffiti wall' to a real-time 'grid graffiti rio!'
We are trying to determine the best way to have a larger set of public services - more nodes, and a demonstration of ideas like the High Availability Grid, an attempt to make a redundant and failproof environment that still maximizes the utility of the attached resources and can be flexibly reconfigured on the fly. Most of the following are things that we have already protyped and tested, but haven't been available through our public portals. Things we'd like to implement (perhaps not all at once):
A high availability configuration of resources using about 6-8 functional nodes divided between 2-3 physical machines.
On-demand temporary single VMs
"The proving grounds" - a combination of both of the above, but with the intention of stability and stress testing for various multimachine configurations.The only way to test 'failproofing' is to force failures and test the ability of the systems to recover.
We've been doing some testing of VMware server in addition to our normal Qemu/9vx/p9p based setups. We prefer free open source personally, but some people prefer to VMware to qemu so we may begin providing vmware .vmdk images as well as Qemu qcow2s. For anyone who wants to try using vmware now, it is possible to use qemu-img convert to make a vmdk from a .qcow2, but there were a few additional adjustments we needed to make for best results, notably recompiling the included pccpuf kernel.
Well, some documentation for some of the new tools at least, but its a start. Writing documentation is hard. The organization of knowledge is the tricky thing. The balance of conceptual explanation and task-oriented how-to is hard to get right. The docs in the docs directory still need to be supplemented with some specific task-oriented recipes for doing things like making a fully customized local grid, which requires a bit of coordination of action at the host os level and the VM interiors. As a warmup, I'll try to summarize the process here:
The base system image is designed to serve as the seed for a multisystem grid. It includes a variety of /cfg directories to use as base roles for new systems, and the confighelper script is designed to help do customization quickly. If you want to create a personalized local grid of machines from the base image, here is how to do it:
Test the base image in its default configuration to verify functionality before customizing.
login as bootes in cpu server mode and start up the confighelper
choose to (r)eset machine key and passwords, and answer the prompts. Your chosen authdom is particularly essential - it can be your domain if you have one, or just an arbitrary name for 'your grid turf'.
reboot the vm to cpu server mode and answer the prompts to create the new machine key. You only get once chance to set bootes' password here, be careful. Then run auth/changeuser bootes and enter the same password you just did, and then run auth/changeuser for your other users and set their new passwords. Run auth/changeuser -n for each user also to update the netkeys.
at this point your new auth setup should be complete and you should be able to drawterm/cpu in, 9fs if the port 564 listener is running, use netkey auth, etc. Please test whatever functions are important to you.
this is your new seed image. If you use it as a base for cloning your additional grid systems, they will all include copies of the auth information and keys and users.
now you can manufacture as many customized systems as you want using the confighelper tools. to make a new customized system, first create a new blank qemu qcow2 for it. qemu-img create -f qcow2 newsysname.qcow2.img 1G creates a qcow2 blank disc that can expand up to 1gb of storage. A venti backed fossil is very small so you probably wont use much of that space.
now boot qemu with the blank disk attached as -hdb. in windows for instance, you might do qemu -L . -hda g9.qcow2.img -hdb newcpu.qcow2.img -redir tcp:567::567 -redir tcp:17010::17010 -m 256, then drawterm in as bootes. Run the confighelper script and choose to (a)dd new system to grid. It will prompt you for a system name and other info and a config from /cfg to clone - it will then transform the disk on drive 2 into a customized clone of your seed system on drive 1.
continue this process to make as many customized VMs as you want. A very common customization to apply would be changing the venti server IP so you could boot additional nodes on other machines using your existing venti server.
these cloned VMs are bootable in exactly the same way as the parent images, and can make 'children' of their own.
Making a new blank qcow and booting and running the configscript takes about 1 minute once you are familiar with the process, so you can make two VMs for every computer on your LAN more or less instantaneously. Note that you might want to 'preload' your /lib/ndb/local with the new machines using confighelper's nbd updater prior to starting the cloning process, so all children have all their peers in their ndb.
Well, we've got a new version of the qemu-based grid tools up - it provides a lot of new functionality, flexibility, and hopefully ease-of-use. We haven't really started documenting everything that's going on with it, and we are going to be rolling out new services related to its design during the upcoming weeks. The image and tools itself should start getting more frequent updates and we are hoping to leverage Venti and other Plan 9 based tools to do so.
If you already have access to a linux machine, the new base version of the image provides what we hope is pretty neat out-of-the box functionality - you extract the .tgz, cd into your new gridtools directory, and run (as root) a simple iptables script to redirect some low ports to high ports for the qemu VM to access without root privileges. The gridlord script (run as normal user inside the extraced directory, no install required) then provides a simple menu-based interface to control a full distributed plan9 system all on your local machine. Fire up a VM and let it boot in cpu server mode, then drawterm in as bootes, glenda, or gridna with password: gridpass. The VM image itself is meta-pre-configured - check out the /cfg directory, which holds relatively self-descriptive configurations for multiple machine roles. /usr/bootes/bin/rc contains confighelper -- a wrapper for an set of configuration helper scripts designed to let you quickly and easily control some administrative variables. Please note that right now this is an ad-hoc set of recipes based on the base configuration we provide. We hope to develop it into a more general purpose Plan 9 configuration utility, but it is currently not recommended for use with anything other than the 9gridchan.org preconfigured qemu image.
There are several other customizations and extras included in the image - and source code to everything, of course. /usr/grid is a 'non-login' user whose directories contain some of the grid tools. /usr/grid/src and /usr/bootes/lib have most of the additonal materials, along with /usr/grid/bin/rc and /usr/grid/bin/386 - these directories are bound in during most user's profiles, except glenda. We left Glenda basically untouched from the Bell Labs setup in this image. Thanks to everyone who helped test and offered suggestions for this version of the tools, and we look forward to improving it further according to user suggestions. As of the time of this posting, the surrounding documentation and web resources are still mostly un-updated so things like the walkthrough screenshots gallery still apply to the old version of the image.
Currently under testing and development: the amazing "infinite rabbits from a single venti-hat" High Availability Grid. The past few weeks we've been exploring ideas triggered by the possibility of storing entire sets of machines in both virtual machine and fossil vacroot form in a set of arenas and isect data for a venti server. Before your eyes glaze over from the seeming over-complexity, lets cut to the chase, the goal for the end user.
The end product is a relatively small file distributed as a .tgz that contains a complete set of files for a venti server and a large collection of .vac files. The venti data includes the rootscores of many different machine fossils, and also entire preinstalled qemu virtual machine images that correspond to those rootscores. Using either a native or plan9port venti, the user can choose from a large number of differentlty configured machines to instantiate on either native hardware or as a virtual machine. The venti has data corresponding to machines ranging from an untouched default install to a large grid of dozens of machines. Due to the nature of venti, storing an arbitrarily large number of differently configured virtual machines and fossils has almost no overhead.
The system images included are configured to act as an integrated plan 9 network with a few tricks designed to provide a high availability 'cloud' environment out of the resources available. This is the system employed on the 9gridchan version of the grid which will be opened for public use soon. Our current setup uses one master and one backup venti, three fossil file servers, three cpu servers, and an arbitrary number of terminals. The creation of a single unified environment for the user along with failsafe reliability is accomplished via simple scripted filesystem imports and binds.
Any of the 3 fossils can boot backed by either of the two ventis, which use venti/rdarena and venti/wrarena to make scores available on both. Each CPU server boots with a TCP root from a different fossil. However, all the fossils are basically identical, and when the user logs into any cpu server, 9fs imports of the other fossils are imported, and the primary fossil's $home is bound over the boot $home by the cpu. A simple backgrounded script then mirrors the data from the active $user to the backup fossils.
The user interaction model is to dial each of the three cpu servers from a different window on their terminal. Each window will behave identically and binds $home from the same primary fossil, but is using a different CPU with a different tcp root and venti backup. Once the user has dialed in their three windows, they can work freely in any of them and see the same file data. If that fossil fails, the backup fossils, no more than 5 minutes out of date, can be bound to $home. A standard usage model is to run rio on one cpu, acme on another, and rc shell only on the third. The user experience should be identical to working with a single all-in-one machine with the difference of enhanced reliability and the increased performance of truly independent multiple CPUs.
A public release of a first version of these tools along with some updates to the g/scripts and preconfigured qemu images should be arriving soon. Feel free to stop in #plan9chan on irc.freenode.net if you want to explore this system in its development state. Standard public resources such as omni are unaffected by the current development environment but will become integrated at the time of an initial public distribution.
No big news on the grid past few weeks - had a service interruption for a few days, but it was resolved. Shockingly, the brief absence of 9gridchan services did not cause any significant rioting or result in the overthrow of any governments. We are greatly relieved.
Random blog update to make sure people know the project is healthy and ongoing. An actual public announcement of some kind on a known Plan 9 forum such as 9fans has been something we've been waiting a long time to do.
Thanks to the ceaseless flow of innovation and progress in free and open source software, the walkthrough screenshots tutorial galleries now feature captions and better navigation! Additionally, we are working to provide new services and content on the dynamic 9p registry system. Nothing too thrilling, but as an example, the omniguest user is now exporting /bin. After connecting to that service, binding the imported directory over the local /bin in an rc window allows testing of any of the installed software on the public cpu server without actually logging in.
Created a couple galleries with screenshots of starting up the image, connecting with drawterm, and running configscript. Hopefully having a visual reference will make it easier for people trying plan 9 for the first time to see if they are seeing what they are supposed to see. More screenshot galleries will be added to demonstrate making use of the g/script toolkit and 9gridchan.org resources.
The preinstalled qemu node image has several purposes. Most importantly is to provide an easy-to-use form of Plan 9 for every platform. Qemu runs on everything and a preinstalled image means that once the download is complete, the user can be inside a fully functional Plan 9 enivornment instantly. Another goal is to provide easy access to the key component of Plan 9: a CPU server. Another is is to provide a user-driven, decentralized platform for grid computing research and development. The latest version of the image makes bootup easier with a single menu for boot choices. There are more variant versions of plan 9 software and some customizations like the default font, taken from quanstro's subpixel fonts. A small sampler collection of .vac scores for downloading additional docs and collections of Plan 9 pictures from venti.9gridchan.org is also included. Thanks as always to the Plan 9 contributors whose work is included such as fgb (contrib and abaco), andrey (irc7 and links port), and others.
A big day of upgrades - along with release 1.1 of the /9/grid qemu image and toolkit, we've upgraded www.9gridchan.org with Uriel's latest Werc + Soul9's image gallery application plugin. Thanks to them for their development efforts. The new 1.1 image features more stuff and a smaller download and installed size, along with default user accounts for instant drawterm/cpu access for testing. (CHANGE THOSE PASSWORDS before allowing access from external networks, everyone!)
Ongoing projects include adding scripts for managing venti servers and databases of .vac files to the g/toolkit and a grid game based on namespace manipulation. Future blog posts will also provide some tips and tricks for using grid resources and tools.
by Eric Van Hensbergen (noreply@blogger.com) at January 28, 2010 03:35 AM
Federal officials accused four men, including a conservative activist, of posing as telephone repairmen to tamper with phones at the New Orleans offices of Democratic Sen. Mary Landrieu. [...] Late Monday morning, according to an FBI affidavit, Messrs. Flanagan and Basel, dressed in blue work shirts, fluorescent green vests and construction hard hats, entered Sen. Landrieu's offices and told a staffer they had come to fix the phone lines. By then, Mr. O'Keefe already had arrived at the offices, according to the FBI. -- http://online.wsj.com/article/SB20001424052748703906204575027500584356116.html
by newsham (noreply@blogger.com) at January 27, 2010 06:37 PM
One of the great religious debates in compiler writing is whether you should use parser generators like yacc and its many descendants or write parsers by hand, usually using recursive descent. On the one hand, using parser generators means you have a precise definition of the language that you are parsing, and a program does most of the grunt work for you. On the other hand, the proponents of hand-written recursive descent parsers argue that parser generators are overkill, that parsers are easy enough to write by hand, and that the result is easier to understand, more efficient, and can give better error messages when presented with syntactically invalid programs.
Like in most religious debates, the choice of side seems to be determined by familiarity more than anything else. I knew how to use yacc before I knew how to write a parser by hand, which put me solidly on the parser generator side of the fence. Now that I do know how to apply both techniques, I'd still rather use a parser generator. In fact, I've worked on projects where I've written parser generators rather than write a parser by hand. Good notation counts for a lot, as we shall see.
In Coders at Work, Ken Thompson talks to Peter Seibel about yacc and lex:
Seibel: And are there development tools that just make you happy to program?
Thompson: I love yacc. I just love yacc. It just does exactly what you want done. Its complement, lex, is horrible. It does nothing you want done.
Seibel: Do you use it anyway or do you write your lexers by hand?
Thompson: I write my lexers by hand. Much easier.
Many of the objections raised by the hand-written parser camp are similar to Thompson's objection to lex—the tool doesn't do what you want—but there's no fundamental reason a tool can't. For example, the spurious conflicts that an LALR(1) algorithm produces are definitely hard to understand, but if you replace it with LR(1) or GLR(1), you get a more useful tool. And if you do learn how to work within the LALR(1) algorithm, even yacc is very powerful.
The objection to parser generates that seems to resonate most is that generators like yacc produce inadequate error messages, little more than “syntax error.” Better error messages were one of the key benefits hoped for when g++ converted from a yacc-based C++ parser to a hand-written one (and to be fair, C++ syntax is nearly impossible to parse with any tool; the many special cases cry out for hand-written code). Here the objection seems harder to work around: the parser internally gets compiled into an automaton—usually a big table of numbers—that moves from state to state as it processes input tokens. If at some point it can't find a next state to go to, it reports an error. How could you possibly turn that into a good message?
Clinton Jeffery showed how in a paper published in ACM TOPLAS in 2003 titled “Generating LR Syntax Error Messages from Examples.” As you can guess from the title, the idea is to introduce a training stage after the parser is generated but before it is used for real. In that stage, you feed examples of syntax errors into the parser and look at what state it ends up in when it detects the error, and maybe also what the next input token is. If the automaton hits an error in that state (and with that input token) during real use, you can issue the message appropriate to the example. The important part is that the parser states are basically an abstract summary of the surrounding context, so it's reasonable to treat errors in a particular state with a single message. For example, suppose you find that this Go program
package main import ( "fmt", "math" )
causes a syntax error at the comma, in state 76. In the parser, that state means that you're in the middle of an import block and expecting to see a string constant. The state abstracts that context, so you'd end up in the same state if you were parsing this erroneous program:
package fmt import ( "strings"; "testing", "xgb" )
Having told the parser that the appropriate error message for the first program is “unexpected , during import block,” the parser will use the same message for the second program.
It's an elegant idea, and the implementation can be kept on the side, without adding any complexity to the grammar itself. I heard about this idea through the grapevine years ago (in 2000, I think) but had never gotten a chance to try it until this week.
There are three Go parsers: Ian Lance Taylor wrote a hand-written
recursive descent parser in C++ for gccgo,
Robert Griesemer wrote a hand-written recursive descent parser
in Go (import "go/parser")
for godoc and
gofmt,
and Ken Thompson wrote a yacc-based parser in C for the
gc compiler suite.
On Monday night, I implemented Jeffery's idea in the gc compiler suite. The gc compilers use bison, the GNU implementation of yacc. Bison doesn't support this kind of error management natively, but it is not hard to adapt. Changing bison would require distributing a new version of bison; instead, my implementation post-processes bison's output.
The examples are in a new file go.errors.
Because the lexer is written by hand, it's not available in the
simulation, so the examples are token name sequences
rather than actual program fragments. In token lists, the
program above and its corresponding error message are:
% loadsys package LIMPORT '(' LLITERAL import_package import_there ','
"unexpected , during import block",
Understanding the tokens on the first line requires knowing what the various
grammar token names mean, but they basically mimic the
syntax, and this tool is targeted at the people working on
the grammar anyway.
An awk program loads bison's tables
from its debugging dump y.output and
then processes the go.errors file, replacing
each line like the above with the number of the offending
state and input token.
That produces a table that can be linked into the compiler
and consulted when a syntax error is encountered.
If the state and input token match an entry in the table,
the better error is used instead of a plain syntax error.
That was an outsized amount of work to close one bug, but now
it's easy to add better messages for other common situations.
For example, the fact that only the short := declaration
form is allowed in for initializers sometimes trips up
new Go programmers. When presented with this program:
package main
import "fmt"
func main() {
fmt.Printf("hello, world\n")
for var i = 0; i < 10; i++ {
fmt.Println(i)
}
}
the compiler used to just say there was a syntax error on line 7, but now it explains:
$ 6g x.go x.go:7: syntax error: var declaration not allowed in for initializer
because of this stanza in go.errors:
% loadsys package imports LFUNC LNAME '(' ')' '{' LFOR LVAR LNAME '=' LNAME
"var declaration not allowed in for initializer",
Gccgo and gofmt give more traditional token-based errors:
$ gccgo x.go x.go:7:6: error: expected operand ... $ gofmt x.go x.go:7:6: expected operand, found 'var'
What's interesting isn't that neither has bothered to handle this as a special case yet. What's interesting is to consider the work required to do so. Changing either would require understanding the corresponding hand-written parser well enough to find the right place to put the special case. With the example-based approach, you just write an example, and the tool figures out where in the parser it needs to go.
by newsham (noreply@blogger.com) at January 25, 2010 06:55 PM
def encChar(ch):More complex marshallers can be written by chaining together simpler encoders. The itertools module provides a chain function which does this. Here we marshall 16-bit and 32-bit numbers in big-endian format:
yield ch
def encNum1(x) :
yield chr(x & 0xff)
from itertools import chainMarshalling a fixed list of values can also be accomplished by chaining together several encoders:
def encNum2(x) :
return chain(encNum1(x >> 8), encNum1(x))
def encNum4(x) :
return chain(encNum2(x >> 16), encNum2(x))
def encFixedList_(es, xs) :This function chains together the encoding of each item in xs using the matching encoder in es. In practice you'd want a special zip that results in error if the number of es and xs was not the same, but lets ignore that for now. This function will work just fine, but it has a problem in practice -- usually you'll want to use this function partially applied, providing the es but leaving the xs to be filled in later. Here's where currying comes in handy. We'll make a function that takes just the es and returns a function that takes the xs later:
return chain(*(e(x) for e,x in zip(es, xs)))
def encFixedList(*es) :This is much easier to use:
return lambda xs : chain(*(e(x) for e,x in zip(es, xs)))
encMyList = encFixedList(encNum4, encChar, encNum2)You might notice that the encoding of fixed lists is basically an encoding of records. If we presuppose that all of our records have a toList method that returns all of the record's fields as a list, we can write a marshaller for these records:
def encRecord(*es) :To use these marshallers we pass the value we want to encode and then convert the resulting iterable to a string:
enc = encFixedList(*es)
return lambda rec : enc(rec.toList())
class MyRec(object) :
def __init__(self, a, b, c) :
self.a, self.b, self.c = a,b,c
def toList(self) :
return (self.a, self.b, self.c)
def __str__(self) :
return '[MyRec a=%s b=%s c=%s]' % self.toList()
encMyRec = encRecord(encChar, encNum1, encNum1)
def encode(enc, x) :
return ''.join(enc(x))
x1 = encode(encNum4, 0x01020304)
print '%r' % x1
x2 = encode(encMyList, [31415927, 'x', 257])
print '%r' % x2
x3 = encode(encMyRec, MyRec('A', 5, 10))
print '%r' % x3
def decChar(it) :Unmarshalling larger numbers is just a matter of unmarshalling several smaller numbers and composing them. Here are the unmarshallers for 16- and 32-bit big endian numbers. These definitions make use of the left-to-right order of evaluation:
return it.next()
def decNum1(it) :
return ord(decChar(it))
def decNum2(it) :And unmarshall fixed lists is just a matter of decoding each constituent in turn. Here again we use currying:
return (decNum1(it) << 8) | decNum1(it)
def decNum4(it) :
return (decNum2(it) << 16) | decNum2(it)
def decFixedList(*ds) :To unmarshall a record we need a way to unmarshall the constituent fields and then create the record. The decFixedList function can be used to unmarshall the records if we have a way to construct the record from a list. We'll presuppose that our records have a constructor that take the field values in turn, then unmarshalling is simply:
return lambda it : [d(it) for d in ds]
decMyList = decFixedList(decNum4, decChar, decNum2)
def decRecord(klass, *ds) :To use these unmarshallers to decode a binary string we must first get an iterator and then pass it to the decoder:
dec = decFixedList(*ds)
return lambda it : klass(*dec(it))
decMyRec = decRecord(MyRec, decChar, decNum1, decNum1)
def decode(dec, binary) :
return dec(iter(binary))
print decode(decNum4, x1)
print decode(decMyList, x2)
print decode(decMyRec, x3)
class Record(object) :Now defining a new record is done simply by filling in the __name__ and __fields__ values:
def __init__(self, *args) :
for f,v in zip(self.__fields__, args) :
setattr(self, f, v)
def toList(self) :
return [getattr(self, f) for f in self.__fields__]
def __str__(self) :
fs = ', '.join('%s=%r' % (f, getattr(self, f)) for f in self.__fields__)
return '[%s %s]' % (self.__name__, fs)
class MyRec2(Record) :
__name__ = 'MyRec2'
__fields__ = 'code', 'cost', 'quantity'
encMyRec2 = encRecord(encChar, encNum2, encNum1)
decMyRec2 = decRecord(MyRec2, decChar, decNum2, decNum1)
x4 = encode(encMyRec2, MyRec2('X', 2495, 3))
print '%r' % x4
print decode(decMyRec2, x4)
from itertools import chain
class Record(object) :
def __init__(self, *args) :
for f,v in zip(self.__fields__, args) :
setattr(self, f, v)
def toList(self) :
return [getattr(self, f) for f in self.__fields__]
def __str__(self) :
fs = ', '.join('%s=%r' % (f, getattr(self, f)) for f in self.__fields__)
return '[%s %s]' % (self.__name__, fs)
def encode(enc, x) :
return ''.join(enc(x))
def encChar(ch):
yield ch
def encNum1(x) :
yield chr(x & 0xff)
def encNum2(x) :
return chain(encNum1(x >> 8), encNum1(x))
def encNum4(x) :
return chain(encNum2(x >> 16), encNum2(x))
def encFixedList(*es) :
return lambda xs : chain(*(e(x) for e,x in zip(es, xs)))
def encRecord(*es) :
enc = encFixedList(*es)
return lambda rec : enc(rec.toList())
def decode(dec, binary) :
return dec(iter(binary))
def decChar(it) :
return it.next()
def decNum1(it) :
return ord(decChar(it))
def decNum2(it) :
return (decNum1(it) << 8) | decNum1(it)
def decNum4(it) :
return (decNum2(it) << 16) | decNum2(it)
def decFixedList(*ds) :
return lambda it : [d(it) for d in ds]
def decRecord(klass, *ds) :
dec = decFixedList(*ds)
return lambda it : klass(*dec(it))
by newsham (noreply@blogger.com) at January 23, 2010 08:57 PM
by newsham (noreply@blogger.com) at January 23, 2010 06:19 PM
Erik released a new alpha driver for Marvell Yukon 2 Ethernet NICs.
From: erik quanstrom <quanstro@qua...>
Subject: yukon 2 alpha driver
Date: Sat, 16 Jan 2010 18:15:50 -0500
is anyone interested in testing a marvell yukon 2
driver? i am currently working with a 88e5057
(1186/4b00, dge-560t). but most single-port yukon 2
parts should work fine. if so. please contact me
off list.
- erik
You can install it from contrib with
term% contrib/install quanstro/yuk
This will install two files, etheryuk.c and yukdump.h, into your /sys/src/9/pc directory.
Edit /sys/src/9/pc/pcf and add the following line to the list of Ethernet devices:
etheryuk pci
After that compile your new kernel by running
mk 'CONF=pcf'
Then copy the new kernel into /n/9fat and boot with it.
On my ASUS A8R32-MVP with 88E8053 chip the following new device will now show up:
#l0: yuk: 1000Mbps port 0xDFEFC000 irq 4: 00173137b3d3
I tested it during the last several days and it works great, good work!
Please send your feedback and bug reports to Erik.
Erik released a new alpha driver for Marvell Yukon 2 Ethernet NICs. (See his post at 9fans)
You can install it from contrib with term% contrib/install quanstro/yuk
This will install two files, etheryuk.c and yukdump.h, into your /sys/src/9/pc directory.
Edit /sys/src/9/pc/pcf and add the following line to the list of Ethernet devices: etheryuk pci
After that compile your new kernel by running mk 'CONF=pcf'
Then copy the new kernel into /n/9fat and boot with it.
On my ASUS A8R32-MVP with 88E8053 chip the following new device will now show up: #l0: yuk: 1000Mbps port 0xDFEFC000 irq 4: 00173137b3d3
Please send your feedback and bug reports to Erik.
by newsham (noreply@blogger.com) at January 22, 2010 06:31 PM
by newsham (noreply@blogger.com) at January 21, 2010 06:24 PM
by newsham (noreply@blogger.com) at January 17, 2010 06:20 PM
by newsham (noreply@blogger.com) at January 15, 2010 07:05 PM
by newsham (noreply@blogger.com) at January 13, 2010 12:03 AM
def send(s, **kw) :
s.send(pickle.dumps(kw))
def recv(s) :
return pickle.loads(s.recv(64*1024))
send(s, foo=1, bar="test", priority="utmost!", flags=SYN|ACK)
m = recv(s)
print m['priority']
class Msg(object) :
def __init__(self, *kw) :
self.__names__ = kw.keys()
for n,v in kw.items() :
setattr(self, n, v)
def __str__(self) :
attrs = ' '.join("%s=%s" % (n,getattr(self,n)) for n in self.__names__)
return '[Msg %s]' % attrs
__repr__ = __str__
def recvMsg(s) :
return Msg(**recv(s))
m = recvMsg(s)
print m.priority
print m
by newsham (noreply@blogger.com) at January 08, 2010 08:49 PM
by newsham (noreply@blogger.com) at January 07, 2010 06:15 PM
This paper reports on the factorization of the 768-bit number RSA-768 by the number field sieve factoring method and discusses the implications for RSA.
by newsham (noreply@blogger.com) at January 07, 2010 05:47 PM
by newsham (noreply@blogger.com) at December 31, 2009 08:46 PM
by newsham (noreply@blogger.com) at December 31, 2009 07:07 PM
Does this mean that we have proved seL4 is secure? Not yet, really. We proved that seL4 is functionally correct. Secure would first need a formal definition, which in itself is a wide field and depends on what you want to use the kernel for. Taken seriously, security is a whole-system question, including the system’s human components.After using the initial code, running static analyzers on them and finding and fixing 16 bugs:
Formal verification then found 144 more bugs in the C code, in total 160. This means even though the code appeared to be running just fine for normal application, there were an order of magnitude more bugs lurking in there than the 16 found initially. [...] As mentioned, we also did proofs on the design and the specification level. This was to 95% before the code was written, and we fixed about 150 issues in each. That means, in total we have discovered roughly 460 issues in kernel code, design and specification.
by newsham (noreply@blogger.com) at December 31, 2009 06:50 PM
According to Ian Meakin, of mobile encryption firm Cellcrypt, only government agencies and "well funded" criminals had access to the necessary technology.
by newsham (noreply@blogger.com) at December 29, 2009 06:26 PM
The Obama administration said Monday it would begin routinely penalizing U.S. air carriers for lengthy tarmac delays. [...] Airlines would be fined $27,500 per passenger for violations.Those who don't like the government poking its head into private business can blame the airlines for bringing this on themselves. Twenty-seven thousand is a small price to pay for holding passengers against their will. I would suggest jail time is more appropriate. http://online.wsj.com/article/SB126140826775100235.html?mod=WSJ_hpp_MIDDLTopStories
by newsham (noreply@blogger.com) at December 29, 2009 03:05 AM
by newsham (noreply@blogger.com) at December 28, 2009 11:09 PM
by newsham (noreply@blogger.com) at December 28, 2009 07:00 PM
by newsham (noreply@blogger.com) at December 28, 2009 06:05 PM
So the rich fuckers don't slack, eh. Be careful with the fuckers, they'll do anything that their money can permit them to keep you down, latest trend is closing down the doors on open and free intellectual property.
The G5 (or G7 or G20 or whatever was it?) is pushing it's own agenda: ACTA , and of course we shouldn't forget about Genome and software patents.
Honestly, i wouldn't be surprised if you had to pay a tax for having a blue-eyed blond kid in 20 years.

by newsham (noreply@blogger.com) at December 24, 2009 06:16 PM
by newsham (noreply@blogger.com) at December 21, 2009 11:14 PM
by newsham (noreply@blogger.com) at December 21, 2009 05:59 PM
Shi'ite fighters in Iraq used software that cost as little as $26 to intercept the video feeds, potentially allowing them to monitor U.S. military operations.It was only a matter of time. When it comes to hacking, America's the country with everything to lose and little to gain. You'd think as a result we'd be more interested in good computer engineering and defensive computer security but that does not appear to be the case.
by newsham (noreply@blogger.com) at December 17, 2009 06:29 PM
One of the more controversial aspects of the Go project
is that Go has a canonical program format, implemented
by a program (gofmt) that will pick up any
Go program and reformat it in the canonical format.
Every program in the Go source tree has been formatted
with gofmt. The code review and revision control tools
check that any new code is gofmt-formatted too.
(I am avoiding the word “style,” because gofmt doesn't enforce style, in the sense of The Elements of Programming Style. Running code through gofmt changes the layout but not whether it's a well-written, elegant program. Referring to these mundane formatting rules as style elevates them far beyond their importance. The only style that gofmt imposes is consistency.)
The most obvious benefit of using gofmt is that when you open an unfamiliar Go program, your brain doesn't get distracted, even subconsciously, about why that brace is in the wrong place; you can focus on the code, not the formatting. But there are many more interesting uses for gofmt. Gofmt can take any file in the Go source tree, parse it into an internal representation, and then write exactly the same bytes back out to the file. There are two reasons this works: the primary reason is that a lot of effort went into gofmt, but an important secondary reason is that gofmt only has to worry about one formatting convention, and we've agreed to accept that as the official one.
Once you have a tool that can parse and print a program losslessly, it's easy to insert mechanical processing in the middle, to transform the program between parsing and printing. Want to change something about the language? Change it in gofmt and reformat the source tree. This has already happened multiple times since the public launch.
The December 9 release of Go added a new built-in function
copy(dst, src) that copies data from one
slice to another. (It's a built-in both because it's such a
common operation and because Go's lack of pointer
arithmetic makes it hard to write the overlap check yourself.)
The new copy replaced the library routine
bytes.Copy, which was similar but
only worked on byte slices.
Gofmt has an option, -r, to specify a rewrite
rule in the form
pattern -> replacememt.
In the pattern and replacement, single-letter lower case names
serve as wildcards matching arbitrary expressions.
Want to convert from bytes.Copy to copy?
gofmt -w -r 'bytes.Copy(d, s) -> copy(d, s)' *.go
(The -w flag writes the new version of the program back to the input file.)
The December 9 release also added support for writing
x[i:] instead of x[i:len(x)].
Want to convert programs to use the new syntax when possible?
gofmt -w -r 'x[i:len(x)] -> x[i:]' *.go
These two minor changes were upstaged by discussion of a larger syntactic change, the elimination of line-ending semicolons. Again, gofmt is crucial to making the switch. The syntax being considered is mostly, but not completely, backwards compatible with the current version of Go. Normally, that would mean having to make the compilers support both syntaxes for some time, during which the compiler might warn about programs depending on the old syntax, expecting users to fix their programs. Instead, having one tool whose job it is to rewrite programs means that all the other tools can focus on other things. Gofmt can fix programs mechanically, and the compilers and other tools can drop the old syntax instead of having to support both.
I'd like to see more tools built using gofmt and the
libraries it uses (go/parser and go/printer).
For example, there's no reason an IDE plugin couldn't sit on the same
libraries to build interesting refactoring tools.
Gofmt's -r is just the beginning.
It's going to be exciting.
One final note. Being able to pick up a program and write it back out, preserving comments, is not an easy task in any language. Robert Griesemer deserves all the credit for the huge effort to make gofmt handle real programs so well.
UPDATE: ‘Fatalis’ has pointed out in the comments that the POST should be made to http://goo.gl/api/url with User-agent set to ‘toolbar’. The code now works, Yay!
Google just announced their own URL shortening service. Their service can only be used from the toolbar or FeedBurner, and I don’t particularly like adding extra toolbars to my browser. Maybe I can figure out a way to use their service from the command line?
I downloaded the toolbar XPI, unzipped it and peeked inside. Horribly indented JS awaited me. Nothing jsbeautifier couldn’t fix though. Few minutes later, I arrived at this readable JS function:
var getUrlShorteningRequestParams = function (b) {
function c() {
for (var l = 0, m = 0; m < arguments.length; m++)
l = l + arguments[m] & 4294967295;
return l
}
function d(l) {
var m = String(l > 0 ? l : l + 4294967296);
for (var o = 0, n = false, p = m.length - 1; p >= 0; --p) {
var q = Number(m.charAt(p));
if (n) {
q *= 2;
o += Math.floor(q / 10) + q % 10
} else o += q;
n = !n
}
m = m = o % 10;
o = 0;
if (m != 0) {
o = 10 - m;
if (l.length % 2 == 1) {
if (o % 2 == 1) o += 9;
o /= 2
}
}
m = String(o);
m += l;
return l = m
}
function e(l) {
for (var m = 5381, o = 0; o < l.length; o++) m = c(m << 5, m, l.charCodeAt(o));
return m
}
function f(l) {
for (var m = 0, o = 0; o < l.length; o++) m = c(l.charCodeAt(o), m << 6, m << 16, -m);
return m
}
var i = e(b);
i = i >> 2 & 1073741823;
i = i >> 4 & 67108800 | i & 63;
i = i >> 4 & 4193280 | i & 1023;
i = i >> 4 & 245760 | i & 16383;
var h = f(b);
var k = (i >> 2 & 15) << 4 | h & 15;
k |= (i >> 6 & 15) << 12 | (h >> 8 & 15) << 8;
k |= (i >> 10 & 15) << 20 | (h >> 16 & 15) << 16;
k |= (i >> 14 & 15) << 28 | (h >> 24 & 15) << 24;
j = "7" + d(k);
i = "user=toolbar@google.com&url=";
i += encodeURIComponent(b);
i += "&auth_token=";
i += j;
return i
};
So, I call getUrlShorteningRequestParams("http://www.kix.in/"); to get "user=toolbar@google.com&url=http%3A%2F%2Fwww.kix.in%2F&auth_token=78925814685". I see in their code that they do a POST request to the service to obtain a JSON return value that would contain the short URL. I punch it in using cURL:
$ curl -v -d "user=toolbar@google.com&url=http%3A%2F%2Fwww.kix.in%2F&auth_token=78925814685" http://goo.gl/ * About to connect() to goo.gl port 80 (#0) * Trying 74.125.19.102... connected * Connected to goo.gl (74.125.19.102) port 80 (#0) > POST / HTTP/1.1 > User-Agent: curl/7.19.7 (i386-apple-darwin10.2.0) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3 > Host: goo.gl > Accept: */* > Content-Length: 77 > Content-Type: application/x-www-form-urlencoded > < HTTP/1.1 405 HTTP method POST is not supported by this URL
Oops! Well, not really, the URL shortener from the toolbar doesn’t work either, I just get the full URL whenever I try to “share” something. Has anybody actually generated a real goo.gl short URL yet?
Their auth_token parameter seems completely superfluous to me as it is generated from the URL itself. Don’t we all know security by obscurity doesn’t work
by newsham (noreply@blogger.com) at December 14, 2009 08:00 PM
by newsham (noreply@blogger.com) at December 10, 2009 06:02 PM
In January 2007 I posted an article on my web site titled “Regular Expression Matching Can Be Simple And Fast.” I intended this to be the first of three; the second would explain how to do submatching using automata, and the third would explain how to make a really fast DFA. I got distracted for a few years but have finally finished the second article.
http://swtch.com/~rsc/regexp/regexp2.html
Simple (easy to follow, I hope) code demonstrating the techniques from the article can be found at http://code.google.com/p/re1/.
I've been writing about Go internals, which I'll keep doing,
but I also want to spend some time on Go programs.
Go's reflect package implements data reflection.
Any value can be passed to reflect.NewValue (its argument type is
interface{}, the empty interface) and then manipulated and
examined by looking at the result, a reflect.Value.
Typically, you use a type switch to determine what kind
of value it is—a *IntValue, a *StructValue, a *PtrValue, and so on—and then walk it further.
This is how fmt.Print
prints arbitrary data passed to it.
Go allows programs to tag the fields in struct definitions with annotation strings. These annotations have no effect on the compilation but are recorded in the run-time type information and can be examined via reflection. The annotations can be hints for reflection-driven code traversing the data structures. This blog post shows two examples of such code: an XML parser that writes directly into data structures, and a template-based output generator that reads from them.
The Go XML package implements a
function Unmarshal which parses XML directly into data structures.
As a contrived but illustrative example, here is an XML encoding of a linked list:
var encoded = `
<list><value>a</value>
<list><value>b</value>
<list><value>c</value>
</list>
</list>
</list>
`
(That's a backquoted multi-line Go string literal.)
Given this Go data structure definition:
// http://gopaste.org/view/private:7v2CQ type List struct { Value string; List *List; } func (l *List) String() string { if l == nil { return "nil" } return l.Value + " :: " + l.List.String() }
we can invoke xml.Unmarshal to
turn the XML above into an instance of this data structure:
func main() {
var l *List;
if err := xml.Unmarshal(strings.NewReader(encoded), &l); err != nil {
log.Exit("xml.Unmarshal: ", err)
}
fmt.Println(l)
}
This program prints a :: b :: c :: nil.
The call to xml.Unmarshal walked the XML
and the data structure at the same time, matching
the XML tags <value> and <list>
against the names of the structure fields and allocating
pointers where necessary.
There are decisions to be made when matching
XML against the data structure that aren't easily expressed
with just the usual struct information.
For example, we might want to make the value an
attribute instead of a sub-element. Because attributes
and sub-elements might use the same names, the XML
package requires struct fields holding attributes to be tagged
with "attr".
This data structure and XML encoding would also work
in the program above:
// http://gopaste.org/view/private:2yR7E type List struct { Value string "attr"; List *List; } var encoded = ` <list value="a"> <list value="b"> <list value="c"/> </list> </list> `
If a field has type string, it gets the
character data from the element with that name,
but sometimes you need to accumulate both
character data and sub-elements.
To do this, you can tag a field with "chardata".
This data structure and XML encoding would also work:
// http://gopaste.org/view/private:U3ews type List struct { Value string "chardata"; List *List; } var encoded = `<list>a<list>b<list>c</list></list></list>`
The annotations are trivial but important: where the encoding and the data structures are not a perfect match (and they rarely are), the annotations provide the hints necessary for the XML package to make them work together anyway.
Let's look at real XML data. Google's Code Search provides regular expression search over the world's public source code. The search [file:/usr/sys/ken/slp.c expected] returns a single, famous result. Instead of using the web interface, though, we can issue the same query using the Code Search GData API and get this XML back (simplified and highlighted for structure):
From the API documentation or just by eyeballing, we can write data structures for the information we'd like to extract:
type Feed struct {
XMLName xml.Name "http://www.w3.org/2005/Atom feed";
Entry []Entry;
}
type Entry struct {
XMLName xml.Name "http://www.w3.org/2005/Atom entry";
Package Package;
File File;
Match []Match;
}
type Package struct {
XMLName xml.Name "http://schemas.google.com/codesearch/2006 package";
Name string "attr";
URI string "attr";
}
type File struct {
XMLName xml.Name "http://schemas.google.com/codesearch/2006 file";
Name string "attr";
}
type Match struct {
XMLName xml.Name "http://schemas.google.com/codesearch/2006 match";
LineNumber string "attr";
Type string "attr";
Snippet string "chardata";
}
The XMLName fields are not required.
The XML package records the XML tag name in them.
If there is a string annotation (as there is here),
then the XML package requires that the XML element being
considered has the given name space and tag name.
That is, the annotation is both documentation and
an assertion about what kind
of XML goes into this structure.
A full Atom GData feed has many other fields, but these are the only ones that we care about. The XML parser will throw away data that doesn't fit into our structures.
We can fetch the feed via HTTP and unmarshal the HTTP response data directly into the data structures:
r, _, err := http.Get(`http://www.google.com/codesearch/feeds/search?q=` +
http.URLEscape(flag.Arg(0)));
if err != nil {
log.Exit(err);
}
var f Feed;
if err := xml.Unmarshal(r.Body, &f); err != nil {
log.Exit("xml.Unmarshal: ", err);
}
and then print the results:
for _, e := range f.Entry {
fmt.Printf("%s\n", e.Package.Name);
for _, m := range e.Match {
fmt.Printf("%s:%s:\n", e.File.Name, m.LineNumber);
fmt.Printf("%s\n", cutHTML(m.Snippet));
}
fmt.Printf("\n");
}
(The cutHTML function returns its argument
with HTML tags stripped: the
m.Snippet is HTML but we just want to print text.)
Compared to the xml.Unmarshal, that loop
with all the fmt.Printf calls sure looks clunky.
We can use a reflection-driven process to print the data too.
The template package is a Go implementation
of json-template.
We can write a template like:
var tmpl = `{.repeated section Entry}
{Package.Name}
{.repeated section Match}
{File.Name}:{LineNumber}:
{Snippet|nohtml}
{.end}
{.end}`
and then print the data using that template:
t := template.MustParse(tmpl, template.FormatterMap{"nohtml": nohtml});
t.Execute(&f, os.Stdout);
The formatter map definitions arrange that the
{Snippet|nohtml} in the template run the
data through the nohtml function when printing.
Both variants of our program produce the same output:
http://www.tuhs.org Archive/PDP-11/Trees/V6/usr/sys/ken/slp.c:325: * You are not expected to understand this.
Full source code is at http://gopaste.org/view/private:7ybCj
I still have three more tricks up my sleeve.
First, in a pinch and if the HTML is
mostly well formatted or you're just lucky, the XML parser
can handle HTML too.
Second, when the XML parser walks into a struct field, if the field
is a pointer and is nil, it allocates a new value, as we
saw in the linked list example. But if the field is a non-nil
pointer already, the parser just follows the pointer.
The same goes for growing slices.
Third, if the XML parser cannot find a struct field for a particular
subelement, before giving up it looks for a field named Any
and uses that field instead.
Combined, these three rules mean that this structure:
type Form struct {
Input []Field;
Any *Form;
}
type Field struct {
Name string "attr";
Value string "attr";
}
can be used to save a tree of form and other
information from a web page. When you unmarshal into it,
you get a tree showing the structure of the web page,
with each element becoming a *Form
because the parser used the Any field.
But who wants to walk that tree just to
find the form fields?
Instead, we can set the Any pointer to
point back at the Form it is in.
Then as the XML parser thinks it is walking the data structure,
in fact it will be visiting the same Form
struct over and over.
Each time it finds an <input>, the parser
will append it to the Input slice.
// http://gopaste.org/view/private:b8Ya2 func main() { r, _, err := http.Get(`http://code.google.com/p/go/`); if err != nil { log.Exit(err); } var f Form; f.Any = &f; p := xml.NewParser(r.Body); p.Strict = false; p.Entity = xml.HTMLEntity; p.AutoClose = xml.HTMLAutoClose; if err := p.Unmarshal(&f, nil); err != nil { log.Exit("xml.Unmarshal: ", err); } for _, fld := range f.Input { fmt.Printf("%s=%q\n", fld.Name, fld.Value); } }
This program prints:
q="" projectsearch="Search projects"
There are a few more examples of this technique in the standard Go libraries. The net package's DNS client uses this technique to build generic packing and unpacking routines for DNS packets. The JSON package provides approximately the same functionality as the XML package, but with JSON as the wire format. In fact, JSON is arguably more suited to the task. (Remember: “the essence of XML is this: the problem it solves is not hard, and it does not solve the problem well.”) The ASN1 package also works this way, though its generality is limited: it implements just enough to parse SSL/TLS certificates. There are also other ways to use reflection, which will have to wait for future posts.
It's actually a little surprising how often this technique gets used in the Go libraries. Part of the reason is that it's a shiny new hammer, but it seems to be a very useful shiny new hammer, because so much of modern programming is talking to other programs over a variety of wire encodings.
Unmarshalling directly into data structures via reflection, with just a few necessary hints as annotations, can be implemented in other languages, but the examples I've seen are not nearly as elegant or as concise as the Go versions.
| The Daily Show With Jon Stewart | Mon - Thurs 11p / 10c | |||
| 30,000 | ||||
| <embed allowfullscreen="true" allownetworking="all" allowscriptaccess="always" bgcolor="#000000" flashvars="autoPlay=false" height="301" src="http://media.mtvnservices.com/mgid:cms:item:comedycentral.com:257663" style="display:block" type="application/x-shockwave-flash" width="360" wmode="window"></embed> | ||||
| ||||
by newsham (noreply@blogger.com) at December 03, 2009 06:02 PM
Sorry to say we did not save images of those files. I had the original tapes until 2000 when I gave them to the Berkeley library archives. When some folks asked about them a couple of years ago, I pointed them to the Berkeley library who (apparently) discarded them as junk.
by newsham (noreply@blogger.com) at December 03, 2009 05:30 PM
Generic data structures (vectors, queues, maps, trees, and so on) seem to be the hot topic if you are evaluating a new language. One of the most frequent questions we've had about Go is where the generics are. It seems like there are three basic approaches to generics:
The generic dilemma is this: do you want slow programmers, slow compilers and bloated binaries, or slow execution times?
I would be happy to learn about implementations that somehow manage to avoid all three of these bad outcomes, especially if there were good written descriptions (papers, blog posts, etc.) about what was hard and why it's a good approach. I'd also be interested to see good written descriptions of trying one of those approaches and what went wrong (or right).
Please leave comments with pointers. Thanks.
In the model we set forward it is not users who need to be better educated on the risks of various attacks [...] but the security community. Security advice simply offers a bad cost-benefit tradeoff to users.
by newsham (noreply@blogger.com) at December 01, 2009 07:55 PM
by newsham (noreply@blogger.com) at December 01, 2009 06:18 PM
Go's interfaces—static, checked at compile time, dynamic when asked for—are, for me, the most exciting part of Go from a language design point of view. If I could export one feature of Go into other languages, it would be interfaces.
This post is my take on the implementation of interface values in the
“gc” compilers: 6g, 8g, and 5g.
Over at Airs, Ian Lance Taylor has written
two
posts about
the implementation of interface values in gccgo.
The implementations are more alike than different:
the biggest difference is that this post has pictures.
Before looking at the implementation, let's get a sense of what it must support.
Go's interfaces let you use
duck typing like you would
in a purely dynamic language like Python but still have the
compiler catch obvious mistakes like passing an int where an object
with a Read method was expected, or
like calling the Read method with the wrong number of arguments.
To use interfaces, first define the interface type (say, ReadCloser):
type ReadCloser interface {
Read(b []byte) (n int, err os.Error);
Close();
}
and then define your new function as taking a ReadCloser.
For example, this function calls Read repeatedly to get all the
data that was requested and then calls Close:
func ReadAndClose(r ReadCloser, buf []byte) (n int, err os.Error) {
for len(buf) > 0 && err == nil {
var nr int;
nr, err = r.Read(buf);
n += nr;
buf = buf[nr:len(buf)];
}
r.Close();
return;
}
The code that calls ReadAndClose can pass
a value of any type as long as it has Read and Close methods
with the right signatures.
And, unlike in languages like Python, if you pass a value with the
wrong type, you get an error at compile time, not run time.
Interfaces aren't restricted to static checking, though. You can check dynamically whether a particular interface value has an additional method. For example:
type Stringer interface {
String() string;
}
func ToString(any interface{}) string {
if v, ok := any.(Stringer); ok {
return v.String();
}
switch v := any.(type) {
case int:
return strconv.Itoa(v);
case float:
return strconv.Ftoa(v, 'g', -1);
}
return "???";
}
The value any has static type
interface{}, meaning no guarantee of any methods at all:
it could contain any type.
The “comma ok” assignment inside the if
statement asks whether it is possible to convert any to
an interface value of type Stringer, which has the
method String. If so, the body of that statement
calls the method to obtain a string to return.
Otherwise, the switch picks off a few basic types before
giving up.
This is basically a stripped down version of what the
fmt package does.
(The if could be replaced by adding
case Stringer: at the top of the switch,
but I used a separate statement to draw attention to the check.)
As a simple example, let's consider a 64-bit integer type
with a String method that prints the value
in binary and a trivial Get method:
type Binary uint64
func (i Binary) String() string {
return strconv.Uitob64(i.Get(), 2)
}
func (i Binary) Get() uint64 {
return uint64(i)
}
A value of type Binary can be passed
to ToString, which will format it using the String
method, even though the program never says that
Binary intends to implement Stringer.
There's no need: the runtime can see that Binary
has a String method, so it implements Stringer,
even if the author of Binary has never heard of
Stringer.
These examples show that even though all the implicit conversions are checked at compile time, explicit interface-to-interface conversions can inquire about method sets at run time. “Effective Go” has more details about and examples of how interface values can be used.
Languages with methods typically fall into one of two camps: prepare tables for all the method calls statically (as in C++ and Java), or do a method lookup at each call (as in Smalltalk and its many imitators, JavaScript and Python included) and add fancy caching to make that call efficient. Go sits halfway between the two: it has method tables but computes them at run time. I don't know whether Go is the first language to use this technique, but it's certainly not a common one. (I'd be interested to hear about earlier examples; leave a comment below.)
As a warmup, a value of type Binary is just a 64-bit integer
made up of two 32-bit words (like in the last post, we'll assume a 32-bit machine; this time memory
grows down instead of to the right):
Interface values are represented as a two-word pair giving
a pointer to information about the type stored in the interface
and a pointer to the associated data.
Assigning b to an interface value of type Stringer
sets both words of the interface value.
(The pointers contained in the interface value are gray to emphasize that they are implicit, not directly exposed to Go programs.)
The first word in the interface value points at
what I call an interface table or itable (pronounced i-table; in the runtime sources,
the C implementation name is Itab).
The itable begins with some metadata about the types
involved and then becomes a list of function pointers.
Note that the itable corresponds to the interface type,
not the dynamic type. In terms of our example, the itable
for Stringer holding type Binary
lists the methods used to satisfy Stringer,
which is just String: Binary's
other methods (Get) make no appearance
in the itable.
The second word in the interface value points
at the actual data, in this case a copy of b.
The assignment var s Stringer = b makes
a copy of b rather than point at b
for the same reason that var c uint64 = b
makes a copy: if b later changes,
s and c are supposed
to have the original value, not the new one.
Values stored in interfaces might be arbitrarily large,
but only one word is dedicated to holding the value
in the interface structure, so the assignment allocates a
chunk of memory
on the heap and records the pointer in the one-word slot.
(There's an obvious optimization when the value does fit
in the slot; we'll get to that later.)
To check whether an interface value holds a
particular type, as in the type switch above,
the Go compiler generates code equivalent to the
C expression s.tab->type to obtain
the type pointer and check it against the desired type.
If the types match, the value can be copied by by
dereferencing s.data.
To call s.String(), the Go compiler generates
code that does the equivalent of the C expression
s.tab->fun[0](s.data): it calls the appropriate
function pointer from the itable, passing the interface value's
data word as the function's first (in this example, only) argument.
You can see this code if you run 8g -S x.go (details at the bottom of this post).
Note that the function in the itable is being passed the
32-bit pointer from the second word of the interface value, not the 64-bit
value it points at.
In general, the interface call site doesn't know the
meaning of this word nor how much data it points at.
Instead, the interface code arranges that the function
pointers in the itable expect the 32-bit representation
stored in the interface values.
Thus the function pointer in this example is (*Binary).String
not Binary.String.
The example we're considering is an interface with just one method. An interface with more methods would have more entries in the fun list at the bottom of the itable.
Now we know what the itables look like, but where
do they come from? Go's dynamic type conversions mean that
it isn't reasonable for the compiler or linker to precompute all
possible itables: there are too many (interface type, concrete type) pairs,
and most won't be needed.
Instead, the compiler generates a type description structure for each
concrete type like Binary or int or func(map[int]string).
Among other metadata, the type description structure contains a list
of the methods implemented by that type.
Similarly, the compiler generates a (different) type description structure
for each interface type like Stringer; it too contains a
method list.
The interface runtime computes the itable by looking for each method listed
in the interface type's method table in
the concrete type's method table.
The runtime caches the itable after generating it,
so that this correspondence need only be computed once.
In our simple example, the method table for
Stringer has one method, while the table for
Binary has two methods.
In general there might be ni methods
for the interface type and nt methods for the concrete type.
The obvious search to find the mapping from interface
methods to concrete methods would take O(ni × nt) time,
but we can do better.
By sorting the two method tables and walking them
simultaneously, we can build the mapping in O(ni + nt) time instead.
The space used by the implementation described above can be optimized in two complementary ways.
First, if the interface type involved is empty—it has no methods—then the itable serves no purpose except to hold the pointer to the original type. In this case, the itable can be dropped and the value can point at the type directly:
Whether an interface type has methods
is a static property—either the type in the source code says
interface{} or it says interace{ methods... }—so
the compiler knows which representation is in use at each point
in the program.
Second, if the value associated with the interface value
can fit in a single machine word, there's no need to introduce
the indirection or the heap allocation. If we define
Binary32 to be like Binary
but implemented as a uint32,
it could be stored in an interface value
by keeping the actual value in the second word:
Whether the actual value is being pointed at or
inlined depends on the size of the type. The compiler
arranges for the functions listed in the type's method table
(which get copied into the itables) to do the right thing with
the word that gets passed in. If the receiver type fits in a word,
it is used directly; if not, it is dereferenced.
The diagrams show this: in the Binary version
far above, the method in the itable is
(*Binary).String, while in the
Binary32 example, the method
in the itable is Binary32.String
not (*Binary32).String.
Of course, empty interfaces holding word-sized (or smaller) values can take advantage of both optimizations:
Smalltalk and the many dynamic systems that have followed it perform a method lookup every time a method gets called. For speed, many implementations use a simple one-entry cache at each call site, often in the instruction stream itself. In a multithreaded program, these caches must be managed carefully, since multiple threads could be at the same call site simultaneously. Even once the races have been avoided, the caches would end up being a source of memory contention.
Because Go has the hint of static typing to go along with the dynamic method lookups, it can move the lookups back from the call sites to the point when the value is stored in the interface. For example, consider this code snippet:
1 var any interface{}; // initialized elsewhere
2 s := any.(Stringer); // dynamic conversion
3 for i := 0; i < 100; i++ {
4 fmt.Println(s.String());
5 }
In Go, the itable gets computed (or found in a cache)
during the assignment on line 2; the dispatch for the
s.String() call executed on line 4 is a
couple of memory fetches and a single indirect call
instruction.
In contrast, the implementation of this program in a dynamic language like Smalltalk (or JavaScript, or Python, or ...) would do the method lookup at line 4, which in a loop repeats needless work. The cache mentioned earlier makes this less expensive than it might be, but it's still more expensive than a single indirect call instruction.
Of course, this being a blog post, I don't have any numbers to back up this discussion, but it certainly seems like the lack of memory contention would be a big win in a heavily parallel program, as is being able to move the method lookup out of tight loops. Also, I'm talking about the general architecture, not the specifics o the implementation: the latter probably has a few constant factor optimizations still available.
The interface runtime support is in
$GOROOT/src/pkg/runtime/iface.c.
There's much more to say about interfaces (we haven't
even seen an example of a pointer receiver yet) and the
type descriptors (they power reflection in addition
to the interface runtime) but those will have to wait for future posts.
Supporting code (x.go):
package main
import (
"fmt";
"strconv";
)
type Stringer interface {
String() string
}
type Binary uint64
func (i Binary) String() string {
return strconv.Uitob64(i.Get(), 2)
}
func (i Binary) Get() uint64 {
return uint64(i)
}
func main() {
b := Binary(200);
s := Stringer(b);
fmt.Println(s.String());
}
Selected output of 8g -S x.go:
0045 (x.go:25) LEAL s+-24(SP),BX 0046 (x.go:25) MOVL 4(BX),BP 0047 (x.go:25) MOVL BP,(SP) 0048 (x.go:25) MOVL (BX),BX 0049 (x.go:25) MOVL 20(BX),BX 0050 (x.go:25) CALL ,BX
The LEAL loads the address of s
into the register BX.
(The notation n(SP) describes the
word in memory at SP+n. 0(SP)
can be shortened to (SP).)
The next two MOVL instructions fetch the
value from the second word in the interface and store
it as the first function call argument, 0(SP).
The final two MOVL instructions fetch
the itable and then the function pointer from the itable,
in preparation for calling that function.
When explaining Go to new programmers, I've found that it often helps to explain what Go values look like in memory, to build the right intuition about which operations are expensive and which are not. This post is about basic types, structs, arrays, and slices.
Let's start with some simple examples:

The variable i has type int, represented in memory as a single 32-bit word. (All these pictures show a 32-bit memory layout; in the current implementations, only the pointer gets bigger on a 64-bit machine—int is still 32 bits—though an implementation could choose to use 64 bits instead.)
The variable j has type int32, because of the explicit conversion. Even though i and j have the same memory layout, they have different types: the assignment i = j is a type error and must be written with an explicit conversion: i = int(j).
The variable f has type float, which the current implementations represent as a 32-bit floating-point value. It has the same memory footprint as the int32 but a different internal layout.
Now things start to pick up. The variable bytes has type [5]byte, an array of 5 bytes. Its memory representation is just those 5 bytes, one after the other, like a C array. Similarly, primes is an array of 4 ints.
Go, like C but unlike Java, gives the programmer control over what is and is not a pointer. For example, this type definition:
type Point struct { X, Y int }
defines a simple struct type named Point, represented as two adjacent ints in memory.

The composite literal syntax Point{10, 20} denotes an initialized Point. Taking the address of a composite literal denotes a pointer to a freshly allocated and initialized Point. The former is two words in memory; the latter is a pointer to two words in memory.
Fields in a struct are laid out side by side in memory.
type Rect1 struct { Min, Max Point }
type Rect2 struct { Min, Max *Point }
Rect1, a struct with two Point fields, is represented by two Points—four ints—in a row. Rect2, a struct with two *Point fields, is represented by two *Points.
Programmers who have used C probably won't be surprised by the distinction between Point fields and *Point fields, while programmers who have only used Java or Python (or ...) may be surprised by having to make the decision. By giving the programmer control over basic memory layout, Go provides the ability to control the total size of a given collection of data structures, the number of allocations, and the memory access patterns, all of which are important for building systems that perform well.
With those preliminaries, we can move on to more interesting data types.

(The gray arrows denote pointers that are present in the implementation but not directly visible in programs.)
A string is represented in memory as a 2-word structure containing a pointer to the string data and a length. Because the string is immutable, it is safe for multiple strings to share the same storage, so slicing s results in a new 2-word structure with a potentially different pointer and length that still refers to the same byte sequence. This means that slicing can be done without allocation or copying, making string slices as efficient as passing around explicit indexes.
(As an aside, there is a well-known gotcha in Java and other languages that when you slice a string to save a small piece, the reference to the original keeps the entire original string in memory even though only a small amount is still needed. Go has this gotcha too. The alternative, which we tried and rejected, is to make string slicing so expensive—an allocation and a copy—that most programs avoid it.)

A slice is a reference to a section of an array. In memory, it is a 3-word structure contaning a pointer to the first element, the length of the slice, and the capacity. The length is the upper bound for indexing operations like x[i] while the capacity is the upper bound for slice operations like x[i:j].
Like slicing a string, slicing an array does not make a copy: it only creates a new structure holding a different pointer, length, and capacity. In the example, evaluating the composite literal []int{2, 3, 5, 7, 11} creates a new array containing the five values and then sets the fields of the slice x to describe that array. The slice expression x[1:3] does not allocate more data: it just writes the fields of a new slice structure to refer to the same backing store. In the example, the length is 2—y[0] and y[1] are the only valid indexes—but the capacity is 4—y[0:4] is a valid slice expression. (See Effective Go for more about length and capacity and how slices are used.)
Because slices are multiword structures, not pointers, the slicing operation does not need to allocate memory, not even for the slice header, which can usually be kept on the stack. This representation makes slices about as cheap to use as passing around explicit pointer and length pairs in C. Go originally represented a slice as a pointer to the structure shown above, but doing so meant that every slice operation allocated a new memory object. Even with a fast allocator, that creates a lot of unnecessary work for the garbage collector, and we found that, as was the case with strings above, programs avoided slicing operations in favor of passing explicit indices. Removing the indirection and the allocation made slices cheap enough to avoid passing explicit indices in most cases.
Go has two data structure creation functions: new and make. The distinction is a common early point of confusion but seems to quickly become natural. The basic distinction is that new(T) returns a *T, a pointer that Go programs can dereference implicitly (the black pointers in the diagrams), while make(T, args) returns an ordinary T, not a pointer. Often that T has inside it some implicit pointers (the gray pointers in the diagrams). New returns a pointer to zeroed memory, while make returns a complex structure.

There is a way to unify these two, but it would be a significant break from the C and C++ tradition: define make(*T) to return a pointer to a newly allocated T, so that the current new(Point) would be written make(*Point). We tried this for a few days but decided it was too different from what people expected of an allocation function.
This has already gotten a bit long. Interface values, maps, and channels will have to wait for future posts.
I have spent the last months discovering Forth: learning about it, reading about it, trying 4th, porting pForth to Plan9,... It is a language that I love, since I spent many hours programming my HP49G calculator and I always was a fan of the RPN syntax and simplicity in general.
My last discovery has been retroForth, a minimalistic Forth which runs on a portable virtual machine, Ngaro. When some days ago Google the Go-team presented their new language I was watching Rob's talk and could not stop thinking about launching Ngaro virtual machines in goroutines communicated through channels. So... that's what I did:
With the help of crcx (the original Ngaro author, many thanks!) and after experimenting with a brainfuck virtual machine the porting process was pretty straightforward. The retroForth image is booting without problems and you can already do some fancy things like launching children VMs and communicate their I/O ports through channels. It is a WIP yet. More to come...
Have fun!
I stopped updating my blog eighteen months ago, because I needed to focus on finishing my thesis. Once I did that, I graduated and moved to California and started working on and with Go, which I couldn't write about. Now that Go has been released, I have a variety of things I'd like to say involving Go so I'm reviving the blog. Enjoy.
by newsham (noreply@blogger.com) at November 23, 2009 04:30 AM
by newsham (noreply@blogger.com) at November 22, 2009 04:37 PM
The ninth incarnation of the sp theme for Habari blogs is available with a git(1) clone or pull from git://labs.utopian.net/sp.git. It works atop the latest developer revisions of Habari. An irregularly-attended gzipped tar archive exists.
v9 scrapes away the "Gesso" CSS framework in favor of using the latest (r18) Boilerplate sources directly. Print styles developed in the fork survive in print.css. Larger fonts in the sidebar are the most noticeable of visible changes that also include minute differences in vertical spacing.
by newsham (noreply@blogger.com) at November 22, 2009 04:01 AM
by newsham (noreply@blogger.com) at November 20, 2009 09:09 PM
by newsham (noreply@blogger.com) at November 18, 2009 06:22 PM
by newsham (noreply@blogger.com) at November 18, 2009 05:54 PM
lab 104 - ducts: bi-directional mount channels
Hi there, I want to talk to you about ducts. Are your ducts old fashion, out of date?
In this lab, I'll walk through a little facility I built to allow me to have bi-directional mounts over a single file descriptor channel. The nature of the way I access my Blue Gene environment requires me to do an awful lot over a single communications channel. I wanted the ability to both access the target node's file system while simultaneously exporting mine, while only using a single file descriptor.
To facilitate this, I wrote a little 9P-knowledgeable multiplexor which directs T-msgs to an export thread and R-msgs to a mount thread while multiplexing the responses from those threads back onto the same file descriptor.
The initial implementation of the base mechanism was relatively simple until I had to start considering how to clean things up when I started unmounting directories on one side or both. The liberal use of sys->pipe throughout the code meant it was somewhat difficult to detect when things were finished. I ended up adding a signaling packet to the protocol to notify the remote Duct when I had closed my mount, and then implemented a channel based monitor infrastructure so I could track what was still running and appropriately shut down the various pieces as things unmounted (it is possible to unmount one side without unmounting the other).
The interface is fairly simple, if using the module interface you spawn an uplink, specifying an input file descriptor and an output file descriptor for the connection (which may be the same FD), the path you wish to export and the mount point for where you want to place the imported file hierarchy.
If you are running it from the command line, it uses stdin and stdout for the input and output FDs and you only need specify the export path and mount point.
host1% listen -A tcp!*!1127 duct /usr/ericvh /n/foo
host2% dial -A tcp!host1!1127 duct /usr/inferno /n/bar
by Eric Van Hensbergen (noreply@blogger.com) at November 17, 2009 02:37 AM
lab 101 - limbo B+ tree
This lab is an implementation of a B+ tree. It is code from a much older lab that was overly complicated and buggy. I pulled the B+ tree code out, tried to fix the bugs and clean it up. The interface is roughly the same as dbm(2). Here is a synopsis.
include "btree.m";
btreem := load Btreem Btreem->PATH;
Datum, Btree: import Btreem;
Btree: adt {
create: fn(file: string, perm: int): ref Btree;
open: fn(file: string, flags: int): ref Btree;
fetch: fn(b: self ref Btree, key: Datum): Datum;
delete: fn(b: self ref Btree, key: Datum): int;
store: fn(b: self ref Btree, key: Datum, val: Datum):int;
firstkey: fn(b: self ref Btree): Datum;
nextkey: fn(b: self ref Btree, key: Datum): Datum;
flush: fn(b: self ref Btree);
close: fn(b: self ref Btree);
};
init: fn();
Like Dbm the keys and values are stored as arrays of bytes, and being a B+tree the values are stored only in the leaf nodes. The maximum key and val size is 255 each. The block size is 8192, so the maximum leaf node size is 515 making the minimum branching factor 15. The maximum number of records in an internal nodes assuming an int as key is 630.
The delete is not fully implemented; it doesn't merge nodes.
Here is some example code listing the full contents of a btree.
sys := load Sys Sys->PATH;
btreem := load Btreem Btreem->PATH;
Datum, Btree: import btreem;
btreem->init();
bt := Btree.open("index.bt", Sys->ORDWR);
for(key := bt.firstkey(); key != nil; key = bt.nextkey(key)){
v := bt.fetch(key);
sys->print("%s %s\n", string key, string v);
}
Here is a rough comparison of btree vs. dbm. This simple test is more of a sanity check that the btree doesn't do anything horribly wrong in its implementation (which it did originally, by making a syscall to get the daytime for every block it tried to get).
% awk '{print $1, NR}' < /lib/words | tr -d 'cr' > t1
% >index.bt
% >dbm.pag
% >dbm.dir
% time sh -c 'btree/store < t1'
0l 4.172r 4.172t
% time sh -c 'dbm/store dbm < t1'
0l 35.25r 35.25t
% ls -l dbm.* index.bt
--rw-rw---- M 6 xcs0998 XCS0998 8192 Jun 04 12:38 dbm.dir
--rw-rw---- M 6 xcs0998 XCS0998 1048576 Jun 04 12:38 dbm.pag
--rw-rw---- M 6 xcs0998 XCS0998 770048 Jun 04 12:36 index.bt
% time sh -c 'btree/list > /dev/null'
0l 5.187r 5.187t
% time sh -c 'dbm/list dbm > /dev/null'
0l 9.438r 9.438t
by caerwyn (noreply@blogger.com) at November 17, 2009 02:35 AM
by newsham (noreply@blogger.com) at November 15, 2009 07:29 PM
by newsham (noreply@blogger.com) at November 15, 2009 05:58 PM