Note: This is an experimental planet, if you have a feed you would like to add or you would like your feed to be removed please contact me.
Planet 9
March 10, 2010
March 09, 2010
newsham
Lifelock a scam (and insecure, to boot)
Apple's iron grip on developers
March 08, 2010
research!rsc
Formal Logic Club
I've been laughing at this for weeks. It's too good not to share.
The first rule of Formal Logic Club is you do not prove the consistency of Formal Logic Club.
The second rule of Formal Logic Club is you do not prove the inconsistency of Formal Logic Club.
(Yes, this looks like http://xkcd.com/703/, but it beat xkcd to the punchline by ten years, and it's funnier.)
March 06, 2010
March 05, 2010
catena
nb—search and index notes in files by keyword
[Download a UTF-8 version of this file.]
nb name search index keyword
nb—search and index notes in files by keyword
nb keyword
nb description search keyword path file line acme
Nb searches for the given keyword in each nbindex file listed in
$HOME/nbindexes. If it finds a match, nb prints the path, filename,
and line number of the indexed file in the format acme uses to refer to
lines in a file.
nb file format store index line path file line acme nbindexes
This file is an example. Nb searches all files in the current directory
for lines which begin “^nb ”, and copies them into the file nbindex.
Each match is listed with its filename, and line number, e.g. for acme
to show the line on a right-click of the mouse. It then adds the
path and filename of the current directory’s nbindex file to the file
$HOME/nbindexes.
By convention I use 1 blank line to separate paragraphs after an nb
line, and 2 blank lines to separate nb lines, but nb doesn’t care about
this.
nb nbindex example search keyword
This is the nbindex file for this file.
nb.1:1 name search index keyword
nb.1:5 keyword
nb.1:8 description search keyword path file line acme
nb.1:15 file format store index line path file line acme nbindexes
nb.1:28 nbindex example search keyword
nb.1:50 source plan9port rc script
nb.1:70 port plan9port rc shell script grëp
nb.1:75 author jason.catena@gmail.com
nb.1:78 bugs
This is the output of the command “nb keyword”. It lists the full
path since it may refer to files anywhere in the filesystem.
/usr/local/plan9/bin/nb.1:1 name search index keyword
/usr/local/plan9/bin/nb.1:5 keyword
/usr/local/plan9/bin/nb.1:8 description search keyword path file line acme
/usr/local/plan9/bin/nb.1:28 nbindex example search keyword
nb source plan9port rc script
#!/usr/local/plan9/bin/rc
flag e +
catalog=$HOME^'/nbindexes'
if(test -r $catalog && ! ~ $1 ''){
list=`{cat $catalog}
grëp $* $list | sed 's,nbindex:[0-9]+: ,,'
}
grep -n '^nb ' * >[2]/dev/null | sed 's,: ?nb +, ,' > nbindex
echo `{pwd}^'/nbindex' >> $catalog
if(~ $TMPDIR ''){
TMPDIR=/var/tmp
}
tmp=$TMPDIR^'/nbindexes.'^$USER^'.'^$pid
sort -u $catalog > $tmp
mv $tmp $catalog
nb port plan9port rc shell script grëp
Nb in written in plan9port’s rc, but should be straightforward to port
to any shell. Use grep instead of grëp.
nb author jason.catena@gmail.com
nb bugs
Nb generates the index after it searches, to present results immediately,
so it does not show changes since its last run. If you want to see
the latest changes, then run nb with no parameters before you supply a
keyword (e.g. in acme, middle-click on nb after you edit an nb line or
following text).

research!rsc
UTF-8: Bits, Bytes, and Benefits
UTF-8 is a way to encode Unicode code points—integer values from 0 through 10FFFF—into a byte stream, and it is far simpler than many people realize. The easiest way to make it confusing or complicated is to treat it as a black box, never looking inside. So let's start by looking inside. Here it is:
| Unicode code points | UTF-8 encoding (binary) | ||
|---|---|---|---|
| 00-7F | (7 bits) | 0tuvwxyz | |
| 0080-07FF | (11 bits) | 110pqrst 10uvwxyz | |
| 0800-FFFF | (16 bits) | 1110jklm 10npqrst 10uvwxyz | |
| 010000-10FFFF | (21 bits) | 11110efg 10hijklm 10npqrst 10uvwxyz | |
The convenient properties of UTF-8 are all consequences of the choice of encoding.
- All ASCII files are already UTF-8 files.
The first 128 Unicode code points are the 7-bit ASCII character set, and UTF-8 preserves their one-byte encoding. - ASCII bytes always represent themselves in UTF-8 files. They never appear as part of other UTF-8 sequences.
All the non-ASCII UTF-8 sequences consist of bytes with the high bit set, so if you see the byte 0x7A in a UTF-8 file, you can be sure it represents the characterz. - ASCII bytes are always represented as themselves in UTF-8 files. They cannot be hidden inside multibyte UTF-8 sequences.
The ASCIIz01111010 cannot be encoded as a two-byte UTF-8 sequence 11000001 10111010. Code points must be encoded using the shortest possible sequence. A corollary is that decoders must detect long-winded sequences as invalid. In practice, it is useful for a decoder to use the Unicode replacement character, code point FFFD, as the decoding of an invalid UTF-8 sequence rather than stop processing the text. - UTF-8 is self-synchronizing.
Let's call a byte of the form 10xxxxxx a continuation byte. Every UTF-8 sequence is a byte that is not a continuation byte followed by zero or more continuation bytes. If you start processing a UTF-8 file at an arbitrary point, you might not be at the beginning of a UTF-8 encoding, but you can easily find one: skip over continuation bytes until you find a non-continuation byte. (The same applies to scanning backward.) - Substring search is just byte string search.
Properties 2, 3, and 4 imply that given a string of correctly encoded UTF-8, the only way those bytes can appear in a larger UTF-8 text is when they represent the same code points. So you can use any 8-bit safe byte at a time search function, likestrchrorstrstr, to run the search. - Most programs that handle 8-bit files safely can handle UTF-8 safely.
This also follows from Properties 2, 3, and 4. I say “most” programs, because programs that take apart a byte sequence expecting one character per byte will not behave correctly, but very few programs do that. It is far more common to split input at newline characters, or split whitespace-separated fields, or do other similar parsing around specific ASCII characters. For example, Unix tools like cat, cmp, cp, diff, echo, head, tail, and tee can process UTF-8 files as if they were plain ASCII files. Most operating system kernels should also be able to handle UTF-8 file names without any special arrangement, since the only operations done on file names are comparisons and splitting at/. In contrast, tools like grep, sed, and wc, which inspect arbitrary individual characters, do need modification. - UTF-8 sequences sort in code point order.
You can verify this by inspecting the encodings in the table above. This means that Unix tools like join, ls, and sort (without options) don't need to handle UTF-8 specially. - UTF-8 has no “byte order.”
UTF-8 is a byte encoding. It is not little endian or big endian. Unicode defines a byte order mark (BOM) code point FFFE, which are used to determine the byte order of a stream of raw 16-bit values, like UCS-2 or UTF-16. It has no place in a UTF-8 file. Some programs like to write a UTF-8-encoded BOM at the beginning of UTF-8 files, but this is unnecessary (and annoying to programs that don't expect it).
UTF-8 does give up the ability to do random access using code point indices. Programs that need to jump to the nth Unicode code point in a file or on a line—text editors are the canonical example—will typically convert incoming UTF-8 to an internal representation like an array of code points and then convert back to UTF-8 for output, but most programs are simpler when written to manipulate UTF-8 directly.
Programs that make UTF-8 more complicated than it needs to be are typically trying to be too general, not wanting to make assumptions that might not be true of other encodings. But there are good tools to convert other encodings to UTF-8, and it is slowly becoming the standard encoding: even the fraction of web pages written in UTF-8 is nearing 50%. UTF-8 was explicitly designed to have these nice properties. Take advantage of them.
For more on UTF-8, see “Hello World or Καλημέρα κόσμε or こんにちは 世界,” by Rob Pike and Ken Thompson, and also this history.
Notes: Property 6 assumes the tools do not strip the high bit from each byte.
Such mangling was common years ago but is very uncommon now.
Property 7 assumes the comparison is done treating
the bytes as unsigned, but such behavior is mandated
by the ANSI C standard for memcmp,
strcmp, and strncmp.
March 04, 2010
newsham
Youtube education
<object height="265" width="320"><param name="movie" value="http://www.youtube.com/v/Kj9-kRv0e6s&hl=en_US&fs=1&"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="265" src="http://www.youtube.com/v/Kj9-kRv0e6s&hl=en_US&fs=1&" type="application/x-shockwave-flash" width="320"></embed></object>
<object height="295" width="480"><param name="movie" value="http://www.youtube.com/v/7SFywA_LQuU&hl=en_US&fs=1&"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="295" src="http://www.youtube.com/v/7SFywA_LQuU&hl=en_US&fs=1&" type="application/x-shockwave-flash" width="480"></embed></object>
OS Inferno
[софт] mySQL-клиент
March 02, 2010
March 01, 2010
February 27, 2010
newsham
Flash Sucks
by newsham (noreply@blogger.com) at February 27, 2010 08:43 AM
February 26, 2010
February 25, 2010
newsham
Model SU-27 jet
<object height="265" width="320"><param name="movie" value="http://www.youtube.com/v/6IKnZ_tkecU&hl=en_US&fs=1&"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="265" src="http://www.youtube.com/v/6IKnZ_tkecU&hl=en_US&fs=1&" type="application/x-shockwave-flash" width="320"></embed></object>
But as impressive as the video is, it doesn't come close to the build log he has on his website. It's just incredibly amazing. It's 83 pages long and every time you think he's getting close to being done, or can't possibly make it more detailed, he does. The process includes him making a metal model of the skin (complete with fake rivet holes), making carbon graphite forms, building a frame, putting the skin on the frame, making plastic models for the various weapons and antennas and cockpit parts, putting together landing gear systems and installing jet engines and fuel tanks. Just completely amazing!
by newsham (noreply@blogger.com) at February 25, 2010 08:08 PM
February 24, 2010
newsham
Hacker v. Bankers, Round 1
The public is eating it up, despite the legality of the hacks.Data leaked so far includes pay details of managers from a Latvian bank that received a bail-out. It reveals that many did not take the salary cuts they promised. Other data shows that state-owned companies secretly awarded bonuses while publicly asking the government for help.
by newsham (noreply@blogger.com) at February 24, 2010 08:40 PM
February 23, 2010
newsham
Chip and Pin
<object height="265" width="320"><param name="movie" value="http://www.youtube.com/v/B80SyRmtbdI&hl=en_US&fs=1&"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="265" src="http://www.youtube.com/v/B80SyRmtbdI&hl=en_US&fs=1&" type="application/x-shockwave-flash" width="320"></embed></object>
by newsham (noreply@blogger.com) at February 23, 2010 08:07 PM
Security and the Corporation
We are laser-focused on increasing sales while cutting costs. You seem to be focused on other things, like security. You're not laser-focused on what matters to us, and that's OK.and
One final quote before I wrap: "[T]op information security specialists have been saying for years that our current infrastructure is at grave risk." For the sake of argument, I'll accept that the folks saying that really are top experts. And you're right. They've been crying wolf for years. No one's used a cross-site scripting attack to take down the world financial system. No one's blown themselves up in Twitter's headquarters. That failure to throw money at security didn't lead to I-35 collapsing. Millions of Americans lack health care. Your unemployment money is running out. So I'm just going to assume that you, and they, are right. Our information infrastructure is one of the many things we could invest in.Computer security is but one of many things that a corporation thinks about, and honestly, it's not often very high on their list. Our collective inability to "fix" the computer security problem is not a technical problem (although there are definitely technical aspects to it). It is a social challenge. It's a challenge of allocating resources between competing investments... ie. an economics problem. We can't really discuss the problem and its solutions properly until we accept that.
If we, as a society, really think that the computer security of our national infrastructure is really that important (do we?) then we can provide economic incentives to make sure that corporations treat it with the same importance that we think it deserves. For example, we could allow consumers to externalize their costs due to a computer security breach by making the corporations liable for damages.
Heartland is still here. TJX had their best quarters ever after they got hacked.We could make software vendors liable for defects in their software that left corporations vulnerable to attack. You might not agree with these particular examples, but I hope you can at least see that the solution must be economic in nature. After all, computer science is really a social science.
by newsham (noreply@blogger.com) at February 23, 2010 06:24 PM
research!rsc
Off to the Races
Go is defined to be a safe language. Indices into array or string references must be in bounds; there is no way to reinterpret the bits of one type as another, no way to conjure a pointer out of thin air; and there is no way to release memory, so no chance of “dangling pointer” errors and the associated memory corruption and instability.
In the current Go implementations, though, there are two ways to break through these safety mechanisms. The first and more direct way is to use package unsafe, specifically unsafe.Pointer. The second, less direct way is to use a data race in a multithreaded program.
If you were going to build an environment that ran untrusted Go code, you'd probably want to change the available packages to restrict or delete certain routines, like os.RemoveAll, and you'd want to disallow access to package unsafe. Those kinds of restrictions are straightforward.
The data races that can be used to break through the usual memory safety of Go are less straightforward. This post describes the races and how to rearrange the data structures involved to avoid them. Until the Go implementations have been tuned more, we won't be able to measure whether there is a significant performance difference between the current representation and the race-free implementation.
Package Unsafe
Here's a simple packaging of a type that lets you edit arbitrary memory locations, built using the standard package unsafe:
import "unsafe"
type Mem struct {
addr *uintptr // actually == &m.data!
data *uintptr
}
// Peek reads and returns the word at address addr.
func (m *Mem) Peek(addr uintptr) uintptr {
*m.addr = addr
return *m.data
}
// Poke sets the word at address addr to val.
func (m *Mem) Poke(addr, val uintptr) {
*m.addr = addr
*m.data = val
}
func NewMem() *Mem {
m := new(Mem)
m.addr = (*uintptr)(unsafe.Pointer(&m.data))
return m
}
(The Go type uintptr is an unsigned integer
the size of a pointer, like uint32 or uint64
depending on the underlying machine architecture.)
The key line is near the bottom, the use of
the special type unsafe.Pointer to convert a
**uintptr into a *uintptr.
Dereferencing that value gives us m.data (actually a *uintptr)
interpreted as a uintptr.
We can assign an arbitrary integer to *m.addr, that changes m.data,
and then we can dereference the integer as *m.data.
In other words, the Mem struct
gives us a way to convert between integers and pointers, just like in C.
There are no races here: this is just something you
can do by importing unsafe.
The Mem wrapper is a bit convoluted—normally you'd just use unsafe directly—but
we're going to drop in a different implementation of NewMem
that doesn't rely on unsafe.
A Race
The current Go representation of slices and interface values admits a data race: because they are multiword values, if one goroutine reads the value while another goroutine writes it, the reader might see half of the old value and half of the new value.
Let's provoke the race using interface values. In Go, an interface value is represented as two words, a type and a value of that type. After these declarations:
var x *uintptr
var i interface{} = &x
var j interface{} = (*uintptr)(nil)
The data structures for i and j look like:
Suppose we kick off a goroutine that alternately assigns i and j
to a new interface value k:
var k interface{}
func hammer() {
for {
k = i
k = j
}
}
After each statement executes, k will look like either i or j, but during the assignment, there will be a moment when k is half i and half j, one of these:
The top case gives us a **uintptr nil, which
we could obtain more easily via legitimate means,
but the bottom case gives us the value &x (actually
a **uintptr) interpreted as a *uintptr.
If we can catch the interface when it looks like the case on the right,
we'll have rederived the conversion we used above via unsafe.
Based on that insight, we can rewrite NewMem without unsafe:
func NewMem() *Mem {
fmt.Println("here we go!")
m := new(Mem)
var i, j, k interface{}
i = &m.data
j = (*uintptr)(nil)
// Try over and over again until we win the race.
done := false
go func(){
for !done {
k = i
k = j
}
}()
for {
// Is k a non-nil *uintptr? If so, we got it.
if p, ok := k.(*uintptr); ok && p != nil {
m.addr = p
done = true
break
}
}
return m
}
The same kind of race happens in all of Go's mutable multiword structures: slices, interfaces, and strings. In the case of slices, the trick is to get a pointer from one slice and a cap from a different one. In the case of strings, the trick is to get a pointer from one string and the len from a different one. (The string race isn't as interesting, because strings cannot be written to, so it would only let you read memory, not write it.)
The Fix
The race is fundamentally caused by being able to observe partial updates to Go's multiword values (slices, interfaces, and strings): the updates are not atomic.
The fix is to make the updates atomic. In Go, the easiest way to do that is to make the representation a single pointer that points at an immutable structure. When the value needs to be updated, you allocate a new structure, fill it in completely, and only then change the pointer to point at it. This makes the assignment atomic: another goroutine reading the pointer at the same time sees either the new data or the old data, but not a mix, assuming the compiler is careful to read the pointer just once and then access both fields using the same pointer value.
(The red border indicates immutable data.)
For slices and strings, it makes sense to keep the multiword representation but put an immutable ”pointer and cap” stub structure between the slice and the underlying array. This keeps the same basic efficiency properties of slices at the cost of a few extra instructions on each indexing operation.
The idea here is to keep a structure with a mutable offset and length to support efficient slicing but replace the pointer with an immutable base+length pair. Any access to the underlying data must check the final offset against the immutable cap. Copying slice values is still not an atomic operation, but an invalid len will not keep an out-of-bounds index from being caught.
This representation requires a couple more assembly instructions, because each index must be checked against two bounds, first the relative len and then the absolute cap:
Compute x[i] in AX | ||||
|---|---|---|---|---|
| Racy | Race-free | |||
LEAL x, SI MOVL i, CX CMPL CX, 4(SI) JGE panic MOVL 0(SI), DI MOVL (4*CX)(DI), AX | LEAL x, BX MOVL i, CX CMPL CX, 4(BX) JGE panic ADDL 8(BX), CX MOVL 0(BX), SI CMPL CX, 4(SI) JGE panic MOVL 0(SI), DI MOVL (4*CX)(DI), AX | // i >= len? // i+off >= cap? // &x[0] -> SI // x[i] (or x[i+off]) -> DI | ||
With suitable analysis, an optimizing compiler
could cache 0(BX), 4(BX), 8(BX),
and 4(SI), so in a loop, it is possible that the
new representation would run at the same speed
as the original.
An ambitious implementation might continue to use the current data structures for slices, interfaces, and strings stored on the stack, because data on the stack can only be accessed by the goroutine running on that stack. (Local variables whose addresses might escape to other goroutines are already allocated on the heap automatically, to avoid dangling pointer bugs after a function returns.)
Garbage Collection
This fix is feasible only because Go is a garbage-collected language: we can treat the red stub structures as immutable and trust that the garbage collector will recycle the memory only when nothing points to them anymore. It's much harder to build a safe language without a garbage collector to fall back on.
Security Implications
It is important to note that these races do not make the current implementations any less secure than they already are. The races allow clever programmers to subvert Go's memory safety, but a less clever programmer can still use the aptly-named package unsafe.
These races only matter if you are trying to build a Go service that can safely run arbitrary code supplied by untrusted programmers (and to the best of my knowledge, there are no such services yet). In that situation, you'd already need to change the implementations to disable access to the unsafe package and remove or restrict functions like os.Remove or net.Dial. Changing the data representations to be race free is just one more change you'd have to make. Now you know, not just that a change is needed but also what the change is.
The races exist because the data representations were chosen for performance: the race-free versions introduce an extra pointer, which carries with it the cost of extra indirection and extra allocation. Once the Go implementations are more mature, we'll be able to evaluate the precise performance impact of using the race-free data structures and whether to use them always or only in situations running untrusted code.
February 22, 2010
newsham
GOOG hacker
Edit: The link above seems to be a pay link. I got to it for free by clicking from the ft.com home page, but I think you have a limited number of free articles on that site. You can check out the story from other news sites such as
http://www.lbc.co.uk/us-tracks-down-chinese-google-hacker-20658
by newsham (noreply@blogger.com) at February 22, 2010 11:03 PM
Fuel cells for your home?
<embed allowfullscreen="true" flashvars="linkUrl=http://www.cbsnews.com/video/watch/?id=6228923n&releaseURL=http://cnettv.cnet.com/av/video/cbsnews/atlantis2/player-dest.swf&videoId=50083943&partner=news&vert=News&si=254&autoPlayVid=false&name=cbsPlayer&allowScriptAccess=always&wmode=transparent&embedded=y&scale=noscale&rv=n&salign=tl" height="324" pluginspage="http://www.macromedia.com/go/getflashplayer" src="http://cnettv.cnet.com/av/video/cbsnews/atlantis2/player-dest.swf" type="application/x-shockwave-flash" width="425"></embed>
Watch CBS News Videos Online
by newsham (noreply@blogger.com) at February 22, 2010 06:50 PM
February 21, 2010
February 20, 2010
February 19, 2010
newsham
More on the GOOG hack
by newsham (noreply@blogger.com) at February 19, 2010 08:25 PM
February 18, 2010
newsham
PocketCube

by newsham (noreply@blogger.com) at February 18, 2010 06:53 PM
February 17, 2010
February 16, 2010
February 15, 2010
newsham
Tracer Tee
<object height="265" width="320"><param name="movie" value="http://www.youtube.com/v/SXmv8quf_xM&hl=en_US&fs=1&"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="265" src="http://www.youtube.com/v/SXmv8quf_xM&hl=en_US&fs=1&" type="application/x-shockwave-flash" width="320"></embed></object>
You can see a more realistic depiction of the attack here: http://www.youtube.com/watch?v=oDtvuCzDZgk
by newsham (noreply@blogger.com) at February 15, 2010 07:31 AM
February 14, 2010
February 13, 2010
February 12, 2010
February 11, 2010
February 10, 2010
catena
Update Go scanner to accept non-ASCII operators
Read the discussion of this change on the golang-nuts mailing list. See this notice in UTF-8 format.
For each line, the scanner accepts, in place of the first operator, any of the remaining operators, and outputs a token whose string (set in $GOROOT/src/go/token/token.go) is the first operator.
! ¬
!= ≠
& ∧
&& ⋀
&= ∧=
&^ ∧¬
&^= ∧¬=
… …
:= ≔
<- ←
<< ≪
<<= ≪=
<= ≤
== =? ≟
>= ≥
>> ≫
>>= ≫=
^ ⊻
^= ⊻=
| ∨
|= ∨=
|| ⋁
To install, copy $GOROOT/src/pkg/go/scanner/scanner.go to another file.
Replace scanner.go with scanner.go
Run $GOROOT/src/all.bash and check for 0 unexpected errors.
Changes to scanner.go update gofmt, which accepts UTF-8 operators and outputs their ASCII equivalents. This mkfile production rule uses gofmt as a preprocessor to create a sharable and compilable file.
%.go: %.ℊℴ
cat $stem.ℊℴ | gofmt > $stem.go
See these files for an example of each new operator form in the context of a simple Go program.
Please mail me privately if something doesn’t work with this code, to avoid noise on the golang-nuts list, since we’re no longer discussing officially-released code.

February 09, 2010
February 07, 2010
February 06, 2010
February 05, 2010
Quod Erat Dragon
More reading material
http://www.cs.rutgers.edu/~ccshan/prepose/prepose.pdf -- Kiselyov, Oleg and Shan, Chung-chieh. Functional Pearl: Implicit Configurations —or, Type Classes Reflect the Values of Types.
http://www.cs.rutgers.edu/~ccshan/tagless/jfp.pdf -- Carette, Jacques and Kiselyov, Oleg and Shan, Chung-chieh. Finally Tagless, Partially Evaluated, Tagless Staged Interpreters for Simpler Typed Languages.
PL Theory:
http://www.cis.upenn.edu/~bcpierce/papers/lti-toplas.pdf -- Pierce, Benjamin C. and Turner, David N. Local Type Inference.
http://www.cis.upenn.edu/~bcpierce/FOOL/papers9/shields.pdf -- Shields, Mark and Jones, Simon Peyton. First-Class Modules for Haskell
Formally verified compilers:
http://www.cl.cam.ac.uk/~mom22/jit/jit.pdf -- Myreen, Magnus O. Verified Just-In-Time Compiler on x86.
Logic programming:
http://www.cs.sunysb.edu/~tswift/webpapers/tplp-submit.pdf -- Swift, Terrance and Warren, David S. XSB: Extending Prolog with Tabled Logic Programming
February 04, 2010
newsham
McCain flip-flop
The day that the leadership of the military comes to me and says "senator, we ought to change the policy" then I think we ought to consider seriously changing it.But today when the leadership of the military asks to repeal don't ask/don't tell:
I'm deeply disappointed in your statement [...] In this moment of immense hardship for the armed services we should not be looking to overturn don't ask/don't tell.There's also a good op-ed by John Oliver on why we should have a don't ask/don't tell policy on people approaching old age.
| The Daily Show With Jon Stewart | Mon - Thurs 11p / 10c | |||
| A Few Gay Men & Women | ||||
| <embed allowfullscreen="true" allownetworking="all" allowscriptaccess="always" bgcolor="#000000" flashvars="autoPlay=false" height="301" src="http://media.mtvnservices.com/mgid:cms:item:comedycentral.com:263464" style="display: block;" type="application/x-shockwave-flash" width="360" wmode="window"></embed> | ||||
| ||||
by newsham (noreply@blogger.com) at February 04, 2010 06:08 PM
Western Union Man, and other great 419 heros

That's a picture of an actual scammer that the scam baiter got him to take of himself.
If you're looking for something elaborate, check out http://www.419eater.com/html/SkeletonCoast/safari2.html.
by newsham (noreply@blogger.com) at February 04, 2010 05:48 PM
research!rsc
Names
Every programmer has a variable naming philosophy. This is mine:
A name's length should not exceed its information content. For a local variable, the name
iconveys as much information asindexoridxand is quicker to read. Similarly,iandjare a better pair of names for index variables thani1andi2(or, worse,index1andindex2), because they are easier to tell apart when skimming the program. Global names must convey relatively more information, because they appear in a larger variety of contexts. Even so, a short, precise name can say more than a long-winded one: compareacquireandtake_ownership. Make every name tell.
The information content metric gives a quantitative argument against long-winded names: they're simply inefficient. I internalized this metric years ago but only realized this phrasing of it recently, perhaps because I have been looking at too much Java code.
February 03, 2010
OS Inferno
Inferno на BeagleBoard
newsham
A dare
by newsham (noreply@blogger.com) at February 03, 2010 03:34 AM
February 02, 2010
newsham
Firefox, Chinese root cert, DNS and trust
by newsham (noreply@blogger.com) at February 02, 2010 07:54 PM
research!rsc
Go's Package Name Space
Go organizes programs into individual pieces called packages.
A package gets to pick a short name for itself, like vector,
even if the import statement must use a longer path like "container/vector"
to name the file where the compiled code is installed.
The early Go compilers used the package name as a unique identifier
during linking, so that vector's New function
could be distinguished from list's New.
In the final binary, one was vector.New and the other list.New.
As we started to fill out the standard library, it became clear that we needed
to do something about managing the package name space:
if multiple packages tried to be package vector,
their symbols would collide in the linker.
For a while we considered segmenting the name space,
reserving lower-case names for standard packages
and upper-case names for local packages.
(Since package names and object file names are
conventionally the same, one reason not to do this
is that it would require a case-sensitive file system.)
Other languages simply use longer names.
Both Java and Python tie the name to the directory in which
the package is found, as in com.java.google.WebServer
for the code in com/java/google/WebServer.class.
In practice this leads to unnecessarily long identifiers, something
Go tries to avoid.
It also ties the name to a particular mechanism for finding code: a file system.
One of the reasons that import paths are string constants in Go is
so that it is easy to substitute other notations, like URLs.
Last spring, during a long discussion about how to divide up the package name space, Robert Griesemer cut the Gordian knot by suggesting that we allow multiple packages to choose a single name and fix the tool chain to cope. The import statement already allows introducing a local alias for the package during the import, so there's no linguistic reason package names have to be unique. We all agreed that this was the right approach, but we weren't sure how to implement it. Other considerations, like the open source release, took priority during most of 2009, but we recently returned to the problem.
Ultimately, the linker needs some unique name for each symbol in the program; the fundamental problem caused by deciding that package names won't be unique is to find another source of uniqueness that fits into the tool chain well.
The best approach* seems to be to use
the package's import path as the unique identifier,
since it must uniquely identify the package in the
import statement already.
Then container/vector's New is
container/vector.New.
But! When you're compiling a package,
how does the compiler know what the package's import path will be?
The package statement just says vector, and while
every compilation that imports "container/vector" knows the import path,
the compilation of vector itself does not,
because compilation is handled separately from
installing the binary in its final, importable location.
Last week I changed the gc compiler suite to do this.
My solution to the import path question
was to introduce a special
name syntax that refers to “this package's import path.”
Because the import paths
are string literals in the Go compiler metadata, I chose the empty string—""—as the
self-reference name.
Thus, in the object file for package vector, the local symbol
New is written "".New.
When the linker reads the object file, it knows what
import path it used to find the file. It substitutes that path
for the "", producing, in this case, the unique name
container/vector.New.
Not embedding a package's final installed location in its object file makes the object files easy to move and duplicate. For example, consider this trivial package:
package seq
var n int
func Next() int {
n++
return n
}
It's valid for a Go program to import the same path multiple times using different local names, but all the names end up referring to the same package:
package main
import (
"fmt"
s "seq" // changed to "seq1" later
t "seq"
)
func main() {
fmt.Println(s.Next(), s.Next(), t.Next(), t.Next())
}
prints 1 2 3 4, because it all four calls are to the same Next function:
$ 6g seq.go $ 6g -I. main.go $ 6l -L. main.6 $ 6.out 1 2 3 4 $
But if we change one of the imports to say "seq1"
and then merely copy the "seq" binary to "seq1",
we've created a distinct package, using lowly cp instead of a compiler:
$ cp seq.6 seq1.6 $ ed main.go 120 /seq s "seq" s/seq/seq1 s "seq1" wq 121 $ 6g -I. main.go $ 6l -L. main.6 $ 6.out 1 2 1 2 $
Now the s.Next calls refer to seq1.6's Next,
while the t.Next calls refer to seq.6's Next.
Duplicating the object actually duplicated the code.
This is very different from the behavior of a traditional C compiler and linker.
A digression: the explicit "". prefix is not strictly necessary.
It would be cleaner
if the linker treated every symbol as needing to be qualified by the
import path, so that all the "". could be dropped.
But occasionally it's important to be able to break the rules,
for example to define a symbol that is logically in one package
be implemented in another. For example,
the implementation of unsafe.Reflect is actually
in the binary for package runtime,
because that's where all the interface manipulation code lives:
$ 6nm pkg/darwin_amd64/runtime.a|grep Reflect iface.6: T unsafe.Reflect $
Another reason to use an explicit prefix is to admit
names with no prefix at all, as would be generated by legacy C code.
Otherwise, what should C's printf be in?
If the linker enforced a strict boundary between packages,
both of these examples would be impossible.
Most of the time that would be a good thing, but
systems languages do not have the luxury of stopping
at “most of the time.”
Last October, a few weeks before the public release of Go,
I changed the linker to insert import path qualifiers on all names
during linking, but it was too disruptive a change to commit
before the release.
Last week's implementation, which allows for semipermeable package boundaries,
is a much better fit for Go.
This week Ian Lance Taylor is working on eliminating the
global package name space assumption in gccgo.
He'd like to avoid making changes to the linker,
which rules out introducing a “this package” notation like "".
Gccgo must be able to write objects that know their own import paths,
which means gccgo must know the import path at compile time.
But how? There will be a new gccgo command line option, and the build system
will simply tell the compiler what the import path is.
In retrospect, I wonder if the effort of "" in
the gc tool chain was justified compared to adding an option.
The gc implementation is easier to use,
but it's not clear how important that will be.
Time will tell.
* An alternative approach would be to generate a
random identifier each time the compiler
is invoked and to use it for the package compiled by that run.
When other packages import the compiled package, they can
read the identifier and use it to generate references to that
package's symbols.
The most glaring problem with this approach is that
the symbol names you'd see while debugging would
be ugly, like mangled C++ names but worse.
Another problem is that it would break aggressive
incremental compilation: if fmt gets recompiled,
all packages that import it would have to be recompiled to pick
up the new identifier, even if the external interface hadn't changed.
It would be nice to avoid those recompilations, especially in large programs.
February 01, 2010
January 31, 2010
January 29, 2010
January 28, 2010
Eric Van Hensbergen
9P Overview Slides
MP4 Video Link or Livestream below:
<object classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" height="340" id="preview-player1" width="340"><param name="movie" value="http://cdn.livestream.com/grid/LSPlayer.swf?channel=graverobbers&clip=pla_5da534c4-5d1f-460e-adda-836aee4b514e&autoPlay=false"><param name="allowScriptAccess" value="always"><param name="allowFullScreen" value="true"><embed allowfullscreen="true" allowscriptaccess="always" height="340" id="preview-player" src="http://cdn.livestream.com/grid/LSPlayer.swf?channel=graverobbers&clip=pla_5da534c4-5d1f-460e-adda-836aee4b514e&autoPlay=false" type="application/x-shockwave-flash" width="340"></embed></object>
by Eric Van Hensbergen (noreply@blogger.com) at January 28, 2010 03:27 PM
9gridchan
Saturday, 3 January 2010 - coming soon, updated Qemu image with many new tools
Saturday, 3 January 2010 - coming soon, updated Qemu image with many new tools
Now that recent projects like writable /proc/pid/ns and rootless post kernel load startup have been announced on 9fans, its time to make a new version of the Qemu preinstalled image with all the new stuff. It will include the latest updates of the previous grid software as well as hubfs, grio, and the new startup process - all available optionally, along with the traditional default kernels and configs. However, I am going to try to make the customized environment as compelling as possible, because I'd like to make a good case for the idea that using multiple system nodes, virtualized or not, has practical benefits. Some goals:
multi-machine aware startup to bind in fileserver and cpu resources early if available
easier to make use of 9gridchan.org hosted resources - and 9gridchan registry needs more real services
easy to set up highly unconventional namespace structures coexisting on a single machine (rootless boot helps)
simple, scripted data replication and machine backup operations
easy bootstrapping from grid of VMs to physical machines or mixed grid
Release goal is by the end of January, with a rough spin available for interested testers a week before or so.
Wednesday 21 October 2009 - Overdue Hubfs announcement - improved "screen" utility
Wednesday 21 October 2009 - Overdue Hubfs announcement - improved "screen" utility
Since this has been up on sources for several months now, I really should have announced this here already. Hubfs is an improved screen-like tool for Plan 9, implemented as a 9p fs with client. It is much more resource efficient than iosrv and, as a 9p fs, much better integrated into the overall system. The user interface is also much more sane. It can be installed from the source tarball contrib/mycroftiv/hubfs.tgz, or with contrib/install mycroftiv/hubfs. Basic usage is just 'hub NAME' to either start a new shell/hub session or connect to an existing. The manpage explains most of the details. Following completion of a solid version of hubfs I hit the burnout threshold as a result of several months of 16 hours a day of Plan 9 and retreated to a remote corner of namespace for a couple months. With the arrival of the winter months, however, I have a good excuse to turn on all my computers in the service of home heating, and undoubtedly this will result in a renewed burst of development energy.
Wednesday 8 July 2009 - Announcing Iosrv - "screen without a screen"
Wednesday 8 July 2009 - Announcing Iosrv - "screen without a screen"
After several weeks of isolation in their underground lab, the /9/grid's craziest researchers are now unleashing upon the world our latest weapon in the struggle for an open collaborative decentralized worldwide grid:
Iosrv.tgz
Available on Bell Labs sources - contrib/install mycroftiv/iosrv or as a tarball contrib/mycroftiv/iosrv.tgz to mk install at your pleasure, or here - this is what happened when we decided to get serious about screen-like functionality for Plan 9 while also trying to pursue some tantalizing hints about some of Doug McIlroy's original ideas that led to the creation of UNIX pipes. Iosrv (which is controlled via a wrapper scrip called io for both the clients and server) does not emulate a terminal in the manner GNU screen does - instead, it follows the Plan 9 principle of staying close to file i/o fundamentals and provides multiplexed buffered pipes that can be attached to the three standard file descriptors used by textual environment programs such as the rc shell.
As a consequence of this, we have some new tricks, and here's the best one:
The clients attached to an iosrv can all share locally executing shells with each other in addition to using the shells on the iosrv host
The standard model of usage that corresponds to how GNU screen works is for clients to attach to an existing iosrv in an imported /srv (started with a command like io rcjul15 on the iosrv remote host ) using the io wrapper script:
io SRVNAME
This command connects to an existing /srv/srvname. Once connected, additional commands will be trapped by the client. To create and attach to new shells hosted by the iosrv remote machine with the command remote # where '#' is a multiple of 3. The first new shell might be created with:
remote 3
followed by subsequent shells created by replacing '3' with successively larger multiples of 3 as the number of backed shells increases. (Each shell uses 3 file descriptors) To move between shells after they have been created, attach # will connect to previously created rc shell Hubs.
attach 0
will move back to the set of file descriptors beginning at Hub 0. (More on Hubs in a moment). The new trick is to start a new shell like this:
local 6
which will start a new rc shell on the CLIENT machine, but connect its i/o file descriptors to the remote iosrv HOST. The consequence is that the locally executing rc becomes shared back to the iosrv, and other clients of the remote host can attach to it using
attach 6
and the shell running on your machine is available to them exactly as a shell running on the iosrv host. In other words, iosrv really does 'serve i/o' and doesn't care what's considered the client or server from the viewpoint of the user, it just pipes data back and forth to whatever its pipe Hubs are connected to.
Ok, there's that word again - Hub - the Hub is the low level data structure iosrv uses to organize connections. Following Pike's classic dictum that "data dominates", we are offering a new variant of an old abstraction, the pipe. A Hub is a pipe, but it has multiple nozzles on each end. It accepts multiple simultaneous readers and writers and allows them to connect and disconnect independently to and from the flow of data in real time. The core iosrv knows nothing of rc, or shells, it simply manages a set of file descriptors connected to Hubs. A standard shell has 3 open file descriptors, so 3 Hubs allow an arbitrary number of clients to all make use of the same shell. Adding a new shell simply means creating 3 new Hubs and then starting an rc using file descriptors managed by that iosrv. This why standard attach, remote, and local commands always take a multiple of 3 as their numeric parameter.
On the client side, a smaller program called ioshell navigates the forest of piped file descriptors for the user, acting to connect the file descriptors of the initiating client to the Hubs provided by iosrv. It also monitors user input for command strings and then takes actions or passes them to iosrvs ctl file as appropriate.
The iosrv is completely parallel, forking off independent processes for reading and writing each connected file descriptor. It is not uncommon for a busy iosrv to be coordinating 50-100 running processes and holding double that many file descriptors. This will not however clog the host system, because all processes use QLocks and/or blocking reads to control execution, yield execution frequently, and each Hub has individually tunable process cycle sleeptime. Testing shows that even a low resource Qemu based virtual machine can easily handle multiple simultaneous clients and servers to multiple local and remote rc shells.
The iosrv also provides fine-grained control over resources for the user, although the interface to this is a work in progress. Using iosrv for screen-like functionality is its main intended purpose, but the design is deliberately open-ended. It is already possible to use iosrv as a kind of greatly expanded 'tee' for data streams and we will be trying to prototype some multinode applications that pipe data around in constant loops between them.
We have been using iosrv heavily and testing it extensively. It is not perfectly polished but we have found it to be stable, reliable, and functional in its current state. Suggestions for features and code improvements always welcome.
Look for shared public shell exports as services in the 9gridchan.org 9p service registry soon, also.
Thursday 18 June 2009 - screen functionality in a few lines of rc
Thursday 18 June 2009 - screen functionality in a few lines of rc
This is an extension of the earlier post - this is a pair of scripts intended to be used on nodes setup similarly to those described in 'reversed remote execution' - it allows shared terminals to have persistent state in the background with the ability to be pushed onto the remote clients (which are providing rio as a service).
#! /bin/rc
# start up an rc with its i/o connected to pipes set
# this monitors and its window should be set to scroll if not backgrounded
mkdir /tmp/wpin
mkdir /tmp/wpinclone
mkdir /tmp/wpout1
mkdir /tmp/wpout2
mkdir /tmp/wpout3
bind '#|' /tmp/wpin
bind '#|' /tmp/wpinclone
bind '#|' /tmp/wpout1
bind '#|' /tmp/wpout2
bind '#|' /tmp/wpout3
window -m
rc <{tee /tmp/wpinclone/data < /tmp/wpin/data1} | tee /tmp/wpout1/data /tmp/wpout2/data /tmp/wpout3/data
#! /bin/rc
# attach to a copy of grc using wpout $1 and push to a second wsys also
mount $wsys /mnt/wsys new
bind -b /mnt/wsys /dev
read -m /tmp/wpout$1/data1 | tee /dev/cons &
read -m /dev/cons | tee /tmp/wpin/data &
cat >> /tmp/wpin/data
The usage model is probably not exactly transparent from these scripts - the first script is named grc and it creates a set of pipes and starts an rc with its input and output redirected to them. This is what creates the 'screen' like functionality - any number of processes can control the shell by writing to /tmp/wpin/data, and as many clients can read the output as there are teed output pipes. Window -m is started after binds to provide a seed window - you will probably want to start more than one additional window -m to preserve access to the pipes in the namespace.
The second script I name getgrc and it assumes that you have set $wsys to a traget window system, then starts a new window on it and connects the i/o of both the remote and the local window to the specified output pipe. It should be noted that the quasi-screen functionality is independent of the idea of mounting a remote window system and using i/o from both/either location. If you want a simple screen equivalent, simply making the pipes and starting the rc with i/o redirections allows multiple clients to use it by connecting to the piped files.
Tuesday 16 June 2009 - Reversed remote execution connections and shared sub rios
Tuesday 16 June 2009 - Reversed remote execution connections and shared sub rios
An experience I continually have with Plan 9 is discovering that everything I have ever wanted to do with computers but have never been able to effectively before can be done with about 3 lines of rc and the standard Plan 9 toolkit. Right now I am gleefully exploring what I find to be an incredibly rich and useful set of possibilities created by rio-as-fileserver. First, let me describe the setup I'm using so you can follow along - you want two plan 9 systems for this, with at least one setup as a CPU server. A pair of Qemu VMs will work fine. Let's just assume you have a pair of CPU servers running, machines A and B, each with a graphical display, rio running, and with a standard exportfs listener on port 17007.
You are working using cpu server A, and your friend is using cpu server B. For convenience, we will describe this as if you have full privileges on each other's machines, but these tricks can also work fine between parties with more limited trust with appropriate modifications to how the filesystem exports are set up. Your goal is to use Plan 9 for some lightweight interactive resource sharing for real time communication and collaboration. Each of you opens a subrio within your main rio session. To start off, you need access to some of machine B's resources. You are working using cpu server A, and you start with:
import -c cpuB /srv /n/cpuBsrv
Now you want to write a message to your friend that will appear on his screen in a new window briefly.
wsys = /n/cpuBsrv/rio.friend.68962
window -m rc -c 'echo hi fred && sleep 5'
And up on Fred's screen on machine B pops a window saying 'hi fred' that sticks around for 5 seconds, then vanishes. How and why? Because rio is a fileserver, and the window command mounts whatever rio is specified in the $wsys. The window command is actually an rc script that can show us the principles involved for a lot of nice tricks. How about just popping up a window to talk back and forth:
mount /n/cpuBsrv/rio.friend.68962 /mnt/wsys new
bind -b /mnt/wsys /dev
cat /dev/cons &
cat >/dev/cons
And you have created a window on your friends screen, with each of you able to see whatever the other types. This is a perfectly functional and usable way to communicate, and the ability to initiate the connection by making the window appear on demand in the 'shared' rio is an improvement over other simple equivalents like telnet or netcat based direct chat.
Another very simple and useful application is to allow your friend to run an interactive rc on your machine, with the display on his system, while you watch. In other words, he is using your machine in a manner similar to a telnet/ssh login, even though you 'pushed' the window onto his machine. The first two setup lines are the same as before:
mount /n/cpuBsrv/rio.friend.68962 /mnt/wsys new
bind -b /mnt/wsys /dev
exec rc -v < /dev/cons |tee /dev/cons
Now you can watch in real time as your friend uses your system. You can also easily invert this - and let your friend watch what you are doing. Just change the final line to:
exec rc <{tee /dev/cons} |tee /dev/cons
This changes the source of the controlling input to the shell from the remote /dev/cons to the local standard input. I believe both of these simple interaction models provide useful functionality in very direct ways. The real-time mirroring of the textual input and output in a shell window is provided without any of the usual overhead of a tool like VNC viewing and the use of a sub rio session as a container for the shared windows makes sandboxing via namespace manipulation very easy - and it should be noted that sharing a rio service is very different than providing full cpu server functionality, as only the functionality of rio is being exposed to the remote client. These simple filesystem and i/o based operations provided fine grained real time connectivity options with lower overhead and potentially greater security than the predominating methods.
The above commands are very simple and can certainly be improved upon - in particular the standard error is being abused and the prompt and errors will be suppressed for the remote user. I'm certain a lot of Plan 9 users out there have nifty improved versions of these tricks or similar. Using the fileserver properties of rio to create a sandboxed dynamic shared environment is a good fit for the goals of the /9/grid so we'll be posting an example of a scripted 'shared workspace' setup script and commands soon - its a pretty clear upgrade from the current 'grid graffiti wall' to a real-time 'grid graffiti rio!'
Sunday 7 June 2009 - Planning for demo grid setup
Sunday 7 June 2009 - Planning for demo grid setup
We are trying to determine the best way to have a larger set of public services - more nodes, and a demonstration of ideas like the High Availability Grid, an attempt to make a redundant and failproof environment that still maximizes the utility of the attached resources and can be flexibly reconfigured on the fly. Most of the following are things that we have already protyped and tested, but haven't been available through our public portals. Things we'd like to implement (perhaps not all at once):
A high availability configuration of resources using about 6-8 functional nodes divided between 2-3 physical machines.
On-demand temporary single VMs
"The proving grounds" - a combination of both of the above, but with the intention of stability and stress testing for various multimachine configurations.The only way to test 'failproofing' is to force failures and test the ability of the systems to recover.
We've been doing some testing of VMware server in addition to our normal Qemu/9vx/p9p based setups. We prefer free open source personally, but some people prefer to VMware to qemu so we may begin providing vmware .vmdk images as well as Qemu qcow2s. For anyone who wants to try using vmware now, it is possible to use qemu-img convert to make a vmdk from a .qcow2, but there were a few additional adjustments we needed to make for best results, notably recompiling the included pccpuf kernel.
Wednesday 3 June 2009 - Documentation added for new tools
Wednesday 3 June 2009 - Documentation added for new tools
Well, some documentation for some of the new tools at least, but its a start. Writing documentation is hard. The organization of knowledge is the tricky thing. The balance of conceptual explanation and task-oriented how-to is hard to get right. The docs in the docs directory still need to be supplemented with some specific task-oriented recipes for doing things like making a fully customized local grid, which requires a bit of coordination of action at the host os level and the VM interiors. As a warmup, I'll try to summarize the process here:
The base system image is designed to serve as the seed for a multisystem grid. It includes a variety of /cfg directories to use as base roles for new systems, and the confighelper script is designed to help do customization quickly. If you want to create a personalized local grid of machines from the base image, here is how to do it:
Test the base image in its default configuration to verify functionality before customizing.
login as bootes in cpu server mode and start up the confighelper
choose to (r)eset machine key and passwords, and answer the prompts. Your chosen authdom is particularly essential - it can be your domain if you have one, or just an arbitrary name for 'your grid turf'.
reboot the vm to cpu server mode and answer the prompts to create the new machine key. You only get once chance to set bootes' password here, be careful. Then run auth/changeuser bootes and enter the same password you just did, and then run auth/changeuser for your other users and set their new passwords. Run auth/changeuser -n for each user also to update the netkeys.
at this point your new auth setup should be complete and you should be able to drawterm/cpu in, 9fs if the port 564 listener is running, use netkey auth, etc. Please test whatever functions are important to you.
this is your new seed image. If you use it as a base for cloning your additional grid systems, they will all include copies of the auth information and keys and users.
now you can manufacture as many customized systems as you want using the confighelper tools. to make a new customized system, first create a new blank qemu qcow2 for it. qemu-img create -f qcow2 newsysname.qcow2.img 1G creates a qcow2 blank disc that can expand up to 1gb of storage. A venti backed fossil is very small so you probably wont use much of that space.
now boot qemu with the blank disk attached as -hdb. in windows for instance, you might do qemu -L . -hda g9.qcow2.img -hdb newcpu.qcow2.img -redir tcp:567::567 -redir tcp:17010::17010 -m 256, then drawterm in as bootes. Run the confighelper script and choose to (a)dd new system to grid. It will prompt you for a system name and other info and a config from /cfg to clone - it will then transform the disk on drive 2 into a customized clone of your seed system on drive 1.
continue this process to make as many customized VMs as you want. A very common customization to apply would be changing the venti server IP so you could boot additional nodes on other machines using your existing venti server.
these cloned VMs are bootable in exactly the same way as the parent images, and can make 'children' of their own.
Making a new blank qcow and booting and running the configscript takes about 1 minute once you are familiar with the process, so you can make two VMs for every computer on your LAN more or less instantaneously. Note that you might want to 'preload' your /lib/ndb/local with the new machines using confighelper's nbd updater prior to starting the cloning process, so all children have all their peers in their ndb.
Monday 1 June 2009 - New Release Cycle begins!
Monday 1 June 2009 - New Release Cycle begins!
Well, we've got a new version of the qemu-based grid tools up - it provides a lot of new functionality, flexibility, and hopefully ease-of-use. We haven't really started documenting everything that's going on with it, and we are going to be rolling out new services related to its design during the upcoming weeks. The image and tools itself should start getting more frequent updates and we are hoping to leverage Venti and other Plan 9 based tools to do so.
If you already have access to a linux machine, the new base version of the image provides what we hope is pretty neat out-of-the box functionality - you extract the .tgz, cd into your new gridtools directory, and run (as root) a simple iptables script to redirect some low ports to high ports for the qemu VM to access without root privileges. The gridlord script (run as normal user inside the extraced directory, no install required) then provides a simple menu-based interface to control a full distributed plan9 system all on your local machine. Fire up a VM and let it boot in cpu server mode, then drawterm in as bootes, glenda, or gridna with password: gridpass. The VM image itself is meta-pre-configured - check out the /cfg directory, which holds relatively self-descriptive configurations for multiple machine roles. /usr/bootes/bin/rc contains confighelper -- a wrapper for an set of configuration helper scripts designed to let you quickly and easily control some administrative variables. Please note that right now this is an ad-hoc set of recipes based on the base configuration we provide. We hope to develop it into a more general purpose Plan 9 configuration utility, but it is currently not recommended for use with anything other than the 9gridchan.org preconfigured qemu image.
There are several other customizations and extras included in the image - and source code to everything, of course. /usr/grid is a 'non-login' user whose directories contain some of the grid tools. /usr/grid/src and /usr/bootes/lib have most of the additonal materials, along with /usr/grid/bin/rc and /usr/grid/bin/386 - these directories are bound in during most user's profiles, except glenda. We left Glenda basically untouched from the Bell Labs setup in this image. Thanks to everyone who helped test and offered suggestions for this version of the tools, and we look forward to improving it further according to user suggestions. As of the time of this posting, the surrounding documentation and web resources are still mostly un-updated so things like the walkthrough screenshots gallery still apply to the old version of the image.
Wednesday 13 May 2009 - Self-assembling high availability grids
Wednesday 13 May 2009 - Self-assembling high availability grids
Currently under testing and development: the amazing "infinite rabbits from a single venti-hat" High Availability Grid. The past few weeks we've been exploring ideas triggered by the possibility of storing entire sets of machines in both virtual machine and fossil vacroot form in a set of arenas and isect data for a venti server. Before your eyes glaze over from the seeming over-complexity, lets cut to the chase, the goal for the end user.
The end product is a relatively small file distributed as a .tgz that contains a complete set of files for a venti server and a large collection of .vac files. The venti data includes the rootscores of many different machine fossils, and also entire preinstalled qemu virtual machine images that correspond to those rootscores. Using either a native or plan9port venti, the user can choose from a large number of differentlty configured machines to instantiate on either native hardware or as a virtual machine. The venti has data corresponding to machines ranging from an untouched default install to a large grid of dozens of machines. Due to the nature of venti, storing an arbitrarily large number of differently configured virtual machines and fossils has almost no overhead.
The system images included are configured to act as an integrated plan 9 network with a few tricks designed to provide a high availability 'cloud' environment out of the resources available. This is the system employed on the 9gridchan version of the grid which will be opened for public use soon. Our current setup uses one master and one backup venti, three fossil file servers, three cpu servers, and an arbitrary number of terminals. The creation of a single unified environment for the user along with failsafe reliability is accomplished via simple scripted filesystem imports and binds.
Any of the 3 fossils can boot backed by either of the two ventis, which use venti/rdarena and venti/wrarena to make scores available on both. Each CPU server boots with a TCP root from a different fossil. However, all the fossils are basically identical, and when the user logs into any cpu server, 9fs imports of the other fossils are imported, and the primary fossil's $home is bound over the boot $home by the cpu. A simple backgrounded script then mirrors the data from the active $user to the backup fossils.
The user interaction model is to dial each of the three cpu servers from a different window on their terminal. Each window will behave identically and binds $home from the same primary fossil, but is using a different CPU with a different tcp root and venti backup. Once the user has dialed in their three windows, they can work freely in any of them and see the same file data. If that fossil fails, the backup fossils, no more than 5 minutes out of date, can be bound to $home. A standard usage model is to run rio on one cpu, acme on another, and rc shell only on the third. The user experience should be identical to working with a single all-in-one machine with the difference of enhanced reliability and the increased performance of truly independent multiple CPUs.
A public release of a first version of these tools along with some updates to the g/scripts and preconfigured qemu images should be arriving soon. Feel free to stop in #plan9chan on irc.freenode.net if you want to explore this system in its development state. Standard public resources such as omni are unaffected by the current development environment but will become integrated at the time of an initial public distribution.
Apr 12 2009 - Just checking in
Apr 12 2009 - Just checking in
No big news on the grid past few weeks - had a service interruption for a few days, but it was resolved. Shockingly, the brief absence of 9gridchan services did not cause any significant rioting or result in the overthrow of any governments. We are greatly relieved.
Feb 19 2009 - Random update
Feb 19 2009 - Random update
Random blog update to make sure people know the project is healthy and ongoing. An actual public announcement of some kind on a known Plan 9 forum such as 9fans has been something we've been waiting a long time to do.
Feb 02 2009 - Explanatory captions for the walkthroughs.
Feb 02 2009 - Explanatory captions for the walkthroughs.
Thanks to the ceaseless flow of innovation and progress in free and open source software, the walkthrough screenshots tutorial galleries now feature captions and better navigation! Additionally, we are working to provide new services and content on the dynamic 9p registry system. Nothing too thrilling, but as an example, the omniguest user is now exporting /bin. After connecting to that service, binding the imported directory over the local /bin in an rc window allows testing of any of the installed software on the public cpu server without actually logging in.
Jan 31 2009 - Setup screenshots gallery
Jan 31 2009 - Setup screenshots gallery
Created a couple galleries with screenshots of starting up the image, connecting with drawterm, and running configscript. Hopefully having a visual reference will make it easier for people trying plan 9 for the first time to see if they are seeing what they are supposed to see. More screenshot galleries will be added to demonstrate making use of the g/script toolkit and 9gridchan.org resources.
Jan 30 2009 - post #2: more on 1.1
Jan 30 2009 - post #2: more on 1.1
The preinstalled qemu node image has several purposes. Most importantly is to provide an easy-to-use form of Plan 9 for every platform. Qemu runs on everything and a preinstalled image means that once the download is complete, the user can be inside a fully functional Plan 9 enivornment instantly. Another goal is to provide easy access to the key component of Plan 9: a CPU server. Another is is to provide a user-driven, decentralized platform for grid computing research and development. The latest version of the image makes bootup easier with a single menu for boot choices. There are more variant versions of plan 9 software and some customizations like the default font, taken from quanstro's subpixel fonts. A small sampler collection of .vac scores for downloading additional docs and collections of Plan 9 pictures from venti.9gridchan.org is also included. Thanks as always to the Plan 9 contributors whose work is included such as fgb (contrib and abaco), andrey (irc7 and links port), and others.
Jan 30 2009 - Development blog post #1
Jan 30 2009 - Development blog post #1
A big day of upgrades - along with release 1.1 of the /9/grid qemu image and toolkit, we've upgraded www.9gridchan.org with Uriel's latest Werc + Soul9's image gallery application plugin. Thanks to them for their development efforts. The new 1.1 image features more stuff and a smaller download and installed size, along with default user accounts for instant drawterm/cpu access for testing. (CHANGE THOSE PASSWORDS before allowing access from external networks, everyone!)
Ongoing projects include adding scripts for managing venti servers and databases of .vac files to the g/toolkit and a grid game based on namespace manipulation. Future blog posts will also provide some tips and tricks for using grid resources and tools.
Eric Van Hensbergen
Linux 9P Trace and Walkthrough
Unfortunately video didn't come out.
by Eric Van Hensbergen (noreply@blogger.com) at January 28, 2010 03:35 AM
January 27, 2010
newsham
Dirty tricks, China censorship, High speed rail and other news...
Federal officials accused four men, including a conservative activist, of posing as telephone repairmen to tamper with phones at the New Orleans offices of Democratic Sen. Mary Landrieu. [...] Late Monday morning, according to an FBI affidavit, Messrs. Flanagan and Basel, dressed in blue work shirts, fluorescent green vests and construction hard hats, entered Sen. Landrieu's offices and told a staffer they had come to fix the phone lines. By then, Mr. O'Keefe already had arrived at the offices, according to the FBI. -- http://online.wsj.com/article/SB20001424052748703906204575027500584356116.html
Microsoft is defending China's internet censorship program proving that they're still pretty good at being evil. Take that, Google! Also thanks for all the bugs!
That rabble rouser Obama wants to give us high speed rail! That is just pure awesome. I bet there are lots of people who want his head. Oh, and in other news "2009 goes into the history books as the worst year the [airline] industry has ever seen." Yay! In my opinion no other industry treats its customers as poorly as the airline industry. They've received numerous bailouts in the past and seem to have decided that they do not need to earn their dollars. Oh, and they'd rather sequester hundreds of people on the tarmac for hours on end to save a few bucks. Please rot in hell.
I'm starting to get that "double-dip" feeling. Here's some new housing data.
Industry anti-piracy groups are still using heavy handed methods and harassing innocent people. Now laywers who were beaten up for their lunch money in school can get back at society as hired thugs for the music industry.
And going a little deeper on the headlines, check out Nate's wrapup on the PS3 hack.
by newsham (noreply@blogger.com) at January 27, 2010 06:37 PM
research!rsc
Generating Good Syntax Errors
One of the great religious debates in compiler writing is whether you should use parser generators like yacc and its many descendants or write parsers by hand, usually using recursive descent. On the one hand, using parser generators means you have a precise definition of the language that you are parsing, and a program does most of the grunt work for you. On the other hand, the proponents of hand-written recursive descent parsers argue that parser generators are overkill, that parsers are easy enough to write by hand, and that the result is easier to understand, more efficient, and can give better error messages when presented with syntactically invalid programs.
Like in most religious debates, the choice of side seems to be determined by familiarity more than anything else. I knew how to use yacc before I knew how to write a parser by hand, which put me solidly on the parser generator side of the fence. Now that I do know how to apply both techniques, I'd still rather use a parser generator. In fact, I've worked on projects where I've written parser generators rather than write a parser by hand. Good notation counts for a lot, as we shall see.
In Coders at Work, Ken Thompson talks to Peter Seibel about yacc and lex:
Seibel: And are there development tools that just make you happy to program?
Thompson: I love yacc. I just love yacc. It just does exactly what you want done. Its complement, lex, is horrible. It does nothing you want done.
Seibel: Do you use it anyway or do you write your lexers by hand?
Thompson: I write my lexers by hand. Much easier.
Many of the objections raised by the hand-written parser camp are similar to Thompson's objection to lex—the tool doesn't do what you want—but there's no fundamental reason a tool can't. For example, the spurious conflicts that an LALR(1) algorithm produces are definitely hard to understand, but if you replace it with LR(1) or GLR(1), you get a more useful tool. And if you do learn how to work within the LALR(1) algorithm, even yacc is very powerful.
The objection to parser generates that seems to resonate most is that generators like yacc produce inadequate error messages, little more than “syntax error.” Better error messages were one of the key benefits hoped for when g++ converted from a yacc-based C++ parser to a hand-written one (and to be fair, C++ syntax is nearly impossible to parse with any tool; the many special cases cry out for hand-written code). Here the objection seems harder to work around: the parser internally gets compiled into an automaton—usually a big table of numbers—that moves from state to state as it processes input tokens. If at some point it can't find a next state to go to, it reports an error. How could you possibly turn that into a good message?
Clinton Jeffery showed how in a paper published in ACM TOPLAS in 2003 titled “Generating LR Syntax Error Messages from Examples.” As you can guess from the title, the idea is to introduce a training stage after the parser is generated but before it is used for real. In that stage, you feed examples of syntax errors into the parser and look at what state it ends up in when it detects the error, and maybe also what the next input token is. If the automaton hits an error in that state (and with that input token) during real use, you can issue the message appropriate to the example. The important part is that the parser states are basically an abstract summary of the surrounding context, so it's reasonable to treat errors in a particular state with a single message. For example, suppose you find that this Go program
package main import ( "fmt", "math" )
causes a syntax error at the comma, in state 76. In the parser, that state means that you're in the middle of an import block and expecting to see a string constant. The state abstracts that context, so you'd end up in the same state if you were parsing this erroneous program:
package fmt import ( "strings"; "testing", "xgb" )
Having told the parser that the appropriate error message for the first program is “unexpected , during import block,” the parser will use the same message for the second program.
It's an elegant idea, and the implementation can be kept on the side, without adding any complexity to the grammar itself. I heard about this idea through the grapevine years ago (in 2000, I think) but had never gotten a chance to try it until this week.
There are three Go parsers: Ian Lance Taylor wrote a hand-written
recursive descent parser in C++ for gccgo,
Robert Griesemer wrote a hand-written recursive descent parser
in Go (import "go/parser")
for godoc and
gofmt,
and Ken Thompson wrote a yacc-based parser in C for the
gc compiler suite.
On Monday night, I implemented Jeffery's idea in the gc compiler suite. The gc compilers use bison, the GNU implementation of yacc. Bison doesn't support this kind of error management natively, but it is not hard to adapt. Changing bison would require distributing a new version of bison; instead, my implementation post-processes bison's output.
The examples are in a new file go.errors.
Because the lexer is written by hand, it's not available in the
simulation, so the examples are token name sequences
rather than actual program fragments. In token lists, the
program above and its corresponding error message are:
% loadsys package LIMPORT '(' LLITERAL import_package import_there ','
"unexpected , during import block",
Understanding the tokens on the first line requires knowing what the various
grammar token names mean, but they basically mimic the
syntax, and this tool is targeted at the people working on
the grammar anyway.
An awk program loads bison's tables
from its debugging dump y.output and
then processes the go.errors file, replacing
each line like the above with the number of the offending
state and input token.
That produces a table that can be linked into the compiler
and consulted when a syntax error is encountered.
If the state and input token match an entry in the table,
the better error is used instead of a plain syntax error.
That was an outsized amount of work to close one bug, but now
it's easy to add better messages for other common situations.
For example, the fact that only the short := declaration
form is allowed in for initializers sometimes trips up
new Go programmers. When presented with this program:
package main
import "fmt"
func main() {
fmt.Printf("hello, world\n")
for var i = 0; i < 10; i++ {
fmt.Println(i)
}
}
the compiler used to just say there was a syntax error on line 7, but now it explains:
$ 6g x.go x.go:7: syntax error: var declaration not allowed in for initializer
because of this stanza in go.errors:
% loadsys package imports LFUNC LNAME '(' ')' '{' LFOR LVAR LNAME '=' LNAME
"var declaration not allowed in for initializer",
Gccgo and gofmt give more traditional token-based errors:
$ gccgo x.go x.go:7:6: error: expected operand ... $ gofmt x.go x.go:7:6: expected operand, found 'var'
What's interesting isn't that neither has bothered to handle this as a special case yet. What's interesting is to consider the work required to do so. Changing either would require understanding the corresponding hand-written parser well enough to find the right place to put the special case. With the example-based approach, you just write an example, and the tool figures out where in the parser it needs to go.
January 25, 2010
newsham
Some news today
The lady who was fined $2M for illegal file sharing of music has had her fine reduced to $52k. The judge simply thought the original fine was too "monstrous and shocking" and reduced it to the largest value he thought was no longer so. $52k sounds more appropriate for sharing 24 songs with no profit motive.
There's been progress on hacking the PS3 to open up the platform (and support piracy). The headline says "hacked" but it sounds like there's still a bit of work to be done.
Its looking more likely that the guy in charge of making sure our economy doesn't tank during the biggest recession of our lifetime is going to get to keep his job. It must be nice working in the financial industry where you can be popular and even make tons of money even if you destroy the wealth of your clients or constituents. Some people are happy at the job he did to get us out of the recession, and I think he deserves credit for that, but now that we're out of recession, he should still be shown the door for letting us get into the mess in the first place.
by newsham (noreply@blogger.com) at January 25, 2010 06:55 PM
January 24, 2010
January 23, 2010
newsham
Python marshalling, iterators and currying
Marshalling
Marshalling is the process of converting some structured data into a serial format. One way to think about this in python is a function that maps some structured datum into an iterable that yields characters. The simplest marshallers can be written directly using generators:
def encChar(ch):More complex marshallers can be written by chaining together simpler encoders. The itertools module provides a chain function which does this. Here we marshall 16-bit and 32-bit numbers in big-endian format:
yield ch
def encNum1(x) :
yield chr(x & 0xff)
from itertools import chainMarshalling a fixed list of values can also be accomplished by chaining together several encoders:
def encNum2(x) :
return chain(encNum1(x >> 8), encNum1(x))
def encNum4(x) :
return chain(encNum2(x >> 16), encNum2(x))
def encFixedList_(es, xs) :This function chains together the encoding of each item in xs using the matching encoder in es. In practice you'd want a special zip that results in error if the number of es and xs was not the same, but lets ignore that for now. This function will work just fine, but it has a problem in practice -- usually you'll want to use this function partially applied, providing the es but leaving the xs to be filled in later. Here's where currying comes in handy. We'll make a function that takes just the es and returns a function that takes the xs later:
return chain(*(e(x) for e,x in zip(es, xs)))
def encFixedList(*es) :This is much easier to use:
return lambda xs : chain(*(e(x) for e,x in zip(es, xs)))
encMyList = encFixedList(encNum4, encChar, encNum2)You might notice that the encoding of fixed lists is basically an encoding of records. If we presuppose that all of our records have a toList method that returns all of the record's fields as a list, we can write a marshaller for these records:
def encRecord(*es) :To use these marshallers we pass the value we want to encode and then convert the resulting iterable to a string:
enc = encFixedList(*es)
return lambda rec : enc(rec.toList())
class MyRec(object) :
def __init__(self, a, b, c) :
self.a, self.b, self.c = a,b,c
def toList(self) :
return (self.a, self.b, self.c)
def __str__(self) :
return '[MyRec a=%s b=%s c=%s]' % self.toList()
encMyRec = encRecord(encChar, encNum1, encNum1)
def encode(enc, x) :
return ''.join(enc(x))
x1 = encode(encNum4, 0x01020304)
print '%r' % x1
x2 = encode(encMyList, [31415927, 'x', 257])
print '%r' % x2
x3 = encode(encMyRec, MyRec('A', 5, 10))
print '%r' % x3
Unmarshalling
Unmarshalling is the process of turning a stream of bytes back into structured data. Instead of working with an iterable and converting it to structured data we'll be working with an iterator. The difference is subtle; we'll want to consume bytes and keep track of where we are in the stream. An iterator does just that. Each of our unmarshallers will take in an iterator of characters, consume some of the bytes and return a decoded value. As before our simplest unmarshallers are for characters and single-byte numbers:
def decChar(it) :Unmarshalling larger numbers is just a matter of unmarshalling several smaller numbers and composing them. Here are the unmarshallers for 16- and 32-bit big endian numbers. These definitions make use of the left-to-right order of evaluation:
return it.next()
def decNum1(it) :
return ord(decChar(it))
def decNum2(it) :And unmarshall fixed lists is just a matter of decoding each constituent in turn. Here again we use currying:
return (decNum1(it) << 8) | decNum1(it)
def decNum4(it) :
return (decNum2(it) << 16) | decNum2(it)
def decFixedList(*ds) :To unmarshall a record we need a way to unmarshall the constituent fields and then create the record. The decFixedList function can be used to unmarshall the records if we have a way to construct the record from a list. We'll presuppose that our records have a constructor that take the field values in turn, then unmarshalling is simply:
return lambda it : [d(it) for d in ds]
decMyList = decFixedList(decNum4, decChar, decNum2)
def decRecord(klass, *ds) :To use these unmarshallers to decode a binary string we must first get an iterator and then pass it to the decoder:
dec = decFixedList(*ds)
return lambda it : klass(*dec(it))
decMyRec = decRecord(MyRec, decChar, decNum1, decNum1)
def decode(dec, binary) :
return dec(iter(binary))
print decode(decNum4, x1)
print decode(decMyList, x2)
print decode(decMyRec, x3)
Records
One refinement I often make is to simplify the construction of records since I often have many records in the protocol:
class Record(object) :Now defining a new record is done simply by filling in the __name__ and __fields__ values:
def __init__(self, *args) :
for f,v in zip(self.__fields__, args) :
setattr(self, f, v)
def toList(self) :
return [getattr(self, f) for f in self.__fields__]
def __str__(self) :
fs = ', '.join('%s=%r' % (f, getattr(self, f)) for f in self.__fields__)
return '[%s %s]' % (self.__name__, fs)
class MyRec2(Record) :
__name__ = 'MyRec2'
__fields__ = 'code', 'cost', 'quantity'
encMyRec2 = encRecord(encChar, encNum2, encNum1)
decMyRec2 = decRecord(MyRec2, decChar, decNum2, decNum1)
x4 = encode(encMyRec2, MyRec2('X', 2495, 3))
print '%r' % x4
print decode(decMyRec2, x4)
Discussion
In the past I had used a special buffer class to marshall data into and out of instead of using iterables and iterators. This approach seems to make better use of the mechanisms already existing in Python. This code makes writing marshallers and unmarshallers for records very simple. One downside is that it hurts debugging a little bit. The stack traces you get when something goes wrong are more complicated than they would be if record marshallers were manually written out by the programmer. Another downside is that marshallers and unmarshallers are written separately even though for correct operation they must be kept in synch and basically encode the same information. A more complicated library could define pairs of marshallers and unmarshallers together and in fact I've tried this approach in the past.
All in all, I am happy with this approach and find it concise and very convenient for prototyping. Look what we've accomplished in just a few lines of code!
from itertools import chain
class Record(object) :
def __init__(self, *args) :
for f,v in zip(self.__fields__, args) :
setattr(self, f, v)
def toList(self) :
return [getattr(self, f) for f in self.__fields__]
def __str__(self) :
fs = ', '.join('%s=%r' % (f, getattr(self, f)) for f in self.__fields__)
return '[%s %s]' % (self.__name__, fs)
def encode(enc, x) :
return ''.join(enc(x))
def encChar(ch):
yield ch
def encNum1(x) :
yield chr(x & 0xff)
def encNum2(x) :
return chain(encNum1(x >> 8), encNum1(x))
def encNum4(x) :
return chain(encNum2(x >> 16), encNum2(x))
def encFixedList(*es) :
return lambda xs : chain(*(e(x) for e,x in zip(es, xs)))
def encRecord(*es) :
enc = encFixedList(*es)
return lambda rec : enc(rec.toList())
def decode(dec, binary) :
return dec(iter(binary))
def decChar(it) :
return it.next()
def decNum1(it) :
return ord(decChar(it))
def decNum2(it) :
return (decNum1(it) << 8) | decNum1(it)
def decNum4(it) :
return (decNum2(it) << 16) | decNum2(it)
def decFixedList(*ds) :
return lambda it : [d(it) for d in ds]
def decRecord(klass, *ds) :
dec = decFixedList(*ds)
return lambda it : klass(*dec(it))
by newsham (noreply@blogger.com) at January 23, 2010 08:57 PM
newsham
Dear Microsoft
I want to use the software I paid for. More accurately, my niece does. We bought her a laptop a few Christmases ago. Now her computer is unusable, which is partly her fault (in the same sense that having graffiti on your car is your fault if you parked in a bad neighborhood). Now I'm helping her reinstall her system to give her a fresh start. Here's the rub: the laptop manufacturer didn't give her an install or recovery DVD with a copy of Windows Vista that we paid for when we bought the computer. (There's even a nice proof-of-authenticity sticker on the bottom of the machine with a license key!) The manufacturer has informed me that they will send me the recovery DVD at no cost, unless you count $24.95 shipping and handling. So this is the situation I'm in: I have to pay $25 and wait a week or two to reinstall a copy of Windows Vista that we already have a legitimate license to use. I'm failing to see what the Windows Genuine Advantage is here. It would cost me less and be less effort to find a pirated copy of the software on the internet and install that instead. Do you see the perverse incentive structure that exists?
With Love, Tim.
by newsham (noreply@blogger.com) at January 23, 2010 06:19 PM
NineTimes - Inferno and Plan 9 News
New alpha driver for Marvell Yukon 2 Ethernet
Erik released a new alpha driver for Marvell Yukon 2 Ethernet NICs.
From: erik quanstrom <quanstro@qua...>
Subject: yukon 2 alpha driver
Date: Sat, 16 Jan 2010 18:15:50 -0500
is anyone interested in testing a marvell yukon 2
driver? i am currently working with a 88e5057
(1186/4b00, dge-560t). but most single-port yukon 2
parts should work fine. if so. please contact me
off list.
- erik
You can install it from contrib with
term% contrib/install quanstro/yuk
This will install two files, etheryuk.c and yukdump.h, into your /sys/src/9/pc directory.
Edit /sys/src/9/pc/pcf and add the following line to the list of Ethernet devices:
etheryuk pci
After that compile your new kernel by running
mk 'CONF=pcf'
Then copy the new kernel into /n/9fat and boot with it.
On my ASUS A8R32-MVP with 88E8053 chip the following new device will now show up:
#l0: yuk: 1000Mbps port 0xDFEFC000 irq 4: 00173137b3d3
I tested it during the last several days and it works great, good work!
Please send your feedback and bug reports to Erik.
New alpha driver for Marvell Yukon 2 Ethernet
Erik released a new alpha driver for Marvell Yukon 2 Ethernet NICs. (See his post at 9fans)
You can install it from contrib with term% contrib/install quanstro/yuk
This will install two files, etheryuk.c and yukdump.h, into your /sys/src/9/pc directory.
Edit /sys/src/9/pc/pcf and add the following line to the list of Ethernet devices: etheryuk pci
After that compile your new kernel by running mk 'CONF=pcf'
Then copy the new kernel into /n/9fat and boot with it.
On my ASUS A8R32-MVP with 88E8053 chip the following new device will now show up: #l0: yuk: 1000Mbps port 0xDFEFC000 irq 4: 00173137b3d3
Please send your feedback and bug reports to Erik.
January 22, 2010
newsham
GOOG hack, MSFT vulnerability and disclosure
The computer security field has been fiercely debating how disclosure should be handled for quite a long time. In the "good old days" there was a cabal of security professionals who shared new found vulnerabilities only with each other and vendors. This was nice if you were in the cabal (think of how easily you can impress your clients if you're a security auditor with access to a large number of unfixed vulnerabilities). Unfortunately it meant many bugs were not fixed for long periods of times. It also made it very hard for newcomers to learn about security and enter the field. In a backlash a large number of security researchers turned to full disclosure policies: when they found a bug they told everyone about it, often including details of how to perform an exploit. This means vendors, good guys and bad guys all got access to this information. This was great if you were a bad guy, or if you were a good guy without access to the cabal. It also put a lot of pressure on vendors to fix vulnerabilities ASAP. Vendors did not much care for this. They pushed for another model called responsible disclosure in which vendors were given advanced notice and only after a reasonable amount of time or when a fix was available would the full details be given to the public. So these are three prominent points on a continuum of disclosure policies, and where you stand on this line is often a matter of religion. So much so that the proponents come up with loaded terms such as "responsible disclosure" to imply that those who don't hold the same view are some how irresponsible.
So back to my point -- the vulnerability used in the google hack was reported to MSFT back in September. They realized this was a very important bug, but decided to release a fix in February, five months later! Clearly "responsible" disclosure gave MSFT a lot of flexibility in postponing the release of this fix. Once the details of the hack were released, MSFT scrambled to issue a release in a much shorter timeline. Did responsible disclosure work well in this instance? I would argue "no". Would full disclosure have worked better? Possibly, but keep in mind that while a fix would have been available much quicker, many more attackers would have had access to this information during that shorter amount of time.
By the way, how did the attackers get access to this exploit? Have attackers penetrated MSFT? Do they have insiders working at MSFT who have access to the vulnerability database? If either of these are true, then I would argue that responsible disclosure gives the advantage to these attackers. I also wonder if a country as powerful as China has insiders working at many of our top software companies. The tech industry employs many Chinese nationals. I don't mean to imply that they are all agents of their government, but it would not surprise me if there were a few agents in the mix. I'm not sure there's a great way to defend against a threat like this. Software is so full of security holes, and a company usually has a large database full of software bugs found internally or reported by external customers. Often times the full impact of a bug is not known and important security bugs are left unpatched for years.
by newsham (noreply@blogger.com) at January 22, 2010 06:31 PM
January 21, 2010
newsham
Big news day
- Obama pushes new bank regulation to limit risk taking by banks preventing commercial banks from engaging in proprietary trading
- Pakistan says it won't cooperate with us.
- A commercial flight is diverted when passengers freak out at the sight of jewish prayer!
- Hillary Clinton asks China to investigate the google hacking incident, upping the pressure on China.
- The music industry is trying to blame piracy for the death of culture. Go figure.
- The supreme court overturns campaign spending limits. Apparently banning companies from making large donation is some sort of illegal government censorship. Its amazing what sort of constitutional rights you have when you're a corporation. You thought we were a corporatocracy before; you ain't seen nothin' yet. On the other hand, think of how much more advertising revenue media will get!
- The campaign to bring the bible to heathens at the point of a gun has been brought to an end.
- The wife and daughter of John McCain have come out in favor of gay marriage.
by newsham (noreply@blogger.com) at January 21, 2010 06:24 PM
OS Inferno
Среда acme на видео
January 20, 2010
January 18, 2010
January 17, 2010
newsham
Scary Asteroid
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,0,0" height="264" width="400"><param name="flashvars" value="webhost=fora.tv&clipid=8779&cliptype=highlight"><param name="allowScriptAccess" value="always"><param name="allowFullScreen" value="true"><param name="movie" value="http://fora.tv/embedded_player"><embed allowfullscreen="true" allowscriptaccess="always" flashvars="webhost=fora.tv&clipid=8779&cliptype=highlight" height="264" pluginspage="http://www.macromedia.com/go/getflashplayer" src="http://fora.tv/embedded_player" type="application/x-shockwave-flash" width="400"></embed></object>
Have a nice day!
by newsham (noreply@blogger.com) at January 17, 2010 06:20 PM
January 16, 2010
January 15, 2010
newsham
The Acme environment in Plan9
<object height="265" width="320"><param name="movie" value="http://www.youtube.com/v/dopu3ZtdCsg&hl=en_US&fs=1&"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="265" src="http://www.youtube.com/v/dopu3ZtdCsg&hl=en_US&fs=1&" type="application/x-shockwave-flash" width="320"></embed></object>
<object height="265" width="320"><param name="movie" value="http://www.youtube.com/v/2vjD_B__SbQ&hl=en_US&fs=1&"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="265" src="http://www.youtube.com/v/2vjD_B__SbQ&hl=en_US&fs=1&" type="application/x-shockwave-flash" width="320"></embed></object>
<object height="265" width="320"><param name="movie" value="http://www.youtube.com/v/cR96WQ6OR00&hl=en_US&fs=1&"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="265" src="http://www.youtube.com/v/cR96WQ6OR00&hl=en_US&fs=1&" type="application/x-shockwave-flash" width="320"></embed></object>
by newsham (noreply@blogger.com) at January 15, 2010 07:05 PM
January 14, 2010
January 13, 2010
newsham
Tectonic movements
by newsham (noreply@blogger.com) at January 13, 2010 12:03 AM
January 11, 2010
January 09, 2010
January 08, 2010
newsham
A protocol development trick
def send(s, **kw) :
s.send(pickle.dumps(kw))
def recv(s) :
return pickle.loads(s.recv(64*1024))
I'm using UDP so I get natural message delimiters. If this was TCP I'd have to wrap the pickled data with a delimiter somehow (ie. prepend a length field). Now sending and receiving messages is straightforward:
send(s, foo=1, bar="test", priority="utmost!", flags=SYN|ACK)
m = recv(s)
print m['priority']
And to make field access a little easier I define a message object so I can use field access instead of dictionary indexing. It also gives me pretty printing:
class Msg(object) :
def __init__(self, *kw) :
self.__names__ = kw.keys()
for n,v in kw.items() :
setattr(self, n, v)
def __str__(self) :
attrs = ' '.join("%s=%s" % (n,getattr(self,n)) for n in self.__names__)
return '[Msg %s]' % attrs
__repr__ = __str__
def recvMsg(s) :
return Msg(**recv(s))
m = recvMsg(s)
print m.priority
print m
So now I can write my protocol in terms of message fields, leave the details of field encoding to later and still have an executable prototype to play with.
btw, anyone know good code formatting options that are compatible with blogger.com (besides http://code.google.com/p/syntaxhighlighter/ which looks pretty god awful ugly)?
by newsham (noreply@blogger.com) at January 08, 2010 08:49 PM
January 07, 2010
newsham
Just because you're a paranoid schizophrenic doesn't mean they're not out to get you
by newsham (noreply@blogger.com) at January 07, 2010 06:15 PM
Factorization of a 768-bit RSA modulus
This paper reports on the factorization of the 768-bit number RSA-768 by the number field sieve factoring method and discusses the implications for RSA.
by newsham (noreply@blogger.com) at January 07, 2010 05:47 PM
January 05, 2010
January 04, 2010
January 03, 2010
January 02, 2010
January 01, 2010
Quod Erat Dragon
qedragon @ 2010-01-01T08:05:00
My new year's resolution is 1280x800. What's yours?
December 31, 2009
newsham
Side channel paper on Nate's blog
PS: if you're into cryptography, security, copy protection or C64's, you should be following Nate's blog.
by newsham (noreply@blogger.com) at December 31, 2009 08:46 PM
Google books on the NewsHour
<script src="http://www.pbs.org/wgbh/pages/frontline/js/pap/embed.js?news01n37f5qd53" type="text/javascript"></script>
by newsham (noreply@blogger.com) at December 31, 2009 07:07 PM
Another seL4 paper
This paper is written for a wider audience and explains the project's accomplishments in a very accessible manner. The paper has a refreshing discussion about the limitations of what they accomplished. It also puts it into context in comparison with other techniques such as testing, using type-safe languages and static code analyzers.
Here are a few interesting quotes:
Does this mean that we have proved seL4 is secure? Not yet, really. We proved that seL4 is functionally correct. Secure would first need a formal definition, which in itself is a wide field and depends on what you want to use the kernel for. Taken seriously, security is a whole-system question, including the system’s human components.After using the initial code, running static analyzers on them and finding and fixing 16 bugs:
Formal verification then found 144 more bugs in the C code, in total 160. This means even though the code appeared to be running just fine for normal application, there were an order of magnitude more bugs lurking in there than the 16 found initially. [...] As mentioned, we also did proofs on the design and the specification level. This was to 95% before the code was written, and we fixed about 150 issues in each. That means, in total we have discovered roughly 460 issues in kernel code, design and specification.
by newsham (noreply@blogger.com) at December 31, 2009 06:50 PM
December 29, 2009
newsham
gsm cracked...
According to Ian Meakin, of mobile encryption firm Cellcrypt, only government agencies and "well funded" criminals had access to the necessary technology.
by newsham (noreply@blogger.com) at December 29, 2009 06:26 PM
newsham
News...
The Obama administration said Monday it would begin routinely penalizing U.S. air carriers for lengthy tarmac delays. [...] Airlines would be fined $27,500 per passenger for violations.Those who don't like the government poking its head into private business can blame the airlines for bringing this on themselves. Twenty-seven thousand is a small price to pay for holding passengers against their will. I would suggest jail time is more appropriate. http://online.wsj.com/article/SB126140826775100235.html?mod=WSJ_hpp_MIDDLTopStories
Can you believe that twitter is profitable? I'm not sure I can. http://news.cnet.com/8301-17939_109-10419569-2.html
Edit: oops, I botched the airline news story URL above. It's fixed now.
by newsham (noreply@blogger.com) at December 29, 2009 03:05 AM
December 28, 2009
newsham
Good 'ole NLS
Here's a one-hour video demonstration of the system from 1968. It was so impressive that it was dubbed "the mother of all demos." See for yourself:
<embed allowfullscreen="true" allowscriptaccess="always" id="VideoPlayback" src="http://video.google.com/googleplayer.swf?docid=-8734787622017763097&hl=en&fs=true" style="height: 326px; width: 400px;" type="application/x-shockwave-flash"></embed>
One of the great things about computer history is that it comes alive in emulators. You can actually boot and run some of these great ancient systems such as 1st and 7th edition UNIX or the CADR lisp machine. The Computer History Museum built an emulator of NLS but, as is too often the case with historic computer software, is having trouble getting the rights to redistribute it.
Thanks to grey for turning me onto this info.
Edit: I'm told there's a good free book that covers this material. I haven't yet read it.
by newsham (noreply@blogger.com) at December 28, 2009 11:09 PM
PRO test
It's sad that american media has been transformed from a powerful world wide power into a sideline youtube commentary. Also, thats my job!
Kudos to the brave men and women in Iran who are keeping the reports coming. I hope they know how many people around the world are rooting for them. We hope you get the right to freely express your opinion free of fear of retribution whether its through the current government or a new one. Here's to you:
<object height="265" width="320"><param name="movie" value="http://www.youtube.com/v/aS-gGYaA8F0&hl=en_US&fs=1&"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="265" src="http://www.youtube.com/v/aS-gGYaA8F0&hl=en_US&fs=1&" type="application/x-shockwave-flash" width="320"></embed></object>
by newsham (noreply@blogger.com) at December 28, 2009 07:00 PM
post-christmas news
AT&T decided that it would be smart to block NYC customers from signing up with iPhone. Hilarity ensues. Btw, am I the only one that finds their ad campaign responding to the verizon "maps" campaign to be completely inane? "Verizon likes to talk about our maps! But they don't even let you talk on the phone while surfing the web! And we have the fastest 3G network! [*] 3G coverage not available in all areas." WTF? Also, don't go talking about your limited 3G coverage in the same ad that is trying to poke at you for having lousy 3G coverage.
Guess who's giving more financial handouts to FNM? You, of course! Nice of them to sneak it in during a quiet vacation week. And the managers won't even accept stock as compensation. And you thought this whole financial crisis thing was over. In unrelated news, FNM is up 17% today.
by newsham (noreply@blogger.com) at December 28, 2009 06:05 PM
December 27, 2009
December 26, 2009
johnny
Intellectual property
Intellectual property: The new war
So the rich fuckers don't slack, eh. Be careful with the fuckers, they'll do anything that their money can permit them to keep you down, latest trend is closing down the doors on open and free intellectual property.
The G5 (or G7 or G20 or whatever was it?) is pushing it's own agenda: ACTA , and of course we shouldn't forget about Genome and software patents.
Honestly, i wouldn't be surprised if you had to pay a tax for having a blue-eyed blond kid in 20 years.
December 25, 2009
Quod Erat Dragon
Root-causing the droid bug
entertained.
You will want to look at the Android tree's
dalvik/libcore/archive/src/main/java/java/util/zip/ZipFile.java
( For contrast, see ./external/zlib/contrib/minizip/unzip.c )
and in particular this comment:
/*
* Find the central directory and read the contents.
*
* The central directory can be followed by a variable-length comment
* field, so we have to scan through it backwards. The comment is at
* most 64K, plus we have 18 bytes for the end-of-central-dir stuff
* itself, plus apparently sometimes people throw random junk on the end
* just for the fun of it.
*
* This is all a little wobbly. If the wrong value ends up in the EOCD
* area, we're hosed. This appears to be the way that everbody handles
* it though, so we're in pretty good company if this fails.
*/
And there's your 64K limit, kinda, sorta mostly.... The code looks for
101010256L == 0x06054B50 ("PK\x05\x06" little endian) which apparently
signals the End Of Central Directory. Anyway... That sequence occurs at
0x0a33ef3 (exploit) and 0x0a2756f (legit). The ZipFile.java code scans
forward from 64K back from the end of the file; the unzip.c code starts at
the end and walks backwards.
How about a magic trick?
I'm going to make this signature validity... disappear!
*WHAM*
TA DA! It's ... gone!
Even more entertainingly, there is in fact some "junk" tossed on the end of
the file at this point, courtesy of Motorola and SignApk (it even tells us
so in the file)... the exploit zip doesn't begin until somewhere near
0x0a27c30 -- 1729 (or so) bytes of junk later, so there's your 63K limit for
this particular exploit.
un-zipping the droid's jar
I'll just give the MD5 and SHA1 sum so you can google for it -- indeed google
has already indexed the MD5 sum to a few likely mirrors.
94a0c30ea9104c2776d042e760bfd716 droid-root.zip
20ad50644644097eb9914cdc18a2527503291d45 droid-root.zip
Anyway, how curious is this file? Well, it's this curious:
% mkdir unzip unjar
% (cd unzip; unzip ../droid-root.zip)
% (cd unjar; jar -xf ../droid-root.zip)
% diff -rq unjar unzip
Only in unjar: bp.img
Only in unjar: mbm.bin
Only in unjar/META-INF: CERT.RSA
Only in unjar/META-INF: CERT.SF
Files unjar/META-INF/com/google/android/updater-script and unzip/META-INF/com/google/android/updater-script differ
Only in unjar/META-INF: MANIFEST.MF
Only in unjar: patch
Only in unjar: rdl.bin
Only in unjar: recovery
Only in unjar/system: app
Only in unzip/system: bin
Only in unjar/system: fonts
Only in unjar/system: lib
Some additional fun facts about this file:
0000000 -- first byte of file, appears to be legitimate META-INF/MANIFEST.MF header & blob
0a22020 or so -- index of legitimate update.zip (META-INF/MANIFEST.MF entry)
0a27586 -- "signed by SignApk"
0a27600 and following -- appears to be HTC Certificate
0a27c3d -- start of next ZIP file? ("PK" and system/ entry)
0a27cc3 or so -- header & blob for /system/bin/su file
0a33970 or so -- index of illegitimate update.zip (META-INF/ entry)
0a3400e -- last byte of file
Observe the following (10648637 is 0x0a27c3d) :
% dd if=droid-root.zip of=test.zip skip=10648637 bs=1
50130+0 records in
50130+0 records out
50130 bytes (50 kB) copied, 0.163549 s, 307 kB/s
% file test.zip
test.zip: Zip archive data, at least v1.0 to extract
% mkdir unzip.sfx
% (cd unzip.sfx; unzip ../test.zip)
Archive: ../test.zip
error [../test.zip]: missing 10648637 bytes in zipfile
(attempting to process anyway)
creating: system/
creating: system/bin/
inflating: system/bin/su
creating: META-INF/
creating: META-INF/com/
creating: META-INF/com/google/
creating: META-INF/com/google/android/
inflating: META-INF/com/google/android/updater-script
error [../test.zip]: attempt to seek before beginning of zipfile
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
If you flip the dd around, the prefix is observed to be a perfectly sane .zip file,
handled by both unjar and unzip in identical ways.
So what we've got is a file which is secretly two zip files concatenated. The index
of the second is mangled so that some tool, namely unjar, blows by it in its search
for the index, but that another tool, namely unzip, does not.
And now for why this matters. Jar file signing is very clever: the list of files
and their hashes is generated and signed and the resulting signature detached and
itself stored in the jar. This allows multiple parties to attest to the contents
of the archive. However, what's happening here is that one tool -- the one verifying
the signature -- sees what it expects to see: the tail of the file (the second zip
file) is outside the archive and therefore doesn't alter the list of files inside
the archive or their contents. So the signature is still believed valid. The file
is then extracted again using a different tool which operates on the
second zip file. At this point, no signature or manifest checks are made, and
your phone is now correctly yours again.
--nwf;
P.S. The reason for the "error" message in the unzip command above is that the
second index has been told that there's this first zip file atop it and so it's
using "back-references" as a kind of compression to allow for more interesting
bits to fit in the exploitable window.
December 24, 2009
newsham
OLPC3

The machine will have no keyboard or connectors (using inductive charging). It will be a full paper-sized plastic display (plastic logic?), low-power (with ARM, I guess no more windows?), have a touch screen that supports a virtual keyboard and cost only $75. Of course OLPC won't be making the machines themselves, they will just inspire the design, but rest assured it will be ready by 2012! Or maybe not..
Still, its an interesting vision. Lets see how much of it comes to pass.
http://www.wired.com/gadgetlab/2009/12/xo-3-concept-a-crazy-thin-tablet-olpc-for-just-75/
http://news.bbc.co.uk/2/hi/technology/8428147.stm
http://www.forbes.com/2009/12/22/tablet-computer-negroponte-technology-cio-network-olpc.html
by newsham (noreply@blogger.com) at December 24, 2009 06:16 PM
OS Inferno
VGA-драйвер
December 23, 2009
December 22, 2009
December 21, 2009
newsham
Mighty morphing faces
<object height="265" width="320"><param name="movie" value="http://www.youtube.com/v/nice6NYb_WA&hl=en_US&fs=1&"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed allowfullscreen="true" allowscriptaccess="always" height="265" src="http://www.youtube.com/v/nice6NYb_WA&hl=en_US&fs=1&" type="application/x-shockwave-flash" width="320"></embed></object>
by newsham (noreply@blogger.com) at December 21, 2009 11:14 PM
Decade review for tech
- Most disruptive device: iPhone
- Biggest makeover: amazon and ec2
- security shock: worms
- biggest folly: SCO/linux
- biggest blunder: MSFT stops IE dev
- longest suicide note: sun micro
- most wanted: jboss and mysql
- survivor: winxp
- linux distro: ubuntu
- worth repeating: big media will stop illegal filesharing
- mixed blessing: "free" (code, information, money)
- most utility: usb
- new microsoft: google
- kleptomania: oracle
- surprise: netbooks
- needed correction: recession
by newsham (noreply@blogger.com) at December 21, 2009 05:59 PM
December 20, 2009
December 19, 2009
December 18, 2009
December 17, 2009
newsham
Drone hacking
Shi'ite fighters in Iraq used software that cost as little as $26 to intercept the video feeds, potentially allowing them to monitor U.S. military operations.It was only a matter of time. When it comes to hacking, America's the country with everything to lose and little to gain. You'd think as a result we'd be more interested in good computer engineering and defensive computer security but that does not appear to be the case.
I wonder if the pentagon is taking their software vendors to task over the breach. Probably not, as procurement officers are probably looking to software vendors for future career opportunities for their post-government life.
by newsham (noreply@blogger.com) at December 17, 2009 06:29 PM
December 16, 2009
December 15, 2009
research!rsc
Gofmt
One of the more controversial aspects of the Go project
is that Go has a canonical program format, implemented
by a program (gofmt) that will pick up any
Go program and reformat it in the canonical format.
Every program in the Go source tree has been formatted
with gofmt. The code review and revision control tools
check that any new code is gofmt-formatted too.
(I am avoiding the word “style,” because gofmt doesn't enforce style, in the sense of The Elements of Programming Style. Running code through gofmt changes the layout but not whether it's a well-written, elegant program. Referring to these mundane formatting rules as style elevates them far beyond their importance. The only style that gofmt imposes is consistency.)
The most obvious benefit of using gofmt is that when you open an unfamiliar Go program, your brain doesn't get distracted, even subconsciously, about why that brace is in the wrong place; you can focus on the code, not the formatting. But there are many more interesting uses for gofmt. Gofmt can take any file in the Go source tree, parse it into an internal representation, and then write exactly the same bytes back out to the file. There are two reasons this works: the primary reason is that a lot of effort went into gofmt, but an important secondary reason is that gofmt only has to worry about one formatting convention, and we've agreed to accept that as the official one.
Once you have a tool that can parse and print a program losslessly, it's easy to insert mechanical processing in the middle, to transform the program between parsing and printing. Want to change something about the language? Change it in gofmt and reformat the source tree. This has already happened multiple times since the public launch.
The December 9 release of Go added a new built-in function
copy(dst, src) that copies data from one
slice to another. (It's a built-in both because it's such a
common operation and because Go's lack of pointer
arithmetic makes it hard to write the overlap check yourself.)
The new copy replaced the library routine
bytes.Copy, which was similar but
only worked on byte slices.
Gofmt has an option, -r, to specify a rewrite
rule in the form
pattern -> replacememt.
In the pattern and replacement, single-letter lower case names
serve as wildcards matching arbitrary expressions.
Want to convert from bytes.Copy to copy?
gofmt -w -r 'bytes.Copy(d, s) -> copy(d, s)' *.go
(The -w flag writes the new version of the program back to the input file.)
The December 9 release also added support for writing
x[i:] instead of x[i:len(x)].
Want to convert programs to use the new syntax when possible?
gofmt -w -r 'x[i:len(x)] -> x[i:]' *.go
These two minor changes were upstaged by discussion of a larger syntactic change, the elimination of line-ending semicolons. Again, gofmt is crucial to making the switch. The syntax being considered is mostly, but not completely, backwards compatible with the current version of Go. Normally, that would mean having to make the compilers support both syntaxes for some time, during which the compiler might warn about programs depending on the old syntax, expecting users to fix their programs. Instead, having one tool whose job it is to rewrite programs means that all the other tools can focus on other things. Gofmt can fix programs mechanically, and the compilers and other tools can drop the old syntax instead of having to support both.
I'd like to see more tools built using gofmt and the
libraries it uses (go/parser and go/printer).
For example, there's no reason an IDE plugin couldn't sit on the same
libraries to build interesting refactoring tools.
Gofmt's -r is just the beginning.
It's going to be exciting.
One final note. Being able to pick up a program and write it back out, preserving comments, is not an easy task in any language. Robert Griesemer deserves all the credit for the huge effort to make gofmt handle real programs so well.
December 14, 2009
Anant Narayanan
Figuring out the Goo.gl API
UPDATE: ‘Fatalis’ has pointed out in the comments that the POST should be made to http://goo.gl/api/url with User-agent set to ‘toolbar’. The code now works, Yay!
Google just announced their own URL shortening service. Their service can only be used from the toolbar or FeedBurner, and I don’t particularly like adding extra toolbars to my browser. Maybe I can figure out a way to use their service from the command line?
I downloaded the toolbar XPI, unzipped it and peeked inside. Horribly indented JS awaited me. Nothing jsbeautifier couldn’t fix though. Few minutes later, I arrived at this readable JS function:
var getUrlShorteningRequestParams = function (b) {
function c() {
for (var l = 0, m = 0; m < arguments.length; m++)
l = l + arguments[m] & 4294967295;
return l
}
function d(l) {
var m = String(l > 0 ? l : l + 4294967296);
for (var o = 0, n = false, p = m.length - 1; p >= 0; --p) {
var q = Number(m.charAt(p));
if (n) {
q *= 2;
o += Math.floor(q / 10) + q % 10
} else o += q;
n = !n
}
m = m = o % 10;
o = 0;
if (m != 0) {
o = 10 - m;
if (l.length % 2 == 1) {
if (o % 2 == 1) o += 9;
o /= 2
}
}
m = String(o);
m += l;
return l = m
}
function e(l) {
for (var m = 5381, o = 0; o < l.length; o++) m = c(m << 5, m, l.charCodeAt(o));
return m
}
function f(l) {
for (var m = 0, o = 0; o < l.length; o++) m = c(l.charCodeAt(o), m << 6, m << 16, -m);
return m
}
var i = e(b);
i = i >> 2 & 1073741823;
i = i >> 4 & 67108800 | i & 63;
i = i >> 4 & 4193280 | i & 1023;
i = i >> 4 & 245760 | i & 16383;
var h = f(b);
var k = (i >> 2 & 15) << 4 | h & 15;
k |= (i >> 6 & 15) << 12 | (h >> 8 & 15) << 8;
k |= (i >> 10 & 15) << 20 | (h >> 16 & 15) << 16;
k |= (i >> 14 & 15) << 28 | (h >> 24 & 15) << 24;
j = "7" + d(k);
i = "user=toolbar@google.com&url=";
i += encodeURIComponent(b);
i += "&auth_token=";
i += j;
return i
};
So, I call getUrlShorteningRequestParams("http://www.kix.in/"); to get "user=toolbar@google.com&url=http%3A%2F%2Fwww.kix.in%2F&auth_token=78925814685". I see in their code that they do a POST request to the service to obtain a JSON return value that would contain the short URL. I punch it in using cURL:
$ curl -v -d "user=toolbar@google.com&url=http%3A%2F%2Fwww.kix.in%2F&auth_token=78925814685" http://goo.gl/ * About to connect() to goo.gl port 80 (#0) * Trying 74.125.19.102... connected * Connected to goo.gl (74.125.19.102) port 80 (#0) > POST / HTTP/1.1 > User-Agent: curl/7.19.7 (i386-apple-darwin10.2.0) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3 > Host: goo.gl > Accept: */* > Content-Length: 77 > Content-Type: application/x-www-form-urlencoded > < HTTP/1.1 405 HTTP method POST is not supported by this URL
Oops! Well, not really, the URL shortener from the toolbar doesn’t work either, I just get the full URL whenever I try to “share” something. Has anybody actually generated a real goo.gl short URL yet?
Their auth_token parameter seems completely superfluous to me as it is generated from the URL itself. Don’t we all know security by obscurity doesn’t work
newsham
C9 Lectures
Erik Meijer and Graham Hutton bring us a 13-part lecture on functional programming based on Hutton's
book. I haven't viewed em yet, but if they're as good as the book, should be a hit. I'm hoping to watch them asap.
Edit: btw, here's some motivation for why anyone would care about this stuff.. Interesting observation in the abstract: http://scyourway.supercomputing.org/conference/view/spost112_1
Edit: beware, Meijer wanders into many complex side topics in the lectures I've seen so far and does not focus on the material that may be difficult for a new user. This lecture series might not be the best introduction for new users.
by newsham (noreply@blogger.com) at December 14, 2009 08:00 PM
December 13, 2009
December 12, 2009
OS Inferno
Visopsys + Inferno?
December 11, 2009
December 10, 2009
newsham
Crazy good surf
by newsham (noreply@blogger.com) at December 10, 2009 06:02 PM
research!rsc
Regular Expression Article #2
In January 2007 I posted an article on my web site titled “Regular Expression Matching Can Be Simple And Fast.” I intended this to be the first of three; the second would explain how to do submatching using automata, and the third would explain how to make a really fast DFA. I got distracted for a few years but have finally finished the second article.
http://swtch.com/~rsc/regexp/regexp2.html
Simple (easy to follow, I hope) code demonstrating the techniques from the article can be found at http://code.google.com/p/re1/.
December 09, 2009
newsham
OS Inferno
[софт] Styx-браузер для Android
[tips] Сниффинг
; cat /net/ether0/0/data > /usr/inferno/sniff_net
December 08, 2009
OS Inferno
[tips] Софтверный RAID
; bind -a '#b' /dev
; echo mirror m0 /dev/sdC0/data /dev/sdD0/data >/dev/ds/ctl
; echo mirror m1 /dev/sdC1/data /dev/sdD1/data >/dev/ds/ctl
; echo inter data /dev/ds/m0 /dev/ds/m1 >/dev/ds/ctl
; disk/kfs -f /dev/ds/data
И ведь любой из учавствующих в RAID дисков может быть импортирован с другой машины.
research!rsc
Data Structures Go Programs
I've been writing about Go internals, which I'll keep doing,
but I also want to spend some time on Go programs.
Go's reflect package implements data reflection.
Any value can be passed to reflect.NewValue (its argument type is
interface{}, the empty interface) and then manipulated and
examined by looking at the result, a reflect.Value.
Typically, you use a type switch to determine what kind
of value it is—a *IntValue, a *StructValue, a *PtrValue, and so on—and then walk it further.
This is how fmt.Print
prints arbitrary data passed to it.
Go allows programs to tag the fields in struct definitions with annotation strings. These annotations have no effect on the compilation but are recorded in the run-time type information and can be examined via reflection. The annotations can be hints for reflection-driven code traversing the data structures. This blog post shows two examples of such code: an XML parser that writes directly into data structures, and a template-based output generator that reads from them.
An XML Linked List
The Go XML package implements a
function Unmarshal which parses XML directly into data structures.
As a contrived but illustrative example, here is an XML encoding of a linked list:
var encoded = `
<list><value>a</value>
<list><value>b</value>
<list><value>c</value>
</list>
</list>
</list>
`
(That's a backquoted multi-line Go string literal.)
Given this Go data structure definition:
// http://gopaste.org/view/private:7v2CQ type List struct { Value string; List *List; } func (l *List) String() string { if l == nil { return "nil" } return l.Value + " :: " + l.List.String() }
we can invoke xml.Unmarshal to
turn the XML above into an instance of this data structure:
func main() {
var l *List;
if err := xml.Unmarshal(strings.NewReader(encoded), &l); err != nil {
log.Exit("xml.Unmarshal: ", err)
}
fmt.Println(l)
}
This program prints a :: b :: c :: nil.
The call to xml.Unmarshal walked the XML
and the data structure at the same time, matching
the XML tags <value> and <list>
against the names of the structure fields and allocating
pointers where necessary.
There are decisions to be made when matching
XML against the data structure that aren't easily expressed
with just the usual struct information.
For example, we might want to make the value an
attribute instead of a sub-element. Because attributes
and sub-elements might use the same names, the XML
package requires struct fields holding attributes to be tagged
with "attr".
This data structure and XML encoding would also work
in the program above:
// http://gopaste.org/view/private:2yR7E type List struct { Value string "attr"; List *List; } var encoded = ` <list value="a"> <list value="b"> <list value="c"/> </list> </list> `
If a field has type string, it gets the
character data from the element with that name,
but sometimes you need to accumulate both
character data and sub-elements.
To do this, you can tag a field with "chardata".
This data structure and XML encoding would also work:
// http://gopaste.org/view/private:U3ews type List struct { Value string "chardata"; List *List; } var encoded = `<list>a<list>b<list>c</list></list></list>`
The annotations are trivial but important: where the encoding and the data structures are not a perfect match (and they rarely are), the annotations provide the hints necessary for the XML package to make them work together anyway.
Code Search Client
Let's look at real XML data. Google's Code Search provides regular expression search over the world's public source code. The search [file:/usr/sys/ken/slp.c expected] returns a single, famous result. Instead of using the web interface, though, we can issue the same query using the Code Search GData API and get this XML back (simplified and highlighted for structure):
From the API documentation or just by eyeballing, we can write data structures for the information we'd like to extract:
type Feed struct {
XMLName xml.Name "http://www.w3.org/2005/Atom feed";
Entry []Entry;
}
type Entry struct {
XMLName xml.Name "http://www.w3.org/2005/Atom entry";
Package Package;
File File;
Match []Match;
}
type Package struct {
XMLName xml.Name "http://schemas.google.com/codesearch/2006 package";
Name string "attr";
URI string "attr";
}
type File struct {
XMLName xml.Name "http://schemas.google.com/codesearch/2006 file";
Name string "attr";
}
type Match struct {
XMLName xml.Name "http://schemas.google.com/codesearch/2006 match";
LineNumber string "attr";
Type string "attr";
Snippet string "chardata";
}
The XMLName fields are not required.
The XML package records the XML tag name in them.
If there is a string annotation (as there is here),
then the XML package requires that the XML element being
considered has the given name space and tag name.
That is, the annotation is both documentation and
an assertion about what kind
of XML goes into this structure.
A full Atom GData feed has many other fields, but these are the only ones that we care about. The XML parser will throw away data that doesn't fit into our structures.
We can fetch the feed via HTTP and unmarshal the HTTP response data directly into the data structures:
r, _, err := http.Get(`http://www.google.com/codesearch/feeds/search?q=` +
http.URLEscape(flag.Arg(0)));
if err != nil {
log.Exit(err);
}
var f Feed;
if err := xml.Unmarshal(r.Body, &f); err != nil {
log.Exit("xml.Unmarshal: ", err);
}
and then print the results:
for _, e := range f.Entry {
fmt.Printf("%s\n", e.Package.Name);
for _, m := range e.Match {
fmt.Printf("%s:%s:\n", e.File.Name, m.LineNumber);
fmt.Printf("%s\n", cutHTML(m.Snippet));
}
fmt.Printf("\n");
}
(The cutHTML function returns its argument
with HTML tags stripped: the
m.Snippet is HTML but we just want to print text.)
Compared to the xml.Unmarshal, that loop
with all the fmt.Printf calls sure looks clunky.
We can use a reflection-driven process to print the data too.
The template package is a Go implementation
of json-template.
We can write a template like:
var tmpl = `{.repeated section Entry}
{Package.Name}
{.repeated section Match}
{File.Name}:{LineNumber}:
{Snippet|nohtml}
{.end}
{.end}`
and then print the data using that template:
t := template.MustParse(tmpl, template.FormatterMap{"nohtml": nohtml});
t.Execute(&f, os.Stdout);
The formatter map definitions arrange that the
{Snippet|nohtml} in the template run the
data through the nohtml function when printing.
Both variants of our program produce the same output:
http://www.tuhs.org Archive/PDP-11/Trees/V6/usr/sys/ken/slp.c:325: * You are not expected to understand this.
Full source code is at http://gopaste.org/view/private:7ybCj
Parsing HTML
I still have three more tricks up my sleeve.
First, in a pinch and if the HTML is
mostly well formatted or you're just lucky, the XML parser
can handle HTML too.
Second, when the XML parser walks into a struct field, if the field
is a pointer and is nil, it allocates a new value, as we
saw in the linked list example. But if the field is a non-nil
pointer already, the parser just follows the pointer.
The same goes for growing slices.
Third, if the XML parser cannot find a struct field for a particular
subelement, before giving up it looks for a field named Any
and uses that field instead.
Combined, these three rules mean that this structure:
type Form struct {
Input []Field;
Any *Form;
}
type Field struct {
Name string "attr";
Value string "attr";
}
can be used to save a tree of form and other
information from a web page. When you unmarshal into it,
you get a tree showing the structure of the web page,
with each element becoming a *Form
because the parser used the Any field.
But who wants to walk that tree just to
find the form fields?
Instead, we can set the Any pointer to
point back at the Form it is in.
Then as the XML parser thinks it is walking the data structure,
in fact it will be visiting the same Form
struct over and over.
Each time it finds an <input>, the parser
will append it to the Input slice.
// http://gopaste.org/view/private:b8Ya2 func main() { r, _, err := http.Get(`http://code.google.com/p/go/`); if err != nil { log.Exit(err); } var f Form; f.Any = &f; p := xml.NewParser(r.Body); p.Strict = false; p.Entity = xml.HTMLEntity; p.AutoClose = xml.HTMLAutoClose; if err := p.Unmarshal(&f, nil); err != nil { log.Exit("xml.Unmarshal: ", err); } for _, fld := range f.Input { fmt.Printf("%s=%q\n", fld.Name, fld.Value); } }
This program prints:
q="" projectsearch="Search projects"
Further Reading
There are a few more examples of this technique in the standard Go libraries. The net package's DNS client uses this technique to build generic packing and unpacking routines for DNS packets. The JSON package provides approximately the same functionality as the XML package, but with JSON as the wire format. In fact, JSON is arguably more suited to the task. (Remember: “the essence of XML is this: the problem it solves is not hard, and it does not solve the problem well.”) The ASN1 package also works this way, though its generality is limited: it implements just enough to parse SSL/TLS certificates. There are also other ways to use reflection, which will have to wait for future posts.
It's actually a little surprising how often this technique gets used in the Go libraries. Part of the reason is that it's a shiny new hammer, but it seems to be a very useful shiny new hammer, because so much of modern programming is talking to other programs over a variety of wire encodings.
Unmarshalling directly into data structures via reflection, with just a few necessary hints as annotations, can be implemented in other languages, but the examples I've seen are not nearly as elegant or as concise as the Go versions.










































































































