Ex-goat herder. Living on a planet in space. Once and future thing. And this is my blog. It has my name on it, and there are dates, the blog-spoor are everywhere.

the year of poop on the desktop



So last month an idea surfaced from @j4mie for an alternative data format: poop separated values (PSV). Here’s the complete spec.

PSV Spec

At first, I laughed. Then, maybe I cried a little – 2016 went in a kind of 💩y direction at the end there. Finally I started thinking. I realized that PSV is a brilliant idea, and here’s why.

What we’re talking about here is the :poop: emoji, standarized as U+1F4A9 PILE OF POO in Unicode 6.0. Look at that unicode for a minute: U+1F4A9. That’s a big number. That’s outside the Basic Multilingual Planes my friends, and into the Astral Planes of the unicode standard. You have to have your act together to deal with this. For example, Mathias Bynens covers all the muddles javascript gets into with poop in Javascript has a Unicode Problem. This is not the 128-odd ASCII characters your grandmother grew up with.

So isn’t that an argument not to use an astral symbol as a separator? Let’s reflect on the failings of CSV. Jesse Donat has a great list in a piece called Falsehoods-Programmers-Believe-About-CSVs, mirroring similiar lists about names, geography and so on. The first 8 falsehoods are all encoding related. With CSV, if your data is a table of numbers, you don’t really have to think about encoding at all. That’s nice right up until the moment a non-ASCII character sneaks in there it all goes pear shaped. But if we lead with a mandatory poop symbol, from an astral plane no less, no-one is going to be able to punt on the encoding issue. You have to get it right up front.

The logical conclusion of this idea would be to borrow a string like Iñtërnâtiônàlizætiøn☃💩 from unit tests and use that as the delimiter. But 💩 alone gets us a good way there, and looks cleaner. Err.

Here’s an example of PSV in action. I’m editing a PSV file called checklist.psv that lists my current goals in life. I’m using emacs to edit the file, and git to view differences against a previous version.

PSV and daff

I’ve configured git here to use daff to view tabular differences cleanly. I’m doing this on my phone because phones currently excel at showing emoji – the same thing on a laptop also works fine but the poop is less cheerful looking.

One danger with PSV is that people could get sloppy about quoting rules. With CSV, you’ve a good chance of seeing a comma in your data, so you deal with quoting sooner rather than later to disambiguate that. The business data I’ve seen in CSV form has never had 💩 in it, so I could imagine someone skimping on quoting. One solution for that is for more of us to put poop in our names and transactions, Little Bobby Tables style (xkcd, @mopman’s company).

There aren’t a lot of programs supporting PSV yet. So far as I know, daff is the first. The purpose of daff is making tabular diffs and helping with version control of data, but until format converters crop up you can use it to convert to and from psv as follows:

pip install daff           # or npm install daff -g
daff copy foo.csv foo.psv  # convert csv -> psv
daff copy foo.psv foo.csv  # convert psv -> csv

Or you can write into the text boxes at the start of this post :-).

releasing a library for many languages using Haxe

npm/gem/pypi/php packages

Every programming language is a special snowflake with its own idiosyncratic beauty. Porting code from one language to another is an art, requiring dodging and weaving to give idiomatic results. If a library you’d like to use hasn’t been ported to your language, one option is to use a foreign function interface (FFI). A lot of reference implementations get written in C for this reason. The result is definitely not a thing of beauty, but it works.

Haxe gives another option to a library writer who needs to support communities using different languages. We can write the bulk of the library in Haxe, have that automatically transpiled to the languages we care about, and maybe add a little hand-written code in each language to make the API feel comfortable. This scales the effort involved way down.

For example, I wrote the daff library in Haxe and publish it to:

It turns out that a bunch of PHP users showed up, giving great feedback. That’s the target I personally know least about and would never have gotten around to supporting without Haxe.

The Ruby language was the one I personally cared most about at the time I started this. Ruby isn’t supported by Haxe, but it turns out to be surprisingly easy to add a target to Haxe that is “good enough” at least to translate code that is just a bunch of logic and algorithms (I did this for ruby here).

There’s an important downside to this approach though: you may not get as many pull requests. Users are not likely to be familiar with Haxe (yet), so working with the source will be a challenge for them. Haxe is a very straightforward, “common denominator” language to read and write – but it is a new language.

How square am I

Let me count the ways:

  • The icon my RSS client uses is of a newspaper printed on paper. RSS, RSS client, newspaper - that counts for three ways right there.
  • My t-shirts say nothing witty. I’m currently wearing one emblazoned with an enigmatic logo for a local bike/walk event and more importantly (if typography is any guide) THE LOGOS AND NAMES OF ALL ITS SPONSORS. There’s a white stain from some goop my daughter spilled on me that I half-heartedly and ineffectually tried to rub off with my nails. Update: I changed t-shirt.
  • My laptop has no funky stickers from events or projects. All there is, is a fading Intel Centrino Inside sticker it came with, that has been gradually rotating under my hand. It is now at a 30 degree angle, with traces of exposed glue on the upper side. I don’t plan to take any action about this anytime soon. Update: the sticker fell off.
  • I don’t have an interesting phone. My phone is as dumb as they come, and not in a hip way, more in a total-neglect and out-of-touchy kind of way. 2016 Update: gave in, now editing this in emacs on an new phone so I don’t even have the retro thing going for me any more. argh how do i type ctrl x ctrl s oh right volume down.
  • I’ve never changed my name on twitter. Why are people doing that? Update: I get this now, thanks to pikesley Update: I mean Only Zuul Update: I mean Galactus of the Left BIG PIPES Nun Of The Above Taylor's Wift Human Blockchain Unicode Batman Lord of the Flues Head of Snorkelling 6MillionBitcoinMan Sharing Economist Lol Cream Peter Gunn Jolly Local Swagman Dandy Highwayman Cubic's Rube Cyber with Rosie Cyber with Roadies Institute of Codeine Santos L. Halper Hugh Jif True Bill Clay Jeremy Kylo Ren New Year's Steve Triangle Man Cognitive Sourpuss Manic Bitcoin Miner Null of Kintyre Safety Third Phineas Gage Thousand-yard Stairs London Supercloud register.register WinstonNilesRumfoord Sabre Wulf Cable Select ISA Bus Sides of March Click Dummy Fatuous Pauper Ringlefinch Alonzo Mosely Ian Bothan R Dweeno Horse it's June When Devs Cry Roko's Obelisk Riemann's Zero Penn Rose-Tile Belouis None Rush Goalie 2-Tractor Auth All Is Aflame Jason Bourne Shell SCSI Terminator X bonsaikitten Kid Charlemagne Fronk-en-shteen Metric Martyr Billy Yum-Yum 2x2 Full-Sack Developer Konix Speedking Metropolitan Liberal Glidd of Glood Sexy Cartoon Fox Elite: Liberal Activist Grudge Gee Suswept Big Dope Shill Britain’s Best Tree Rogue Nun Deus X-Wing Ice-Cream Fan Man of Leisure SCMODS R Dweeno Alternative Fax Auntie Kithera Straight Banana iso8601 or GTFO Fake Gnus.
  • The things I’m putting on this list and not putting on it no doubt reveal assumptions I’m making about what is hip that confirm further my squaritude in ways I can’t even imagine.
  • I totally get grumpy about the dumb things that kids are doing these days.
  • I totally get grumpy about how much sense the dumb things that kids are doing these days ends up making in the end.

Diff and merge CSV files in your git client

Sometimes, you want to version-control your data. As a programmer, many of us are used to putting everything in git. For large datasets, that is currently a recipe for sadness, but smaller ones can work just fine.

There are a few hurdles. CSV files have a special tabular structure that git knows nothing about. This means that diffs will be noisier than they need be, and that git may see conflicts when merging where there are none.

For diffs, James Smith has a great explanation and a good start at a solution. In the git client, he proposes a custom git diff driver that understands CSV structure. On the server side, he shows how to tweak gitlab or (via a plugin called CSVHub) github to get pretty diffs using the daff library.

For merges, on the client side, the coopy library has for some time provided a similar merge driver to let git understand and use CSV structure. As of today, daff can do the same, and it is much easier to install.

$ npm install daff -g
$ daff git csv

Once that is installed, you’ll get nice diffs produced by the same library James used for his github plugin:

Random diff

And you’ll get nice merges too. Let’s look at an example.

Suppose we have this table stored in digi.csv:


And in one branch we correct thre to three, and in another branch we correct 33 to 3:

one,1	   	  one,1	   
two,2	   	  two,2	   
three,33   	  thre,3	   
four,4	   	  four,4	   
five,5       	  five,5     

If we try to merge these files in vanilla git, we’ll get an ugly conflict:

Auto-merging digi.csv
CONFLICT (content): Merge conflict in digi.csv
Automatic merge failed; fix conflicts and then commit the result.

And the CSV file is no longer a valid CSV file, which is unfortunate if we’re using a CSV-aware editor for it:

<<<<<<< HEAD
>>>>>>> 634275495ecd86c287e292e2719e89a9c1188ed1

With a CSV-aware merge driver, we get:

Auto-merging digi.csv
Merge made by recursive.
 digi.csv |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

and the changes are correctly merged:


What if there was a real conflict? Suppose we replaced thre with three in one branch but thirty-three in another? We are told:

1 conflict
Auto-merging digi.csv
CONFLICT (content): Merge conflict in digi.csv
Automatic merge failed; fix conflicts and then commit the result.

And the conflicted file looks like this:

"((( thre ))) thirty three /// three",33

This remains a valid CSV file, and so can be edited in a CSV-aware editor - we aren’t suddenly kicked out into needing a text editor.


Many years ago, I had a list of hobby projects I worked on from time to time, each with a little summary that began: “Until Google solves this problem nicely, …” Most of these problems have now been solved, except this one:

I’d like to be able to communicate with aliens over great distances. Until Google solves this problem nicely, I’m working on a cosmic OS.

So Google hasn’t yet sorted this out, but Hans Freudenthal made a great start back in 1960 with Lincos, a “Language for Cosmic Intercourse.” Lincos starts out in a by-now conventional way (though it was inventing the conventions) with 35 pages describing a message for teaching basic math from first principles. Then it moves on to 12 pages on time. Then (and this is where things get very interesting) a whopping 79 pages on behavior, with imaginary conversations between imaginary personalities called Ha and Hb.

Chapter III: Behavior

Ha and Hb discuss mathematics, since that’s about the only topic for conversation, but that is arbitrary. In their discussions, they introduce useful ideas such as good and bad (in the sense of constructive versus non-constructive). Now we are getting somewhere.

I was struck by the value Freudenthal was able to get from descriptions of extremely basic conversations, and wondered, what could we communicate through richer interactions? What if we described simulated environments that could actually be evaluated, and played forward or reversed, to see full simulated encounters take place? That was the seed for CosmicOS.

The idea with CosmicOS is to start with math, as Freudenthal did, and then build from there to a basic programming language, and then from there to programs and simulations. CosmicOS compiles down to a series of four arbitrary symbols that could be encoded and transmitted any way we like:

CosmicOS as digits

In human-readable form, it looks kind of Lisp-y, since that happened to be the syntax that introduced least complications. CosmicOS is communicated as a long series of definitions and demonstrations:

Human-readable CosmicOS

The initial language isn’t super important, because we quickly bootstrap to any language we want. At the time I was writing this part, I was keen on Java, so I wrote a translator for it, targeting what has to be the least efficient JVM ever written.

   (field q ((int) new))
   (method <init>-V
     (lambda () /
      let ((vars / cell new / make-hash / vector
                    (pair 0 (self)))
           (stack / cell new / vector)) /
      state-machine (vars) (stack) / ? jvm / ? x / cond
         ((= (x) 0) (jvm aload 0))
         ((= (x) 1) (jvm invokespecial <init>-V 0 0))
         ((= (x) 2) (jvm aload 0))
         ((= (x) 3) (jvm iconst 0))
         ((= (x) 4) (jvm putfield q (int)))
         ((= (x) 5) (jvm return))
         (jvm return))

Then I wrote a little maze game in Java, shoved it in the message, and promptly dropped the whole project for several years :-). But now I’m back and fiddling with it again, mostly at my son’s goading. I’ve brought the project up-to-date enough to be able to get pull requests. You should contribute! You know you want to.

And just so its clear: I don’t have any particular belief in extraterrestrials or any special reason to be interested in contacting them. It is an interesting puzzle though, figuring out all the different ways we might try to do so. You may also want to check out the recent Archaeology, Anthropology, and Interstellar Communication book.

The Data Commons Co-op

The Data Commons Co-op is a quirky start-up that I’ve been helping out with. Its job is to maximize the impact of the data held by its members, and reduce costs in managing it. Its members are “alternative economy” organizations of all types. Dan Nordley calls it “perhaps the geekiest of all cooperative organizations on the planet!”

DCC Retreat

The infrastructure for collaborative data projects could be a lot more fun than it is now. Open Data initiatives are pushing things forward quite a bit, primarily with government data in mind. That is sort of a top-down direction of data flow. We’re looking at bottom-up, grass-roots economic organizing. Worker co-ops, buying clubs, community gardens, time banks, and so on. There’s a lot of overlap in communities, and potential for network effects. The Data Commons Co-op is a way to pay for the infrastructure that every one needs and no-one can make happen alone. So far we’ve produced a simple diff format for tables documented on the Data Protocols site (some background in an Open Knowledge Labs post), along with two programs called daff and coopy for comparing and merging table versions. Beyond the technology, we’re also figuring out how to a culture of sharing can work in the economy. There’s a lot of reflexive data-hoarding and hiding that goes on, which is totally understandable. For individual organizations, the cost of thinking about all the issues around sharing data can outweigh by far any potential benefit. Hopefully the DCC can tilt that equation!


I wrote daff to better visualize diffs between tables (daff = data diff). You don’t need this if you work with append-only data, for example a stream of events churned out by a sensor or bureaucracy. But if you have a collection of assertions that can change with time or need correcting, then data diffs are handy.

bridge diff

daff can be used from that command line, as a library, or on github, using James Smith’s CSVHub. CSVHub can convert a diff like this:

bus diff line based

to something like this:

bus diff

Fragments of identity

Hello good evening and welcome. I'm so totally going to make a website one of these days. In the meantime, here are some fragments of my identity.