June 23, 2009
Jason Parker-Burlingham — Where's that toad when you need it?
Ugh, houseflies. I would have preferred a more interesting subject to explore the borage, but at least it did not fly away when the flash went off.
June 22, 2009
Anthony Towns — Side project #1: Pageant
So as per my post from a week ago, here comes the description of my first little side project. But first a quick reiteration of the aim: I’m trying to get a feel for what it’s like actually doing a tech startup; so not charging for my time, but rather making something once that I can then sell repeatedly without having to do a lot more work. This is intended to make me more experienced rather than wealthy, so “success” means learning something, rather than making much money. As a consequence I’m aiming for business ideas that are in the bad-to-mediocre range, that will nevertheless involve some interesting/useful technology. That way if the business part goes badly, I don’t feel like I’ve screwed up a chance to make a bazillion dollars, or wasted my time doing something pointless.
So the first interesting-tech/mediocre-business idea I have is related to popcon. I like to think a comment I made once helped inspire popcon’s existance back in the day:
I think It’d be interesting to have a debian-survey style package that when installed, informs the `project’ (stats@debian.org?) who’s using which packages. This would allow us to get a *much* better indication on which packages’s are in fact moderately stable and tested, and which are just gathering dust; and give us a better idea of what’s appropriate for inclusion in stable and/or unreleased.
Sadly that mail disappeared from the web (it was in the archives mentioned at the bottom of one of my initial posts to debian-devel regarding (what became) the testing suite, but disappeared after an upgrade/reinstall of www.debian.org) — but it was nominally in the public domain as of late July 1998, and lo and behold, popularity-contest appeared some three months later, doing everything I’d thought of and more. (For all I know, my comment played absolutely no part in Avery’s implementation, but I still like to think it did :)
Anyway, cool as popcon (and my original idea!) is, there’re interesting ways you could extend it, getting more information, and doing more with it. You could, for instance, survey more information about packages — what version’s installed would give you hints about how many people are pulling from backports, or mixing stable and unstable, or Debian and Ubuntu; or checking conffiles against their original md5sum might give you useful information about how often the default configuration is sufficient. Or you could analyse the information more thoroughly — eg, seeing if there are any unexpected correlations between people who use particular combinations of packages, or doing a netflix-like “I see you use package foo, many other people who use it also use bar, maybe that might be worth investigating.” (I once tried to do that sort of analysis on the popcon data, but all I ended up with was a pretty animated gif, that apparently crashed some people’s browsers… Red dots were systems, blue dots packages, with a package being installed on a system implying attraction, and uninstalled applying repulsion)
You could also gather completely different data — like information about the hardware, or things like the default language or timezone, or potentially even things from logs. That would let you answer questions like “do many people run Debian on HP hardware?” or “which IBM hardware is popular with Linux users?” which might influence future hardware development or purchases; or tell you surprising things about where Linux is actually being used; or give you some feedback on questions like “is the OOM killer a common occurence?” or “is IPv6 adoption actually going anywhere?”
As well as just gathering data from otherwise passive users, you could also use the data collection as an opportunity to make introductions between users — having established you’re running Debian and have a particular Intel graphics card, you could be automatically given the address of a section of the Debian wiki that’s dedicated to issues with that card; with the idea being that you can see any helpful solutions other users have already come up with to problems you’re having, or leave your own tips for future users. The same principle potentially applies to other sorts of data: if you have an old version of wordpress installed, it might be reasonable to point you at some security alerts that apply to it, or having determined you’re running Debian on some HP server, you might get directed at some updated management software that enables some extra features.
Another interesting improvement I think you could make is to provide ways users can aggregate and anonymise their own data. Even in the age of social networks and ubiquitous transparency, managing privacy of this sort of data is important: it would be spectacularly bad to provide a website that told people exactly which machines were vulnerable to which secuirty exploit, but that’s exactly what a list of which machines have which versions of which packages installed would provide. The popularity-contest software goes to some lengths to avoid that, by identifying data against a randomly generated UUID rather than an internet address, email or username; by not storing detailed information about package versions; and by restricting who has the ability to run any detailed analysis on the data. But you can go further than that by aggregating and filtering the data even before it makes its way to a centralised server — eg, rather than have each individual machine on a network reports its statistics to Debian, you could have the information sent to a proxy server that aggregates all the packages into a single report (30 computers, 10 of which have apache, 15 of which have exim, …), thus removing certain correlations (do all the machines running apache also run exim? or do none of them?), and potentially filtering things like the UUID (which might reveal something about the random number generator, particularly given Debian’s recent issue with randomness…) popcon version (which gives an indication what version of Debian is in use, and in some cases how recently it’s been updated) or timestamp (that may give away that the machine has been down). And if you’re running a network that’s intended to be somewhat locked down, it might be more reasonable to have computers reporting to a machine that you control, rather than one just out their in the wild.
So that, in very rough terms, is the spec for this project, which is currently going by the name “pageant” (ie, a popularity contest that takes itself a bit more seriously…) The technical goal is to provide a pageant client that people can run on their systems, which can report potentially arbitrary information to a central server and can receive and present relevant snippets of advice related to that information; a pageant proxy that can intermediate and filter pageant clients to provide a slighter higher level of anonymity/privacy; and a pageant server that can collect the data, provide relevant advice to clients, and analyse the data. I think it’s feasible to do an interesting job of that, that should go a little further than existing programs, and be usable by actual people, though I suspect the server side will have to be a bit beta-ish to be finished within a week or so.
The business goal, obviously, is to turn some of the hypothetical benefits touched on above into actual income, ideally without turning it into a vast NSA-like data hoarding corporate conspiracy. I figure there’s a few reasonable ways to approach that:
- First, I figure that providing the same information other systems currently do at no charge makes sense: so getting basic stats on how many Debian users have nickle installed, or Ubuntu users have network-manager, or Fedora users have a Synaptics touchpad should be free.
- Second, I figure providing further analysis for companies and researchers should probably be possible, and cost something: probably more depending on how complicated the analysis is. Possibly there could be an extra fee for the analysis to not be also made available to the public; that could be entertaining.
- Third, I figure that it probably should be possible for companies to at least provide advice to users of their hardware through the system, and that at least in some cases, that probably should be for a fee. I’m not sure if there’s a line in there somewhere between necessary advice (security updates?), helpful tips (here’s some non-free drivers for that hardware?), or outright advertising (buying our hard drives will give you 200% better performance!) that might mean “advice” should vary between free, paid and blocked. An approach might be to say distros’ advice is free, other people pay.
- Fourth, I think it would be interesting to allow users to optionally pay a fee to register their hardware. This could have a couple of benefits: it provides a low-maintenances way to discourage ballot stuffing — it’s not at all difficult to hack up popcon to pretend you have thousands of servers running your favourite package to try to bias the statistics, but it’s somewhat harder to come up with even a few dollars thousands of times; and possibly more interestingly, it provides an easy means to link a small payment for “using Linux” with the software that’s being used — so distributing 80%-90% of those fees to the authors of the software that’s actually being used might be an efficient way of helping support free software development.
Anyway, that’s the project! My notes have a few other things in them worth mentioning — there’s a couple of not entirely little complications in a few of the above ideas, for one — but this is already long enough, and it’s not like I can’t blog again later. Even though there’s a few similar projects around (popcon and smolt in particular) I’m planning on taking a NIH approach and starting from scratch, on the basis that current stuff is mostly pretty basic to reimplement, and getting an architecture I’m comfortable with is pretty important in making it appropriately generic. As always, helpful tips, questions and/or any general encouragement appreciated, either by email or the comment link…
Jason Parker-Burlingham — Ann has the setup shot for this
Nothing more complex than the macro lens on a tripod with a remote shutter release. Later I got a little more fancy and used a focus trap, but this is manually focused using a small twig as a guide. The camera was in shutter priority to try to freeze motion but this was a mistake since the shutter speed stays constant by varying aperture and affecting focus. Later pictures with lower ISO, slower shutter and wider aperture worked just as well, but had less compliant subjects.
I could have gotten closer but that makes focusing even more chancy (there are lots of pictures of very blurry goldfinches in my collection now) and the birds seemed to not appreciate a camera closer than a few feet away.
Lessons learned: go full manual next time, don't bother with the focus trap, I think it just made trying for a shot more difficult. An actual cable release instead of an IR remote makes taking a shot less chancy. An off-camera flash may be useful. A sandbag and string to keep the feeder still could be useful too.
Interestingly only a few male finches ever came anywhere near the setup: a shame since they're a vastly more colorful bird.
Adrian Sutton — I Love Parser Generators, I Hate Parser Generators
I was reminded on the weekend of how much I like working with parser generators – they’re just so pure and clean. You really feel like you’re working with a grammar and all those CS lectures come flooding back. Writing code to parse the same content by hand just never has that feel. Plus they create incredibly accurate parsers in very little time at all.
I was also reminded of how much I hate parser generators. They generate very accurate code which is great when you have very accurate input. In the real world, it just means the parser craps out an awful lot on very minor syntax problems. So then you try to make the grammar more flexible to accept that input and the generator just complains that the language is no longer LL(1).
Off we go into the deep dark depths of those CS lectures. Now all of a sudden you find yourself with pen and paper out drawing states and the paths between them. Pretty soon you want to migrate to A3 paper and then on to butchers paper. Eventually you find yourself writing on the wall.
Real world content just isn’t sane. You can’t tokenize it first and then just use those tokens – characters in different places have all kinds of different meanings. You don’t really want to validate the content as you read it1, you just want to do the most brain dead simple thing to get that content in and in a form that you can work with2.
I run into this every time I work with parser generators and wind up spending so much time making the grammar fully tolerant that it winds up being easier to just write the entire thing by hand. I just can’t help but think that there should be a better way though.
1 – beyond making sure you’re avoiding buffer overflows and that the resulting model isn’t dangerous etc but often those kind of checks are best done in the code that actually does the work (i.e. assume every value is user supplied rather than assuming that the content is all nice and safe). ↩
2 – this of course is situation dependent. I happened to be parsing CSS where tolerance is the key to success. Parsing configuration files on the other hand should be strict and fail fast.↩
June 21, 2009
Ben Fowler — Daily Mail features JET and MAST!
'This is save-the-world science. This is the energy holy grail'This Daily Mail article is pretty awesome, with some great photos (far better than mine), and the cool JET video. If we want fusion to be taken seriously as a potential energy source, we need more of this!Ben Fowler — Visit to UKAEA Culham: my photos
I've posted all the pictures I took on my Culham trip, up on my SmugMug page.Check them out here.June 18, 2009
Ben Fowler — Visit to UKAEA Culham: is public outreach a waste of time?
When I was 17 and working as a printer in Quickfast Photos in Surfers Paradise, I had the opportunity to catch a bit of the IndyCarnival. My boss, Andy, besides being a raging pederast, was extremely cheap and wouldn't buy tickets the legal way, so he got transit passes usually reserved for the locals, and spent the entire carnival watching the race while I held the fort at the shop. However, IBen Fowler — Visit to UKAEA Culham
Yesterday, I realized an ambition I've been harbouring for quite a few years -- to visit the world's biggest tokamak fusion reactor experiment at Culham, near Oxford.When I was a kid, I remember marvelling at pictures of the Joint European Torus, and since then, I've closely followed the construction and commissioning of the UK's own fusion experiment, MAST. Mankind's future will be made orSarah Smith — The Answer Man

- stockdraw - releasing strategic reserves
- surge production - increase of supply from internal oil sources
- demand restraint - reducing oil consumption
- upfitting vehicles to CNG, and other alternative energy sources
- using sugar cane and other sustainable ethanol programs
- stopping corn ethanol production - it uses more petroleum products than it produces
- hydrogen and electric vehicles
- moving off oil fired heating
A 2008 report from the National Research Council estimated it would take $200 million from government and industry over the next 15 years to commercialize hydrogen fuel cell vehicles to the point they could be competitive with gas vehicles.(Reported by reuters)
June 17, 2009
Jason Parker-Burlingham — Summer's first cookout
On Saturday night we were having trouble deciding where to eat and settled on buying hotdogs and the makings of smores and lit a small fire in the driveway. I got to light the fire and Ann was exceedingly patient while I did (we both like lighting fires, what's not to like?). After a time I decided it might be fun to have some pictures of the flames and Ann suggested a long shutter speed would be good. Henry helpfully ran a long stick over the length of a burning log to throw up some sparks.
Sunset at nearly 9pm: check. Hot dogs: check! 70—80F heat: check! I do so enjoy summer—it's part of the reason I adored the Bay Area, where summer is a season 6 months long.
Andrae Muys — Looking for JSON, ReST, (and in-memory RDF) frameworks
Currently writing a number of small web-services to do various informatics tasks (more detailed post to come). Fortunately I'm not the one having to deal with 3rd-party SOAP apis! Still I do need to do various XML and JSON parsing, and not having addressed the latter before I've gone looking for libraries.
Currently I am about to start using Jackson, but was wondering if anyone had any warnings, advice, or recommended alternatives? In the course of looking at what was out there I have also come across Restlet, a ReST framework that seems like it is well worth the time to figure out and deploy, so I will probably be doing that soon as well, any warnings or advice on this will be welcome.
One of the nice things about Restlet is its support for RDF. Granted it doesn't support querying, and the terminology in the interface is a bit confused, but it does use its native Resource interface for URIRefs, so it should integrate well. OTOH, if it does prove useful as a ReST framework, I can see myself writing a quick Sesame or Mulgara extension, as there is only so much you can do with RDF before you need a query and/or data-binding interface.
David Starkoff — The inventive step
When Justice French, then of the Federal Court of Australia, was nominated to be Chief Justice of Australia, the quality of his extra-judicial speeches was noted. It is a rare judicial officer who can segue from declarations to Homer Simpson, or use Judge Dredd to illustrate arguments about judicial activism.
As a highlight from his recent speeches, I thought a part of his 4 April speech to the Licensing Executives Society of Australia and New Zealand bore special mention.
Despite these adventurous extensions of intellectual property law most of the words it uses remain boring. … Terms such as “inventive step”, which sounds like a dance created by a nerd … are not terms to stand your hairs on end.



