July 04, 2008

Adrian SuttonBackups Of The Cloud

Mike Gunderloy provides an overview of the terms and conditions from three of the popular online office applications and questions who owns your documents? The more important point that comes out of it though is who is backing up your documents? When people move data into “the cloud” the often forget that ultimately having backups is their problem and they should only trust themselves to do it.

One thing that’s clearly missing is any sort of backup guarantee. While you may feel more secure storing your documents on Google’s or Zoho’s or Adobe’s servers than your own, that security is not something that you’re promised. Any of the three can lose your documents or terminate your ability to get to them at any time for pretty much any reason, and you’re out of luck.

That’s precisely why I ensure that any data in a hosted solution of any kind is also backed up locally. I’ve already been through the experience of hosts going broke, or just plain stuffing up their backups and having to restore from my local copy and in time everyone will.

Vendors providing these services need to start thinking this through better too - how are your users going to get a full backup of all their data? In most systems it means going in a manually exporting each individual document and in many systems it’s just not possible at all.

July 04, 2008 11:36 AM

July 03, 2008

Arjen LentzFORCE INDEX

SELECT ... FROM
tblProduct as prd FORCE INDEX(vProductName,iCategoryId,eProductType,vProductImage,fPrice,eProductStatus,dProductDateAdded)
LEFT OUTER JOIN (SELECT bidjoin.* FROM tblProduct as prdjoin, tblBid as bidjoin WHERE bidjoin.iProductId = prdjoin.iProductId) bid ON prd.iProductId = bid.iProductId
, tblCategory as cat FORCE INDEX(vCategoryName,eCategoryStatus,dCategoryDateAdded) 
, tblAuctionProduct as aucprd FORCE INDEX(iRequiredBids,iProductId,eAuctionType,iFreeBidLevel,fAdminFee,dDateClosed)
, tblPurchaseProduct as purprd FORCE INDEX(dProductCloseDate,iProductId,iTotalInventory,dLastUpdated)
I know, unreadable. But the main issue today is the FORCE INDEX. This is just one query (real, with permission - I just adjusted some identifiers) from an app that is litterally full of queries using FORCE INDEX. So what does FORCE INDEX do? It forces the MySQL optimiser to use the specified index (or choose from one of the specified, if multiple), even if it reckons it's not the best choice. Likewise there's an IGNORE INDEX modifier that denies the optimiser the choice for using a specified index in a particular query.

I generally hold that these modifiers should only be used for testing things and tracking down optimiser issues (not that common these days, but 3.23 had plenty that 4.0 fixed). This because when you hardcode them into an app, perhaps fixing a real problem you see *now*, your data will still change over time and so the optimiser needs the freedom to change its choices too. It might be ok for a week, a month, or even a year, but at some point it's going to cost.

The above query takes about 5 times as long compared to the same query without the FORCE modifiers. Considering the app is chockers with it, I'm guessing that it might have been coded by someone who didn't know that MySQL actually has an optimiser at all. Who can say... anyway it serves here as an extreme example.

The take-away message is: use these modifiers with care, and be very hesitant ever adding it to production code. There's generally always a better way to resolve whatever problem you might be observing.

July 03, 2008 11:36 AM

Jason Parker-BurlinghamInitial Pittsburgh thoughts

I came to Pittsburgh via some tunnel or other, possibly the Fort Pitt Tunnel, meaning that I came upon the downtown area all at once; it was very reminiscent of driving the small freeway that runs along the Brisbane River.

Afterwards, I had a time driving around Shadyside, a nice suburb that seems to consist entirely of apartment complexes, not terribly nice but not awful either. A fair bit of foot traffic, and lights that don't allow right-turn-on-red (and more than a few left turns that are pretty dodgy given the oncoming traffic).

I arrived somewhat unexpectedly; the last two days of my drive east were devoted to just chewing up miles as quickly as possible, so I drove to Shadyside hoping that I could score a room at the hotel I'm staying in next week while I try to arrange a lease. Turns out they were full, the swanky downtown hotel was ridiculously expensive (I'd surmise that they were just trying to get rid of the unshaven and poorly shod gent from their lobby except the clerk went above and beyond her duties by finding me a hotel with a room, and writing directions. The valets in the lobby agreed with me about the ridiculous prices upstairs.

I'm staying kind of a ways outside where I want to be tomorrow, opening Craigslist pages of houses for rent and recording even more realtors to call. If I can find the perfect place tomorrow, I'll certainly put a deposit down and sign a lease; the sooner I can have my stuff unpacked the better (especially since I'll have to arrange storage if I have nowhere to unpack before this coming Monday).

Lovely city. Really very nice. But in terms of driving, it's hell on wheels, especially by the water. I suspect there's a gravity anomaly of some kind lurking nearby.

July 03, 2008 02:47 AM

July 02, 2008

James McPhersonAnd if you squint.....



All the bits are apparently normal and the risk factors for Trisonomies 21, 13 and 18 are really, really low.

July 02, 2008 09:51 AM

July 01, 2008

Benjamin CarlyleREST's GET pattern

I have been starting to mentally collect a few common of late. I have hacked up a little alternate feed where I plan to place some of these patterns.

The very phrase "pattern" might be a little strong for some of what I have in mind. Often we are talking about very simple structures of client and server, or more of an organisational construct to support a REST architecture. However, I will be attempting to document important trade-offs in any given pattern.

The first pattern I have attempted to document is GET. I do not cover all possible permuations of GET (I see some of these permutations a separate patterns). Instead, I have focused on the usage where a client wants to retrieve a defined small-ish set of data from its server.

Once all of the possible failure cases and the possibility of differently-aged software components in the architecture are considered, GET is actually fairly subtle set of optimisations. I have attempted to show the fundamental communication that is going on between client and server, then map it onto specific HTTP protocol elements.

I have been trying to follow the style of the Gang of Four's . However, I would generally expect that most of the patterns I will describe will be implemented in protocol libraries rather than requiring actual application-level code to participate in significant ways. Comments are welcome.

Benjamin

July 01, 2008 09:47 PM

David StarkoffCounsel's time

Justice Hayne, saying what many litigation solicitors yearn to, in Amoonguna Community Inc v Northern Territory [2008] HCATrans 254:

MR P V SLATTERY QC: We are content to be ordered to file [an amended statement of claim] within seven days, your Honour. Thank you.

HIS HONOUR: I am always delighted when counsel say that things will be done within seven days. It shows commendable application, but I want it to be a real seven not an almost seven. Is seven realistic, Mr Slattery? If it is, by all means. I am delighted to say do it within seven, but I want it a real seven not—forgive me if I say this—counsel’s seven days.

July 01, 2008 10:16 AM

Arjen LentzOSS-based appliances: Cybersource/datasafe

I'm pretty happy with my 1TB Apple Time Capsule. Bought it while at the MySQL Conf in April, and it does the right thing for my situation.

Con Zymaris and his crowd at Cybersource in Melbourne made something like this ages ago, aimed at small businesses: Cybersource/DATASAFE. That's a pretty neat solution, and an excellent example of how an OSS-based solution can be deployed in a business, regardless of what other technologies might already exist on the premises. The box will work just fine in a Windows environment.

Always focus on the solution (and what practical needs it solves for the client), not the technology (or the philosophy) - with a happy customer, you'll get plenty of opportunity (over time) to discuss what OSS is really about, and you're likely to find a very willing ear at that point.

July 01, 2008 12:47 AM

June 30, 2008

Arjen LentzTraining schedule readability

A tricky meta-question. I've been experimenting with ways to display an overview of training days (or days grouped by audience/topics, like Developers or DBAs), locations, and schedule.

Product Audience BNESYDCBRMELADLAKL
MySQL Dev SepAugSepAug
MySQL DBA SepJulAugAugAugJul
MySQL HA Aug
PostgreSQL DBA JulJul
PHP QA Dev AugAug
With the number of locations I have now, it's easy to run out of space horizontally. And vertically can't make the list too long either, otherwise the nice simple overview is lost. It's a difficult problem and I haven't fully resolved it. Displaying the locations vertically and the topics horizontally doesn't seem to work either. Different pages on the Open Query web site now display things differently depending on context, and I hope to learn more from that experiment. The above is an example of this. Compact, but I'm not sure it conveys sufficient information. Within each group, people are able to sign up for individual course days. It's work in progress I suppose. Suggestions welcome!

June 30, 2008 11:36 PM

Adrian SuttonCMS and Mac

Some time ago now, James Robertson blogged about the poor state of Mac support in CMS products. Quite rightly he identified the WYSIWYG editor as the most common problem area which of course got my attention. It’s over six years ago now that Ephox switched over to Java from ActiveX to get support for Mac and it’s probably the smartest thing we’ve ever done. Not because we have vast numbers of Mac users, but because it only takes one Mac user to sink a deal.

It’s taken me so long to post because just talking about your Mac support has no credibility, so I wanted to show copy and paste on Mac - the precise task that James found so many problems with. So I present for your entertainment, copy and paste from Word on a Mac, the 30 second demo, complete with cheesy music. Naturally in QuickTime with iPhone optimized versions built in.


src="/copyPasteOnMac/Copy%20Paste%20On%20Mac-poster.jpg"
href="/copyPasteOnMac/Copy%20Paste%20On%20Mac.mov"
target="myself"
controller="false"
autoplay="false"
scale="aspect">

I had wanted to go over the top and do it all in the style of an old silent movie but there’s only so much time I can justify on this…

June 30, 2008 01:22 PM

Adrian SuttonJust Take The Money!

It’s really amazing how many web sites have broken shopping carts in one form or another. It’s the ultimate form of stealing defeat from the jaws of victory. The favorite is always shopping carts that time out. Nothing like throwing your customers out of the store after they’ve decided to purchase from you.

British Airways seem to have perfected the art of displaying an error page just when you were pulling out your credit card. Bonus points for reporting that their systems aren’t responding as if that actually means something to the user who just got a response from their systems - the error page.

Recently though I’ve come across a few more creative ways to not make money. It turns out that Google Checkout, at least as implemented for Google Site Search is just a tad bit buggy - instead of taking you to a page to fill in your details it refreshes the current page. So you select your product, click checkout and are asked to select your product again. Pure genius!

Oh and reporting it to Google won’t help - they don’t answer emails, even when you can find the contact us form.

June 30, 2008 10:37 AM

Arjen LentzFinding useless indexes

I'll say beforehand that the following is not very clean - for a number of reasons. I'll discuss some issues after the query. The query returns all indexes in a db where the cardinality is lower than 30% of the rows, thus making it unlikely that the server will ever use that index.

SELECT s.table_name,
       concat(s.index_name,'(',group_concat(s.column_name order by s.seq_in_index),')') as idx,
       GROUP_CONCAT(s.cardinality ORDER BY s.seq_in_index) AS card,
       t.table_rows
  FROM information_schema.tables t
  JOIN information_schema.statistics s USING (table_schema,table_name)
 WHERE t.table_schema='dbname'
   AND t.table_rows > 1000
   AND s.non_unique
 GROUP BY s.table_name,s.index_name
HAVING (card + 0) < (t.table_rows / 3);
Let's discuss...
The number of rows in a table used here will be accurate for MyISAM; for InnoDB it's essentiall a rough guess.
Cardinality in this context is the number of distinct values in a column. This statistic needs to be updated using ANALYZE TABLE, otherwise it might either not be available or just outdated.
Since there can be composite indexes (index on a,b or more columns, rather than just one column), the cardinality is per column and in the output we show this comma-separated; of course for anything but the first, it's relative to the previous columns. The HAVING clause just ends up using the cardinality of the first column, which statistically is not really correct. If someone has a bright idea on how to grab or calculate a sensible cardinality figure for composite indexes, please do comment.
And yes, some selected columns (like t.table_rows) fall outside the grouping, however they remain the same so that's "ok" for this quick hack and MySQL allows this unless you have sql_mode=ONLY_FULL_GROUP_BY enabled.

The fact that the calculated cardinality figure (for composite indexes) is dodgy may not actually be relevant for the purpose of this query. The point of the query is that generally, the server will not use an index if it needs to look at >30% of the rows anyway. That's not *actually* how it works inside the server (as Peter Zaitsev can explain in great detail ;-) but as an easy rule-of-thumb it's close enough to reality. That's not the whole story, because if all the columns referenced in a query are in the index you have a so-called covering index, and the server knows that scanning an index is quicker than scanning the table. So in that case, an index might still make sense, but this query can't know about that case.
Also, for very small tables, the server will prefer a table scan anyway; to weed out the worst of this, only tables with >1000 rows are shown in the output. Again, a rather crude filter with plenty of flaws. But again, it may suffice for this purpose.

Normally, what you'd do is pick out the obvious indexes on yes/no, male/female, active/deleted style columns. They're easy to spot. But if you run the above query, much more may show up. Do unused indexes hurt? Well, they slow down INSERT/UPDATE/DELETE and ALTER TABLE statements. So this query can be useful as a rough filter, then human brains can be used to decide which indexes to keep and which ones to get rid of.

June 30, 2008 02:47 AM

June 28, 2008

David StarkoffWWJD?

Truly awesome: Judge Jesus (sitting on the International Tribunal for the Law of the Sea).

WWJD, indeed.

(Via Opinio Juris.)

June 28, 2008 03:12 AM