New BlackBerry

July 24th, 2008

I bought a new cell phone on Friday, and went with the new-ish BlackBerry Curve 8330 from Telus.

I like it a lot so far. (I’m typing on it right now.). But I’m unhappy with Telus.

First, as others have reported, they’ve done something to hobble the GPS. GPS works in Telus Navigator or when Navigator is running in the background, but not in Google Maps or Blackberry Maps. Navigator is an extra $10/month value added service. After my free trial of navigator runs out, I will have to call up Telus and bully them into giving it to me for free, and that’s no fun.

Second, Telus had a data outage last Saturday which seemed to affect only 3rd party apps like MidpSSH. When I phoned them three times to ask about it, nobody knew anything. It magically fixed itself Sunday morning. But since ssh is the key reason I have a full-keyboard handheld, this is almost worth returning the phone if it happens again.

I like the overall experience though.

Art of War - Moral Law

July 20th, 2008

At the very beginning of The Art of War, Sun Tsu writes:

The MORAL LAW causes the people to be in complete accord with their ruler, so that they will follow him regardless of their lives, undismayed by any danger.

This is something that the US lacks at present. Even without ther 2000 election mess in Florida, a substantial fraction of the American people would have been unwilling to follow George W Bush to war. It’s been going on for decades, the division of America and the dissolution of our common goals and ideals.

I’m also reading Churchill’s history of the second world war, and I’m struck by how similarly divided Britain was in the pacifist period of the 20s and 30s was.

Good news? Maybe. The allies won the war, after all. But divided Britain came together under a wise respected statesman, and had the help of a wealthy, productive and nearly invulnerable ally in the US. Today we seem short of both.

Sorry, you actually need COM to get WMI data

July 19th, 2008

If you want to run on Windows 2000, you can’t use wmic.exe. That’s too bad, because my solution posted the other day relies on wmic. And I have to support W2K.

djbdns and DHCP server

July 19th, 2008

There is a script for extracting DNS data from dhcpd.leases that I’ve been using for a few years.  It generates a tinydns data file which can be merged with the static file.

But I am not content with only one DHCP server.  I have implemented a backup/failover server.  Now I have twice the problems!

The ISC DHCP server only records the hostname in the dhcpd.leases file if it granted the lease. Although most leases will be written by the primary server, it would be a mistake to assume that the failover peer will never write a lease.

So the solution appears to be to run a dhcp-serving copy of axfrdns on each machine, and then pull that data together, collate it, and serve a single unified copy.
A picture might help, but I’m on a blackberry right now and can’t draw one.

I hope to implement this next week and report how it came out.

DHCP and djbdns

July 17th, 2008

DHCP assigns IPs to hosts; DNS reports IPs by hostname. The popular dhcp server is even written by the folks who brought you BIND.

But in our shop, we run djbdns.

I am not a full convert to the djb way. (We run postfix, for example.). I have never met the man in person. I would probably find him infuriating. But I find his claims about BIND and DNS convincing. And I’ve been using his software for the last eight years without a major glitch.

I selected djbdns when I was setting up DNS for my first real website. I had bought the DNS and BIND O’Reilly book and was puzzling through the man pages and just feeling stupider and stupider the more time I spent on it. I figured there must be a better way, so I looked around and found djbdns. The setup instructions were clear and simple. I followed them and everything worked. I felt smart and competent.

Users like software that makes them feel good. There’s a lesson for us all there, I suspect.

Anyway, one of the quirks of djbdns is it doesn’t honor UPDATE requests. Unsecure, you see. Which makes it hard to do DHCP….

Reporting Your Own CPU Time on Windows

July 12th, 2008

Unix has the ‘time’ command to report the running time of a process. Actually there’s both the time command, /bin/time or /usr/bin/time, provided by time-1.7 on my CentOS server, and the bash builtin command time. They have slightly different output formats but they both report the run time (aka wall clock time) and CPU time in both user mode and kernel mode:


[smikes@arthur ~]$ time ls >/dev/null

real    0m0.055s
user    0m0.000s
sys     0m0.004s
[smikes@arthur ~]$ /usr/bin/time ls >/dev/null
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+253minor)pagefaults 0swaps
[smikes@arthur ~]$

Windows? No native equivalent. If you install Cygwin (which you should) then you get /usr/bin/time and bash-time as above. That’s handy for you, but unless you can guarantee your users will have Cygwin installed too, not something you can rely on.

Ages ago I had put in some trivial timing code into $CLIENT’s main application. It was just based on recording time() at program start and successful exit and reporting the run time in hours, minutes, and seconds from that. Some time later while my brain was thoroughly disengaged, I added another counter based on clock_t and, to distinguish it from the first one, I reported that as “CPU time”.

Now it’s several years later and I get the following exceedingly polite bug report:

Consider these two successive builds of ___, one without the error files and one with. The error files are significant, but not that significant.

ST:Total CPU Time: 1:39:12.42
ST:Total Wall Time: 1:39:14.

Second run, when I got the computer much busier doing test builds.

ST:Total CPU Time: 2:16:23.08
ST:Total Wall Time: 2:16:25.

The reported CPU time should not be significantly higher than the first run, but it is suspiciously close to 100% of the Wall Time. As I used a significant amount of CPU time on other processes, I know the reported CPU tim [sic] cannot be correct.

Busted! I must have been thinking that clock_t was magically going to give me a CPU time number since I associate clock_t with timing loops. But over the entire run time of the application, of course it will record all the time, both execution time and wait time.

After poking around for a while I rediscovered WMI, the Windows Management and Instrumentation service. Among other things it keeps per-process counters for kernel mode and user mode execution time. It comes with a wealth of baroque interfaces, including COM (of course), an interactive/batch command-line client (wmic.exe), and others. I didn’t want to add any more COM to this C application, so decided to try the command-line client first.

Here’s my first spike solution:


#include <stdio.h>
#include <process.h>

int main()
{
  FILE * pf = NULL;

  pf = _popen( "wmic process where Handle=5612 list FULL", "r" );

  while( !feof( pf ) )
  {
    char acLine[512];
    fgets( acLine, 512, pf );
    if( strncmp( acLine, “KernelModeTime”, 14 ) == 0 )
    {
       printf( “Kernel mode time: %.2f s\n”, atof(acLine+15)/1.0e6 );
    }
  }
}

Note that I’m just hardcoding a process ID in (that happened to be my MSDEV.exe process ID at that time, actually) and reading into a static fixed-size buffer. The wmic command line was developed by repeated hacking around with wmic /? and some judicious googling.

When I compiled this with MSVC (cl test.c => test.exe) and ran it under cygwin, I got a hard hang of my rxvt. But running it under cmd.exe worked fine, yielding a reasonable-enough looking number. Encouraged, I proceeded to integrate it into the application.

My first version was quite similar to the spike above, but with a longer buffer. This worked fine but gave nonsense results, reporting 222 seconds in user time when the total execution time was only 85 seconds. Apparently the WMI data is reported in tenths of microseconds instead of microseconds. The older documentation I initially found gave milliseconds as the unit for the script object interface, but more recent documentation says 100-ns units, as I found. Apparently the reasonable-seeming result I saw in my spike solution was ten times too high.

Then I decided to do away with the fgets(), static buffer, and strncmp and switched to reading into malloc’ed storage and using strtok(). That was all good and it ran beautifully in the debugger. But when I started the program from a cygwin command-line, I again got a hard hang, uninterruptible by Ctrl-C, leading to an rxvt crash.

It turns out that Windows _popen() and Cygwin just do not play nicely together. Furthermore, any moderately advanced Cygwin console (e.g., an rxvt) uses a pty device instead of the raw Windows console device, and ptys are confusing to the wmic command-line client. You can find the source material yourself on Google: http://www.google.com/search?q=wmic+cygwin and http://www.google.com/search?q=_popen+cygwin.

So I went to the age-old ugly solution of writing to a temporary file. My application runs in a dedicated directory and assumes it has exclusive control of the current directory, so I don’t protect against race conditions. But then I discovered that WMIC writes UTF-16 output which Windows had been recoding to my local code when I used _popen(). When dumped to a file, Windows left it alone and I became responsible for recoding it myself (or using it as-is). So I switched to the ‘wcs’ versions of the string functions I was using, and used our utility function read_lineW to read a line from the file into malloc’ed storage.

I think you need administrator access to read WMI performance counters (or maybe just to run WMIC?), so admin access is necessary. Since it depends on hardcoded localizable constants (UserModeTime etc.) it probably won’t run on a localized Windows where wmic.exe displays BenutzerModeTime or whatever. But it works for my purposes, and maybe for yours too:

Here’s the final code:



char * format_hmsd( double dSeconds )
{
	static char acBuffer[32] = {0};

	int nSeconds = (int)dSeconds;
	int nMinutes = nSeconds / 60;
	double dFrac = dSeconds - (nMinutes * 60);

	int nHours = nMinutes / 60;
	int nMinRemainder = nMinutes % 60;

	sprintf( acBuffer, “%2d:%02d:%05.2f”, nHours, nMinRemainder, dFrac );

	return acBuffer;
}

void wmi_time( double * pdKernel, double * pdUser )
{
	FILE * pf = NULL;
	char acFile[32] = {0};
	char acCmd[128] = {0};
	wchar_t * psLine = NULL;
	DWORD nID = GetCurrentProcessId();

	*pdKernel = *pdUser = 0.0;

	sprintf( acFile, “wmi%04d.out”, nID );
	sprintf( acCmd, “wmic process where handle=%d list FULL %s”, nID, acFile );
	system( acCmd );

	pf = fopen( acFile, “rb” );

	for( psLine = read_lineW( pf ); psLine != NULL; free( psLine ), psLine = read_lineW( pf ) )
	{
		wchar_t * psKey = wcstok( psLine, L”=” );
		wchar_t * psValue = wcstok( NULL, L”" );

		if( psKey != NULL )
		{
			if( wcscmp( psKey, L”KernelModeTime” ) == 0 )
				*pdKernel = wcstod( psValue, NULL ) / 1.0e7;
			else if( wcscmp( psKey, L”UserModeTime” ) == 0 )
				*pdUser = wcstod( psValue, NULL ) / 1.0e7;
		}

	}

	fclose( pf );
	remove( acFile );
}

/* Used like this:
  double dKernel, dUser;
  double dTotal = (clock() - clkStart+0.0)/CLOCKS_PER_SEC;
  wmi_time( &dKernel, &dUser );

  logmsg( “ST:Total Wall Time  : %s.\n”, format_hmsd( dSeconds ) );
  if( dUser != 0.0 || dKernel != 0.0 )
  {
	logmsg( “ST:Total I/O Time   : %s.\n”, format_hmsd( dSeconds - dUser - dKernel ) );
	logmsg( “ST:Total CPU Time   : %s.\n”, format_hmsd( dKernel+dUser ) );
	logmsg( “ST:Total Kernel Time: %s.\n”, format_hmsd( dKernel ) );
	logmsg( “ST:Total User Time  : %s.\n”, format_hmsd( dUser ) );
}
 */

Does this matrix make me look sheared?

July 9th, 2008

A customer recently had a problem with the way we were drawing his circles. They were coming out looking like weird wankel rotator things.

The problem turned out to be a spline thing: stress management and periodicity. But before sorting that out I went down this long rabbit hole about transformation matrices, and figuring out whether a given matrix contains a shear component.

You can go down the rabbit hole yourself if you want to: just pull out your linear algebra textbook, and spend some quality time at Wikipedia’s extensive math pages, but it all comes back to a very simple question: if I take two vectors that are perpendicular before I transform them, will they still be perpendicular after I transform them? Check for each pair of vectors in a basis set, and you’re done.

So I think the check for shear is (T * x) dot (T * y) != 0 => implies shear. Seem sensible enough?

Why does this matter? Tune in another time.

Why you shouldn’t get an accountant

July 6th, 2008

Often people who are starting self-employment are advised to get an accountant. This is usually overkill.

First let’s look at what an accountant is. An accountant is a professional or quasi-professional who is skilled at auditing, determining the costs of things, and assessing the value of different business strategies. Given enough data and time, an accountant can tell you how much it costs per day to keep a pallet of Cascade on a shelf in a warehouse, whether you’re selling enough Cascade to justify the inventory you’re keeping, and how much high you’d need to price off-brand dishwashing soap in order to get an acceptable return on capital. Accounting is not particularly dry or boring, in my very limited experience with it.

But since as a software solo practitioner, you’re not managing inventories, have very low captial requirements, and relatively simple expenses, you don’t have much need for the higher-level abilities of an accountant. All you really need is to keep track of your income and expenses, generate a couple of reports at your year-end, and file minimal tax forms with the state/provincial and federal governments. Hiring an accountant to do this is like hiring a Jamie Zawinski to ‘program HTML’.

All you really need is a bookkeeper. So let’s look at the typical transactions after a year of software self-employment: you’ll write a dozen salary checks, deposit checks from at most 2-3 clients per month, and reimburse yourself for some expenses. You will depreciate a computer and maybe your desk and chair. That’s at most 100 transactions per year. For many people it will be more like fifty. Even if you save up all your bookkeeping and do it at the end of the year, it shouldn’t take you more than an hour. (I just looked at my last year’s books and we did 111 transactions, including maintaining bank accounts in two currencies and two sets of monthly payroll checks.)

So: why should you do your own bookkeeping? I’m sure it feels nice when you hand over your mess of papers and receipts and check stubs to somebody else and get back a neatly printed tax return. It feels less nice when you reflect that you’re paying at least $1,000 for work that you could have done in an hour or two — and if you spread the work out over the year, as I recommend, you don’t even notice the minute here and minute there that you spend on it.

Incidentally, $1,000 for a simple business tax return might be obsolete. After the Enron/Andersen mess, accountants got slapped with a pile of new regulation which, as far as I can tell, makes them more liable for failing to detect and document your mistakes, evasions, etc. This has the valuable effect of making accountants more careful and meticulous when they prepare books for large publicly traded companies. It has the irritating effect of making accountants treat everybody as a potential Enron, and also inspiring accountants to carry professional liability insurance, which of course increases their rates.

The other thing you can get from an accountant — auditing — is not particularly useful to you as a solo practitioner. If you’re the only employee and the boss and the shareholders, you can only steal from yourself. (Arguably you could be trying to steal money from the government by cheating on your taxes, but it’s not really the accountant’s job to prevent that.)

Directions this will eventually go: how to set up your bookkeeping, some more discussion of the accountability (or lack thereof) including responding to some of the points raised by Ben in his comments about startups vs. solo practice.

Use STL lower_bound even though it’s slower

July 3rd, 2008

Summarizing the last few posts on STL lower_bound: instead of falling back to a naive algorithm when it’s not given a random-access iterator, STL’s lower_bound algorithm still performs binary search using bidirectional or forward-only iterators. This causes it to perform 2N list traversals, which is four times as many as the naive algorithm’s average case of N/2 list traversals. The STL algorithm guarantees that there will only be O( log N ) comparisons, but we’ve seen that’s not much of an advantage unless the comparison requires an extra memory access.

You might be thinking: why would anybody ever do lower_bound on a list anyway? Just maintain the data in a vector: that has O(1) amortized append, and the STL lower_bound wil run in O( log n ). Remember that the disadvantage of a vector is not merely the O(n) cost of inserting/deleting in the middle: adding an element to a vector invalidates all of its iterators because the addition can, in principle, realloc.

So given that, for lists, naive_lower_bound is clearer, shorter, and faster for large N even when comparisons are costly, when should you use it instead of STL lower_bound? Never. Always use STL lower_bound. The maintenance costs of naive_lower_bound are too high. If you put it in your code, even if you comment it clearly and provide a link to this discussion, in six months or two or ten years, some maintenance programmer is going to be looking at that line of code and wondering “What was wrong with STL lower_bound? Is it still true that naive_lower_bound is fast? Is it even correct?” And that maintenance programmer might well be you. It’s like the old adage: Nobody ever got fired for buying IBM. Or Microsoft. Or today, STL.

Of course, like all generalizations, this one is partially false. If your program is the client for a popular MMORPG and you offer your users a discount if they let you run numerical algorithms on their machine while the client is running, and you need to maintain a linked list of all other available clients, sorted by latency or some other simple metric of speed/quality, and frequently partition that list using lower_bound periodically, then perhaps it makes sense for you to avoid STL lower_bound. But maybe you could use a vector instead. Or maybe instead of maintaining a single sorted list, you could decide a priori to divide your clients into slow, medium, and fast links based on latency < 50ms, < 200 ms, over 200 ms. There are many available ways to approach the problem, and improving the algorithm’s running time — although it’s in some ways the easiest to measure and implement — carries with it maintenance costs that you won’t have to pay.

STL lower_bound is better for complex keys, but still loses on large N

July 2nd, 2008

Testing a list<int> is pretty much the worst possible case for STL’s lower bound algorithm. Recall from the analysis that STL’s lower_bound uses a binary search algorithm even when the cost of advancing an iterator N steps is O(N). The advantage of this algorithm is that it only performs O( log N ) comparisons, as opposed to O( N ) comparisons for the naive algorithm.

Testing STL lower_bound against naive_lower_bound using integers definitely plays to the strengths of naive_lower_bound: an integer compare is a single machine instruction. To test the effects of a more-costly comparison, I used the simplest costly key I could think of: a string consisting of a prefix of underscores, followed by the sequence number formatted in decimal. I ran with one, ten, and 64 underscores; 64 is the cache line length on my machine, so this length forces a comparison to make four memory accesses (two per argument) instead of only two.

For these tests, the STL lower_bound is initially twice as fast as the naive_lower_bound, and I must say it’s a bit of a relief to see the standard library algorithm outperforming the dumb algorithm. But as the cache effects start to be felt (for me, around 30,000 elements) the sheer cost of traversing the whole list twice wipes out the savings from reduced comparisons, at least for 1 and 10 underscores.

Recall from the earlier analysis that the naive algorithm runs in N * (c + t) / 2 for the average case, where c is the average cost of a comparison and t is the average cost of a list traversal. So the overall slope of the naive algorithm, for large N, should be (c + t) / 2. In contrast, the STL algorithm runs in 2 * t * N + c * lg N. Just for coarse analysis, we can throw away the c * lg N term (lg N is effectively a constant, varying between 10 and 16 when N goes from 1,000 to 60,000), which gives 2 * t for the slope of the STL algorithm. The difference between the expected slopes is (3t - c)/2.

With this in mind, you can see that even when there’s a substantial constant prefix on the key (ten underscores) this doesn’t appreciably raise the cost of a comparison, at least not compared to the cost of three list traversals. So let’s try forcing two more memory accesses per comparion by extending the prefix length to 64 bytes, the full length of a cache line.

64 bytes of underscores cause each comparison to make 4 memory accesses, slowing naive substantially

Now the cost of a comparison is much closer to the cost of three list traversals, but still slightly less: the graph shows that the execution time of STL is still increasing more quickly with increasing N. The second memory access of the comparison doesn’t suffer the full random-access penalty of a true cache miss because the key is contiguous and the cache reads ahead, while for the list traversal it’s quite unlikely that any two adjacent list elements would be within 64 bytes of each other.

Overall, though, for most small-to-medium list sizes the STL algorithm is winning when the key is more complex than a builtin type.

Next time: why this matters, when to use the STL lower_bound, and more.