Just checked out Erik’s
weblog. I like the groovy icons. Very nice. Oh, and the content is good
too 🙂
Category Archives: Uncategorized
Bloggi. Webliki?
Wikify…. I like wiki. But… its too hard. I want to be able to wikify my entire website, so that if… [development]
I believe that snipsnap attempts to meld blogging with wiki. Might be worth a look.
Why yes, I did one just last week…
Am I alone in thinking ‘Huh?’
Why I’m not a corporate investor
-Russ [Russell Beattie Notebook]
1993 was also when I first heard the word ‘Linux’, which, because we were British, was pronounced ‘Line-ux’. One of my Uni acquaintances showed it to me. “Look, free Unix on a pc”. “That won’t go anywhere”, I thought… This rates alongside my other great predictions such as, ‘Netscape are having an IPO, should I invest? Naa, they probably won’t be worth much’. And Yahoo, and Redhat… I console myself that I was a poor student and didn’t have much to invest at the time anyway. Wouldn’t have had enough money to get rich, just enough to have had one whale of a time at Uni.
Note to Russ
I couldn’t unplug…. But I’m going to be brief tonight. It’s 11:19 p.m. I want to be off the computer by 11:30 (maybe 11:45’s more realistic).
-Russ [Russell Beattie Notebook]
Russ, if you’re reading this and its NOT just after you got up on thursday morning, switch off. Now. 🙂
BEEP
Fed my book-buying habit today with the acquisition of the O’Reilly BEEP book. Build your own network protocol. Cool. BEEP looks very interesting as an alternative for all the contortions distributed application developers have to go through to make them work over HTTP. It provides a framework where most of the complex low-level stuff is done for you, and you just have to build your application-specific stuff on top of it. So the developer gets to decide whether the connection should be pull/push or both, stateless or stateful, pipelined or multiplexed etc. And security appears to be pluggable too.
I seem to remember Paul Hammant mentioning something about writing a BEEP module for AltRMI, which sounds like a great idea, especially for doing asynchronous callbacks. Must read more in case I’m totally wrong…
You’re not a *nix geek unless…
…you’ve replaced sendmail as your default MTA (in my case with postfix). Oh my word. Talk about stressful. At one point I thought I’d just obliterated a whole day’s worth of incoming mail because I kicked off fetchmail (thinking I was ready when I wasn’t), and postfix threw a wobbly. Thankfully it kept all the undelivered messages so after a few frantic minutes skimming the docs, hacking the config and one ‘postfix flush’ later, all my email reappeared. Phew.
I flatter myself that I can usually puzzle my way through most techie things, but email delivery systems are way more complex than I ever imagined. I had no idea what I was getting into when I started. Its still not working as I expected but I appear to be able to send email, so I think I’ll leave it until my palms stop sweating.
Distributed Lucene
Interesting article by Mark Harwood here regarding distributed
lucene indexes. Using distributed indexes is how google achieves its scalability I
believe, but they are a fairly special case.
If scalability in the sense of concurrent users is the issue, I tend to favour
multiple identical boxes with a load balancer and an RPC frontend. This can be
as simple as a servlet, or you can use SOAP or XML-RPC etc. (Possibly RMI,
although I’ve never tried that across a load balancer). Doing things this way
is probably a lot simpler to manage than splitting your indexes across boxes and
means that even if your queries are asymmetric (ie. 85% of the queries are for
the same thing), the load can be fairly balanced. Reliability is achieved for
free as well – if a box dies just stop sending requests there. Given Lucene’s
performance (it has been used to index collections of more than 10 million
documents) its pretty unlikely that your dataset will get so large that sheer
size starts to affect your query times. Unless of course, you are google 🙂
Lucene hints
Lucene is great, but some of the default settings are heavily biased towards interactive indexing and searching. If you’re building an index in a batch process style, set the IndexWriter.mergeFactor value to something big. I use 10,000, which makes it burn about 500 meg of RAM while indexing, but speeds it up a lot over the default value of 10. YMMV as ever.
Trials of Mutt
More for personal reference than anything else: Things I’ve learned about mutt
You must be logged in to post a comment.