« September 2005 | Main | November 2005 »

October 2005 Archives

October 3, 2005

Word Frequency

I need to know how often each word is used. I'd like to answer the question, "How big is the vocabulary of the Internet?" And I'd like the database that I'd generate in answering that, to help answer a host of other interesting questions. (Related domain names and spelling suggestions come to mind immediately.)

So, would this work?

Link a crawler to a database that stores the words found on each page crawled. Feed the crawler a seed set of pages, make sure it hits the Gutenberg project and some AP archives and DMOZ. Increment the count of each word as we run across it again.

If there are 750,000 words (as suggested by the Oxford Dictionary people), I should be able to store this in something like 23 megs. As long as I cap individual word frequencies at the four billion limit imposed by a default MySQL integer type. (Of course that could be increased.)

I could use content-type strings to identify languages, couldn't I? Or is that too unreliable? Would I end up indexing Japanese and French pages right along with it?

And actually... That would be *cool*. With a way to store space-data with each word, I wonder if with some simple language seeding I could even automatically generate language-specific indexes...?

What is a language? What defines a language?

Wow wow. I can't believe Google hasn't come out with more stuff than they have. With all that data -- I wonder.

October 4, 2005

Fascinating Lies: Poorly-Written Articles

The difference between a typo and a glaring factual error is hard to tell. An article from eWeek's Steven Vaughan-Nichols, who really ought to proofread better, contains a number of subtle errors that end up totally contradicting and confusing the point of the entire piece. Some examples:

The Streamlined Sales Tax Project may sound to some like the states are getting ready to start charging state sales tax on all e-commerce purchases, but the reality is that simple.

That's the first sentence of the article. Was the author deliberately being funny? Or did he forget a "not"?

What the SSTA (Streamlined Sales Tax Project), a group of U.S. states united in trying to simplify state and local tax collection, is doing is setting up a system by which Internet e-commerce companies can voluntarily pay state taxes to the states in which their customers reside.

If you made it through that sentence, did you get the meaning? Basically, he says that this new project helps companies pay state taxes.

In addition, "the states that are in compliance with SSUTA (Member States) will offer advantages to those sellers who use a CSP.

No idea where the quote ends, and the article gives no clue who we're quoting anyway. Also, is the SSUTA related to the SSTA defined above? Again, no info provided. Maybe a typo.

More minor advantages are that such companies will receive free tax collection and remittance software.

Is that... even more minor than the other advantages? Or additional, minor advantages?

Since 1992, a still valid Supreme Court decision ruled that companies do not have to pay sales taxes in states where they do have a physical presence.

The linked Supreme Court decision (which I read, in surprise at this article) actually says pretty clearly that companies do not have to pay sales taxes in states where they do not have a physical presence. An important word to forget.

This is not say that state sales tax on Internet purchases is a done deal. "There's still a lot of opposition," said Logan.

This is not say that this article was machine translated from Japanese.

Thus, in the end while the current move is purely voluntarily, Logan said that "if a critical mass of retailers buy into this, a lot more will follow, and it will snowball into almost all Internet retailers."

Thus, in the end while I used to trust eWeek, I now feel that "if this is all the attention and care they give to their writing, I should be reading from other sources."

This is not say that anything on the Internet is really trustworthy.

October 10, 2005

Thesis Ideas 3: Provo City PR & the Internet

Provo city has championed the installation of a city-wide fiber-optic network for Internet, TV, telephone, and whatever other services they can think to run over it. The system is almost completely deployed and early projections seem to show the city failing to meet revenue projections. But there is a lot of positive 'buzz' about the project on the Internet, across the country, as people would like to see similar projects attempted in their cities.

I'd like to study the Internet PR effect this venture has had on Provo. For typical, business-related searches, for instance, does Provo come up more often because of iProvo and the related publicity? Are more new businesses coming to Provo? What about home-buyers? Anyone possibly swayed by positive or negative coverage of the project?

I know I'm excited to have broadband! But I'm in the second to last zone receiving the system. Boo.

What other benefits might Provo be receiving as an early entrant into the civic-broadband space? How is the Internet responding to Provo's efforts to become an ISP? (Yeah, I know they're not *technically* an ISP...)

October 11, 2005

Breathaking: Great Design

Stopdesign.com -- a beautiful site. I knew I'd enjoy reading as soon as I stumbled across it, and then I ran into this collection of sites:

http://www.stopdesign.com/examples/css/vault/

It's 144 astonishing sites, a great resource for budding designers or anyone looking for some ideas.

I'd also highly recommend poking around Stopdesign. The portfolio section is probably my favorite portfolio I've ever seen, and he offers a free set of templates for a photo gallery. (You might even see those templates here, one of these days!)

Search Engine Rankings

Oh. Somebody did my scariest idea for a search engine study, and they did it much more cleverly. I wanted to know -- what percent of people click on the #1 ranking, versus #2, etc. and if that varies by search engine. Which search engine is the most relevant? (That was the key question, in my mind.)

The horrible plan I was fomulating was to get access to Omniture's SiteCatalyst tracking data and measure clicks to various companies' landing pages. Then I'd correlate that data with historical records of the landing pages' rankings in various search engines. That would give me a broad, real-life snapshot. (For example: Company A got 71 clicks to their landing page for Keyword B last month. Keyword B was ranked #2 on MSN. Keyword B was searched for 1100 times on Overture.)

There is so much error in that -- from all the assumptions, from uncategorizable traffic sources, and who knows what else -- but it could be a very, very large scale study.

Professor Thorsten Joachims and colleagues at Cornell University did a much more elegant study: they tracked a sample of users on Google. They showed that 40% of people click on the first result -- and even if you switch the results, 34% of people still click on the first result.

The implication: People are strongly conditioned to click on the first result.

My response: Let me replicate that study, for MSN and Yahoo, and lets see which one gets the best score! Or which one has the most sheep-like users. Hee hee.

Read more for the full paper reference. >>

Continue reading "Search Engine Rankings" »

October 12, 2005

Responsibilty and Fault

Remember Senator Hatch's "inducement" act, that would have held P2P companies liable for the copyright infringement of their users? (No? What are you, normal?)

The issue is coming up again in the form of lobbyists and proposals for laws to hold software developers liable for security vulnerabilities in their products. Under some versions of this concept, if I bought a Microsoft operating system and my computer got hacked, Microsoft would have to pay me.

In a lively discussion about this on Slashdot, the following comments emerged:

=============
Person 1: Whatever happened to holding the people who exploit vulnerabilities responsible?

Person 2:That's crazy talk! What are you thinking, man? Next you'll suggest that when I walk down the street with my entire head completely exposed and vulnerable, that somehow the mugger than hits me over the head with a baseball bat may somehow be responsible for the outcome! See how crazy you are?

Or, when I lock my door and leave my house for the day, and a guy comes along with a sledgehammer and just breaks in anyway - I suppose you think that the person with the sledgehammer is somehow responsible for that? Totally twisted, man.
=============

I need say nothing more.

October 14, 2005

Satellite Radio and Licensing

Satellite radio is the wave of the future. Higher-quality broadcasts, digital feeds, no commercials, and hundreds of channels. Even the equipment is cooler -- most interfaces display the names of the songs that are playing and can provide other information. It's just awesome.

Of course, if you want to listen to it, you have to buy a satellite receiver. Which is understandable. But then you have to pay an additional monthly subscription fee. Which is not understandable.

What if I were to walk down the street, screaming at the top of my lungs? It would irritate my neighbors, sure. But they'd get even more irritated when I told them that they had to pay me for the privilege of listening.

That's exactly what satellite radio is doing! A satellite broadcasts across a huge chunk of the country. The satellite broadcasts through my air, and through your air. The radio waves it emits might even now be causing cancerous mutations in our cells.

The government intervenes to ensure that companies don't step on each other's frequencies. Otherwise we would end up with an entire spectrum of unusable space, as companies block each other out. But there is no clear reason for the government to enforce the companies' decisions to lock their content and charge for it.

If I build my own receiver and write some software to decode the signal, why should that be illegal? If I broke into Sirius' corporate headquarters and stole the specs for their encryption, fine. Are they arguing that the encryption method is patented? I bet they never filed for the patent on it.

Hrm.

Being something which is written in the sight

utada_hikaru.gifMachine translation never ceases to amuse me! (I'm pretty easily amused.)

I did a Google search for [utada hikaru single collection] and Amazon.jp came up first. Following the "translate this page" link, I was presented with the following:

Price
Price: XXYEN 2,753 (including tax) as for this commodity being domestic delivery charge free, we report! As for details this way It can utilize also payment on delivery.

Reviews
Being something which is written by the your other customer it does the customer review * pickup & the customer review. The case of purchase please utilize with customer himself last judgement.
Your review is recorded before the sight.

Recommendations
The person who buys this CD has bought also such a CD:
# Wish You The Best warehouse wooden flax robe
# Someone's request Kanai う time space/large house Tada ヒカル
# A BEST beach promontory ayu seeing

User Review
5つ星のうち5Waiting, it increased, (HBAR - * HBAR), 2005/10/08
レビュアー: ミキ northeast
Variegated favorite me taking the tune of the space/large house Tada ヒカル, 1 it is highest!
Sometimes the favorite, the tune which the space/large house Tada ヒカル does not like it being, you did not buy the album easily.
This album comes off and is not well being to be the cousin taking, how. It is possible to be the feeling which the lyric you speak in daily conversation don't you think? is. It was the lyric which you can make think.
Also voice has done simply, you say, or make sticky clearly and increase.
If listening it is not, loss! 1 is.

The whole thing is hilarious. But I must admit -- I can understand the gist of most of it. Certainly more than I got from looking at the Japanese text!

October 17, 2005

Marketing Public Transportation

A fascinating piece on NPR today introduced me to the problem many public transportation systems in the US will soon face: how to retain the massive influx of riders once the price of gas drops back to 'acceptable'?

They've had a surge of riders because of the rapid increase in gas prices. However, several experts on the show suggested that in history, people have always been quick to snap back to previous behaviors as soon as the prices fall back down.

So, this is the situation. What would I do, if I were the director of marketing for one of these companies? (I'm sure that's the question burning in everyone's minds...)

===============
What I Would Do
===============
Money got the people onto the buses in the first place. But it won't keep them there. Especially as gas prices fall again, the financial reasons will fade. But we have a great opportunity to highlight to everyone the other benefits they are enjoying by using public transit:

1. Reduced stress from dealing with 'road rage' and stupid drivers and traffic jams
2. Able to work while they travel
3. Able to read while they travel

I think numbers 1 and 3 are probably the most important -- emotional reasons to choose the bus instead of the car. (Because it will mostly be emotion that pulls them back to their cars.)

So, here are a few programs I would implement:

1. I-Pod Discounts
To encourage people to buy year-long passes (get them to commit now, before gas prices fall!) I'd put an incentive in place. Anyone who buys, gets a coupon for half-off an I-Pod. This is a natural fit, as people can listen to music while they ride the bus. This could lead to addtional partnerships -- kiosks at bus stops for people to download podcasts of the news each day, or featured music, while they wait for the bus.

Speaking of kiosks at bus stations -- in Korea, every bus stop had vending machines with hot chocolate. I'd love to see those here, but I don't know if that would be practical. (Snif.)

2. Commuter Papers
Provide copies of the newspaper or professional magazines on the buses, as is done at medical offices. This would be a perfect channel for distributing any free, community papers or even better magazines. Airlines used to do this (still do this?) with their in-house magazines, but they lose money on that. Distributing other people's content will be much more effective.

3. Read a Book
I think one of my favorite things about this idea is the enormously powerful creative imagery it allows for advertising. Promote transit time as a time for entertaining reading! Again, airlines are an interesting example -- every airport sells books, but at best they are presented as a way to pass unpleasant time in a less painful way.

Let's remind people of how fun reading is!

An example TV spot:

>> A guy walks into work, frazzled and stressed, complaining about the traffic on I-45. His grumpy coworker pipes up about the accident on State Street. Another guy walks in, smiling and happy. "How did you get to work?" they ask him. "I flew on the back of a dragon, fighting off gryphons with my magic sword." They watch, confused, as he walks into his office and sets down his jacket. Then we see his book -- a fantasy, with a dragon on the cover -- and his bus pass.

If the I-Pod partnership is too expensive, a deal for gift certificates at Barnes and Noble would play into this as well. Or even a partnership with the local library, in some places, might be a great approach. Give a $5 discount to people who show their library card when they buy a pass?

I think all three programs would work well together. And everyone would choose to ride the buses. As long as buses are viewed simply as an alternative to paying for gas, ticket prices will never be able to rise to even the level of self-sufficiency.

If we found that the idea of working on the bus really drew people (surveys would be well employed here), sponsoring wi-fi all along the bus routes would be a great move, too.

And that's the end! That'll be $20,000 consulting fees, please. Payable to Dalton Creative Solutions.

October 19, 2005

Earn thousands with no effort and no risk!

So, here's a business model for you:

Sign up with a couple of drop-shippers who will ship product for you and take a large percentage of your sales. Sign up with a company that will make a website for you and take a small percentage of the sales. Sign up with a marketing company that will promote your website for a small percentage of the sales. Sit back and sip lemonade while the money pours in.

What's the problem with this scenario?

1. The "large" percentage the drop-shippers take will be very large.
2. That large percentage plus the two other percentages will approach 100%

"But," you insist, "that's okay! Even if it's 99.9%, I'm making money without working, and all I have to do is scale up my volume!"

This leads us to the real problem:

3. Nobody will buy from you.

If you're working with a drop-shipper, you are competing with anyone else who wants to work with that same drop-shipper. You are all selling the exact same product, with the exact same level of service, and exact same fulfillment cycle. What can you compete on?

Price.

But you can't compete on price, because you have no margins! And because you are drop-shipping, and buying everything at single-unit prices, anyone who has invested in buying bulk and warehousing their own product will undersell you.

October 21, 2005

Omniture and 2o7.net

Omniture is one of the leading web analytics vendors in the world. Their program, SiteCatalyst, uses cookies to track visitors for websites such as eBay, Novell, and Ameritrade. For these websites, the data SiteCatalyst provides is incredibly useful and important.

Wonder if you've visited a site tracked by Omniture? Of course, you can "view source" on whatever page you question, and look for a set of tags that say "Omniture" in a commented header. You should see something like the following:

var s_server=""
var s_channel=""
var s_pageType=""
var s_prop1=""

(I won't put more, because the code is copyrighted...)

More sneakily, though, you can check to see if you have a cookie from 2o7.net. That's the domain Omniture uses for their tracking cookies. If you've got one, congratulations! You're helping some smart web-marketer make more money.

Layers of sarcasm aside, there is no actual danger presented by the 2o7 cookies. I actually rely on the information we get from Omniture to help our clients, and I've tested to make sure it's not leaking information.

If you are concerned, you can set your browser to allow cookies "for originating site only." Your personal information won't be any more secure, but you will have thwarted (or at least hindered slightly) the efforts of those who would cater the Internet to your tastes.

October 28, 2005

Radio Free Capitalism!

"Take the meter maid who hands out parking tickets if you're two mintues late. Is she going to give you a break? No! But your local Volvo retailer -- they'll give you a break!"

"The Inn on the Hill. The perfect place to enjoy a romantic evening! Each room is as unique as the Utah destination for which it is named."

"You know, the last thing you want this Halloween holiday is to run out of candy."

"I'm Robert Dudd, a real Sandy resident. I look forward to having an eight acre park with ballfields so near our house!"

"I'm a Ford truck man. That's all I drive. I ain't got no boundaries... During truck month, get behind the wheel of a Ford F-150!"

"Tracy Aviary changes into Hogwarts, Saturdays this month, with exciting readings of the book!"

"Burt Brothers wants to send you on a rase for the chase!"

"Find out how easy it is to enjoy the comfort of a gym in your own home."

"Service goes up while prices go down! That makes Nate Wade Subaru the best in town."

"Lend me your ear, or both of your ears. And come with me to an unsurpassed place. The new Delta.com site is better than ever! Where good goes around."

"Smith's saves you more... Smith's saves you more everyday!"

"I'm a Ford truck man. That's all I drive. I ain't got no boundaries. I don't compromise. Truck month bonus cash only available with financing through your Ford dealer."

-- Two commercial breaks, KBZN Smooth Jazz

About October 2005

This page contains all entries posted to Tom Dalton :: Doer of Good in October 2005. They are listed from oldest to newest.

September 2005 is the previous archive.

November 2005 is the next archive.

Many more can be found on the main index page or by looking through the archives.