internet librarian 05: google debate

| 1 Comment | 1 TrackBack

Rich Wiggins squares off against Roy Tennant in a debate over "Google: Catalyst for Digitization or Library Destruction?"

Rich starts off, and is utterly charming. Some funny starting slides, hard to capture in print because of their visual impact.

Starts by talking about a similar debate they had 4 years ago. (The slides are dense with bullet points now, and I'm sitting where it's hard for me to see the screen, so I'm not going to try to transcribe them. Later I'll look for a pointer to the presentation online.)

How many bytes are in the LIbrary of Congress? This is a non-trivial question, with lots of technical aspects. You can't gloss those aspects (resolution, color, etc) because you'll end up wasting effort. Rich cites Brewster Kahle's estimate of 20 terabytes.

Rich says it's becoming so inexpensive to capture full-text and images that complete digitization is becoming realistic. Disk space is cheap, scanning technology has improved. He asked google what they're using, and they wouldn't answer. (Color me shocked...) I wonder whether Microsoft will be more forthcoming, considering their partnership with OCA. I hope so. [add musing on google's secrecy here]

Refers the comment last night by Stephen Abrams that we spend more money getting abook through ILL than we do to buy it. (That's a really interesting thing to think about.)

There are a bunch of straw man arguments here. He dismisses the preservation argument--we have better access, since you can still get the stuff online after a fire. (But what happens when the power goes out? That happens a lot more often...) Doesn't address the question of what happens when data is stored in proprietary formats--do we know what format Google will store this information in?

His bottom line, "Google Print has taught us to 'think big.'" (hmmm. does the period go before or between the single and double quotes there?)

Argues that this vision of digitization will have to be done by a forward-thinking company -- not by government. It has to be a company. (He claims that Google invented Ajax!!!!) Mocks Microsoft, saying they're playing catchup, and not very well. "Hmmm...Google's going to digitize millions of books? We'll digitize 150,000!"

Now it's Roy's turn. Starts out by saying that his bottom line is "more access is better." He thinks it's great that Google's digitizing stuff, that OCA is doing it, that libraries have been doing it for decades. There's a lot of room for everyone to be involved. Says he's going to try to be provocative, and starts out a halloween-themed slide that reads "Google: Devil? or Merely Evil?" (I didn't get a photo of this, but would love to get the slide from him.) Says he's going to talk about the scary monsters that he sees lurking in this project.

The first monster: the fair use problem. He's concerned about Google trying to shield themselves with fair use. Because this has pulled the issue into the courts, it has the potential to result in restriction of fair use rights for everyone, including libraries.

The second monster: Closed access to open material. For example, there are many copies of Call of the Wild that are freely avaialble. But when you go to Google Print, you won't know that--you'll see the reprinted, proprietary version from a publisher, without an indication that it's in the public domain and can be found from other sources. "And to add insult to injury, they give you links to buy the book, but no links to libraries." He's been assured this will change, but it hasn't happened yet, and there's no guarantee that it will.

The third monster: Blind, wholesale digitiazation. He's not so sure this is a good thing. Large collections in research libraries are choked with out-of-date crap, so that their collection numbers are high enough to keep them in their "tier." Also, because copyrighted information is more difficult to get to, people will rely on old, out of date information because it's free and easy to get to. Is this a good thing? (This is a great point that I haven't heard mentioned before.) OCA is more focused on selective digitization--for example, American literature.

The fourth monster: advertising. How long before we see ads for antidepressant medication next to Hamlet? Google's window of opportunity to do "good things" will be constricted by their responsiblity to stockholders.

The fifth monster: secrecy
The agreements between Google and libraries have been largely kept secret. Before the announcement, the Google libraries could not even talk to each other. Michigan revealed theirs (but not until a Freedom of Info Act request forced it, and months after the project was announced). Rumor has it that UM has the best agreement from the library perspective, and that other libraries are agreeing to much less onerous terms. This is a hot button for me. One of the things that I really like about Microsoft is the extent to which its researchers regularly collaborate, publish, and present outside of the company. If Google's intent is purely philanthropic, why does the commitment to "provide access to the world's information" stop at their front door?

The sixth monster: longevity.

  • What do google, Enron, and WorldCon all have in common? Answer: They are or were publicly traded companies motivated by profit. Two are now gone.
  • What does google have in common with libraries? Answer: They're both on planet earth. (much laughter)
  • How old is the harvard library? Answer: 400+ years. How old is Google: 7. So, which of these organizations do you want to trust with your intellectual heritage?

Now Adam Smith gets a chance to respond. Flashes a charming grin, and says "I'm not that dangerous, am I?" :) (This is what scares me most about Google. Their people and their products are indeed so seductively charming, it's easy to take their claims of purely philanthropic motivation seriously.)

He encourages feedback and criticism--says that's how they make their products better. They launch things quickly so they can get feedback quickly. They walk a difficult path in trying to make many parties happy. Their goal is to make information more accessible, not hidden in library stacks. Says he'll be here to answer questions.

He's asked about the scanning process--they've developed a proprietary non-destructive scanning process, but are not at liberty to disclose that. Someone asks about privacy, Adam refers them to Google's privacy policy. Someone else asks if it's true that one of the libraries requested that only manual page turning be part of the scanning, and he again invokes "no comment."

I ask about the disjoint between the stated policy of helping the world by making information accessible and the veil of secrecy surrounding everything they do, and he's unable to respond--says he's only been there two years, and isn't really familiar with the reasoning behind their policies on disclosure. I express surprise that he hasn't asked for clarification, since I would think he's asked this fairly often, and he says he's never been challenged on this in a public forum before. I'd love to think that's not true, but I suspect that the Google mystique, which they cultivate so very well, has a lot to do with that.

Lots of discussion, not all of which I capture mentally (let alone here on the screen).

1 TrackBack

Gisle om Google Print from Eiriks forfatterblogg on October 29, 2005 7:03 AM

Knapt har man rukket � reise s�nn noenlunde utenfor rekkevidde av nettet, f�r det virkelig skjer store ting p� nettfronten.... Read More

1 Comment

Thanks for the valuable reporting.

Leave a comment

 

Categories

Archives

Recent Photos

www.flickr.com
This is a Flickr badge showing public photos and videos from mamamusings. Make your own badge here.

Upcoming Travel

Creative Commons License
This blog is licensed under a Creative Commons License.

About this Entry

This page contains a single entry by Liz Lawley published on October 26, 2005 10:31 AM.

my public apology to adam smith of google was the previous entry in this blog.

internet librarian 05: search engine choices is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.