Where Did That Quote Come From, Again?

Perhaps the best new bit of net technology I’ve seen in some time (okay, right after the iPod/iTunes/iTunes Music Store troika) has come to us today from, of all places, Amazon.com: full-text content searching of over a hundred thousand of the books in the database. The results give you not simply a list of the books that contain your search string within their text, but the page citation, the context of the quote, and a link to an image of the page on which the string appears.

Should I be surprised that this technology has been developed for a mass marketer of books, rather than, say, a library?

4 thoughts on “Where Did That Quote Come From, Again?

  1. Libraries don’t tend to have a lot of capital to invest in such technologies.

    And publishers are a lot less likely to *allow* the provision of full-text information through libraries (which like to give things away) rather than bookstores (which sell them).

  2. it should also be noted that various project in libraries have done this with smaller groups of books for a very long time…

  3. This is kind of an amazing leap for Amazon — not so much technologically (I’ve read that they did it with no new technological breakthroughs, and are serving it all up apparently from surplus or idle backup machines) as it is commercial-politically. Having gotten a significant chunk of the publishing industry to go along is the feat; and my guess is that, if they can show a reliable conversion from deep-text-search-result into a sale, the rest of the publishers will soon get in line — with the possible exception of the ones (like O’Reilly) which are trying a model which depends on subscriptions to full e-texts. And they still might do it, if Amazon limits the search result well enough.

    I confess to being concerned for my company. I’m not sure they get how much this might differentiate Amazon from everyone else. They’ve made much of being focused on the book business, while A-zon branches out into being a massive e-tailer portal time thing. But this is a specific-to-books utility that is could further cement customer loyalty — presuming, of course, that they can get more publishers to sign up.

  4. Liz — exactly my point. On the other hand, I wonder if Amazon’s likely to come into some unexpected kinds of copyright trouble. Someone on a listserv I read discovered an easy way to read a full text online (search for the last line on the last page that the site has already given you; repeat). While such a database of text-images might constitute a valid research tool (and thus come under fair-use through some library-licensing program like Lexis-Nexis), it’s hard to imagine a world in which the use of such a database for commercial purposes will continue unabated. (But I’ve clearly got my head all screwed on backward when it comes to fair use and copyright enforcement. Just ask the Supreme Court.)

    Jeremy — of course such projects have existed on smaller scales for some time. But the mammoth nature of this database is the thing that makes it revolutionary, I think. It’s potentially a very powerful research tool, and it’s undoubtedly deeply naive of me to have any hope that such a tool would be created for purposes other than the commercial…

    BT — I worry for your company, too — have worried for it roughly since the days it was paying my contractor’s fees, way back when. The guys at Amazon may have over-diversified themselves out of any sense that they’re as devoted to the book as are your guys, but your guys (the top guys, not the ones actually working on the site) seem never to have had a full understanding of how that whole internet thing was going to work, how it was going to be different from and yet connected to the stores, what would keep the folks coming back. (Then again, who did? Big talk from academic girl.) I still think, though (see response to Liz, above) that the exploitability of this search engine is likely to create problems for Amazon that they’re not yet foreseeing.

Leave a Reply

Your email address will not be published. Required fields are marked *