| | |  | Software Engineering | Home » » » Managing Gigabytes: Compressing and Indexing Documents and Images, Second Edition (The Morgan Kaufmann Series in Multimedia Information and Systems) | | | | | | | Product Promotions: | | | | | Description: | | In this fully updated second edition of the highly acclaimed Managing Gigabytes, authors Witten, Moffat, and Bell continue to provide unparalleled coverage of state-of-the-art techniques for compressing and indexing data. Whatever your field, if you work with large quantities of information, this book is essential reading--an authoritative theoretical resource and a practical guide to meeting the toughest storage and access challenges. It covers the latest developments in compression and indexing and their application on the Web and in digital libraries. It also details dozens of powerful techniques supported by mg, the authors' own system for compressing, storing, and retrieving text, images, and textual images. mg's source code is freely available on the Web.
* Up-to-date coverage of new text compression algorithms such as block sorting, approximate arithmetic coding, and fat Huffman coding * New sections on content-based index compression and distributed querying, with 2 new data structures for fast indexing * New coverage of image coding, including descriptions of de facto standards in use on the Web (GIF and PNG), information on CALIC, the new proposed JPEG Lossless standard, and JBIG2 * New information on the Internet and WWW, digital libraries, web search engines, and agent-based retrieval * Accompanied by a public domain system called MG which is a fully worked-out operational example of the advanced techniques developed and explained in the book * New appendix on an existing digital library system that uses the MG software
| | | Product Details: | | | Author:
| Ian H. Witten | | Hardcover:
| 550 pages | | Publisher:
| Morgan Kaufmann | | Publication Date:
| May 17, 1999 | | Language:
| English | | ISBN:
| 1558605703 | | Product Length:
| 9.54 inches | | Product Width:
| 7.67 inches | | Product Height:
| 1.34 inches | | Product Weight:
| 2.49 pounds | | Package Length:
| 9.21 inches | | Package Width:
| 7.64 inches | | Package Height:
| 1.34 inches | | Package Weight:
| 2.34 pounds | | Average Customer Rating:
| based on 11 reviews |
| | | | Customer Reviews: | |
Average Customer Review:
( 11 customer reviews )
Write an online review and share your thoughts with other customers.
Most Helpful Customer Reviews
59 of 61 found the following review helpful:
The Wonderful Thing Is: It's the Only OneDec 20, 2001
By Peter Norvig This is the only book there is that will actually teach you how to build an information retrieval system (aka search engine). It discusses all the algorithms and tradeoffs, and comes with free downloadable source code to experiment with. Some of the material is standard, but covered in more implementation detail here than anywhere else. Some of the material is novel: you won't find better coverage of compression unless you hand-assemble twenty research papers, and reverse-engineer them to figure out how they're implemented. But with "Managing Gigabytes", it's all here. (Although, after a particularly envigorating discussion of how to string together a bunch of techniques to compress their corpus and save a couple 100MB, I did a check and found you could buy 512MB of RAM for less than the cost of the book. Knowledge is Power, but sometimes a little cash is more powerful.) The only negative is that this book is not called "Managing Terabytes", as the first edition promised/threatened it might be. RAM and disk are cheap, but not that cheap, and for now terabytes (and sometimes petabytes) are managed only by NASA, Google, and a few others. I can't wait to see the third edition!
15 of 15 found the following review helpful:
Good introduction to searching/indexing in data.Dec 29, 1999
By Amund Tveit MG gave a good introduction to the components of practical Information Retrieval (IR). You can clearly see that the authors have a genuine interest in the field! But, I would like some more theoretical analysis of the algorithms used(i.e. O-notation), and more focus on parallell implementations of IR systems. Another book related to the same area worth mentioning is "Modern Information Retrieval".
15 of 16 found the following review helpful:
Very clear, but misses some key real-world issuesAug 14, 2001
By Edwin Young As others have said, MG is a good introductory text for Information Retrieval. However I think it spends a little too much time on compression techniques and lacks a good discussion of incremental or on-line indexing. The book tends to assume that the set of texts to be searched is static - if new documents can be added or old ones deleted it makes the whole problem much harder and many of MG's techniques are no longer relevant. That said, I strongly look forward to Managing Terabytes (if it ever appears).
18 of 20 found the following review helpful:
Compression, Algorithms, Full Text RetrievalSep 15, 1999
Managing Gigabytes is a must read for anyone iterested in how to transmit, access, store, and search large amounts of data. I'm the President and CTO of Aladdin Systems, Inc, the creators of the StuffIt compression product line for Mac and Windows, and I find it an invaluable addition to my reference library. The authors take complex information and present it in an organized, easy to read format, suitable for novices to experts. I highly recommend this book.
11 of 11 found the following review helpful:
Best text available. Has no competition.Sep 11, 1999
This text sets the standard for future information retrieval texts and has replaced the Salton books as the canonical academic text.The second edition is highly readable and contains a thorough updating of the algorithms and data structures in the field. I like the text because of its readability, conciseness, thoroughness, and attention to detail. The comparisons of algorithms on realistic sized collections is unparalleled in other texts. I have used this text for the past 5 years in a graduate level information storage and retrieval class but I believe it has a much wider audience due to the quality of writing. Additionally, the free availability of the mg system which implements many of the best algorithms of the text allows the reader/student to take advantage of the technology without having to start from scratch. Highly recommended.
See all 11 customer reviews on Amazon.com
| | |
|