January 09, 2007
Posted by
Mark Reichel
/ 6:41 AM /
It was recently announced that Google Inc. obtained a patent directed to the calculation of similarity metrics for objects, including web pages. U.S. Patent No. 7,158,961, entitled “Methods and apparatus for estimating similarity,” issued on January 2, 2007, and includes 24 claims on systems and computer-implemented methods for generating a compact representation of objects. The method of claim 1 comprises the identification of a set of features corresponding to a first object, the generation of a hashing vector having n coordinates for each feature, “summing the hashing vectors to obtain a summed vector,” and the creation of “an nx-bit representation of the summed vector by calculating an x-bit value for each coordinate of the summed vector, the nx-bit representation of the summed vector defining the compact representation of the first object.” According to the patent, and from the search engine's perspective, “one problem in cataloging the large number of available web pages is that multiple ones of the web documents are often identical or nearly identical,” and that “[s]eparately cataloging similar documents is inefficient and can be frustrating for the user if, in response to a request, a list of nearly identical documents is returned.” The patent further states that “it is desirable for the search engine to identify documents that are similar or "roughly the same" so that this type of redundancy in search results can be avoided,” and that “there is a need in the art for improved techniques for determining similarity between documents.” According to the DailyTech news article (link below), several companies, including IBM and Hitachi, have also filed patent applications for “similarity-engines” over the past decade.
U.S. Patent No. 7,158,961: LINK
DailyTech News Article: LINK
U.S. Patent No. 7,158,961: LINK
DailyTech News Article: LINK
0 comments:
Post a Comment