/* * Copyright (C) 2011 The Guava Authors * * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except * in compliance with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software distributed under the License * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express * or implied. See the License for the specific language governing permissions and limitations under * the License. */ package com.google.common.hash; import com.google.common.annotations.Beta; import com.google.common.primitives.Ints; import java.nio.charset.Charset; /** * A hash function is a collision-averse pure function that maps an arbitrary block of * data to a number called a hash code. * *
Unpacking this definition: * *
Summarizing the last two points: "equal yield equal always; unequal yield * unequal often." This is the most important characteristic of all hash functions. * *
A high-quality hash function strives for some subset of the following virtues: * *
The primary way to provide the data that your hash function should act on is via a * {@link Hasher}. Obtain a new hasher from the hash function using {@link #newHasher}, * "push" the relevant data into it using methods like {@link Hasher#putBytes(byte[])}, * and finally ask for the {@code HashCode} when finished using {@link Hasher#hash}. (See * an {@linkplain #newHasher example} of this.) * *
If all you want to hash is a single byte array, string or {@code long} value, there * are convenient shortcut methods defined directly on {@link HashFunction} to make this * easier. * *
Hasher accepts primitive data types, but can also accept any Object of type {@code
* T} provided that you implement a {@link Funnel Funnel Compatibility note: Throughout this API, multibyte values are always
* interpreted in little-endian order. That is, hashing the byte array {@code
* {0x01, 0x02, 0x03, 0x04}} is equivalent to hashing the {@code int} value {@code
* 0x04030201}. If this isn't what you need, methods such as {@link Integer#reverseBytes}
* and {@link Ints#toByteArray} will help.
*
* Java's baked-in concept of hash codes is constrained to 32 bits, and provides no
* separation between hash algorithms and the data they act on, so alternate hash
* algorithms can't be easily substituted. Also, implementations of {@code hashCode} tend
* to be poor-quality, in part because they end up depending on other existing
* poor-quality {@code hashCode} implementations, including those in many JDK classes.
*
* {@code Object.hashCode} implementations tend to be very fast, but have weak
* collision prevention and no expectation of bit dispersion. This leaves them
* perfectly suitable for use in hash tables, because extra collisions cause only a slight
* performance hit, while poor bit dispersion is easily corrected using a secondary hash
* function (which all reasonable hash table implementations in Java use). For the many
* uses of hash functions beyond data structures, however, {@code Object.hashCode} almost
* always falls short -- hence this library.
*
* @author Kevin Bourrillion
* @since 11.0
*/
@Beta
public interface HashFunction {
/**
* Begins a new hash code computation by returning an initialized, stateful {@code
* Hasher} instance that is ready to receive data. Example: Relationship to {@link Object#hashCode}
*
* {@code
*
* HashFunction hf = Hashing.md5();
* HashCode hc = hf.newHasher()
* .putLong(id)
* .putString(name)
* .hash();}
*/
Hasher newHasher();
/**
* Begins a new hash code computation as {@link #newHasher()}, but provides a hint of the
* expected size of the input (in bytes). This is only important for non-streaming hash
* functions (hash functions that need to buffer their whole input before processing any
* of it).
*/
Hasher newHasher(int expectedInputSize);
/**
* Shortcut for {@code newHasher().putLong(input).hash()}; returns the hash code for the
* given {@code long} value, interpreted in little-endian byte order. The implementation
* might perform better than its longhand equivalent, but should not perform worse.
*/
HashCode hashLong(long input);
/**
* Shortcut for {@code newHasher().putBytes(input).hash()}. The implementation
* might perform better than its longhand equivalent, but should not perform
* worse.
*/
HashCode hashBytes(byte[] input);
/**
* Shortcut for {@code newHasher().putBytes(input, off, len).hash()}. The implementation
* might perform better than its longhand equivalent, but should not perform
* worse.
*
* @throws IndexOutOfBoundsException if {@code off < 0} or {@code off + len > bytes.length}
* or {@code len < 0}
*/
HashCode hashBytes(byte[] input, int off, int len);
/**
* Shortcut for {@code newHasher().putString(input).hash()}. The implementation might
* perform better than its longhand equivalent, but should not perform worse. Note that no
* character encoding is performed; the low byte and high byte of each character are hashed
* directly (in that order). This is equivalent to using
* {@code hashString(input, Charsets.UTF_16LE)}.
*/
HashCode hashString(CharSequence input);
/**
* Shortcut for {@code newHasher().putString(input, charset).hash()}. Characters are encoded
* using the given {@link Charset}. The implementation might perform better than its
* longhand equivalent, but should not perform worse.
*/
HashCode hashString(CharSequence input, Charset charset);
/**
* Returns the number of bits (a multiple of 32) that each hash code produced by this
* hash function has.
*/
int bits();
}