answersLogoWhite
C Programming

Hashing methods related to data structure in C?


Top Answer
User Avatar
Wiki User
2011-08-12 04:26:13
2011-08-12 04:26:13

While binary search trees provide logarithmic search times on average, we would like to do even better. For

integer keyed items, we can trivially get constant search time by using an array, with one position for each

possible key. However, with 232 possible keys, there's no way we could create an array large enough.

Instead, we could get almost constant search time if we create an array twice as large as our data set,

and store a value k at position kmodm, where m is the size of the array. To search for a value k, we again

compute k mod m and then check the corresponding array position. We would expect an even distribution

of values, so we expect one or two values per array position. To retain this expectation, we may need to

resize the array as we insert values. We want to keep the load factor, k

m, at about 0:5. Resizing potentially

requires us to move all values, but the total amortized cost of insertion is still constant. Searching is also

constant in this scheme.

There is the problem of multiple values mapping to the same array position, or bucket. This situation

is called a collision. There are multiple ways of resolving collisions. One way is to place a value that maps

to an occupied bucket to the next empty bucket. Another is to have linked lists at each bucket, and store

values in the lists. This is the solution we'll use.

This is ne for integer values, but what about arbitrary objects? We rst compute an integer corresponding

to the object, called a hash code, and use that to map the objects to positions. There are two

conditions on a hash code. Equal objects (as dened by the equals() method) must have equal hash codes,

and the hash code for an object cannot change while it is in a hash table. Then, assuming the hash code

can be computed quickly and distributes objects evenly in a table, this gives us constant-time searching for

arbitrary objects.

How should we compute a hash code on an object? Many objects can be represented as k-tuples of

integers. For example, an object corresponding to a point in two-dimensional space can be represented by a

double of integers, converting the

oating point x and y values bitwise to integers. A good hash code for a

k-tuple (x0; x1; :::; xk􀀀1) is

x0 an􀀀1 + x1 an􀀀2 + ::: + xn􀀀2 a + xn􀀀1

where a is a constant. Choosing a prime number minimizes the chance of a collision.

Our hash code formula works even if k is unbounded, such as in strings. Java's String class uses the

following formula:

public int hashCode() {

int code = 0;

for (int i = 0; i < length(); i++) {

code = 31 * code + charAt(i);

}

}

This is our formula, with a = 31 and xi =charAt(i).

Related Questions

User Avatar

Hashing provides a method to search for data. Hashing provides a method to search for data. Hashing provides a method to search for data. Hashing provides a method to search for data.

User Avatar

If you read up on hashing, why hashing is done, what are its uses. Then you will be able to answer your own question. More to the point you will have studied the material that your homework question is intended to make you study. It is educational.

User Avatar

Hashing is performed on arbitrary data by a hash function. A hash function is any function that can convert data to either a number or an alphanumeric code. There are possibly as many types of hashing as there are data. How precisely the hash function works depends on what data it is meant to generate a hash code from. Hashing is used for a variety of things. For example, a hash table is a data structure used for storing data in memory. Instead of iterating through the structure to find a specific item, we associate a key (hash code) to a particular item (data). A hash code can be generated from a file or disk image. If the data does not match the code, then the data is assumed to be corrupted. Hashing has the advantage of taking a larger amount of data and representing it as a smaller amount of data (hash code). The code generated is unique to the data it came from. Generating a hash code can take time however, depending on the function and the data. Some hash functions include Bernstein hash, Fowler-Noll-Vo hash, Jenkins hash, MurmurHash, Pearson hashing and Zobrist hashing.

User Avatar

Homomorphic Hashing is a algorithm technique used for verifying data.

User Avatar

if collision is occurred in hash function then we can solve this problem by using double hash function


Copyright © 2020 Multiply Media, LLC. All Rights Reserved. The material on this site can not be reproduced, distributed, transmitted, cached or otherwise used, except with prior written permission of Multiply.