Posts Topics Forums Images
Search videos from message boards Videos Search messages from microblogs Microblogs Search messages from imdb.com Imdb Search messages from yuku.com Yuku Search messages from lefora.com (free forums) Lefora
My account: Login | Sign Up
Loading... 

Thread: Very Large Non-Dynamic Table for Lookups

Started 2 months, 1 week ago by Eidetic Hans
I have a very simple table: Field A, Field B No more data will be written to the table, only lookups are needed. The Table will consist of at least 500 million records but could possibly hit around 1 billion. What would be the most efficient method in MySQL for performing these lookups? Multiple tables? One large table? Indexing strategies? Field A would be unique. This...
Site: MySQL AB - MySQL Forums  MySQL AB - MySQL Forums - site profile
Forum: Optimizer  Optimizer - forum profile
Total authors: 2 authors
Total thread posts: 8 posts
Thread activity: no new posts during last week
Domain info for: mysql.com

Other posts in this thread:

Rick James replied 2 months, 1 week ago
You will be looking up by A, not B, correct? InnoDB, with PRIMARY KEY(A). No advantage for splitting the table. Please do SHOW CREATE TABLE so I can check for other issues.

Eidetic Hans replied 2 months, 1 week ago
CREATE TABLE `EMPLOYEE`.`IDENT` ( `SSN` varchar(11) NOT NULL, `USERID` varchar(5) NOT NULL, PRIMARY KEY (`SSN`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; The USERID was a company identifier. It's not necessarily unique because USERIDs were recycled (e.g. a new employee might have used an ex-employee's USERID).

Rick James replied 2 months, 1 week ago
At 1B rows, plan on about 100GB of disk space needed. Because of the way BTrees work in InnoDB, and assuming innodb_buffer_pool_size is big enough, plan on 1 disk hit per random fetch. innodb_buffer_pool_size should be about 70% of available RAM. The equivalent MyISAM table would take about 35GB; about evenly split between data and index. Even so, it might be slower because it would ...

Eidetic Hans replied 2 months, 1 week ago
You're right, not all SSNs have been allocated. I actually only have around 10,000,000 but the integrity can't be vouched for. The reason it may go up is because I can generate SSNs and randomly assign UIDs for future projects. What I'm mainly interested in is using MySQL as a basic retrieval system for large data sets (not a full transactional or even relational system). The company or ...

Rick James replied 2 months, 1 week ago
Look at it this way. Let's say you have 1GB for innodb_buffer_pool_size (a practical limit on a 2GB machine) and the data is 100GB. Now, let's say you want to fetch a _random_ SSN. I do mean random; you can't predict what value it will have. What are the odds of that one being in cache at the moment? 1GB/100GB = 1%. That is, fetch 100 _random_ SSNs, 99 will hit the disk, 1 will be in ram....

Eidetic Hans replied 2 months, 1 week ago
No names. Not even certain whether the SSNs are all valid. It doesn't matter, though. Let's just assume that it's a randomly generated anonymous SSN death index, which is probably closest to the truth at this point. The memory analysis for a random fetch definitely made that clear. It just seems that no matter whether the SSN is random that if the SSN is sequentially ordered as a 9-...

Rick James replied 2 months, 1 week ago
All the 11122xxxx numbers were issued over the last century from Bangor Maine. 11123xxxx came from Portland, Maine over the same time period. (I am making up the specifics; but maybe you get the idea.) If your searches are by geographic area, then they will nicely cluster. If they are by age (eg those born in 1899), they will be scattered "randomly". If you need to look up by userid,...

 

Top contributing authors

Name
Posts
Eidetic Hans
4
user's latest post:
Re: Very Large Non-Dynamic Table...
Published (2009-10-26 12:19:00)
No names. Not even certain whether the SSNs are all valid. It doesn't matter, though. Let's just assume that it's a randomly generated anonymous SSN death index, which is probably closest to the truth at this point. The memory analysis for a random fetch definitely made that clear. It just seems that no matter whether the SSN is random that if the SSN is sequentially ordered as a 9-digit number such that 111223333 <...
Rick James
4
user's latest post:
Re: Very Large Non-Dynamic Table...
Published (2009-10-26 20:03:00)
All the 11122xxxx numbers were issued over the last century from Bangor Maine. 11123xxxx came from Portland, Maine over the same time period. (I am making up the specifics; but maybe you get the idea.) If your searches are by geographic area, then they will nicely cluster. If they are by age (eg those born in 1899), they will be scattered "randomly". If you need to look up by userid, you need another index. That would add...

Related threads on "MySQL AB - MySQL Forums":

Related threads on other sites:

Thread profile page for "Very Large Non-Dynamic Table for Lookups" on http://www.mysql.com. This report page is a snippet summary view from a single thread "Very Large Non-Dynamic Table for Lookups", located on the Message Board at http://www.mysql.com. This thread profile page shows the thread statistics for: Total Authors, Total Thread Posts, and Thread Activity