Had an idea - looking for a math buff to tell me if it's possible with today's technology. - Test

newer
IPv6 CPE Survey - Initial Results...

Had an idea - looking for a math buff to tell me if it's possible with today's technology.

Landon Stewart

18 May 2011 18 May '11

8:07 p.m.

Lets say you had a file that was 1,000,000,000 characters consisting of 8,000,000,000bits. What if instead of transferring that file through the interwebs you transmitted a mathematical equation to tell a computer on the other end how to *construct* that file. First you'd feed the file into a cruncher of some type to reduce the pattern of 8,000,000,000 bits into an equation somehow. Sure this would take time, I realize that. The equation would then be transmitted to the other computer where it would use its mad-math-skillz to *figure out the answer* which would theoretically be the same pattern of bits. Thus the same file would emerge on the other end. The real question here is how long would it take for a regular computer to do this kind of math? Just a weird idea I had. If it's a good idea then please consider this intellectual property. LOL -- Landon Stewart <LStewart@SUPERB.NET> SuperbHosting.Net by Superb Internet Corp. Toll Free (US/Canada): 888-354-6128 x 4199 Direct: 206-438-5879 Web hosting and more "Ahead of the Rest": http://www.superbhosting.net

Show replies by date

John Adams

18 May 18 May

8:12 p.m.

We call that "Compression." -j On Wed, May 18, 2011 at 1:07 PM, Landon Stewart <lstewart@superb.net> wrote:

...

Lets say you had a file that was 1,000,000,000 characters consisting of 8,000,000,000bits. What if instead of transferring that file through the interwebs you transmitted a mathematical equation to tell a computer on the other end how to *construct* that file. First you'd feed the file into a cruncher of some type to reduce the pattern of 8,000,000,000 bits into an equation somehow. Sure this would take time, I realize that. The equation would then be transmitted to the other computer where it would use its mad-math-skillz to *figure out the answer* which would theoretically be the same pattern of bits. Thus the same file would emerge on the other end.

The real question here is how long would it take for a regular computer to do this kind of math?

Just a weird idea I had. If it's a good idea then please consider this intellectual property. LOL

-- Landon Stewart <LStewart@SUPERB.NET> SuperbHosting.Net by Superb Internet Corp. Toll Free (US/Canada): 888-354-6128 x 4199 Direct: 206-438-5879 Web hosting and more "Ahead of the Rest": http://www.superbhosting.net

Jack Carrozzo

8:15 p.m.

That's basically what compression is. Except rarely (read: never) does your Real Data (tm) fit just one equation, hence the various compression algorithms that look for patterns etc etc. -J On Wed, May 18, 2011 at 4:07 PM, Landon Stewart <lstewart@superb.net> wrote:

...

Lets say you had a file that was 1,000,000,000 characters consisting of 8,000,000,000bits. What if instead of transferring that file through the interwebs you transmitted a mathematical equation to tell a computer on the other end how to *construct* that file. First you'd feed the file into a cruncher of some type to reduce the pattern of 8,000,000,000 bits into an equation somehow. Sure this would take time, I realize that. The equation would then be transmitted to the other computer where it would use its mad-math-skillz to *figure out the answer* which would theoretically be the same pattern of bits. Thus the same file would emerge on the other end.

The real question here is how long would it take for a regular computer to do this kind of math?

Just a weird idea I had. If it's a good idea then please consider this intellectual property. LOL

-- Landon Stewart <LStewart@SUPERB.NET> SuperbHosting.Net by Superb Internet Corp. Toll Free (US/Canada): 888-354-6128 x 4199 Direct: 206-438-5879 Web hosting and more "Ahead of the Rest": http://www.superbhosting.net

Barry Shein

19 May 19 May

12:54 a.m.

"Compression" is one result. But this is sometimes referred to as the "inverse problem": Given a set of data tell me a function which fits it (or fits it to some tolerance.) It's important in statistics and all kinds of data analyses. Another area is fourier transforms which basically sums sine waves of different amp/freq until you reach the desired fit. This is also the basis of a lot of noise filtering algorithms, throw out the frequencies you don't want, such as 60HZ or 50HZ, or all those smaller than you consider interesting, high-freq "noise", or low freq noise, whatever. Another buzz term is "data entropy", randomness. If the data were perfectly random then there exists no such function which can be represented in less bits than the original data, which is why you can't compress a compressed file indefinitely and also why it's recommended you compress files before encrypting them, it's hard to begin cracking a file which is pretty close to random. And this is what you do when you give something like a MARC or ISBN or Dewey Decimal index to a librarian and s/he brings you the book you want. Effectively you've represented the entire book as that small "number". Imagine if you had to recite the entire text of a book to find it unambiguously! See: Transfinite Number Systems. -- -Barry Shein The World | bzs@TheWorld.com | http://www.TheWorld.com Purveyors to the Trade | Voice: 800-THE-WRLD | Dial-Up: US, PR, Canada Software Tool & Die | Public Access Internet | SINCE 1989 *oo*

Michael Holstein

18 May 18 May

8:18 p.m.

...

Just a weird idea I had. If it's a good idea then please consider this intellectual property.

It's easy .. the zeros are fatter than the ones. http://dilbert.com/strips/comic/2004-12-09/ ~Mike.

Christopher Morrow

8:33 p.m.

On Wed, May 18, 2011 at 4:18 PM, Michael Holstein <michael.holstein@csuohio.edu> wrote:

...

...
Just a weird idea I had. If it's a good idea then please consider this intellectual property.

It's easy .. the zeros are fatter than the ones.

no no no.. it's simply, since the OP posited a math solution, md5. ship the size of file + hash, compute file on the other side. All files can be moved anywhere regardless of the size of the file in a single packet. The solution is left as an exercise for the reader.

Leo Bicknell

9:03 p.m.

In a message written on Wed, May 18, 2011 at 04:33:34PM -0400, Christopher Morrow wrote:

...

no no no.. it's simply, since the OP posited a math solution, md5. ship the size of file + hash, compute file on the other side. All files can be moved anywhere regardless of the size of the file in a single packet.

The solution is left as an exercise for the reader.

Bah, you should include the solution, it's so trivial. Generate all possible files and then do an index lookup on the MD5. It's a little CPU heavy, but darn simple to code. You can even stop when you get a match, which turns out to be a HUGE optimization. :) -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/

Chris Owen

9:22 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 18, 2011, at 4:03 PM, Leo Bicknell wrote:

...

Bah, you should include the solution, it's so trivial.

Generate all possible files and then do an index lookup on the MD5. It's a little CPU heavy, but darn simple to code.

Isn't this essentially what Dropbox has been doing in many cases? Chris - -- - ------------------------------------------------------------------------- Chris Owen - Garden City (620) 275-1900 - Lottery (noun): President - Wichita (316) 858-3000 - A stupidity tax Hubris Communications Inc www.hubris.net - ------------------------------------------------------------------------- -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (Darwin) Comment: Public Key: http://home.hubris.net/owenc/pgpkey.txt Comment: Public Key ID: 0xB513D9DD iEYEARECAAYFAk3UOKIACgkQElUlCLUT2d3YoQCfee38nKuXD5O4C2w5VXUWszF1 EjcAmwfyytDgwmQDpJsQZSpl03ddGbVv =3sX9 -----END PGP SIGNATURE-----

Jamie Bowden

19 May 19 May

1:48 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possiblewith today's technology.

I know you're having fun with him, but I think what the original poster had in mind was more like thinking of a file as just a string of numbers. Create an equation that generates that string of numbers, send equation, regenerate string on other end. Of course, if it was that easy, someone would already have done it (or who knows, IBM might have done this decades ago, put it on a virtual shelf in their IP closet, and forgot about it...apparently they do that sort of thing all the time). Compression is mathematically akin to cryptography, with the compressed file being a huge seed with a standard algorithm (and a very weak one by modern cryptography standards, sure, but imagine someone trying to figure out a .zip file in the 50s). Jamie -----Original Message----- From: Leo Bicknell [mailto:bicknell@ufp.org] Sent: Wednesday, May 18, 2011 5:03 PM To: nanog Subject: Re: Had an idea - looking for a math buff to tell me if it's possiblewith today's technology. In a message written on Wed, May 18, 2011 at 04:33:34PM -0400, Christopher Morrow wrote:

...

no no no.. it's simply, since the OP posited a math solution, md5. ship the size of file + hash, compute file on the other side. All files can be moved anywhere regardless of the size of the file in a single packet.

The solution is left as an exercise for the reader.

Steven Bellovin

6:01 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possiblewith today's technology.

On May 19, 2011, at 9:48 35AM, Jamie Bowden wrote:

...

I know you're having fun with him, but I think what the original poster had in mind was more like thinking of a file as just a string of numbers. Create an equation that generates that string of numbers, send equation, regenerate string on other end. Of course, if it was that easy, someone would already have done it

Yes. I guess I was too terse with my answer, but this is known as Kolmogorv complexity. It's a well-known concept, and in general you can't construct such equations/programs/what-have-yous. Wikipedia even gives a proof of that... --Steve Bellovin, https://www.cs.columbia.edu/~smb

Lyndon Nerenberg (VE6BBM/VE7TFX)

18 May 18 May

9:03 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible

...

no no no.. it's simply, since the OP posited a math solution, md5. ship the size of file + hash, compute file on the other side. All files can be moved anywhere regardless of the size of the file in a single packet.

MD5 compression is lossy in this context. Given big enough files you're going to start seeing hash collisions.

Dorn Hetzel

9:31 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible

...

MD5 compression is lossy in this context. Given big enough files you're going to start seeing hash collisions.

Actually, for a n-bit hash, I can guarantee to find collisions in the

universe of files just n+1 bits in size :)

Philip Dorr

9:19 p.m.

On Wed, May 18, 2011 at 3:33 PM, Christopher Morrow <morrowc.lists@gmail.com> wrote:

...

On Wed, May 18, 2011 at 4:18 PM, Michael Holstein <michael.holstein@csuohio.edu> wrote:

...
...
Just a weird idea I had. If it's a good idea then please consider this intellectual property.

It's easy .. the zeros are fatter than the ones.

no no no.. it's simply, since the OP posited a math solution, md5. ship the size of file + hash, compute file on the other side. All files can be moved anywhere regardless of the size of the file in a single packet.

The solution is left as an exercise for the reader.

You would need a lot of computing power to generate a file of any decent size. If you want to be evil then you could send just a md5 hash and a sha512 hash (or some other hash that would not have a collision at the same time except when correct)

Andrew Mulholland

19 May 19 May

2:19 p.m.

On Wed, May 18, 2011 at 9:33 PM, Christopher Morrow <morrowc.lists@gmail.com> wrote:

...

no no no.. it's simply, since the OP posited a math solution, md5. ship the size of file + hash, compute file on the other side. All files can be moved anywhere regardless of the size of the file in a single packet.

only problem is that of hash collision then.

Dorn Hetzel

18 May 18 May

8:18 p.m.

On Wed, May 18, 2011 at 4:07 PM, Landon Stewart <lstewart@superb.net> wrote:

...

Lets say you had a file that was 1,000,000,000 characters consisting of 8,000,000,000bits. What if instead of transferring that file through the interwebs you transmitted a mathematical equation to tell a computer on the other end how to *construct* that file. First you'd feed the file into a cruncher of some type to reduce the pattern of 8,000,000,000 bits into an equation somehow. Sure this would take time, I realize that. The equation would then be transmitted to the other computer where it would use its mad-math-skillz to *figure out the answer* which would theoretically be the same pattern of bits. Thus the same file would emerge on the other end.

The real question here is how long would it take for a regular computer to do this kind of math?

The real question is whether this is possible. And the short answer is No, at least not in general.

Now if your file has patterns that make it compressible, you can make it smaller, but not all files can be compressed this way, at least not in a way that makes them smaller. To understand why, consider the case of a file of one byte, or 8 bits. There are 256 possible files of this size, 00000000, 00000001, 00000010, ..., 11111101, 11111110, 1111111. Since each code we send must generate a unique file (or what's the point, we need 256 different codes to represent each possible file), but the shortest general way to write 256 different codes is still 8 bits long. Now, we can use coding schemes and say that the one-bit value 1 represents 11111111 because that file happens a lot. Then we could use 01 to represent something else, but we can't use 1 at the beginning again because we couldn't tell that from the file named by 1. Bottom line, for some codes to be shorter than the file they represent, others must be longer... So if files have a lot of repetition, you can get a win, but for "random" data, not so much :(

Aria Stewart

8:35 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible with today's technology.

On Wednesday, May 18, 2011 at 2:18 PM, Dorn Hetzel wrote:

...

On Wed, May 18, 2011 at 4:07 PM, Landon Stewart <lstewart@superb.net> wrote:

...
Lets say you had a file that was 1,000,000,000 characters consisting of 8,000,000,000bits. What if instead of transferring that file through the interwebs you transmitted a mathematical equation to tell a computer on the other end how to *construct* that file. First you'd feed the file into a cruncher of some type to reduce the pattern of 8,000,000,000 bits into an equation somehow. Sure this would take time, I realize that. The equation would then be transmitted to the other computer where it would use its mad-math-skillz to *figure out the answer* which would theoretically be the same pattern of bits. Thus the same file would emerge on the other end.

The real question here is how long would it take for a regular computer to do this kind of math?

The real question is whether this is possible. And the short answer is No, at least not in general. Exactly: What you run up against is that you can reduce extraneous information, and compress redundant information, but if you actually have dense information, you're not gonna get any better.

So easy to compress a billion bytes of JSON or XML significantly; not so much a billion bytes of already tightly coded movie. ---- Aria Stewart

Stefan Fouant

8:19 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible with today's technology.

...

-----Original Message----- From: Landon Stewart [mailto:lstewart@superb.net] Sent: Wednesday, May 18, 2011 4:08 PM To: nanog Subject: Had an idea - looking for a math buff to tell me if it's possible with today's technology.

Lets say you had a file that was 1,000,000,000 characters consisting of 8,000,000,000bits. What if instead of transferring that file through the interwebs you transmitted a mathematical equation to tell a computer on the other end how to *construct* that file. First you'd feed the file into a cruncher of some type to reduce the pattern of 8,000,000,000 bits into an equation somehow. Sure this would take time, I realize that. The equation would then be transmitted to the other computer where it would use its mad-math-skillz to *figure out the answer* which would theoretically be the same pattern of bits. Thus the same file would emerge on the other end.

Not exactly the same thing, but application acceleration of this sort has been around for some time - http://www.riverbed.com/us/ http://www.juniper.net/us/en/products-services/application-acceleration/wxc- series/ http://www.cisco.com/en/US/products/ps5680/Products_Sub_Category_Home.html Stefan Fouant

Joe Loiacono

8:47 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible with today's technology.

"Stefan Fouant" <sfouant@shortestpathfirst.net> wrote on 05/18/2011 04:19:26 PM:

...

...
Lets say you had a file that was 1,000,000,000 characters consisting of

http://www.riverbed.com/us/

http://www.juniper.net/us/en/products-services/application-acceleration/wxc-

...

series/

http://www.cisco.com/en/US/products/ps5680/Products_Sub_Category_Home.html You also need to include Silver Peak. http://www.silver-peak.com/ Saw a very interesting presentation on their techniques. Joe

Christopher Morrow

9:29 p.m.

On Wed, May 18, 2011 at 4:47 PM, Joe Loiacono <jloiacon@csc.com> wrote:

...

You also need to include Silver Peak.

only if you like random failures.

Robert Bonomi

8:27 p.m.

Wildly off-topic for the NANOG mailing-list, as it has -zero- relevance to 'network operations'

...

Date: Wed, 18 May 2011 13:07:32 -0700 Subject: Had an idea - looking for a math buff to tell me if it's possible with today's technology. From: Landon Stewart <lstewart@superb.net> To: nanog <nanog@nanog.org>

Lets say you had a file that was 1,000,000,000 characters consisting of 8,000,000,000bits. What if instead of transferring that file through the interwebs you transmitted a mathematical equation to tell a computer on the other end how to *construct* that file. First you'd feed the file into a cruncher of some type to reduce the pattern of 8,000,000,000 bits into an equation somehow. Sure this would take time, I realize that. The equation would then be transmitted to the other computer where it would use its mad-math-skillz to *figure out the answer* which would theoretically be the same pattern of bits. Thus the same file would emerge on the other end.

The real question here is how long would it take for a regular computer to do this kind of math?

I have, on my computer, an encoder/decoder that does _exactly_ that. Both the encoder and decoder are _amazingly_ fast -- as fast as a file copy, in fact. the average size of the tranmsitted files, across all possible input files is exactly 100% of the size of the input files. (one *cannot* do better than that, across all possible inputs -- see the 'counting' problem, in data-compression theory)

...

Just a weird idea I had. If it's a good idea then please consider this intellectual property. LOL

'Weird' is one word for it. You might want to read up on the subject of 'data compression', to get an idea of how things work. See also "polynominial curve-fitting", for the real-world limits of your theory. for the real-world limits of your theory.

Steven Bellovin

8:33 p.m.

On May 18, 2011, at 4:07 32PM, Landon Stewart wrote:

...

Lets say you had a file that was 1,000,000,000 characters consisting of 8,000,000,000bits. What if instead of transferring that file through the interwebs you transmitted a mathematical equation to tell a computer on the other end how to *construct* that file. First you'd feed the file into a cruncher of some type to reduce the pattern of 8,000,000,000 bits into an equation somehow. Sure this would take time, I realize that. The equation would then be transmitted to the other computer where it would use its mad-math-skillz to *figure out the answer* which would theoretically be the same pattern of bits. Thus the same file would emerge on the other end.

The real question here is how long would it take for a regular computer to do this kind of math?

Just a weird idea I had. If it's a good idea then please consider this intellectual property. LOL

http://en.wikipedia.org/wiki/Kolmogorov_complexity --Steve Bellovin, https://www.cs.columbia.edu/~smb

John Lee

8:39 p.m.

The concept is called fractals where you can compress the image and send the values and recreate the image. There was a body of work on the subject, I would say in the mid to late eighties where two Georgia Tech professors started a company doing it. John (ISDN) Lee On Wed, May 18, 2011 at 4:07 PM, Landon Stewart <lstewart@superb.net> wrote:

...

Lets say you had a file that was 1,000,000,000 characters consisting of 8,000,000,000bits. What if instead of transferring that file through the interwebs you transmitted a mathematical equation to tell a computer on the other end how to *construct* that file. First you'd feed the file into a cruncher of some type to reduce the pattern of 8,000,000,000 bits into an equation somehow. Sure this would take time, I realize that. The equation would then be transmitted to the other computer where it would use its mad-math-skillz to *figure out the answer* which would theoretically be the same pattern of bits. Thus the same file would emerge on the other end.

The real question here is how long would it take for a regular computer to do this kind of math?

Just a weird idea I had. If it's a good idea then please consider this intellectual property. LOL

-- Landon Stewart <LStewart@SUPERB.NET> SuperbHosting.Net by Superb Internet Corp. Toll Free (US/Canada): 888-354-6128 x 4199 Direct: 206-438-5879 Web hosting and more "Ahead of the Rest": http://www.superbhosting.net

William Pitcock

8:40 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible with today's technology.

On Wed, 18 May 2011 13:07:32 -0700 Landon Stewart <lstewart@superb.net> wrote:

...

Lets say you had a file that was 1,000,000,000 characters consisting of 8,000,000,000bits. What if instead of transferring that file through the interwebs you transmitted a mathematical equation to tell a computer on the other end how to *construct* that file. First you'd feed the file into a cruncher of some type to reduce the pattern of 8,000,000,000 bits into an equation somehow. Sure this would take time, I realize that. The equation would then be transmitted to the other computer where it would use its mad-math-skillz to *figure out the answer* which would theoretically be the same pattern of bits. Thus the same file would emerge on the other end.

The real question here is how long would it take for a regular computer to do this kind of math?

Just a weird idea I had. If it's a good idea then please consider this intellectual property. LOL

I believe they call this 'compression'. William

George Bonser

8:44 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possiblewith today's technology.

...

-----Original Message----- From: Landon Stewart [mailto:lstewart@superb.net] Sent: Wednesday, May 18, 2011 1:08 PM To: nanog Subject: Had an idea - looking for a math buff to tell me if it's possiblewith today's technology.

Lets say you had a file that was 1,000,000,000 characters consisting of 8,000,000,000bits. What if instead of transferring that file through the interwebs you transmitted a mathematical equation to tell a computer on the other end how to *construct* that file.

Congratulations. You have just invented compression.

Landon Stewart

9:34 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possiblewith today's technology.

On Wed, May 18, 2011 at 1:44 PM, George Bonser <gbonser@seven.com> wrote:

...

Congratulations. You have just invented compression.

Woot. -- Landon Stewart <LStewart@SUPERB.NET> SuperbHosting.Net by Superb Internet Corp. Toll Free (US/Canada): 888-354-6128 x 4199 Direct: 206-438-5879 Web hosting and more "Ahead of the Rest": http://www.superbhosting.net

Heath Jones

11:26 p.m.

I wonder if this is possible: - Take a hash of the original file. Keep a counter. - Generate data in some sequential method on sender side (for example simply starting at 0 and iterating until you generate the same as the original data) - Each time you iterate, take the hash of the generated data. If it matches the hash of the original file, increment counter. - Send the hash and the counter value to recipient. - Recipient performs same sequential generation method, stopping when counter reached. Any thoughts? Heath On 18 May 2011 21:07, Landon Stewart <lstewart@superb.net> wrote:

...

Lets say you had a file that was 1,000,000,000 characters consisting of 8,000,000,000bits. What if instead of transferring that file through the interwebs you transmitted a mathematical equation to tell a computer on the other end how to *construct* that file. First you'd feed the file into a cruncher of some type to reduce the pattern of 8,000,000,000 bits into an equation somehow. Sure this would take time, I realize that. The equation would then be transmitted to the other computer where it would use its mad-math-skillz to *figure out the answer* which would theoretically be the same pattern of bits. Thus the same file would emerge on the other end.

The real question here is how long would it take for a regular computer to do this kind of math?

Just a weird idea I had. If it's a good idea then please consider this intellectual property. LOL

-- Landon Stewart <LStewart@SUPERB.NET> SuperbHosting.Net by Superb Internet Corp. Toll Free (US/Canada): 888-354-6128 x 4199 Direct: 206-438-5879 Web hosting and more "Ahead of the Rest": http://www.superbhosting.net

Valdis.Kletnieks＠vt.edu

11:42 p.m.

On Thu, 19 May 2011 00:26:26 BST, Heath Jones said:

...

I wonder if this is possible:

- Take a hash of the original file. Keep a counter. - Generate data in some sequential method on sender side (for example simply starting at 0 and iterating until you generate the same as the original data) - Each time you iterate, take the hash of the generated data. If it matches the hash of the original file, increment counter. - Send the hash and the counter value to recipient. - Recipient performs same sequential generation method, stopping when counter reached.

MD5 is a 128 bit hash. 2^128 is 340,282,366,920,938,463,463,374,607,431,768,211,456 - you're welcome to iterate that many times to find a duplicate. You may get lucky and get a hit in the first trillion or so attempts - but you may get unlucky and not get a hit until the *last* few trillion attempts. On average you'll have to iterate about half that huge number before you get a hit. And it's lossy - if you hash all the possible 4K blocks with MD5, you'll find that each of those 2^128 hashes has been hit about 256 times - and no indication in the hash of *which* of the 256 colliding 4K blocks you have on this iteration. (The only reason that companies can do block-level de-duplication by saving a hash as an index to one copy shared by all blocks with the same hash value is because you have a *very small* fraction of the possibilities covered, so if you saved a 4K block of data from somebody's system32 folder under a given MD5 hash, it's *far* more likely that another block with that same hash is from another copy of another identical system32 folder, than it is an actual accidental collision.) Protip: A good hash function is by definition one-way - given the data, it's easy to generate the hash - but reversing it to find the "pre-image" (the data that *generated* the hash) is massively difficult.

Heath Jones

19 May 19 May

12:01 a.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible

My point here is it IS possible to transfer just a hash and counter value and effectively generate identical data at the remote end. The limit that will be hit is the difficulty of generating and comparing hash values with current processing power. I'm proposing iterating through generated data up until the actual data. It's not even a storage issue, as once you have incremented the data you don't need to store old data or hash values - just the counter. No massive hash tables. It's a CPU issue. Heath On 19 May 2011 00:42, <Valdis.Kletnieks@vt.edu> wrote:

...

On Thu, 19 May 2011 00:26:26 BST, Heath Jones said:

...
I wonder if this is possible:

- Take a hash of the original file. Keep a counter. - Generate data in some sequential method on sender side (for example simply starting at 0 and iterating until you generate the same as the original data) - Each time you iterate, take the hash of the generated data. If it matches the hash of the original file, increment counter. - Send the hash and the counter value to recipient. - Recipient performs same sequential generation method, stopping when counter reached.

MD5 is a 128 bit hash.

2^128 is 340,282,366,920,938,463,463,374,607,431,768,211,456 - you're welcome to iterate that many times to find a duplicate. You may get lucky and get a hit in the first trillion or so attempts - but you may get unlucky and not get a hit until the *last* few trillion attempts. On average you'll have to iterate about half that huge number before you get a hit.

And it's lossy - if you hash all the possible 4K blocks with MD5, you'll find that each of those 2^128 hashes has been hit about 256 times - and no indication in the hash of *which* of the 256 colliding 4K blocks you have on this iteration. (The only reason that companies can do block-level de-duplication by saving a hash as an index to one copy shared by all blocks with the same hash value is because you have a *very small* fraction of the possibilities covered, so if you saved a 4K block of data from somebody's system32 folder under a given MD5 hash, it's *far* more likely that another block with that same hash is from another copy of another identical system32 folder, than it is an actual accidental collision.)

Protip: A good hash function is by definition one-way - given the data, it's easy to generate the hash - but reversing it to find the "pre-image" (the data that *generated* the hash) is massively difficult.

Aria Stewart

12:04 a.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible

On Wednesday, May 18, 2011 at 6:01 PM, Heath Jones wrote:

...

My point here is it IS possible to transfer just a hash and counter value and effectively generate identical data at the remote end. The limit that will be hit is the difficulty of generating and comparing hash values with current processing power.

I'm proposing iterating through generated data up until the actual data. It's not even a storage issue, as once you have incremented the data you don't need to store old data or hash values - just the counter. No massive hash tables.

It's a CPU issue.

Google "Birthday paradox" and "hash collision" ---- Aria Stewart

Justin Cook

12:10 a.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible

...

My point here is it IS possible to transfer just a hash and counter value and effectively generate identical data at the remote end. The limit that will be hit is the difficulty of generating and comparing hash values with current processing power.

I'm proposing iterating through generated data up until the actual data. It's not even a storage issue, as once you have incremented the data you don't need to store old data or hash values - just the counter. No massive hash tables.

It's a CPU issue.

Heath

On 19 May 2011 00:42, <Valdis.Kletnieks@vt.edu> wrote:

...
On Thu, 19 May 2011 00:26:26 BST, Heath Jones said:

...
I wonder if this is possible:

- Take a hash of the original file. Keep a counter. - Generate data in some sequential method on sender side (for example simply starting at 0 and iterating until you generate the same as the original data) - Each time you iterate, take the hash of the generated data. If it matches the hash of the original file, increment counter. - Send the hash and the counter value to recipient. - Recipient performs same sequential generation method, stopping when counter reached.

MD5 is a 128 bit hash.

2^128 is 340,282,366,920,938,463,463,374,607,431,768,211,456 - you're welcome to iterate that many times to find a duplicate. You may get lucky and get a hit in the first trillion or so attempts - but you may get unlucky and not get a hit until the *last* few trillion attempts. On average you'll have to iterate about half that huge number before you get a hit.

And it's lossy - if you hash all the possible 4K blocks with MD5, you'll find that each of those 2^128 hashes has been hit about 256 times - and no indication in the hash of *which* of the 256 colliding 4K blocks you have on this iteration. (The only reason that companies can do block-level de-duplication by saving a hash as an index to one copy shared by all blocks with the same hash value is because you have a *very small* fraction of

Why is this on nanog? Yes it is "possible". But the CPU use and time will be absurd compared to just sending the data across the network. I would say attempting this with even a small file will end up laughable. Passwords are just several bytes and have significant lifetimes. -- Justin Cook On 19 May 2011 01:03, "Heath Jones" <hj1980@gmail.com> wrote: the

...

...
possibilities covered, so if you saved a 4K block of data from somebody's system32 folder under a given MD5 hash, it's *far* more likely that another block with that same hash is from another copy of another identical system32 folder, than it is an actual accidental collision.)

Protip: A good hash function is by definition one-way - given the data, it's easy to generate the hash - but reversing it to find the "pre-image" (the data that *generated* the hash) is massively difficult.

Valdis.Kletnieks＠vt.edu

2:21 a.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible

On Thu, 19 May 2011 01:01:43 BST, Heath Jones said:

...

My point here is it IS possible to transfer just a hash and counter value and effectively generate identical data at the remote end.

Nope. Let's use phone numbers as an example. I want to send you the phone number 540-231-6000. The hash function is "number mod 17 plus 5". So 5402316000 mod 17 plus 5 is '7'. Yes, it's a poor hash function, except it has two nice features - it can be worked with pencil and paper or a calculator, and it has similar output distributions to really good hash functions (math geeks would say it's an "onto function", but not a "one-to-one" function). http://www.regentsprep.org/Regents/math/algtrig/ATP5/OntoFunctions.htm Go read that, and get your mind wrapped around onto and one-to-one. Almost all good hashes are onto, and almost none are one-to-one. OK. counter = 0. Hash that, we got 5. increment and hash, we get 6. Increment and hash, we got 7. If we keep incrementing and hashing, we'll also get 7 for 19, 36, 53, 70, and roughly 317,783,289 other numbers before you get to my phone number. Now if I send you 2 and 7, how do you get that phone number back out, and be sure you wanted *that* phone number and not 212-555-3488, which *also* ends up with a hash of 7, so you'd send a counter of 2? Or a number in Karubadam, Tajikistan that starts with +992 3772 but also hashes to 7? The problem is that if the number of input values is longer than the hash output, there *will* be collisions. The hash function above generates 17 numbers from 5 to 22 - if you try to hash an 18th number, it *has* to collide with a number already used. Think a game of musical chairs, which is interesting only because it's an "onto" function (every chair gets a butt mapped to it), but it's not one-to-one (not all chairs have *exactly one* butt aimed at them). (And defining the hash function so that it's one-to-one and every possible input value generates a different output value doesn't work either - because at that point, the only counter that generates the same hash as the number you're trying to send *is that number*. So if 5552316000 generates a hash value of 8834253743, you'll hash 0, 1, 2,3, ... and only get that same hash again when you get to the phone number. Then you send me "5552316000,8834253743" and I hash some 5,552,315,999 other numbers till I reach the phone number.. which you sent me already as the counter value. tl;dr: If the hash function is onto but not one-to-one, you get collisions that you can't resolve. And if the hash function *is* one-to-one, you end up sending a counter that's equal to the data.

Heath Jones

2:42 a.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible

...

My point here is it IS possible to transfer just a hash and counter value and effectively generate identical data at the remote end.

Nope. Let's use phone numbers as an example. I want to send you the phone

...

number 540-231-6000. The hash function is "number mod 17 plus 5". So 5402316000 mod 17 plus 5 is '7'.

...

OK. counter = 0. Hash that, we got 5. increment and hash, we get 6. Increment and hash, we got 7. If we keep incrementing and hashing, we'll also get 7 for 19, 36, 53, 70, and roughly 317,783,289 other numbers before you get to my phone number.

Now if I send you 2 and 7, how do you get that phone number back out, and be sure you wanted *that* phone number and not 212-555-3488, which *also* ends up with a hash of 7, so you'd send a counter of 2?

The correct values I would send for that hash function are 7 and the approximate 317783289, the counter is incremented each time a data value is reached with a matching hash to the data that is to be communicated, *not hashing of the counter*.. Example: I want to send you the number 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000. The MD5 hash of this is f59a3651eafa7c4dbbb547dd7d6b41d7. I generate data 0,1,2,3,4,5.. all the way up to 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000, observing the hash value of the data just generated each time. Whenever the hash matches f59a3651eafa7c4dbbb547dd7d6b41d7 , I increment a counter. Once I have reached the number I want to send you, I send the hash value and the counter value. You perform the same function starting at 0 and working your way up until you have a matching counter value. The number of collisions in the range 0 -> target is represented by the counter value, and as long as both sides are performing the same sequence this will work. Obviously this is completely crazy and would never happen with current processing power... It's just theoretical nonsense, but answers the OP's question.

Dan Collins

4:07 a.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible

On Wed, May 18, 2011 at 10:42 PM, Heath Jones <hj1980@gmail.com> wrote:

...

Example: I want to send you the number 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000. The MD5 hash of this is f59a3651eafa7c4dbbb547dd7d6b41d7. I generate data 0,1,2,3,4,5.. all the way up to 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000, observing the hash value of the data just generated each time. Whenever the hash matches f59a3651eafa7c4dbbb547dd7d6b41d7 , I increment a counter. Once I have reached the number I want to send you, I send the hash value and the counter value.

You perform the same function starting at 0 and working your way up until you have a matching counter value. The number of collisions in the range 0 -> target is represented by the counter value, and as long as both sides are performing the same sequence this will work.

Obviously this is completely crazy and would never happen with current processing power... It's just theoretical nonsense, but answers the OP's question.

The point here, however, is that for most cases, the hash f59a3651eafa7c4dbbb547dd7d6b41d7 and the length of the counter will amount to sending more data than just transmitting the number. For example, MD5 is a 16 byte hash. For a 20 byte value, you'll need to transmit a 16 byte hash, plus a counter. On average, of all 20 byte values, where there are 2^128 hash outputs, each hash output will have 2^32 possible input values, and a counter to store that will need to be 32 bits long. So you're back where you started. Sometimes this might even take more space, say if there are 0 strings which hash to 0000...0 and 2^33 which hash to 0000...1 and 2^32 that hash to each other value, if you're trying to transmit the last item that hashes to 0000...1, you'll actually need more space. Your compression algorithm just became an inflation algorithm. --Dan

Heath Jones

12:03 a.m.

My point here is it IS possible to transfer just a hash and counter value and effectively generate identical data at the remote end. The limit that will be hit is the difficulty of generating and comparing hash values with current processing power. I'm proposing iterating through generated data up until the actual data. It's not even a storage issue, as once you have incremented the data you don't need to store old data or hash values - just the counter. No massive hash tables. It's a CPU issue. On 19 May 2011 00:42, <Valdis.Kletnieks@vt.edu> wrote:

...

On Thu, 19 May 2011 00:26:26 BST, Heath Jones said:

...
I wonder if this is possible:

- Take a hash of the original file. Keep a counter. - Generate data in some sequential method on sender side (for example simply starting at 0 and iterating until you generate the same as the original data) - Each time you iterate, take the hash of the generated data. If it matches the hash of the original file, increment counter. - Send the hash and the counter value to recipient. - Recipient performs same sequential generation method, stopping when counter reached.

MD5 is a 128 bit hash.

2^128 is 340,282,366,920,938,463,463,374,607,431,768,211,456 - you're welcome to iterate that many times to find a duplicate. You may get lucky and get a hit in the first trillion or so attempts - but you may get unlucky and not get a hit until the *last* few trillion attempts. On average you'll have to iterate about half that huge number before you get a hit.

And it's lossy - if you hash all the possible 4K blocks with MD5, you'll find that each of those 2^128 hashes has been hit about 256 times - and no indication in the hash of *which* of the 256 colliding 4K blocks you have on this iteration. (The only reason that companies can do block-level de-duplication by saving a hash as an index to one copy shared by all blocks with the same hash value is because you have a *very small* fraction of the possibilities covered, so if you saved a 4K block of data from somebody's system32 folder under a given MD5 hash, it's *far* more likely that another block with that same hash is from another copy of another identical system32 folder, than it is an actual accidental collision.)

Protip: A good hash function is by definition one-way - given the data, it's easy to generate the hash - but reversing it to find the "pre-image" (the data that *generated* the hash) is massively difficult.

Christopher Morrow

12:41 a.m.

On Wed, May 18, 2011 at 8:03 PM, Heath Jones <hj1980@gmail.com> wrote:

...

My point here is it IS possible to transfer just a hash and counter value and effectively generate identical data at the remote end. The limit that will be hit is the difficulty of generating and comparing hash values with current processing power.

I'm proposing iterating through generated data up until the actual data. It's not even a storage issue, as once you have incremented the data you don't need to store old data or hash values - just the counter. No massive hash tables.

It's a CPU issue.

i'd note it took you many more packets than my example of roughly the same thing. if you really want to save bandwidth, my 1 packet answer is the best answer.

Brett Frankenberger

2:52 a.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible with today's technology.

On Thu, May 19, 2011 at 12:26:26AM +0100, Heath Jones wrote:

...

I wonder if this is possible:

- Take a hash of the original file. Keep a counter. - Generate data in some sequential method on sender side (for example simply starting at 0 and iterating until you generate the same as the original data) - Each time you iterate, take the hash of the generated data. If it matches the hash of the original file, increment counter. - Send the hash and the counter value to recipient. - Recipient performs same sequential generation method, stopping when counter reached.

Any thoughts?

That will work. Of course, the CPU usage will be overwhelming -- longer than the age of the universe to do a large file -- but, theoretically, with enough CPU power, it will work. For a 8,000,000,000 bit file and a 128 bit hash, you will need a counter of at least 7,999,999,872 bits to cover the number of possible collisions. So you will need at leat 7,999,999,872 + 128 = 8,000,000,000 bits to send your 8,000,000,000 bit file. If your goal is to reduce the number of bits you send, this wouldn't be a good choice. -- Brett

Heath Jones

3:03 a.m.

Ha! I was wondering this the whole time - if the size of the counter would make it a zero sum game. That sux! :) On 19 May 2011 03:52, Brett Frankenberger <rbf+nanog@panix.com> wrote:

...

On Thu, May 19, 2011 at 12:26:26AM +0100, Heath Jones wrote:

...
I wonder if this is possible:

- Take a hash of the original file. Keep a counter. - Generate data in some sequential method on sender side (for example simply starting at 0 and iterating until you generate the same as the original data) - Each time you iterate, take the hash of the generated data. If it matches the hash of the original file, increment counter. - Send the hash and the counter value to recipient. - Recipient performs same sequential generation method, stopping when counter reached.

Any thoughts?

That will work. Of course, the CPU usage will be overwhelming -- longer than the age of the universe to do a large file -- but, theoretically, with enough CPU power, it will work.

For a 8,000,000,000 bit file and a 128 bit hash, you will need a counter of at least 7,999,999,872 bits to cover the number of possible collisions.

So you will need at leat 7,999,999,872 + 128 = 8,000,000,000 bits to send your 8,000,000,000 bit file. If your goal is to reduce the number of bits you send, this wouldn't be a good choice.

-- Brett

Leo Bicknell

1:08 p.m.

In a message written on Wed, May 18, 2011 at 09:52:22PM -0500, Brett Frankenberger wrote:

...

That will work. Of course, the CPU usage will be overwhelming -- longer than the age of the universe to do a large file -- but, theoretically, with enough CPU power, it will work.

You have a different definition of "work" than I do. If it can't finish before the universe ends I don't think it works. :) -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/

Leigh Porter

1:12 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible with today's technology.

...

-----Original Message----- From: Leo Bicknell [mailto:bicknell@ufp.org] Sent: 19 May 2011 14:10 To: nanog Subject: Re: Had an idea - looking for a math buff to tell me if it's possible with today's technology.

In a message written on Wed, May 18, 2011 at 09:52:22PM -0500, Brett Frankenberger wrote:

...
That will work. Of course, the CPU usage will be overwhelming -- longer than the age of the universe to do a large file -- but, theoretically, with enough CPU power, it will work.

You have a different definition of "work" than I do. If it can't finish before the universe ends I don't think it works. :)

You obviously do not read enough SciFi. By then (whenever then is) sub picoseconds optical quantum computers will be able to solve such problems before you knew they were problems ;-) -- Leigh Porter ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________

Chrisjfenton

12:35 a.m.

try itu v.42bis Iridescent iPhone +1 972 757 8894 On May 18, 2011, at 15:07, Landon Stewart <lstewart@superb.net> wrote:

...

Lets say you had a file that was 1,000,000,000 characters consisting of 8,000,000,000bits. What if instead of transferring that file through the interwebs you transmitted a mathematical equation to tell a computer on the other end how to *construct* that file. First you'd feed the file into a cruncher of some type to reduce the pattern of 8,000,000,000 bits into an equation somehow. Sure this would take time, I realize that. The equation would then be transmitted to the other computer where it would use its mad-math-skillz to *figure out the answer* which would theoretically be the same pattern of bits. Thus the same file would emerge on the other end.

The real question here is how long would it take for a regular computer to do this kind of math?

Just a weird idea I had. If it's a good idea then please consider this intellectual property. LOL

-- Landon Stewart <LStewart@SUPERB.NET> SuperbHosting.Net by Superb Internet Corp. Toll Free (US/Canada): 888-354-6128 x 4199 Direct: 206-438-5879 Web hosting and more "Ahead of the Rest": http://www.superbhosting.net

Eu-Ming Lee

20 May 20 May

6:46 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible with today's technology.

To do this, you only need 2 numbers: the nth digit of pi and the number of digits. Simply convert your message into a single extremely long integer. Somewhere, in the digits of pi, you will find a matching series of digits the same as your integer! Decompressing the number is relatively easy after some sort-of recent advances in our understanding of pi. Finding out what those 2 numbers are--- well, we still have a ways to go on that. Despite the ridiculousness of this example, it does illustrate to the author that there are extremes of compression. The "single mathematical formula" compression method is possible, even trivial. However, the computation time for compression may be unreasonably large. Here is another ridiculous way to compress data: Convert your data into a series of coordinates to a mandelbrot fractal set. The final picture is a fixed size, regardless of the size of your starting data set. From that final picture, you should be able to retrieve your original starting data set. Good luck!

Brett Frankenberger

6:53 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible?with today's technology.

On Fri, May 20, 2011 at 06:46:45PM +0000, Eu-Ming Lee wrote:

...

To do this, you only need 2 numbers: the nth digit of pi and the number of digits.

Simply convert your message into a single extremely long integer. Somewhere, in the digits of pi, you will find a matching series of digits the same as your integer!

Decompressing the number is relatively easy after some sort-of recent advances in our understanding of pi.

Finding out what those 2 numbers are--- well, we still have a ways to go on that.

Even if those problems were solved, you'd need (on average) just as many bits to represent which digit of pi to start with as you'd need to represent the original message. -- Brett

Paul Graydon

7:34 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible?with today's technology.

On 05/20/2011 08:53 AM, Brett Frankenberger wrote:

...

On Fri, May 20, 2011 at 06:46:45PM +0000, Eu-Ming Lee wrote:

...
To do this, you only need 2 numbers: the nth digit of pi and the number of digits.

Simply convert your message into a single extremely long integer. Somewhere, in the digits of pi, you will find a matching series of digits the same as your integer!

Decompressing the number is relatively easy after some sort-of recent advances in our understanding of pi.

Finding out what those 2 numbers are--- well, we still have a ways to go on that. Even if those problems were solved, you'd need (on average) just as many bits to represent which digit of pi to start with as you'd need to represent the original message.

-- Brett Not quite sure I follow that. "Start at position xyz, carry on for 10000 bits" shouldn't be as long as telling it all 10000 bits?

Paul

Brett Frankenberger

7:43 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible?with today's technology.

On Fri, May 20, 2011 at 09:34:59AM -1000, Paul Graydon wrote:

...

On 05/20/2011 08:53 AM, Brett Frankenberger wrote:

...
On Fri, May 20, 2011 at 06:46:45PM +0000, Eu-Ming Lee wrote:

...
To do this, you only need 2 numbers: the nth digit of pi and the number of digits.

Simply convert your message into a single extremely long integer. Somewhere, in the digits of pi, you will find a matching series of digits the same as your integer!

Decompressing the number is relatively easy after some sort-of recent advances in our understanding of pi.

Finding out what those 2 numbers are--- well, we still have a ways to go on that. Even if those problems were solved, you'd need (on average) just as many bits to represent which digit of pi to start with as you'd need to represent the original message.

...

Not quite sure I follow that. "Start at position xyz, carry on for 10000 bits" shouldn't be as long as telling it all 10000 bits?

I don't know about "should", but it *will* be when "xyz" is greater than 2^10000 (or about 10^3000). Your intuition is probably telling you that "xyz" won't likely be a 3000 digit (or longer) number, but if so, your intuition is wrong. -- Brett

Ken Chase

7:44 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible?with today's technology.

On Fri, May 20, 2011 at 09:34:59AM -1000, Paul Graydon said:

...

Not quite sure I follow that. "Start at position xyz, carry on for 10000 bits" shouldn't be as long as telling it all 10000 bits?

what position # do you think your exact 10000 bits will appear at? (infact, mathies, whats the probability density function for string of digits length N appearing in pi's digits per M digits?) find M/N and there's your answer - might well be cheaper to express the 10000 bits themselves, than a 100,000 bit long position # in pi. you cant exabyte-attack all possible integers, ya know. /kc -- Ken Chase - ken@heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.

Doug Barton

10:13 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible?with today's technology.

On 5/20/2011 12:44 PM, Ken Chase wrote:

...

On Fri, May 20, 2011 at 09:34:59AM -1000, Paul Graydon said:

...
Not quite sure I follow that. "Start at position xyz, carry on for 10000 bits" shouldn't be as long as telling it all 10000 bits?

what position # do you think your exact 10000 bits will appear at?

(infact, mathies, whats the probability density function for string of digits length N appearing in pi's digits per M digits?)

find M/N and there's your answer - might well be cheaper to express the 10000 bits themselves, than a 100,000 bit long position # in pi.

Blah. I seriously hate extending this silliness but I can't resist pointing out something that might be useful to someone to solve a real problem someday. Who in their right mind would represent a string of 10**3000 numbers as the full string in what's supposed to be a compression algorithm? And yes, I'm pretty sure I just suggested the proper solution, as did the reference to a 255 bit array, but just in case ... Assume that your string starts at precisely digit number 18,000,000. Advance to position 2**24, advance 1,222,784 digits further, begin recording. Obviously better/more interesting models could be developed by someone who actually cared. :) Doug (you're welcome) -- Nothin' ever doesn't change, but nothin' changes much. -- OK Go Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/

mikea

7:48 p.m.

On Fri, May 20, 2011 at 09:34:59AM -1000, Paul Graydon wrote:

...

On 05/20/2011 08:53 AM, Brett Frankenberger wrote:

...
On Fri, May 20, 2011 at 06:46:45PM +0000, Eu-Ming Lee wrote:

...
To do this, you only need 2 numbers: the nth digit of pi and the number of digits.

Simply convert your message into a single extremely long integer. Somewhere, in the digits of pi, you will find a matching series of digits the same as your integer!

Decompressing the number is relatively easy after some sort-of recent advances in our understanding of pi.

Finding out what those 2 numbers are--- well, we still have a ways to go on that. Even if those problems were solved, you'd need (on average) just as many bits to represent which digit of pi to start with as you'd need to represent the original message.

-- Brett Not quite sure I follow that. "Start at position xyz, carry on for 10000 bits" shouldn't be as long as telling it all 10000 bits?

This depends strongly on the size of the number expressing "position xyz". Pi is infinitely long, so there is no guarantee that for some random string which can be found starting at "position xyz" in, say, the binary, decimal, or hexadecimal expansion of pi, xyz can be expressed in fewer than 10000 (or indeed any fixed number N) bits. -- Mike Andrews, W5EGO mikea@mikea.ath.cx Tired old sysadmin

Valdis.Kletnieks＠vt.edu

7:48 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible?with today's technology.

On Fri, 20 May 2011 09:34:59 -1000, Paul Graydon said:

...

Not quite sure I follow that. "Start at position xyz, carry on for 10000 bits" shouldn't be as long as telling it all 10000 bits?

The problem is that the length of 'xyz' will probably be on the same order of magnitude as the length of your message - if it's only 2-3 digits long, you'll probably find a matching location in the first 100 or so digits. But if you need a run that's an exact match for an entire 20K email, you're going to have to go down a long ways to find a match. (protip - compressing the email text first is a big win here)

Paul Timmins

8:32 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible?with today's technology.

On 05/20/2011 03:34 PM, Paul Graydon wrote:

...

On 05/20/2011 08:53 AM, Brett Frankenberger wrote:

...
Even if those problems were solved, you'd need (on average) just as many bits to represent which digit of pi to start with as you'd need to represent the original message.

-- Brett Not quite sure I follow that. "Start at position xyz, carry on for 10000 bits" shouldn't be as long as telling it all 10000 bits?

Currently we have a compression algorithm for doing this already in widespread use. We create a list of numbers ranging from 1 to 255 and then provide an index into that array. We save space by assuming it's a single character.

Sudeep Khuraijam

9:16 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible?with today's technology.

I could not help but admire nanog in its full form ;) and I cannot resist anymore. Allow me to suggest the EPR paradox machine. The cost of regenerating unpredictable information is inefficient by orders of magnitude, but wait... isn't it what we are trying to solve? On May 20, 2011, at 1:32 PM, Paul Timmins wrote: On 05/20/2011 03:34 PM, Paul Graydon wrote: On 05/20/2011 08:53 AM, Brett Frankenberger wrote: Even if those problems were solved, you'd need (on average) just as many bits to represent which digit of pi to start with as you'd need to represent the original message. -- Brett Not quite sure I follow that. "Start at position xyz, carry on for 10000 bits" shouldn't be as long as telling it all 10000 bits? Currently we have a compression algorithm for doing this already in widespread use. We create a list of numbers ranging from 1 to 255 and then provide an index into that array. We save space by assuming it's a single character. ____________________________________________ Sudeep Khuraijam | Netops | liveops | Office 408 8442511 | Mobile 408 666 9987 skhuraijam@liveops.com<mailto:skhuraijam@liveops.com> | aim: skhuraijam

Marshall Eubanks

21 May 21 May

3:50 p.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible?with today's technology.

On May 20, 2011, at 5:16 PM, Sudeep Khuraijam wrote:

...

I could not help but admire nanog in its full form ;) and I cannot resist anymore. Allow me to suggest the EPR paradox machine.

The cost of regenerating unpredictable information is inefficient by orders of magnitude, but wait... isn't it what we are trying to solve?

On May 20, 2011, at 1:32 PM, Paul Timmins wrote:

On 05/20/2011 03:34 PM, Paul Graydon wrote: On 05/20/2011 08:53 AM, Brett Frankenberger wrote: Even if those problems were solved, you'd need (on average) just as many bits to represent which digit of pi to start with as you'd need to represent the original message.

-- Brett Not quite sure I follow that. "Start at position xyz, carry on for 10000 bits" shouldn't be as long as telling it all 10000 bits?

Yes, it will be as long or longer (on average), because you have to represent position XYZ in some fashion, and send that representation to the decoder, and it can easily be longer than the original message. Suppose that it takes 20,000 bits to represent XYZ. How have you saved bits ? Having XYZ be longer than the original message is just as likely as having it be shorter. The same problem applies to the original suggestion. You will not (on average) save bits. If typical messages are not totally random, you can compress by considering the nature of that non-randomness, and tailoring your compression accordingly. These schemes are using random strings / hashes for their compression, and thus will (on average) not save bits even if a message is highly non-random. Regards Marshall

...

Currently we have a compression algorithm for doing this already in widespread use. We create a list of numbers ranging from 1 to 255 and then provide an index into that array. We save space by assuming it's a single character.

____________________________________________ Sudeep Khuraijam | Netops | liveops | Office 408 8442511 | Mobile 408 666 9987 skhuraijam@liveops.com<mailto:skhuraijam@liveops.com> | aim: skhuraijam

Gregory Edigarov

23 May 23 May

7:46 a.m.

New subject: Had an idea - looking for a math buff to tell me if it's possible with today's technology.

On Wed, 18 May 2011 13:07:32 -0700 Landon Stewart <lstewart@superb.net> wrote:

...

Lets say you had a file that was 1,000,000,000 characters consisting of 8,000,000,000bits. What if instead of transferring that file through the interwebs you transmitted a mathematical equation to tell a computer on the other end how to *construct* that file. First you'd feed the file into a cruncher of some type to reduce the pattern of 8,000,000,000 bits into an equation somehow. Sure this would take time, I realize that. The equation would then be transmitted to the other computer where it would use its mad-math-skillz to *figure out the answer* which would theoretically be the same pattern of bits. Thus the same file would emerge on the other end.

The real question here is how long would it take for a regular computer to do this kind of math?

Just a weird idea I had. If it's a good idea then please consider this intellectual property. LOL

42 :-)

5152

Age (days ago)

5157

Last active (days ago)

List overview

Download

51 comments

37 participants

participants (37)

Andrew Mulholland
Aria Stewart
Barry Shein
Brett Frankenberger
Chris Owen
Chrisjfenton
Christopher Morrow
Dan Collins
Dorn Hetzel
Doug Barton
Eu-Ming Lee
George Bonser
Gregory Edigarov
Heath Jones
Jack Carrozzo
Jamie Bowden
Joe Loiacono
John Adams
John Lee
Justin Cook
Ken Chase
Landon Stewart
Leigh Porter
Leo Bicknell
Lyndon Nerenberg (VE6BBM/VE7TFX)
Marshall Eubanks
Michael Holstein
mikea
Paul Graydon
Paul Timmins
Philip Dorr
Robert Bonomi
Stefan Fouant
Steven Bellovin
Sudeep Khuraijam
Valdis.Kletnieks＠vt.edu
William Pitcock