Skip to content

Revamp md5.py #8065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 46 commits into from
Apr 1, 2023
Merged
Changes from 2 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
7fcce6e
Add type hints to md5.py
tianyizheng02 Dec 31, 2022
a20ccf8
Rename some vars to snake case
tianyizheng02 Dec 31, 2022
ce2d51f
Specify functions imported from math
tianyizheng02 Dec 31, 2022
d243bf3
Rename vars and functions to be more descriptive
tianyizheng02 Jan 1, 2023
b29ab66
Make tests from test function into doctests
tianyizheng02 Jan 1, 2023
0b5e6a4
Clarify more var names
tianyizheng02 Jan 1, 2023
18d891c
Refactor some MD5 code into preprocess function
tianyizheng02 Jan 1, 2023
27a8b29
Simplify loop indices in get_block_words
tianyizheng02 Jan 1, 2023
885f116
Add more detailed comments, docs, and doctests
tianyizheng02 Jan 1, 2023
20085e2
Merge branch 'TheAlgorithms:master' into md5
tianyizheng02 Jan 2, 2023
f7ba9da
updating DIRECTORY.md
Jan 2, 2023
d289ade
updating DIRECTORY.md
Jan 2, 2023
8b44f10
Merge branch 'TheAlgorithms:master' into master
tianyizheng02 Jan 2, 2023
200fc0d
Merge branch 'TheAlgorithms:master' into master
tianyizheng02 Jan 7, 2023
0d64972
Merge branch 'TheAlgorithms:master' into md5
tianyizheng02 Jan 7, 2023
3332400
Merge branch 'TheAlgorithms:master' into md5
tianyizheng02 Jan 12, 2023
2aea9a2
updating DIRECTORY.md
Jan 12, 2023
3ff65ba
Merge branch 'TheAlgorithms:master' into master
tianyizheng02 Jan 12, 2023
9d1971b
updating DIRECTORY.md
Jan 12, 2023
f2e8fbd
Merge branch 'TheAlgorithms:master' into master
tianyizheng02 Jan 26, 2023
5f404b4
updating DIRECTORY.md
Jan 26, 2023
1b93899
Merge branch 'TheAlgorithms:master' into master
tianyizheng02 Feb 5, 2023
e133a3b
Merge branch 'TheAlgorithms:master' into md5
tianyizheng02 Feb 5, 2023
30ee318
Merge branch 'TheAlgorithms:master' into master
tianyizheng02 Mar 26, 2023
27f57c2
Merge branch 'TheAlgorithms:master' into md5
tianyizheng02 Mar 26, 2023
3fbe643
Add type hints to md5.py
tianyizheng02 Dec 31, 2022
adfe215
Rename some vars to snake case
tianyizheng02 Dec 31, 2022
17fc171
Specify functions imported from math
tianyizheng02 Dec 31, 2022
2400676
Rename vars and functions to be more descriptive
tianyizheng02 Jan 1, 2023
cd501ba
Make tests from test function into doctests
tianyizheng02 Jan 1, 2023
feefe88
Clarify more var names
tianyizheng02 Jan 1, 2023
2b7a465
Refactor some MD5 code into preprocess function
tianyizheng02 Jan 1, 2023
84f7ac3
Simplify loop indices in get_block_words
tianyizheng02 Jan 1, 2023
5fadb6e
Add more detailed comments, docs, and doctests
tianyizheng02 Jan 1, 2023
83bcabc
Merge branch 'md5' of github.com:tianyizheng02/Python into md5
tianyizheng02 Mar 26, 2023
24457b9
updating DIRECTORY.md
Jan 2, 2023
e73d826
Merge branch 'md5' of github.com:tianyizheng02/Python into md5
tianyizheng02 Mar 26, 2023
61d7761
updating DIRECTORY.md
Mar 26, 2023
4959857
Merge branch 'TheAlgorithms:master' into md5
tianyizheng02 Mar 26, 2023
c69bda1
Merge branch 'TheAlgorithms:master' into md5
tianyizheng02 Apr 1, 2023
aa1a18f
updating DIRECTORY.md
Apr 1, 2023
4bec95e
Merge branch 'TheAlgorithms:master' into md5
tianyizheng02 Apr 1, 2023
c71f64a
updating DIRECTORY.md
Apr 1, 2023
17c76ba
Convert str types to bytes
tianyizheng02 Apr 1, 2023
c775f15
Add tests comparing md5_me to hashlib's md5
tianyizheng02 Apr 1, 2023
1f3842e
Replace line-break backslashes with parentheses
tianyizheng02 Apr 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 42 additions & 33 deletions hashes/md5.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from math import sin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use https://docs.python.org/3/library/struct.html to do the endian conversations or at least use struct in the doctests for to_little_endian().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly I'm not that familiar with the struct module, but I'm not sure how to make it work with the to_little_endian() function.

To me, the problem is that the original function rearrange() doesn't actually convert strings to little-endian. Instead, it treats 32-char string inputs as if they were 32-bit bit strings, with each char as a single "bit". It then restructures the input in an "little-endian fashion": the 8 least significant "bits" come first, followed by the 8 next least significant "bits", etc. Thus it looks little-endian if you squint hard enough.

Since the inputs to rearrange()/to_little_endian() are being restructured in units far larger than a byte, I'm not sure if the struct module would work here, unless I'm misunderstanding how the module works.



def to_little_endian(string_32: str) -> str:
def to_little_endian(string_32: bytes) -> bytes:
"""
Converts the given string to little-endian in groups of 8 chars.

Expand All @@ -27,23 +27,23 @@ def to_little_endian(string_32: str) -> str:

Returns:
32-char little-endian string
>>> to_little_endian('1234567890abcdfghijklmnopqrstuvw')
'pqrstuvwhijklmno90abcdfg12345678'
>>> to_little_endian('1234567890')
>>> to_little_endian(b'1234567890abcdfghijklmnopqrstuvw')
b'pqrstuvwhijklmno90abcdfg12345678'
>>> to_little_endian(b'1234567890')
Traceback (most recent call last):
...
ValueError: Input must be of length 32
"""
if len(string_32) != 32:
raise ValueError("Input must be of length 32")

little_endian = ""
little_endian = b""
for i in [3, 2, 1, 0]:
little_endian += string_32[8 * i : 8 * i + 8]
return little_endian


def reformat_hex(i: int) -> str:
def reformat_hex(i: int) -> bytes:
"""
Converts the given non-negative integer to hex string.

Expand All @@ -63,15 +63,15 @@ def reformat_hex(i: int) -> str:
8-char little-endian hex string

>>> reformat_hex(1234)
'd2040000'
b'd2040000'
>>> reformat_hex(666)
'9a020000'
b'9a020000'
>>> reformat_hex(0)
'00000000'
b'00000000'
>>> reformat_hex(1234567890)
'd2029649'
b'd2029649'
>>> reformat_hex(1234567890987654321)
'b11c6cb1'
b'b11c6cb1'
>>> reformat_hex(-1)
Traceback (most recent call last):
...
Expand All @@ -81,13 +81,13 @@ def reformat_hex(i: int) -> str:
raise ValueError("Input must be non-negative")

hex_rep = format(i, "08x")[-8:]
little_endian_hex = ""
little_endian_hex = b""
for i in [3, 2, 1, 0]:
little_endian_hex += hex_rep[2 * i : 2 * i + 2]
little_endian_hex += hex_rep[2 * i : 2 * i + 2].encode("utf-8")
return little_endian_hex


def preprocess(message: str) -> str:
def preprocess(message: bytes) -> bytes:
"""
Preprocesses the message string:
- Convert message to bit string
Expand All @@ -111,26 +111,27 @@ def preprocess(message: str) -> str:
Returns:
processed bit string padded to a multiple of 512 chars

>>> preprocess("a") == "01100001" + "1" + ("0" * 439) + "00001000" + ("0" * 56)
>>> preprocess(b"a") == b"01100001" + b"1" + (b"0" * 439) + b"00001000" + \
(b"0" * 56)
True
>>> preprocess("") == "1" + ("0" * 447) + ("0" * 64)
>>> preprocess(b"") == b"1" + (b"0" * 447) + (b"0" * 64)
True
"""
bit_string = ""
bit_string = b""
for char in message:
bit_string += format(ord(char), "08b")
start_len = format(len(bit_string), "064b")
bit_string += format(char, "08b").encode("utf-8")
start_len = format(len(bit_string), "064b").encode("utf-8")

# Pad bit_string to a multiple of 512 chars
bit_string += "1"
bit_string += b"1"
while len(bit_string) % 512 != 448:
bit_string += "0"
bit_string += b"0"
bit_string += to_little_endian(start_len[32:]) + to_little_endian(start_len[:32])

return bit_string


def get_block_words(bit_string: str) -> Generator[list[int], None, None]:
def get_block_words(bit_string: bytes) -> Generator[list[int], None, None]:
"""
Splits bit string into blocks of 512 chars and yields each block as a list
of 32-bit words
Expand Down Expand Up @@ -160,16 +161,17 @@ def get_block_words(bit_string: str) -> Generator[list[int], None, None]:
Yields:
a list of 16 32-bit words

>>> test_string = "".join(format(n << 24, "032b") for n in range(16))
>>> test_string = "".join(format(n << 24, "032b") for n in range(16)) \
.encode("utf-8")
>>> list(get_block_words(test_string))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]]
>>> list(get_block_words(test_string * 4)) == [list(range(16))] * 4
True
>>> list(get_block_words("1" * 512)) == [[4294967295] * 16]
>>> list(get_block_words(b"1" * 512)) == [[4294967295] * 16]
True
>>> list(get_block_words(""))
>>> list(get_block_words(b""))
[]
>>> list(get_block_words("1111"))
>>> list(get_block_words(b"1111"))
Traceback (most recent call last):
...
ValueError: Input must have length that's a multiple of 512
Expand Down Expand Up @@ -292,7 +294,7 @@ def left_rotate_32(i: int, shift: int) -> int:
return ((i << shift) ^ (i >> (32 - shift))) % 2**32


def md5_me(message: str) -> str:
def md5_me(message: bytes) -> bytes:
"""
Returns the 32-char MD5 hash of a given message.

Expand All @@ -304,12 +306,19 @@ def md5_me(message: str) -> str:
Returns:
32-char MD5 hash string

>>> md5_me("")
'd41d8cd98f00b204e9800998ecf8427e'
>>> md5_me("The quick brown fox jumps over the lazy dog")
'9e107d9d372bb6826bd81d3542a419d6'
>>> md5_me("The quick brown fox jumps over the lazy dog.")
'e4d909c290d0fb1ca068ffaddf22cbd0'
>>> md5_me(b"")
b'd41d8cd98f00b204e9800998ecf8427e'
>>> md5_me(b"The quick brown fox jumps over the lazy dog")
b'9e107d9d372bb6826bd81d3542a419d6'
>>> md5_me(b"The quick brown fox jumps over the lazy dog.")
b'e4d909c290d0fb1ca068ffaddf22cbd0'

>>> import hashlib
>>> from string import ascii_letters
>>> msgs = [b"", ascii_letters.encode("utf-8"), "Üñîçø∂é".encode("utf-8"),
... b"The quick brown fox jumps over the lazy dog."]
>>> all(md5_me(msg) == hashlib.md5(msg).hexdigest().encode("utf-8") for msg in msgs)
True
"""

# Convert to bit string, add padding and append message length
Expand Down