comments | difficulty | edit_url | tags | |
---|---|---|---|---|
true |
Hard |
|
Table: logs
+-------------+---------+ | Column Name | Type | +-------------+---------+ | log_id | int | | ip | varchar | | status_code | int | +-------------+---------+ log_id is the unique key for this table. Each row contains server access log information including IP address and HTTP status code.
Write a solution to find invalid IP addresses. An IPv4 address is invalid if it meets any of these conditions:
- Contains numbers greater than
255
in any octet - Has leading zeros in any octet (like
01.02.03.04
) - Has less or more than
4
octets
Return the result table ordered by invalid_count
, ip
in descending order respectively.
The result format is in the following example.
Example:
Input:
logs table:
+--------+---------------+-------------+ | log_id | ip | status_code | +--------+---------------+-------------+ | 1 | 192.168.1.1 | 200 | | 2 | 256.1.2.3 | 404 | | 3 | 192.168.001.1 | 200 | | 4 | 192.168.1.1 | 200 | | 5 | 192.168.1 | 500 | | 6 | 256.1.2.3 | 404 | | 7 | 192.168.001.1 | 200 | +--------+---------------+-------------+
Output:
+---------------+--------------+ | ip | invalid_count| +---------------+--------------+ | 256.1.2.3 | 2 | | 192.168.001.1 | 2 | | 192.168.1 | 1 | +---------------+--------------+
Explanation:
- 256.1.2.3 is invalid because 256 > 255
- 192.168.001.1 is invalid because of leading zeros
- 192.168.1 is invalid because it has only 3 octets
The output table is ordered by invalid_count, ip in descending order respectively.
We can determine if an IP address is invalid based on the following conditions:
- The number of
.
in the IP address is not equal to$3$ ; - Any octet in the IP address starts with
0
; - Any octet in the IP address is greater than
$255$ .
Then we group the invalid IP addresses and count the occurrences of each invalid IP address invalid_count
, and finally sort by invalid_count
and ip
in descending order.
SELECT
ip,
COUNT(*) AS invalid_count
FROM logs
WHERE
LENGTH(ip) - LENGTH(REPLACE(ip, '.', '')) != 3
OR SUBSTRING_INDEX(ip, '.', 1) REGEXP '^0[0-9]'
OR SUBSTRING_INDEX(SUBSTRING_INDEX(ip, '.', 2), '.', -1) REGEXP '^0[0-9]'
OR SUBSTRING_INDEX(SUBSTRING_INDEX(ip, '.', 3), '.', -1) REGEXP '^0[0-9]'
OR SUBSTRING_INDEX(ip, '.', -1) REGEXP '^0[0-9]'
OR SUBSTRING_INDEX(ip, '.', 1) > 255
OR SUBSTRING_INDEX(SUBSTRING_INDEX(ip, '.', 2), '.', -1) > 255
OR SUBSTRING_INDEX(SUBSTRING_INDEX(ip, '.', 3), '.', -1) > 255
OR SUBSTRING_INDEX(ip, '.', -1) > 255
GROUP BY 1
ORDER BY 2 DESC, 1 DESC;
import pandas as pd
def find_invalid_ips(logs: pd.DataFrame) -> pd.DataFrame:
def is_valid_ip(ip: str) -> bool:
octets = ip.split(".")
if len(octets) != 4:
return False
for octet in octets:
if not octet.isdigit():
return False
value = int(octet)
if not 0 <= value <= 255 or octet != str(value):
return False
return True
logs["is_valid"] = logs["ip"].apply(is_valid_ip)
invalid_ips = logs[~logs["is_valid"]]
invalid_count = invalid_ips["ip"].value_counts().reset_index()
invalid_count.columns = ["ip", "invalid_count"]
result = invalid_count.sort_values(
by=["invalid_count", "ip"], ascending=[False, False]
)
return result