Skip to content

Commit 171a8c8

Browse files
committed
feat: Improve validation
This commit introduces two different validation modes: - Strict (default): Only allows letters, digits, hyphens - Lax: Allows any octets and just checks for the max lengths This allows domains to have an underscore character. Closes #134 BREAKING CHANGE: Introduces a dependency on the global `TextEncoder` constructor which should be available in all modern engines (see https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder). The strict validation mode (which is the default) will also be a little bit more strict since it will now also check for hyphens at the beginning or end of a domain label. It also requires top-level domain names not to be all-numeric.
1 parent 4985cc7 commit 171a8c8

File tree

6 files changed

+387
-80
lines changed

6 files changed

+387
-80
lines changed

README.md

Lines changed: 78 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Since domain name registrars organize their namespaces in different ways, it's n
1515
import { parseDomain, ParseResultType } from "parse-domain";
1616

1717
const parseResult = parseDomain(
18-
// This should be a string with basic latin characters only.
18+
// This should be a string with basic latin letters only.
1919
// More information below.
2020
"www.some.example.co.uk"
2121
);
@@ -32,7 +32,7 @@ if (parseResult.type === ParseResultType.Listed) {
3232
}
3333
```
3434

35-
This package has been designed for modern Node and browser environments, supporting both CommonJS and ECMAScript modules. It assumes an ES2015 environment with [`Symbol()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol) and [`URL()`](https://developer.mozilla.org/en-US/docs/Web/API/URL) globally available. You need to transpile it down to ES5 (e.g. by using [Babel](https://babeljs.io/)) if you need to support older environments.
35+
This package has been designed for modern Node and browser environments, supporting both CommonJS and ECMAScript modules. It assumes an ES2015 environment with [`Symbol()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol), [`URL()`](https://developer.mozilla.org/en-US/docs/Web/API/URL) and [`TextDecoder()](https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder) globally available. You need to transpile it down to ES5 (e.g. by using [Babel](https://babeljs.io/)) if you need to support older environments.
3636

3737
The list of top-level domains is stored in a [trie](https://en.wikipedia.org/wiki/Trie) data structure and serialization format to ensure the fastest lookup and the smallest possible library size. The library is side-effect free (this is important for proper [tree-shaking](https://webpack.js.org/guides/tree-shaking/)).
3838

@@ -96,7 +96,7 @@ When parsing a hostname there are 5 possible results:
9696

9797
### 👉 Invalid domains
9898

99-
The given input is first validated against [RFC 1034](https://tools.ietf.org/html/rfc1034). If the validation fails, `parseResult.type` will be `ParseResultType.Invalid`:
99+
The given input is first validated against [RFC 3696](https://datatracker.ietf.org/doc/html/rfc3696#section-2) (the domain labels are limited to basic latin letters, numbers and hyphens). If the validation fails, `parseResult.type` will be `ParseResultType.Invalid`:
100100

101101
```javascript
102102
import { parseDomain, ParseResultType } from "parse-domain";
@@ -108,6 +108,20 @@ console.log(parseResult.type === ParseResultType.Invalid); // true
108108

109109
Check out the [API](#api-ts-ValidationError) if you need more information about the validation error.
110110

111+
If you don't want the characters to be validated (e.g. because you need to allow underscores in hostnames), there's also a more relaxed validation mode (according to [RFC 2181](https://www.rfc-editor.org/rfc/rfc2181#section-11)).
112+
113+
```javascript
114+
import { parseDomain, ParseResultType, Validation } from "parse-domain";
115+
116+
const parseResult = parseDomain("_jabber._tcp.gmail.com", {
117+
validation: Validation.Lax,
118+
});
119+
120+
console.log(parseResult.type === ParseResultType.Listed); // true
121+
```
122+
123+
See also [#134](https://github.com/peerigon/parse-domain/issues/134) for the discussion.
124+
111125
### 👉 IP addresses
112126

113127
If the given input is an IP address, `parseResult.type` will be `ParseResultType.Ip`:
@@ -273,17 +287,27 @@ console.log(topLevelDomains); // []
273287
🧬 = TypeScript export
274288

275289
<h3 id="api-js-parseDomain">
276-
🧩 <code>export parseDomain(hostname: string | typeof <a href="#api-js-NO_HOSTNAME">NO_HOSTNAME</a>): <a href="#api-ts-ParseResult">ParseResult</a></code>
290+
🧩 <code>export parseDomain(hostname: string | typeof <a href="#api-js-NO_HOSTNAME">NO_HOSTNAME</a>, options?: <a href="#api-ts-ParseDomainOptions">ParseDomainOptions</a>): <a href="#api-ts-ParseResult">ParseResult</a></code>
277291
</h3>
278292

279-
Takes a hostname (e.g. `"www.example.com"`) and returns a [`ParseResult`](#api-ts-ParseResult). The hostname must only contain basic latin characters, digits, hyphens and dots. International hostnames must be puny-encoded. Does not throw an error, even with invalid input.
293+
Takes a hostname (e.g. `"www.example.com"`) and returns a [`ParseResult`](#api-ts-ParseResult). The hostname must only contain basic latin letters, digits, hyphens and dots. International hostnames must be puny-encoded. Does not throw an error, even with invalid input.
280294

281295
```javascript
282296
import { parseDomain } from "parse-domain";
283297

284298
const parseResult = parseDomain("www.example.com");
285299
```
286300

301+
Use `Validation.Lax` if you want to allow all characters:
302+
303+
```javascript
304+
import { parseDomain, Validation } from "parse-domain";
305+
306+
const parseResult = parseDomain("_jabber._tcp.gmail.com", {
307+
validation: Validation.Lax,
308+
});
309+
```
310+
287311
<h3 id="api-js-fromUrl">
288312
🧩 <code>export fromUrl(input: string): string | typeof <a href="#api-js-NO_HOSTNAME">NO_HOSTNAME</a></code>
289313
</h3>
@@ -296,6 +320,54 @@ Takes a URL-like string and tries to extract the hostname. Requires the global [
296320

297321
`NO_HOSTNAME` is a symbol that is returned by [`fromUrl`](#api-js-fromUrl) when it was not able to extract a hostname from the given string. When passed to [`parseDomain`](#api-js-parseDomain), it will always yield a [`ParseResultInvalid`](#api-ts-ParseResultInvalid).
298322

323+
<h3 id="api-ts-ParseDomainOptions">
324+
🧬 <code>export type ParseDomainOptions</code>
325+
</h3>
326+
327+
```ts
328+
export type ParseDomainOptions = {
329+
/**
330+
* If no validation is specified, Validation.Strict will be used.
331+
**/
332+
validation?: Validation;
333+
};
334+
```
335+
336+
<h3 id="api-js-Validation">
337+
🧩 <code>export Validation</code>
338+
</h3>
339+
340+
An object that holds all possible [Validation](#api-ts-Validation) `validation` values:
341+
342+
```javascript
343+
export const Validation = {
344+
/**
345+
* Allows any octets as labels
346+
* but still restricts the length of labels and the overall domain.
347+
*
348+
* @see https://www.rfc-editor.org/rfc/rfc2181#section-11
349+
**/
350+
Lax: "LAX",
351+
352+
/**
353+
* Only allows ASCII letters, digits and hyphens (aka LDH),
354+
* forbids hyphens at the beginning or end of a label
355+
* and requires top-level domain names not to be all-numeric.
356+
*
357+
* This is the default if no validation is configured.
358+
*
359+
* @see https://datatracker.ietf.org/doc/html/rfc3696#section-2
360+
*/
361+
Strict: "STRICT",
362+
};
363+
```
364+
365+
<h3 id="api-ts-Validation">
366+
🧬 <code>export Validation</code>
367+
</h3>
368+
369+
This type represents all possible `validation` values.
370+
299371
<h3 id="api-ts-ParseResult">
300372
🧬 <code>export ParseResult</code>
301373
</h3>
@@ -391,6 +463,7 @@ const ValidationErrorType = {
391463
LabelMinLength: "LABEL_MIN_LENGTH",
392464
LabelMaxLength: "LABEL_MAX_LENGTH",
393465
LabelInvalidCharacter: "LABEL_INVALID_CHARACTER",
466+
LastLabelInvalid: "LAST_LABEL_INVALID",
394467
};
395468
```
396469

package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,8 @@
2020
"import": "./build-esm/src/main.js"
2121
},
2222
"scripts": {
23-
"test": "jest",
23+
"test": "run-p test:*",
24+
"test:suite": "jest",
2425
"posttest": "run-s build posttest:*",
2526
"posttest:lint": "eslint --cache --ext js,ts *.js src bin",
2627
"build": "run-s build:*",

src/main.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@ export {
1010
ParseResultListed,
1111
} from "./parse-domain";
1212
export { fromUrl, NO_HOSTNAME } from "./from-url";
13-
export { ValidationError, ValidationErrorType } from "./sanitize";
13+
export { Validation, ValidationError, ValidationErrorType } from "./sanitize";

src/parse-domain.test.ts

Lines changed: 160 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import { parseDomain, ParseResultType } from "./parse-domain";
2-
import { ValidationErrorType } from "./sanitize";
2+
import { Validation, ValidationErrorType } from "./sanitize";
33
import { fromUrl } from "./from-url";
44

55
const ipV6Samples = [
@@ -244,25 +244,27 @@ describe(parseDomain.name, () => {
244244
});
245245
});
246246

247-
test("returns type ParseResultType.Invalid and error information for a hostname with an empty label", () => {
248-
expect(parseDomain(".example.com")).toMatchObject({
249-
type: ParseResultType.Invalid,
250-
errors: expect.arrayContaining([
251-
expect.objectContaining({
252-
type: ValidationErrorType.LabelMinLength,
253-
message:
254-
'Label "" is too short. Label is 0 octets long but should be at least 1.',
255-
column: 1,
256-
}),
257-
]),
258-
});
259-
expect(parseDomain("www..example.com")).toMatchObject({
260-
type: ParseResultType.Invalid,
261-
errors: expect.arrayContaining([
262-
expect.objectContaining({
263-
column: 5,
264-
}),
265-
]),
247+
test("returns type ParseResultType.Invalid and error information for a hostname with an empty label (both validation modes)", () => {
248+
[Validation.Lax, Validation.Strict].forEach((validation) => {
249+
expect(parseDomain(".example.com", { validation })).toMatchObject({
250+
type: ParseResultType.Invalid,
251+
errors: expect.arrayContaining([
252+
expect.objectContaining({
253+
type: ValidationErrorType.LabelMinLength,
254+
message:
255+
'Label "" is too short. Label is 0 octets long but should be at least 1.',
256+
column: 1,
257+
}),
258+
]),
259+
});
260+
expect(parseDomain("www..example.com")).toMatchObject({
261+
type: ParseResultType.Invalid,
262+
errors: expect.arrayContaining([
263+
expect.objectContaining({
264+
column: 5,
265+
}),
266+
]),
267+
});
266268
});
267269
});
268270

@@ -277,31 +279,72 @@ describe(parseDomain.name, () => {
277279
});
278280
});
279281

280-
test("returns type ParseResultType.Invalid and error information for a hostname with a label that is too long", () => {
282+
test("returns type ParseResultType.Invalid and error information for a hostname with a label that is too long (both validation modes)", () => {
281283
const labelToLong = new Array(64).fill("x").join("");
282284

283-
expect(parseDomain(`${labelToLong}.example.com`)).toMatchObject({
285+
[Validation.Lax, Validation.Strict].forEach((validation) => {
286+
expect(parseDomain(labelToLong, { validation })).toMatchObject({
287+
type: ParseResultType.Invalid,
288+
errors: expect.arrayContaining([
289+
expect.objectContaining({
290+
type: ValidationErrorType.LabelMaxLength,
291+
message: `Label "${labelToLong}" is too long. Label is 64 octets long but should not be longer than 63.`,
292+
column: 1,
293+
}),
294+
]),
295+
});
296+
expect(
297+
parseDomain(`www.${labelToLong}.example.com`, { validation })
298+
).toMatchObject({
299+
type: ParseResultType.Invalid,
300+
errors: expect.arrayContaining([
301+
expect.objectContaining({
302+
column: 5,
303+
}),
304+
]),
305+
});
306+
});
307+
// Should work with 63 octets
308+
expect(parseDomain(new Array(63).fill("x").join(""))).toMatchObject({
309+
type: ParseResultType.NotListed,
310+
});
311+
});
312+
313+
test("returns type ParseResultType.Invalid and error information for a hostname that is too long", () => {
314+
const domain = new Array(254).fill("x").join("");
315+
316+
// A single long label
317+
expect(parseDomain(new Array(254).fill("x").join(""))).toMatchObject({
284318
type: ParseResultType.Invalid,
285319
errors: expect.arrayContaining([
286320
expect.objectContaining({
287-
type: ValidationErrorType.LabelMaxLength,
288-
message: `Label "${labelToLong}" is too long. Label is 64 octets long but should not be longer than 63.`,
289-
column: 1,
321+
type: ValidationErrorType.DomainMaxLength,
322+
message: `Domain "${domain}" is too long. Domain is 254 octets long but should not be longer than 253.`,
323+
column: 254,
290324
}),
291325
]),
292326
});
293-
expect(parseDomain(`www.${labelToLong}.example.com`)).toMatchObject({
327+
328+
// Multiple labels
329+
expect(parseDomain(new Array(128).fill("x").join("."))).toMatchObject({
294330
type: ParseResultType.Invalid,
295331
errors: expect.arrayContaining([
296332
expect.objectContaining({
297-
column: 5,
333+
type: ValidationErrorType.DomainMaxLength,
298334
}),
299335
]),
300336
});
337+
338+
// Should work with 253 octets
339+
expect(parseDomain(new Array(127).fill("x").join("."))).toMatchObject({
340+
type: ParseResultType.NotListed,
341+
});
301342
});
302343

303-
test("returns type ParseResultType.Invalid and error information for a hostname that is too long", () => {
304-
const domain = new Array(127).fill("x").join(".") + "x";
344+
test("interprets the hostname as octets", () => {
345+
// The "ä" character is 2 octets long which is why we only need
346+
// 127 of them to exceed the limit
347+
const domain = new Array(127).fill("ä").join("");
305348

306349
expect(parseDomain(domain)).toMatchObject({
307350
type: ParseResultType.Invalid,
@@ -362,6 +405,90 @@ describe(parseDomain.name, () => {
362405
});
363406
});
364407

408+
test("accepts any character as labels with Validation.Lax", () => {
409+
// Trying out 2^10 characters
410+
getCharCodesUntil(2 ** 10)
411+
.map((octet) => String.fromCharCode(octet))
412+
.filter((hostname) => hostname !== ".")
413+
.forEach((hostname) => {
414+
const result = parseDomain(hostname, { validation: Validation.Lax });
415+
416+
expect(result).toMatchObject({
417+
type: ParseResultType.NotListed,
418+
});
419+
});
420+
});
421+
422+
test("returns type ParseResultType.Invalid and error information for a hostname where some labels start or end with a -", () => {
423+
expect(parseDomain("-example")).toMatchObject({
424+
type: ParseResultType.Invalid,
425+
errors: expect.arrayContaining([
426+
expect.objectContaining({
427+
type: ValidationErrorType.LabelInvalidCharacter,
428+
message:
429+
'Label "-example" contains invalid character "-" at column 1.',
430+
column: 1,
431+
}),
432+
]),
433+
});
434+
expect(parseDomain("-example.com")).toMatchObject({
435+
type: ParseResultType.Invalid,
436+
errors: expect.arrayContaining([
437+
expect.objectContaining({
438+
type: ValidationErrorType.LabelInvalidCharacter,
439+
message:
440+
'Label "-example" contains invalid character "-" at column 1.',
441+
column: 1,
442+
}),
443+
]),
444+
});
445+
expect(parseDomain("example-")).toMatchObject({
446+
type: ParseResultType.Invalid,
447+
errors: expect.arrayContaining([
448+
expect.objectContaining({
449+
type: ValidationErrorType.LabelInvalidCharacter,
450+
message:
451+
'Label "example-" contains invalid character "-" at column 8.',
452+
column: 8,
453+
}),
454+
]),
455+
});
456+
expect(parseDomain("example-.com")).toMatchObject({
457+
type: ParseResultType.Invalid,
458+
errors: expect.arrayContaining([
459+
expect.objectContaining({
460+
type: ValidationErrorType.LabelInvalidCharacter,
461+
message:
462+
'Label "example-" contains invalid character "-" at column 8.',
463+
column: 8,
464+
}),
465+
]),
466+
});
467+
});
468+
469+
test("returns type ParseResultType.Invalid and error information for a hostname where the last label just contains numbers", () => {
470+
expect(parseDomain("123")).toMatchObject({
471+
type: ParseResultType.Invalid,
472+
errors: expect.arrayContaining([
473+
expect.objectContaining({
474+
type: ValidationErrorType.LabelInvalidCharacter,
475+
message: 'Last label "123" must not be all-numeric.',
476+
column: 1,
477+
}),
478+
]),
479+
});
480+
expect(parseDomain("example.123")).toMatchObject({
481+
type: ParseResultType.Invalid,
482+
errors: expect.arrayContaining([
483+
expect.objectContaining({
484+
type: ValidationErrorType.LabelInvalidCharacter,
485+
message: 'Last label "123" must not be all-numeric.',
486+
column: 9,
487+
}),
488+
]),
489+
});
490+
});
491+
365492
test("returns type ParseResultType.Invalid and error information if the input was not domain like", () => {
366493
// @ts-expect-error This is a deliberate error for the test
367494
expect(parseDomain(undefined)).toMatchObject({
@@ -465,3 +592,7 @@ describe(parseDomain.name, () => {
465592
});
466593
});
467594
});
595+
596+
const getCharCodesUntil = (length: number) => {
597+
return Array.from({ length }, (_, i) => i);
598+
};

0 commit comments

Comments
 (0)