0

URL Standard

 1 year ago
source link: https://url.spec.whatwg.org/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

URL StandardFile an issue about the selected text

Abstract

The URL Standard defines URLs, domains, IP addresses, the application/x-www-form-urlencoded format, and their API.

Table of Contents

Goals

The URL standard takes the following approach towards making URLs fully interoperable:

  • Align RFC 3986 and RFC 3987 with contemporary implementations and obsolete the RFCs in the process. (E.g., spaces, other "illegal" code points, query encoding, equality, canonicalization, are all concepts not entirely shared, or defined.) URL parsing needs to become as solid as HTML parsing. [RFC3986] [RFC3987]

  • Standardize on the term URL. URI and IRI are just confusing. In practice a single algorithm is used for both so keeping them distinct is not helping anyone. URL also easily wins the search result popularity contest.

  • Supplanting Origin of a URI [sic]. [RFC6454]

  • Define URL’s existing JavaScript API in full detail and add enhancements to make it easier to work with. Add a new URL object as well for URL manipulation without usage of HTML elements. (Useful for JavaScript worker environments.)

  • Ensure the combination of parser, serializer, and API guarantee idempotence. For example, a non-failure result of a parse-then-serialize operation will not change with any further parse-then-serialize operations applied to it. Similarly, manipulating a non-failure result through the API will not change from applying any number of serialize-then-parse operations to it.

As the editors learn more about the subject matter the goals might increase in scope somewhat.

1. Infrastructure

This specification depends on Infra. [INFRA]

Some terms used in this specification are defined in the following standards and specifications:


To serialize an integer#serialize-an-integerReferenced in:, represent it as the shortest possible decimal number.

1.1. Writing

A validation error#validation-errorReferenced in: indicates a mismatch between input and valid input. User agents, especially conformance checkers, are encouraged to report them somewhere.

A validation error does not mean that the parser terminates. Termination of a parser is always stated explicitly, e.g., through a return statement.

It is useful to signal validation errors as error-handling can be non-intuitive, legacy user agents might not implement correct error-handling, and the intent of what is written might be unclear to other developers.

Error type Error description Failure
IDNA
domain-to-ASCII#validation-error-domain-to-asciiReferenced in:

Unicode ToASCII records an error or returns the empty string. [UTS46]

If details about Unicode ToASCII errors are recorded, user agents are encouraged to pass those along.

Yes
domain-to-Unicode#domain-to-unicodeReferenced in:

Unicode ToUnicode records an error. [UTS46]

The same considerations as with domain-to-ASCII apply.

·
Host parsing
domain-invalid-code-point#domain-invalid-code-pointReferenced in:

The input’s host contains a forbidden domain code point.

Hosts are percent-decoded before being processed when the URL is special, which would result in the following host portion becoming "exa#mple.org" and thus triggering this error.

"https://exa%23mple.org"

Yes
host-invalid-code-point#host-invalid-code-pointReferenced in:

An opaque host (in a URL that is not special) contains a forbidden host code point.

"foo://exa[mple.org"

Yes
IPv4-empty-part#ipv4-empty-partReferenced in:

An IPv4 address ends with a U+002E (.).

"https://127.0.0.1./"

·
IPv4-too-many-parts#ipv4-too-many-partsReferenced in:

An IPv4 address does not consist of exactly 4 parts.

"https://1.2.3.4.5/"

Yes
IPv4-non-numeric-part#ipv4-non-numeric-partReferenced in:

An IPv4 address part is not numeric.

"https://test.42"

Yes
IPv4-non-decimal-part#ipv4-non-decimal-partReferenced in:

The IPv4 address contains numbers expressed using hexadecimal or octal digits.

"https://127.0.0x0.1"

·
IPv4-out-of-range-part#ipv4-out-of-range-partReferenced in:

An IPv4 address part exceeds 255.

"https://255.255.4000.1"

Yes
(only if applicable to the last part)
IPv6-unclosed#ipv6-unclosedReferenced in:

An IPv6 address is missing the closing U+005D (]).

"https://[::1"

Yes
IPv6-invalid-compression#ipv6-invalid-compressionReferenced in:

An IPv6 address begins with improper compression.

"https://[:1]"

Yes
IPv6-too-many-pieces#ipv6-too-many-piecesReferenced in:

An IPv6 address contains more than 8 pieces.

"https://[1:2:3:4:5:6:7:8:9]"

Yes
IPv6-multiple-compression#ipv6-multiple-compressionReferenced in:

An IPv6 address is compressed in more than one spot.

"https://[1::1::1]"

Yes
IPv6-invalid-code-point#ipv6-invalid-code-pointReferenced in:

An IPv6 address contains a code point that is neither an ASCII hex digit nor a U+003A (:). Or it unexpectedly ends.

"https://[1:2:3!:4]"

"https://[1:2:3:]"

Yes
IPv6-too-few-pieces#ipv6-too-few-piecesReferenced in:

An uncompressed IPv6 address contains fewer than 8 pieces.

"https://[1:2:3]"

Yes
IPv4-in-IPv6-too-many-pieces#ipv4-in-ipv6-too-many-piecesReferenced in:

An IPv6 address with IPv4 address syntax: the IPv6 address has more than 6 pieces.

"https://[1:1:1:1:1:1:1:127.0.0.1]"

Yes
IPv4-in-IPv6-invalid-code-point#ipv4-in-ipv6-invalid-code-pointReferenced in:

An IPv6 address with IPv4 address syntax:

  • An IPv4 part is empty or contains a non-ASCII digit.
  • An IPv4 part contains a leading 0.
  • There are too many IPv4 parts.

"https://[ffff::.0.0.1]"

"https://[ffff::127.0.xyz.1]"

"https://[ffff::127.0xyz]"

"https://[ffff::127.00.0.1]"

"https://[ffff::127.0.0.1.2]"

Yes
IPv4-in-IPv6-out-of-range-part#ipv4-in-ipv6-out-of-range-partReferenced in:

An IPv6 address with IPv4 address syntax: an IPv4 part exceeds 255.

"https://[ffff::127.0.0.4000]"

Yes
IPv4-in-IPv6-too-few-parts#ipv4-in-ipv6-too-few-partsReferenced in:

An IPv6 address with IPv4 address syntax: an IPv4 address contains too few parts.

"https://[ffff::127.0.0]"

Yes
URL parsing
invalid-URL-unit#invalid-url-unitReferenced in:

A code point is found that is not a URL unit.

"https://example.org/>"

" https://example.org "

"ht
tps://example.org
"

"https://example.org/%s"

·
special-scheme-missing-following-solidus#special-scheme-missing-following-solidusReferenced in:

The input’s scheme is not followed by "//".

"file:c:/my-secret-folder"

"https:example.org"

·
missing-scheme-non-relative-URL#missing-scheme-non-relative-urlReferenced in:

The input is missing a scheme, because it does not begin with an ASCII alpha, and either no base URL was provided or the base URL cannot be used as a base URL because it has an opaque path.

Input’s scheme is missing and no base URL is given:

Input’s scheme is missing, but the base URL has an opaque path.

Yes
invalid-reverse-solidus#invalid-reverse-solidusReferenced in:

The URL has a special scheme and it uses U+005C (\) instead of U+002F (/).

"https://example.org\path\to\file"

·
invalid-credentials#invalid-credentialsReferenced in:

The input includes credentials.

"https://[email protected]"

"https://user:pass@"

Yes
(only if there is no host)
host-missing#host-missingReferenced in:

The input has a special scheme, but does not contain a host.

"https://#fragment"

"https://:443"

Yes
port-out-of-range#port-out-of-rangeReferenced in:

The input’s port is too big.

"https://example.org:70000"

Yes
port-invalid#port-invalidReferenced in:

The input’s port is invalid.

"https://example.org:7z"

Yes
file-invalid-Windows-drive-letter#file-invalid-windows-drive-letterReferenced in:

The input is a relative-URL string that starts with a Windows drive letter and the base URL’s scheme is "file".

·
file-invalid-Windows-drive-letter-host#file-invalid-windows-drive-letter-hostReferenced in:

A file: URL’s host is a Windows drive letter.

"file://c:"

·

1.2. Parsers

The EOF code point#eof-code-pointReferenced in: is a conceptual code point that signifies the end of a string or code point stream.

A pointer#pointerReferenced in: for a string input is an integer that points to a code point within input. Initially it points to the start of input. If it is −1 it points nowhere. If it is greater than or equal to input’s code point length, it points to the EOF code point.

When a pointer is used, c#cReferenced in: references the code point the pointer points to as long as it does not point nowhere. When the pointer points to nowhere c cannot be used.

When a pointer is used, remaining#remainingReferenced in: references the code point substring from the pointer + 1 to the end of the string, as long as c is not the EOF code point. When c is the EOF code point remaining cannot be used.

If "mailto:username@example" is a string being processed and a pointer points to @, c is U+0040 (@) and remaining is "example".

If the empty string is being processed and a pointer points to the start and is then decreased by 1, using c or remaining would be an error.

1.3. Percent-encoded bytes

A percent-encoded byte#percent-encoded-byteReferenced in: is U+0025 (%), followed by two ASCII hex digits.

It is generally a good idea for sequences of percent-encoded bytes to be such that, when percent-decoded and then passed to UTF-8 decode without BOM or fail, they do not end up as failure. How important this is depends on where the percent-encoded bytes are used. E.g., for the host parser not following this advice is fatal, whereas for URL rendering the percent-encoded bytes would not be rendered percent-decoded.

To percent-encode#percent-encodeReferenced in: a byte byte, return a string consisting of U+0025 (%), followed by two ASCII upper hex digits representing byte.

To percent-decode#percent-decodeReferenced in: a byte sequence input, run these steps:

Using anything but UTF-8 decode without BOM when input contains bytes that are not ASCII bytes might be insecure and is not recommended.

  1. Let output be an empty byte sequence.

  2. For each byte byte in input:

    1. If byte is not 0x25 (%), then append byte to output.

    2. Otherwise, if byte is 0x25 (%) and the next two bytes after byte in input are not in the ranges 0x30 (0) to 0x39 (9), 0x41 (A) to 0x46 (F), and 0x61 (a) to 0x66 (f), all inclusive, append byte to output.

    3. Otherwise:

      1. Let bytePoint be the two bytes after byte in input, decoded, and then interpreted as hexadecimal number.

      2. Append a byte whose value is bytePoint to output.

      3. Skip the next two bytes in input.

  3. Return output.

To percent-decode#string-percent-decodeReferenced in: a scalar value string input:

  1. Let bytes be the UTF-8 encoding of input.

  2. Return the percent-decoding of bytes.

In general, percent-encoding results in a string with more U+0025 (%) code points than the input, and percent-decoding results in a byte sequence with less 0x25 (%) bytes than the input.


The C0 control percent-encode set#c0-control-percent-encode-setReferenced in: are the C0 controls and all code points greater than U+007E (~).

The fragment percent-encode set#fragment-percent-encode-setReferenced in: is the C0 control percent-encode set and U+0020 SPACE, U+0022 ("), U+003C (<), U+003E (>), and U+0060 (`).

The query percent-encode set#query-percent-encode-setReferenced in: is the C0 control percent-encode set and U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), and U+003E (>).

The query percent-encode set cannot be defined in terms of the fragment percent-encode set due to the omission of U+0060 (`).

The special-query percent-encode set#special-query-percent-encode-setReferenced in: is the query percent-encode set and U+0027 (').

The path percent-encode set#path-percent-encode-setReferenced in: is the query percent-encode set and U+003F (?), U+0060 (`), U+007B ({), and U+007D (}).

The userinfo percent-encode set#userinfo-percent-encode-setReferenced in: is the path percent-encode set and U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+0040 (@), U+005B ([) to U+005E (^), inclusive, and U+007C (|).

The component percent-encode set#component-percent-encode-setReferenced in: is the userinfo percent-encode set and U+0024 ($) to U+0026 (&), inclusive, U+002B (+), and U+002C (,).

This is used by HTML for registerProtocolHandler(), and could also be used by other standards to percent-encode data that can then be embedded in a URL’s path, query, or fragment; or in an opaque host. Using it with UTF-8 percent-encode gives identical results to JavaScript’s encodeURIComponent() [sic]. [HTML] [ECMA-262]

The application/x-www-form-urlencoded percent-encode set#application-x-www-form-urlencoded-percent-encode-setReferenced in: is the component percent-encode set and U+0021 (!), U+0027 (') to U+0029 RIGHT PARENTHESIS, inclusive, and U+007E (~).

The application/x-www-form-urlencoded percent-encode set contains all code points, except the ASCII alphanumeric, U+002A (*), U+002D (-), U+002E (.), and U+005F (_).

To percent-encode after encoding#string-percent-encode-after-encodingReferenced in:, given an encoding encoding, scalar value string input, a percentEncodeSet, and an optional boolean spaceAsPlus (default false):

  1. Let encoder be the result of getting an encoder from encoding.

  2. Let inputQueue be input converted to an I/O queue.

  3. Let output be the empty string.

  4. Let potentialError be 0.

    This needs to be a non-null value to initiate the subsequent while loop.

  5. While potentialError is non-null:

    1. Let encodeOutput be an empty I/O queue.

    2. Set potentialError to the result of running encode or fail with inputQueue, encoder, and encodeOutput.

    3. For each byte of encodeOutput converted to a byte sequence:

      1. If spaceAsPlus is true and byte is 0x20 (SP), then append U+002B (+) to output and continue.

      2. Let isomorph be a code point whose value is byte’s value.

      3. Assert: percentEncodeSet includes all non-ASCII code points.

      4. If isomorph is not in percentEncodeSet, then append isomorph to output.

      5. Otherwise, percent-encode byte and append the result to output.

    4. If potentialError is non-null, then append "%26%23", followed by the shortest sequence of ASCII digits representing potentialError in base ten, followed by "%3B", to output.

      This can happen when encoding is not UTF-8.

  6. Return output.

Of the possible values for the percentEncodeSet argument only two end up encoding U+0025 (%) and thus give “roundtripable data”: component percent-encode set and application/x-www-form-urlencoded percent-encode set. The other values for the percentEncodeSet argument — which happen to be used by the URL parser — leave U+0025 (%) untouched and as such it needs to be percent-encoded first in order to be properly represented.

To UTF-8 percent-encode#utf-8-percent-encodeReferenced in: a scalar value scalarValue using a percentEncodeSet, return the result of running percent-encode after encoding with UTF-8, scalarValue as a string, and percentEncodeSet.

To UTF-8 percent-encode#string-utf-8-percent-encodeReferenced in: a scalar value string input using a percentEncodeSet, return the result of running percent-encode after encoding with UTF-8, input, and percentEncodeSet.


Here is a summary, by way of example, of the operations defined above:

Operation Input Output
Percent-encode input 0x23 "%23"
0x7F "%7F"
Percent-decode input `%25%s%1G` `%%s%1G`
Percent-decode input "‽%25%2E" 0xE2 0x80 0xBD 0x25 0x2E
Percent-encode after encoding with Shift_JIS, input, and the userinfo percent-encode set "" "%20"
"" "%81%DF"
"" "%26%238253%3B"
Percent-encode after encoding with ISO-2022-JP, input, and the userinfo percent-encode set "¥" "%1B(J\%1B(B"
Percent-encode after encoding with Shift_JIS, input, the userinfo percent-encode set, and true "1+1 ≡ 2%20‽" "1+1+%81%DF+2%20%26%238253%3B"
UTF-8 percent-encode input using the userinfo percent-encode set U+2261 (≡) "%E2%89%A1"
U+203D (‽) "%E2%80%BD"
UTF-8 percent-encode input using the userinfo percent-encode set "Say what‽" "Say%20what%E2%80%BD"

2. Security considerations

The security of a URL is a function of its environment. Care is to be taken when rendering, interpreting, and passing URLs around.

When rendering and allocating new URLs "spoofing" needs to be considered. An attack whereby one host or URL can be confused for another. For instance, consider how 1/l/I, m/rn/rri, 0/O, and а/a can all appear eerily similar. Or worse, consider how U+202A LEFT-TO-RIGHT EMBEDDING and similar code points are invisible. [UTR36]

When passing a URL from party A to B, both need to carefully consider what is happening. A might end up leaking data it does not want to leak. B might receive input it did not expect and take an action that harms the user. In particular, B should never trust A, as at some point URLs from A can come from untrusted sources.

3. Hosts (domains and IP addresses)

At a high level, a host, valid host string, host parser, and host serializer relate as follows:

A parse-serialize roundtrip gives the following results, depending on the isNotSpecial argument to the host parser:

Input Output (isNotSpecial = false) Output (isNotSpecial = true)
EXAMPLE.COM example.com (domain) EXAMPLE.COM (opaque host)
example%2Ecom example%2Ecom (opaque host)
faß.example xn--fa-hia.example (domain) fa%C3%9F.example (opaque host)
0 0.0.0.0 (IPv4) 0 (opaque host)
%30 %30 (opaque host)
0x 0x (opaque host)
0xffffffff 255.255.255.255 (IPv4) 0xffffffff (opaque host)
[0:0::1] [::1] (IPv6)
[0:0::1%5D Failure
[0:0::%31]
09 Failure 09 (opaque host)
example.255 example.255 (opaque host)
example^example Failure

3.1. Host representation

A host#concept-hostReferenced in: is a domain, an IP address, an opaque host, or an empty host. Typically a host serves as a network address, but it is sometimes used as opaque identifier in URLs where a network address is not necessary.

A typical URL whose host is an opaque host is git://github.com/whatwg/url.git.

The RFCs referenced in the paragraphs below are for informative purposes only. They have no influence on host writing, parsing, and serialization. Unless stated otherwise in the sections that follow.

A domain#concept-domainReferenced in: is a non-empty ASCII string that identifies a realm within a network. [RFC1034]

The domain labels#domain-labelReferenced in: of a domain domain are the result of strictly splitting domain on U+002E (.).

The example.com and example.com. domains are not equivalent and typically treated as distinct.

An IP address#ip-addressReferenced in: is an IPv4 address or an IPv6 address.

An IPv4 address#concept-ipv4Referenced in: is a 32-bit unsigned integer that identifies a network address. [RFC791]

An IPv6 address#concept-ipv6Referenced in: is a 128-bit unsigned integer that identifies a network address. For the purposes of this standard it is represented as a list of eight 16-bit unsigned integers, also known as IPv6 pieces#concept-ipv6-pieceReferenced in:. [RFC4291]

Support for <zone_id> is intentionally omitted.

An opaque host#opaque-hostReferenced in: is a non-empty ASCII string that can be used for further processing.

An empty host#empty-hostReferenced in: is the empty string.

3.2. Host miscellaneous

A forbidden host code point#forbidden-host-code-pointReferenced in: is U+0000 NULL, U+0009 TAB, U+000A LF, U+000D CR, U+0020 SPACE, U+0023 (#), U+002F (/), U+003A (:), U+003C (<), U+003E (>), U+003F (?), U+0040 (@), U+005B ([), U+005C (\), U+005D (]), U+005E (^), or U+007C (|).

A forbidden domain code point#forbidden-domain-code-pointReferenced in: is a forbidden host code point, a C0 control, U+0025 (%), or U+007F DELETE.

To obtain the public suffix#host-public-suffixReferenced in: of a host host, run these steps. They return null or a domain representing a portion of host that is included on the Public Suffix List. [PSL]

  1. If host is not a domain, then return null.

  2. Let trailingDot be "." if host ends with "."; otherwise the empty string.

  3. Let publicSuffix be the public suffix determined by running the Public Suffix List algorithm with host as domain. [PSL]

  4. Assert: publicSuffix is an ASCII string that does not end with ".".

  5. Return publicSuffix and trailingDot concatenated.

To obtain the registrable domain#host-registrable-domainReferenced in: of a host host, run these steps. They return null or a domain formed by host’s public suffix and the domain label preceding it, if any.

  1. If host’s public suffix is null or host’s public suffix equals host, then return null.

  2. Let trailingDot be "." if host ends with "."; otherwise the empty string.

  3. Let registrableDomain be the registrable domain determined by running the Public Suffix List algorithm with host as domain. [PSL]

  4. Assert: registrableDomain is an ASCII string that does not end with ".".

  5. Return registrableDomain and trailingDot concatenated.

Host input Public suffix Registrable domain
com com null
example.com com example.com
www.example.com com example.com
sub.www.example.com com example.com
EXAMPLE.COM com example.com
example.com. com. example.com.
github.io github.io null
whatwg.github.io github.io whatwg.github.io
إختبار xn--kgbechtv null
example.إختبار xn--kgbechtv example.xn--kgbechtv
sub.example.إختبار xn--kgbechtv example.xn--kgbechtv
[2001:0db8:85a3:0000:0000:8a2e:0370:7334] null null

Specifications should prefer the origin concept for security decisions. The notion of "public suffix" and "registrable domain" cannot be relied-upon to provide a hard security boundary, as the public suffix list will diverge from client to client. Specifications which ignore this advice are encouraged to carefully consider whether URLs' schemes ought to be incorporated into any decisions made, i.e. whether to use the same site or schemelessly same site concepts.

3.3. IDNA

The domain to ASCII#concept-domain-to-asciiReferenced in: algorithm, given a string domain and a boolean beStrict, runs these steps:

  1. Let result be the result of running Unicode ToASCII with domain_name set to domain, UseSTD3ASCIIRules set to beStrict, CheckHyphens set to false, CheckBidi set to true, CheckJoiners set to true, Transitional_Processing set to false, and VerifyDnsLength set to beStrict. [UTS46]

    If beStrict is false, domain is an ASCII string, and strictly splitting domain on U+002E (.) does not produce any item that starts with an ASCII case-insensitive match for "xn--", this step is equivalent to ASCII lowercasing domain.

  2. If result is a failure value, domain-to-ASCII validation error, return failure.

  3. If result is the empty string, domain-to-ASCII validation error, return failure.

  4. Return result.

This document and the web platform at large use Unicode IDNA Compatibility Processing and not IDNA2008. For instance, ☕.example becomes xn--53h.example and not failure. [UTS46] [RFC5890]

The domain to Unicode#concept-domain-to-unicodeReferenced in: algorithm, given a domain domain and a boolean beStrict, runs these steps:

  1. Let result be the result of running Unicode ToUnicode with domain_name set to domain, CheckHyphens set to false, CheckBidi set to true, CheckJoiners set to true, UseSTD3ASCIIRules set to beStrict, and Transitional_Processing set to false. [UTS46]

  2. Signify domain-to-Unicode validation errors for any returned errors, and then, return result.

3.4. Host writing

A valid host string#valid-host-stringReferenced in: must be a valid domain string, a valid IPv4-address string, or: U+005B ([), followed by a valid IPv6-address string, followed by U+005D (]).

A domain is a valid domain#valid-domainReferenced in: if these steps return success:

  1. Let result be the result of running domain to ASCII with domain and true.

  2. If result is failure, then return failure.

  3. Set result to the result of running domain to Unicode with result and true.

  4. If result contains any errors, return failure.

  5. Return success.

Ideally we define this in terms of a sequence of code points that make up a valid domain rather than through a whack-a-mole: issue 245.

A valid domain string#valid-domain-stringReferenced in: must be a string that is a valid domain.

A valid IPv4-address string#valid-ipv4-address-stringReferenced in: must be four shortest possible strings of ASCII digits, representing a decimal number in the range 0 to 255, inclusive, separated from each other by U+002E (.).

A valid IPv6-address string#valid-ipv6-address-stringReferenced in: is defined in the "Text Representation of Addresses" chapter of IP Version 6 Addressing Architecture. [RFC4291]

A valid opaque-host string#valid-opaque-host-stringReferenced in: must be one of the following:

This is not part of the definition of valid host string as it requires context to be distinguished.

3.5. Host parsing

The host parser#concept-host-parserReferenced in: takes a scalar value string input with an optional boolean isNotSpecial (default false), and then runs these steps. They return failure or a host.

  1. If input starts with U+005B ([), then:

    1. If input does not end with U+005D (]), IPv6-unclosed validation error, return failure.

    2. Return the result of IPv6 parsing input with its leading U+005B ([) and trailing U+005D (]) removed.

  2. If isNotSpecial is true, then return the result of opaque-host parsing input.

  3. Assert: input is not the empty string.

  4. Let domain be the result of running UTF-8 decode without BOM on the percent-decoding of input.

    Alternatively UTF-8 decode without BOM or fail can be used, coupled with an early return for failure, as domain to ASCII fails on U+FFFD (�).

  5. Let asciiDomain be the result of running domain to ASCII with domain and false.

  6. If asciiDomain is failure, then return failure.

  7. If asciiDomain contains a forbidden domain code point, domain-invalid-code-point validation error, return failure.

  8. If asciiDomain ends in a number, then return the result of IPv4 parsing asciiDomain.

  9. Return asciiDomain.


The ends in a number checker#ends-in-a-number-checkerReferenced in: takes an ASCII string input and then runs these steps. They return a boolean.

  1. Let parts be the result of strictly splitting input on U+002E (.).

  2. If the last item in parts is the empty string, then:

    1. If parts’s size is 1, then return false.

    2. Remove the last item from parts.

  3. Let last be the last item in parts.

  4. If last is non-empty and contains only ASCII digits, then return true.

    The erroneous input "09" will be caught by the IPv4 parser at a later stage.

  5. If parsing last as an IPv4 number does not return failure, then return true.

    This is equivalent to checking that last is "0X" or "0x", followed by zero or more ASCII hex digits.

  6. Return false.

The IPv4 parser#concept-ipv4-parserReferenced in: takes an ASCII string input and then runs these steps. They return failure or an IPv4 address.

The IPv4 parser is not to be invoked directly. Instead check that the return value of the host parser is an IPv4 address.

  1. Let parts be the result of strictly splitting input on U+002E (.).

  2. If the last item in parts is the empty string, then:

    1. IPv4-empty-part validation error.

    2. If parts’s size is greater than 1, then remove the last item from parts.

  3. If parts’s size is greater than 4, IPv4-too-many-parts validation error, return failure.

  4. Let numbers be an empty list.

  5. For each part of parts:

    1. Let result be the result of parsing part.

    2. If result is failure, IPv4-non-numeric-part validation error, return failure.

    3. If result[1] is true, IPv4-non-decimal-part validation error.

    4. Append result[0] to numbers.

  6. If any item in numbers is greater than 255, IPv4-out-of-range-part validation error.

  7. If any but the last item in numbers is greater than 255, then return failure.

  8. If the last item in numbers is greater than or equal to 256(5 − numbers’s size), then return failure.

  9. Let ipv4 be the last item in numbers.

  10. Remove the last item from numbers.

  11. Let counter be 0.

  12. For each n of numbers:

    1. Increment ipv4 by n × 256(3 − counter).

    2. Increment counter by 1.

  13. Return ipv4.

The IPv4 number parser#ipv4-number-parserReferenced in: takes an ASCII string input and then runs these steps. They return failure or a tuple of a number and a boolean.

  1. If input is the empty string, then return failure.

  2. Let validationError be false.

  3. Let R be 10.

  4. If input contains at least two code points and the first two code points are either "0X" or "0x", then:

    1. Set validationError to true.

    2. Remove the first two code points from input.

    3. Set R to 16.

  5. Otherwise, if input contains at least two code points and the first code point is U+0030 (0), then:

    1. Set validationError to true.

    2. Remove the first code point from input.

    3. Set R to 8.

  6. If input is the empty string, then return (0, true).

  7. If input contains a code point that is not a radix-R digit, then return failure.

  8. Let output be the mathematical integer value that is represented by input in radix-R notation, using ASCII hex digits for digits with values 0 through 15.

  9. Return (output, validationError).


The IPv6 parser#concept-ipv6-parserReferenced in: takes a scalar value string input and then runs these steps. They return failure or an IPv6 address.

The IPv6 parser could in theory be invoked directly, but please discuss actually doing that with the editors of this document first.

  1. Let address be a new IPv6 address whose IPv6 pieces are all 0.

  2. Let pieceIndex be 0.

  3. Let compress be null.

  4. Let pointer be a pointer for input.

  5. If c is U+003A (:), then:

    1. If remaining does not start with U+003A (:), IPv6-invalid-compression validation error, return failure.

    2. Increase pointer by 2.

    3. Increase pieceIndex by 1 and then set compress to pieceIndex.

  6. While c is not the EOF code point:

    1. If pieceIndex is 8, IPv6-too-many-pieces validation error, return failure.

    2. If c is U+003A (:), then:

      1. If compress is non-null, IPv6-multiple-compression validation error, return failure.

      2. Increase pointer and pieceIndex by 1, set compress to pieceIndex, and then continue.
    3. Let value and length be 0.

    4. While length is less than 4 and c is an ASCII hex digit, set value to value × 0x10 + c interpreted as hexadecimal number, and increase pointer and length by 1.

    5. If c is U+002E (.), then:

      1. If length is 0, IPv4-in-IPv6-invalid-code-point validation error, return failure.

      2. Decrease pointer by length.

      3. If pieceIndex is greater than 6, IPv4-in-IPv6-too-many-pieces validation error, return failure.

      4. Let numbersSeen be 0.

      5. While c is not the EOF code point:

        1. Let ipv4Piece be null.

        2. If numbersSeen is greater than 0, then:

          1. If c is a U+002E (.) and numbersSeen is less than 4, then increase pointer by 1.

          2. Otherwise, IPv4-in-IPv6-invalid-code-point validation error, return failure.
        3. If c is not an ASCII digit, IPv4-in-IPv6-invalid-code-point validation error, return failure.

        4. While c is an ASCII digit:

          1. Let number be c interpreted as decimal number.

          2. If ipv4Piece is null, then set ipv4Piece to number.

            Otherwise, if ipv4Piece is 0, IPv4-in-IPv6-invalid-code-point validation error, return failure.

            Otherwise, set ipv4Piece to ipv4Piece × 10 + number.

          3. If ipv4Piece is greater than 255, IPv4-in-IPv6-out-of-range-part validation error, return failure.

          4. Increase pointer by 1.

        5. Set address[pieceIndex] to address[pieceIndex] × 0x100 + ipv4Piece.

        6. Increase numbersSeen by 1.

        7. If numbersSeen is 2 or 4, then increase pieceIndex by 1.

      6. If numbersSeen is not 4, IPv4-in-IPv6-too-few-parts validation error, return failure.

      7. Break.

    6. Otherwise, if c is U+003A (:):

    7. Otherwise, if c is not the EOF code point, IPv6-invalid-code-point validation error, return failure.

    8. Set address[pieceIndex] to value.

    9. Increase pieceIndex by 1.

  7. If compress is non-null, then:

    1. Let swaps be pieceIndex − compress.

    2. Set pieceIndex to 7.

    3. While pieceIndex is not 0 and swaps is greater than 0, swap address[pieceIndex] with address[compress + swaps − 1], and then decrease both pieceIndex and swaps by 1.

  8. Otherwise, if compress is null and pieceIndex is not 8, IPv6-too-few-pieces validation error, return failure.

  9. Return address.


3.6. Host serializing

The host serializer#concept-host-serializerReferenced in: takes a host host and then runs these steps. They return an ASCII string.

  1. If host is an IPv4 address, return the result of running the IPv4 serializer on host.

  2. Otherwise, if host is an IPv6 address, return U+005B ([), followed by the result of running the IPv6 serializer on host, followed by U+005D (]).

  3. Otherwise, host is a domain, opaque host, or empty host, return host.

The IPv4 serializer#concept-ipv4-serializerReferenced in: takes an IPv4 address address and then runs these steps. They return an ASCII string.

  1. Let output be the empty string.

  2. Let n be the value of address.

  3. For each i in the range 1 to 4, inclusive:

    1. Prepend n % 256, serialized, to output.

    2. If i is not 4, then prepend U+002E (.) to output.

    3. Set n to floor(n / 256).

  4. Return output.

The IPv6 serializer#concept-ipv6-serializerReferenced in: takes an IPv6 address address and then runs these steps. They return an ASCII string.

  1. Let output be the empty string.

  2. Let compress be an index to the first IPv6 piece in the first longest sequences of address’s IPv6 pieces that are 0.

    In 0:f:0:0:f:f:0:0 it would point to the second 0.

  3. If there is no sequence of address’s IPv6 pieces that are 0 that is longer than 1, then set compress to null.

  4. Let ignore0 be false.

  5. For each pieceIndex in the range 0 to 7, inclusive:

    1. If ignore0 is true and address[pieceIndex] is 0, then continue.

    2. Otherwise, if ignore0 is true, set ignore0 to false.

    3. If compress is pieceIndex, then:

      1. Let separator be "::" if pieceIndex is 0, and U+003A (:) otherwise.

      2. Append separator to output.

      3. Set ignore0 to true and continue.

    4. Append address[pieceIndex], represented as the shortest possible lowercase hexadecimal number, to output.

    5. If pieceIndex is not 7, then append U+003A (:) to output.

  6. Return output.

This algorithm requires the recommendation from A Recommendation for IPv6 Address Text Representation. [RFC5952]

3.7. Host equivalence

To determine whether a host A equals#concept-host-equalsReferenced in:host B, return true if A is B, and false otherwise.

Certificate comparison requires a host equivalence check that ignores the trailing dot of a domain (if any). However, those hosts have also various other facets enforced, such as DNS length, that are not enforced here, as URLs do not enforce them. If anyone has a good suggestion for how to bring these two closer together, or what a good unified model would be, please file an issue.

4. URLs

At a high level, a URL, valid URL string, URL parser, and URL serializer relate as follows:

Input Base Valid Output
https:example.org https://example.org/
https://////example.com/// https://example.com///
https://example.com/././foo https://example.com/foo
hello:world https://example.com/ hello:world
https:example.org https://example.com/ https://example.com/example.org
\example\..\demo/.\ https://example.com/ https://example.com/demo/
example https://example.com/demo https://example.com/example
file:///C|/demo file:///C:/demo
.. file:///C:/demo file:///C:/
file://loc%61lhost/ file:///
https://user:[email protected]/ https://user:[email protected]/
https://example.org/foo bar https://example.org/foo%20bar
https://EXAMPLE.com/../x https://example.com/x
https://ex ample.org/ Failure
example ❌, due to lack of base Failure
https://example.com:demo Failure
http://[www.example.com]/ Failure
https://example.org// https://example.org//
https://example.com/[]?[]#[] https://example.com/[]?[]#[]
https://example/%?%#% https://example/%?%#%
https://example/%25?%25#%25 https://example/%25?%25#%25

The base and output URL are represented in serialized form for brevity.

4.1. URL representation

A URL#concept-urlReferenced in: is a struct that represents a universal identifier. To disambiguate from a valid URL string it can also be referred to as a URL record.

A URL’s scheme#concept-url-schemeReferenced in: is an ASCII string that identifies the type of URL and can be used to dispatch a URL for further processing after parsing. It is initially the empty string.

A URL’s username#concept-url-usernameReferenced in: is an ASCII string identifying a username. It is initially the empty string.

A URL’s password#concept-url-passwordReferenced in: is an ASCII string identifying a password. It is initially the empty string.

A URL’s host#concept-url-hostReferenced in: is null or a host. It is initially null.

The following table lists allowed URL’s scheme / host combinations.

A URL’s port#concept-url-portReferenced in: is either null or a 16-bit unsigned integer that identifies a networking port. It is initially null.

A URL’s path#concept-url-pathReferenced in: is either a URL path segment or a list of zero or more URL path segments, usually identifying a location. It is initially « ».

A special URL’s path is always a list, i.e., it is never opaque.

A URL’s query#concept-url-queryReferenced in: is either null or an ASCII string. It is initially null.

A URL’s fragment#concept-url-fragmentReferenced in: is either null or an ASCII string that can be used for further processing on the resource the URL’s other components identify. It is initially null.

A URL also has an associated blob URL entry#concept-url-blob-entryReferenced in: that is either null or a blob URL entry. It is initially null.

This is used to support caching the object a "blob" URL refers to as well as its origin. It is important that these are cached as the URL might be removed from the blob URL store between parsing and fetching, while fetching will still need to succeed.

The following table lists how valid URL strings, when parsed, map to a URL’s components. Username, password, and blob URL entry are omitted; in the examples below they are the empty string, the empty string, and null, respectively.

Input Scheme Host Port Path Query Fragment
https://example.com/ "https" "example.com" null « the empty string » null null
https://localhost:8000/search?q=text#hello "https" "localhost" 8000 « "search" » "q=text" "hello"
urn:isbn:9780307476463 "urn" null null "isbn:9780307476463" null null
file:///ada/Analytical%20Engine/README.md "file" null null « "ada", "Analytical%20Engine", "README.md" » null null

A URL path segment#url-path-segmentReferenced in: is an ASCII string. It commonly refers to a directory or a file, but has no predefined meaning.

A single-dot URL path segment#single-dot-path-segmentReferenced in: is a URL path segment that is "." or an ASCII case-insensitive match for "%2e".

A double-dot URL path segment#double-dot-path-segmentReferenced in: is a URL path segment that is ".." or an ASCII case-insensitive match for ".%2e", "%2e.", or "%2e%2e".

4.2. URL miscellaneous

A special scheme#special-schemeReferenced in: is an ASCII string that is listed in the first column of the following table. The default port#default-portReferenced in: for a special scheme is listed in the second column on the same row. The default port for any other ASCII string is null.

Special scheme Default port
"ftp" 21
"file" null
"http" 80
"https" 443
"ws" 80
"wss" 443

A URL is special#is-specialReferenced in: if its scheme is a special scheme. A URL is not special#is-not-specialReferenced in: if its scheme is not a special scheme.

A URL includes credentials#include-credentialsReferenced in: if its username or password is not the empty string.

A URL has an opaque path#url-opaque-pathReferenced in: if its path is a URL path segment.

A URL cannot have a username/password/port#cannot-have-a-username-password-portReferenced in: if its host is null or the empty string, or its scheme is "file".

A URL can be designated as base URL#concept-base-urlReferenced in:.

A base URL is useful for the URL parser when the input might be a relative-URL string.


A Windows drive letter#windows-drive-letterReferenced in: is two code points, of which the first is an ASCII alpha and the second is either U+003A (:) or U+007C (|).

A normalized Windows drive letter#normalized-windows-drive-letterReferenced in: is a Windows drive letter of which the second code point is U+003A (:).

As per the URL writing section, only a normalized Windows drive letter is conforming.

A string starts with a Windows drive letter#start-with-a-windows-drive-letterReferenced in: if all of the following are true:

  • its length is greater than or equal to 2
  • its first two code points are a Windows drive letter
  • its length is 2 or its third code point is U+002F (/), U+005C (\), U+003F (?), or U+0023 (#).
String Starts with a Windows drive letter
"c:"
"c:/"
"c:a"

To shorten a url’s path#shorten-a-urls-pathReferenced in::

  1. Assert: url does not have an opaque path.

  2. Let path be url’s path.

  3. If url’s scheme is "file", path’s size is 1, and path[0] is a normalized Windows drive letter, then return.

  4. Remove path’s last item, if any.

4.3. URL writing

A valid URL string#valid-url-stringReferenced in: must be either a relative-URL-with-fragment string or an absolute-URL-with-fragment string.

An absolute-URL-with-fragment string#absolute-url-with-fragment-stringReferenced in: must be an absolute-URL string, optionally followed by U+0023 (#) and a URL-fragment string.

An absolute-URL string#absolute-url-stringReferenced in: must be one of the following:

any optionally followed by U+003F (?) and a URL-query string.

A URL-scheme string#url-scheme-stringReferenced in: must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.). Schemes should be registered in the IANA URI [sic] Schemes registry. [IANA-URI-SCHEMES] [RFC7595]

A relative-URL-with-fragment string#relative-url-with-fragment-stringReferenced in: must be a relative-URL string, optionally followed by U+0023 (#) and a URL-fragment string.

A relative-URL string#relative-url-stringReferenced in: must be one of the following, switching on base URL’s scheme:

A special scheme that is not "file"

a scheme-relative-special-URL string

a path-absolute-URL string

a path-relative-scheme-less-URL string

"file"

a scheme-relative-file-URL string

a path-absolute-URL string if base URL’s host is an empty host

a path-absolute-non-Windows-file-URL string if base URL’s host is not an empty host

a path-relative-scheme-less-URL string

Otherwise

a scheme-relative-URL string

a path-absolute-URL string

a path-relative-scheme-less-URL string

any optionally followed by U+003F (?) and a URL-query string.

A non-null base URL is necessary when parsing a relative-URL string.

A scheme-relative-special-URL string#scheme-relative-special-url-stringReferenced in: must be "//", followed by a valid host string, optionally followed by U+003A (:) and a URL-port string, optionally followed by a path-absolute-URL string.

A URL-port string#url-port-stringReferenced in: must be one of the following:

  • the empty string

  • one or more ASCII digits representing a decimal number no greater than 216 − 1.

A scheme-relative-URL string#scheme-relative-url-stringReferenced in: must be "//", followed by an opaque-host-and-port string, optionally followed by a path-absolute-URL string.

An opaque-host-and-port string#opaque-host-and-port-stringReferenced in: must be either the empty string or: a valid opaque-host string, optionally followed by U+003A (:) and a URL-port string.

A scheme-relative-file-URL string#scheme-relative-file-url-stringReferenced in: must be "//", followed by one of the following:

A path-absolute-URL string#path-absolute-url-stringReferenced in: must be U+002F (/) followed by a path-relative-URL string.

A path-absolute-non-Windows-file-URL string#path-absolute-non-windows-file-url-stringReferenced in: must be a path-absolute-URL string that does not start with: U+002F (/), followed by a Windows drive letter, followed by U+002F (/).

A path-relative-URL string#path-relative-url-stringReferenced in: must be zero or more URL-path-segment strings, separated from each other by U+002F (/), and not start with U+002F (/).

A path-relative-scheme-less-URL string#path-relative-scheme-less-url-stringReferenced in: must be a path-relative-URL string that does not start with: a URL-scheme string, followed by U+003A (:).

A URL-path-segment string#url-path-segment-stringReferenced in: must be one of the following:

A URL-query string#url-query-stringReferenced in: must be zero or more URL units.

A URL-fragment string#url-fragment-stringReferenced in: must be zero or more URL units.

The URL code points#url-code-pointsReferenced in: are ASCII alphanumeric, U+0021 (!), U+0024 ($), U+0026 (&), U+0027 ('), U+0028 LEFT PARENTHESIS, U+0029 RIGHT PARENTHESIS, U+002A (*), U+002B (+), U+002C (,), U+002D (-), U+002E (.), U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+003F (?), U+0040 (@), U+005F (_), U+007E (~), and code points in the range U+00A0 to U+10FFFD, inclusive, excluding surrogates and noncharacters.

Code points greater than U+007F DELETE will be converted to percent-encoded bytes by the URL parser.

In HTML, when the document encoding is a legacy encoding, code points in the URL-query string that are higher than U+007F DELETE will be converted to percent-encoded bytes using the document’s encoding. This can cause problems if a URL that works in one document is copied to another document that uses a different document encoding. Using the UTF-8 encoding everywhere solves this problem.

For example, consider this HTML document:

Since the document encoding is windows-1252, the link’s URL’s query will be "sm%F6rg%E5sbord". If the document encoding had been UTF-8, it would instead be "sm%C3%B6rg%C3%A5sbord".

The URL units#url-unitsReferenced in: are URL code points and percent-encoded bytes.

Percent-encoded bytes can be used to encode code points that are not URL code points or are excluded from being written.


There is no way to express a username or password of a URL record within a valid URL string.

4.4. URL parsing

The URL parser#concept-url-parserReferenced in: takes a scalar value string input, with an optional null or base URL base (default null) and an optional encoding encoding (default UTF-8), and then runs these steps:

Non-web-browser implementations only need to implement the basic URL parser.

How user input in the web browser’s address bar is converted to a URL record is out-of-scope of this standard. This standard does include URL rendering requirements as they pertain trust decisions.

  1. Let url be the result of running the basic URL parser on input with base and encoding.

  2. If url is failure, return failure.

  3. If url’s scheme is not "blob", return url.

  4. Set url’s blob URL entry to the result of resolving the blob URL url, if that did not return failure, and null otherwise.

  5. Return url.


The basic URL parser#concept-basic-url-parserReferenced in: takes a scalar value string input, with an optional null or base URL base (default null), an optional encoding encoding (default UTF-8), an optional URL url#basic-url-parser-urlReferenced in:, and an optional state override state override#basic-url-parser-state-overrideReferenced in:, and then runs these steps:

The encoding argument is a legacy concept only relevant for HTML. The url and state override arguments are only for use by various APIs. [HTML]

When the url and state override arguments are not passed, the basic URL parser returns either a new URL or failure. If they are passed, the algorithm modifies the passed url and can terminate without returning anything.

  1. If url is not given:

    1. Set url to a new URL.

    2. If input contains any leading or trailing C0 control or space, invalid-URL-unit validation error.

    3. Remove any leading and trailing C0 control or space from input.

  2. If input contains any ASCII tab or newline, invalid-URL-unit validation error.

  3. Remove all ASCII tab or newline from input.

  4. Let state be state override if given, or scheme start state otherwise.

  5. Set encoding to the result of getting an output encoding from encoding.

  6. Let buffer be the empty string.

  7. Let atSignSeen, insideBrackets, and passwordTokenSeen be false.

  8. Let pointer be a pointer for input.

  9. Keep running the following state machine by switching on state. If after a run pointer points to the EOF code point, go to the next step. Otherwise, increase pointer by 1 and continue with the state machine.

    scheme start state#scheme-start-stateReferenced in:
    1. If c is an ASCII alpha, append c, lowercased, to buffer, and set state to scheme state.

    2. Otherwise, if state override is not given, set state to no scheme state and decrease pointer by 1.

    3. Otherwise, return failure.

      This indication of failure is used exclusively by the Location object’s protocol setter.

    scheme state#scheme-stateReferenced in:
    1. If c is an ASCII alphanumeric, U+002B (+), U+002D (-), or U+002E (.), append c, lowercased, to buffer.

    2. Otherwise, if c is U+003A (:), then:

      1. If state override is given, then:

        1. If url’s scheme is a special scheme and buffer is not a special scheme, then return.

        2. If url’s scheme is not a special scheme and buffer is a special scheme, then return.

        3. If url includes credentials or has a non-null port, and buffer is "file", then return.

        4. If url’s scheme is "file" and its host is an empty host, then return.

      2. Set url’s scheme to buffer.

      3. If state override is given, then:

        1. If url’s port is url’s scheme’s default port, then set url’s port to null.

        2. Return.

      4. Set buffer to the empty string.

      5. If url’s scheme is "file", then:

      6. Otherwise, if url is special, base is non-null, and base’s scheme is url’s scheme:

      7. Otherwise, if url is special, set state to special authority slashes state.

      8. Otherwise, if remaining starts with an U+002F (/), set state to path or authority state and increase pointer by 1.

      9. Otherwise, set url’s path to the empty string and set state to opaque path state.

    3. Otherwise, if state override is not given, set buffer to the empty string, state to no scheme state, and start over (from the first code point in input).

    4. Otherwise, return failure.

      This indication of failure is used exclusively by the Location object’s protocol setter. Furthermore, the non-failure termination earlier in this state is an intentional difference for defining that setter.

    no scheme state#no-scheme-stateReferenced in:
    1. If base is null, or base has an opaque path and c is not U+0023 (#), missing-scheme-non-relative-URL validation error, return failure.

    2. Otherwise, if base has an opaque path and c is U+0023 (#), set url’s scheme to base’s scheme, url’s path to base’s path, url’s query to base’s query, url’s fragment to the empty string, and set state to fragment state.

    3. Otherwise, if base’s scheme is not "file", set state to relative state and decrease pointer by 1.

    4. Otherwise, set state to file state and decrease pointer by 1.

    special relative or authority state#special-relative-or-authority-stateReferenced in:
    1. If c is U+002F (/) and remaining starts with U+002F (/), then set state to special authority ignore slashes state and increase pointer by 1.

    2. Otherwise, special-scheme-missing-following-solidus validation error, set state to relative state and decrease pointer by 1.

    path or authority state#path-or-authority-stateReferenced in:
    1. If c is U+002F (/), then set state to authority state.

    2. Otherwise, set state to path state, and decrease pointer by 1.

    relative state#relative-stateReferenced in:
    1. Assert: base’s scheme is not "file".

    2. Set url’s scheme to base’s scheme.

    3. If c is U+002F (/), then set state to relative slash state.

    4. Otherwise, if url is special and c is U+005C (\), invalid-reverse-solidus validation error, set state to relative slash state.

    5. Otherwise:

      1. Set url’s username to base’s username, url’s password to base’s password, url’s host to base’s host, url’s port to base’s port, url’s path to a clone of base’s path, and url’s query to base’s query.

      2. If c is U+003F (?), then set url’s query to the empty string, and state to query state.

      3. Otherwise, if c is U+0023 (#), set url’s fragment to the empty string and state to fragment state.

      4. Otherwise, if c is not the EOF code point:

        1. Set url’s query to null.

        2. Shorten url’s path.

        3. Set state to path state and decrease pointer by 1.

    relative slash state#relative-slash-stateReferenced in:
    1. If url is special and c is U+002F (/) or U+005C (\), then:

    2. Otherwise, if c is U+002F (/), then set state to authority state.

    3. Otherwise, set url’s username to base’s username, url’s password to base’s password, url’s host to base’s host, url’s port to base’s port, state to path state, and then, decrease pointer by 1.

    special authority slashes state#special-authority-slashes-stateReferenced in:special authority ignore slashes state#special-authority-ignore-slashes-stateReferenced in:
    1. If c is neither U+002F (/) nor U+005C (\), then set state to authority state and decrease pointer by 1.

    2. Otherwise, special-scheme-missing-following-solidus validation error.

    authority state#authority-stateReferenced in:
    1. If c is U+0040 (@), then:

      1. Invalid-credentials validation error.

      2. If atSignSeen is true, then prepend "%40" to buffer.

      3. Set atSignSeen to true.

      4. For each codePoint in buffer:

        1. If codePoint is U+003A (:) and passwordTokenSeen is false, then set passwordTokenSeen to true and continue.

        2. Let encodedCodePoints be the result of running UTF-8 percent-encode codePoint using the userinfo percent-encode set.

        3. If passwordTokenSeen is true, then append encodedCodePoints to url’s password.

        4. Otherwise, append encodedCodePoints to url’s username.

      5. Set buffer to the empty string.

    2. Otherwise, if one of the following is true:

      then:

      1. If atSignSeen is true and buffer is the empty string, invalid-credentials validation error, return failure.

      2. Decrease pointer by buffer’s code point length + 1, set buffer to the empty string, and set state to host state.

    3. Otherwise, append c to buffer.

    host state#host-stateReferenced in:hostname state#hostname-stateReferenced in:
    1. If state override is given and url’s scheme is "file", then decrease pointer by 1 and set state to file host state.

    2. Otherwise, if c is U+003A (:) and insideBrackets is false, then:

      1. If buffer is the empty string, host-missing validation error, return failure.

      2. If state override is given and state override is hostname state, then return.

      3. Let host be the result of host parsing buffer with url is not special.

      4. If host is failure, then return failure.

      5. Set url’s host to host, buffer to the empty string, and state to port state.

    3. Otherwise, if one of the following is true:

      then decrease pointer by 1, and then:

      1. If url is special and buffer is the empty string, host-missing validation error, return failure.

      2. Otherwise, if state override is given, buffer is the empty string, and either url includes credentials or url’s port is non-null, return.

      3. Let host be the result of host parsing buffer with url is not special.

      4. If host is failure, then return failure.

      5. Set url’s host to host, buffer to the empty string, and state to path start state.

      6. If state override is given, then return.

    4. Otherwise:

      1. If c is U+005B ([), then set insideBrackets to true.

      2. If c is U+005D (]), then set insideBrackets to false.

      3. Append c to buffer.

    port state#port-stateReferenced in:
    1. If c is an ASCII digit, append c to buffer.

    2. Otherwise, if one of the following is true:

      then:

      1. If buffer is not the empty string, then:

        1. Let port be the mathematical integer value that is represented by buffer in radix-10 using ASCII digits for digits with values 0 through 9.

        2. If port is greater than 216 − 1, port-out-of-range validation error, return failure.

        3. Set url’s port to null, if port is url’s scheme’s default port; otherwise to port.

        4. Set buffer to the empty string.

      2. If state override is given, then return.

      3. Set state to path start state and decrease pointer by 1.

    3. Otherwise, port-invalid validation error, return failure.

    file state#file-stateReferenced in:
    1. Set url’s scheme to "file".

    2. Set url’s host to the empty string.

    3. If c is U+002F (/) or U+005C (\), then:

    4. Otherwise, if base is non-null and base’s scheme is "file":

      1. Set url’s host to base’s host, url’s path to a clone of base’s path, and url’s query to base’s query.

      2. If c is U+003F (?), then set url’s query to the empty string and state to query state.

      3. Otherwise, if c is U+0023 (#), set url’s fragment to the empty string and state to fragment state.

      4. Otherwise, if c is not the EOF code point:

        1. Set url’s query to null.

        2. If the code point substring from pointer to the end of input does not start with a Windows drive letter, then shorten url’s path.

        3. Otherwise:

          This is a (platform-independent) Windows drive letter quirk.

        4. Set state to path state and decrease pointer by 1.

    5. Otherwise, set state to path state, and decrease pointer by 1.

    file slash state#file-slash-stateReferenced in:
    1. If c is U+002F (/) or U+005C (\), then:

    2. Otherwise:

      1. If base is non-null and base’s scheme is "file", then:

        1. Set url’s host to base’s host.

        2. If the code point substring from pointer to the end of input does not start with a Windows drive letter and base’s path[0] is a normalized Windows drive letter, then append base’s path[0] to url’s path.

          This is a (platform-independent) Windows drive letter quirk.

      2. Set state to path state, and decrease pointer by 1.

    file host state#file-host-stateReferenced in:
    1. If c is the EOF code point, U+002F (/), U+005C (\), U+003F (?), or U+0023 (#), then decrease pointer by 1 and then:

      1. If state override is not given and buffer is a Windows drive letter, file-invalid-Windows-drive-letter-host validation error, set state to path state.

        This is a (platform-independent) Windows drive letter quirk. buffer is not reset here and instead used in the path state.

      2. Otherwise, if buffer is the empty string, then:

        1. Set url’s host to the empty string.

        2. If state override is given, then return.

        3. Set state to path start state.

      3. Otherwise, run these steps:

        1. Let host be the result of host parsing buffer with url is not special.

        2. If host is failure, then return failure.

        3. If host is "localhost", then set host to the empty string.

        4. Set url’s host to host.

        5. If state override is given, then return.

        6. Set buffer to the empty string and state to path start state.

    2. Otherwise, append c to buffer.

    path start state#path-start-stateReferenced in:
    1. If url is special, then:

      1. If c is U+005C (\), invalid-reverse-solidus validation error.

      2. Set state to path state.

      3. If c is neither U+002F (/) nor U+005C (\), then decrease pointer by 1.

    2. Otherwise, if state override is not given and c is U+003F (?), set url’s query to the empty string and state to query state.

    3. Otherwise, if state override is not given and c is U+0023 (#), set url’s fragment to the empty string and state to fragment state.

    4. Otherwise, if c is not the EOF code point:

      1. Set state to path state.

      2. If c is not U+002F (/), then decrease pointer by 1.

    5. Otherwise, if state override is given and url’s host is null, append the empty string to url’s path.

    path state#path-stateReferenced in:
    1. If one of the following is true:

      • c is the EOF code point or U+002F (/)

      • url is special and c is U+005C (\)

      • state override is not given and c is U+003F (?) or U+0023 (#)

      then:

      1. If url is special and c is U+005C (\), invalid-reverse-solidus validation error.

      2. If buffer is a double-dot URL path segment, then:

        1. Shorten url’s path.

        2. If neither c is U+002F (/), nor url is special and c is U+005C (\), append the empty string to url’s path.

          This means that for input /usr/.. the result is / and not a lack of a path.

      3. Otherwise, if buffer is a single-dot URL path segment and if neither c is U+002F (/), nor url is special and c is U+005C (\), append the empty string to url’s path.

      4. Otherwise, if buffer is not a single-dot URL path segment, then:

        1. If url’s scheme is "file", url’s path is empty, and buffer is a Windows drive letter, then replace the second code point in buffer with U+003A (:).

          This is a (platform-independent) Windows drive letter quirk.

        2. Append buffer to url’s path.

      5. Set buffer to the empty string.

      6. If c is U+003F (?), then set url’s query to the empty string and state to query state.

      7. If c is U+0023 (#), then set url’s fragment to the empty string and state to fragment state.

    2. Otherwise, run these steps:

    opaque path state#cannot-be-a-base-url-path-stateReferenced in:
    1. If c is U+003F (?), then set url’s query to the empty string and state to query state.

    2. Otherwise, if c is U+0023 (#), then set url’s fragment to the empty string and state to fragment state.

    3. Otherwise:

    query state#query-stateReferenced in:
    1. If encoding is not UTF-8 and one of the following is true:

      then set encoding to UTF-8.

    2. If one of the following is true:

      then:

      1. Let queryPercentEncodeSet be the special-query percent-encode set if url is special; otherwise the query percent-encode set.

      2. Percent-encode after encoding, with encoding, buffer, and queryPercentEncodeSet, and append the result to url’s query.

        This operation cannot be invoked code-point-for-code-point due to the stateful ISO-2022-JP encoder.

      3. Set buffer to the empty string.

      4. If c is U+0023 (#), then set url’s fragment to the empty string and state to fragment state.

    3. Otherwise, if c is not the EOF code point:

    fragment state#fragment-stateReferenced in:
  10. Return url.


To set the username#set-the-usernameReferenced in: given a url and username, set url’s username to the result of running UTF-8 percent-encode on username using the userinfo percent-encode set.

To set the password#set-the-passwordReferenced in: given a url and password, set url’s password to the result of running UTF-8 percent-encode on password using the userinfo percent-encode set.

4.5. URL serializing

The URL serializer#concept-url-serializerReferenced in: takes a URL url, with an optional boolean exclude fragment#url-serializer-exclude-fragmentReferenced in: (default false), and then runs these steps. They return an ASCII string.

  1. Let output be url’s scheme and U+003A (:) concatenated.

  2. If url’s host is non-null:

    1. Append "//" to output.

    2. If url includes credentials, then:

      1. Append url’s username to output.

      2. If url’s password is not the empty string, then append U+003A (:), followed by url’s password, to output.

      3. Append U+0040 (@) to output.

    3. Append url’s host, serialized, to output.

    4. If url’s port is non-null, append U+003A (:) followed by url’s port, serialized, to output.

  3. If url’s host is null, url does not have an opaque path, url’s path’s size is greater than 1, and url’s path[0] is the empty string, then append U+002F (/) followed by U+002E (.) to output.

    This prevents web+demo:/.//not-a-host/ or web+demo:/path/..//not-a-host/, when parsed and then serialized, from ending up as web+demo://not-a-host/ (they end up as web+demo:/.//not-a-host/).

  4. Append the result of URL path serializing url to output.

  5. If url’s query is non-null, append U+003F (?), followed by url’s query, to output.

  6. If exclude fragment is false and url’s fragment is non-null, then append U+0023 (#), followed by url’s fragment, to output.

  7. Return output.

The URL path serializer#url-path-serializerReferenced in: takes a URL url and then runs these steps. They return an ASCII string.

  • If url has an opaque path, then return url’s path.

  • Let output be the empty string.

  • For each segment of url’s path: append U+002F (/) followed by segment to output.

  • Return output.

4.6. URL equivalence

To determine whether a URL A equals#concept-url-equalsReferenced in:URL B, with an optional boolean exclude fragments (default false), run these steps:

  1. Let serializedA be the result of serializing A, with exclude fragment set to exclude fragments.

  2. Let serializedB be the result of serializing B, with exclude fragment set to exclude fragments.

  3. Return true if serializedA is serializedB; otherwise false.

4.7. Origin

See origin’s definition in HTML for the necessary background information. [HTML]

The origin#concept-url-originReferenced in: of a URL url is the origin returned by running these steps, switching on url’s scheme:

"blob"

  1. If url’s blob URL entry is non-null, then return url’s blob URL entry’s environment’s origin.

  2. Let pathURL be the result of parsing the result of URL path serializing url.

  3. If pathURL is failure, then return a new opaque origin.

  4. If pathURL’s scheme is not "http" and not "https", then return a new opaque origin.

  5. Return pathURL’s origin.

The origin of blob:https://whatwg.org/d0360e2f-caee-469f-9a2f-87d5b0456f6f is the tuple origin ("https", "whatwg.org", null, null).

"ftp" "http" "https" "ws" "wss"

Return the tuple origin (url’s scheme, url’s host, url’s port, null).

"file"

Unfortunate as it is, this is left as an exercise to the reader. When in doubt, return a new opaque origin.

Otherwise

Return a new opaque origin.

This does indeed mean that these URLs cannot be same origin with themselves.

4.8. URL rendering

A URL should be rendered in its serialized form, with modifications described below, when the primary purpose of displaying a URL is to have the user make a security or trust decision. For example, users are expected to make trust decisions based on a URL rendered in the browser address bar.

4.8.1. Simplify non-human-readable or irrelevant components

Remove components that can provide opportunities for spoofing or distract from security-relevant information:

  • Browsers may render only a URL’s host in places where it is important for end users to distinguish between the host and other parts of the URL such as the path. Browsers may consider simplifying the host further to draw attention to its registrable domain. For example, browsers may omit a leading www or m domain label to simplify the host, or display its registrable domain only to remove spoofing opportunities posted by subdomains (e.g., https://examplecorp.attacker.com/).

  • Browsers should not render a URL’s username and password, as they can be mistaken for a URL’s host (e.g., https://[email protected]/).

  • Browsers may render a URL without its scheme if the display surface only ever permits a single scheme (such as a browser feature that omits https:// because it is only enabled for secure origins). Otherwise, the scheme may be replaced or supplemented with a human-readable string (e.g., "Not secure"), a security indicator icon, or both.

4.8.2. Elision

In a space-constrained display, URLs should be elided carefully to avoid misleading the user when making a security decision:

  • Browsers should ensure that at least the registrable domain can be shown when the URL is rendered (to avoid showing, e.g., ...examplecorp.com when loading https://not-really-examplecorp.com/).

  • When the full host cannot be rendered, browsers should elide domain labels starting from the lowest-level domain label. For example, examplecorp.com.evil.com should be elided as ...com.evil.com, not examplecorp.com.... (Note that bidirectional text means that the lowest-level domain label may not appear on the left.)

4.8.3. Internationalization and special characters

Internationalized domain names (IDNs), special characters, and bidirectional text should be handled with care to prevent spoofing:

  • Browsers should render a URL’s host by running domain to Unicode with the URL’s host and false.

    Various characters can be used in homograph spoofing attacks. Consider detecting confusable characters and warning when they are in use. [IDNFAQ] [UTS39]

  • URLs are particularly prone to confusion between host and path when they contain bidirectional text, so in this case it is particularly advisable to only render a URL’s host. For readability, other parts of the URL, if rendered, should have their sequences of percent-encoded bytes replaced with code points resulting from running UTF-8 decode without BOM on the percent-decoding of those sequences, unless that renders those sequences invisible. Browsers may choose to not decode certain sequences that present spoofing risks (e.g., U+1F512 (🔒)).

  • Browsers should render bidirectional text as if it were in a left-to-right embedding. [BIDI]

    Unfortunately, as rendered URLs are strings and can appear anywhere, a specific bidirectional algorithm for rendered URLs would not see wide adoption. Bidirectional text interacts with the parts of a URL in ways that can cause the rendering to be different from the model. Users of bidirectional languages can come to expect this, particularly in plain text environments.

5. application/x-www-form-urlencoded

The application/x-www-form-urlencoded#concept-urlencodedReferenced in: format provides a way to encode a list of tuples, each consisting of a name and a value.

The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms. [HTML]

5.1. application/x-www-form-urlencoded parsing

A legacy server-oriented implementation might have to support encodings other than UTF-8 as well as have special logic for tuples of which the name is `_charset`. Such logic is not described here as only UTF-8 is conforming.

The application/x-www-form-urlencoded parser#concept-urlencoded-parserReferenced in: takes a byte sequence input, and then runs these steps:

  1. Let sequences be the result of splitting input on 0x26 (&).

  2. Let output be an initially empty list of name-value tuples where both name and value hold a string.

  3. For each byte sequence bytes in sequences:

    1. If bytes is the empty byte sequence, then continue.

    2. If bytes contains a 0x3D (=), then let name be the bytes from the start of bytes up to but excluding its first 0x3D (=), and let value be the bytes, if any, after the first 0x3D (=) up to the end of bytes. If 0x3D (=) is the first byte, then name will be the empty byte sequence. If it is the last, then value will be the empty byte sequence.

    3. Otherwise, let name have the value of bytes and let value be the empty byte sequence.

    4. Replace any 0x2B (+) in name and value with 0x20 (SP).

    5. Let nameString and valueString be the result of running UTF-8 decode without BOM on the percent-decoding of name and value, respectively.

    6. Append (nameString, valueString) to output.

  4. Return output.

5.2. application/x-www-form-urlencoded serializing

The application/x-www-form-urlencoded serializer#concept-urlencoded-serializerReferenced in: takes a list of name-value tuples tuples, with an optional encoding encoding (default UTF-8), and then runs these steps. They return an ASCII string.

  1. Set encoding to the result of getting an output encoding from encoding.

  2. Let output be the empty string.

  3. For each tuple of tuples:

    1. Assert: tuple’s name and tuple’s value are scalar value strings.

    2. Let name be the result of running percent-encode after encoding with encoding, tuple’s name, the application/x-www-form-urlencoded percent-encode set, and true.

    3. Let value be the result of running percent-encode after encoding with encoding, tuple’s value, the application/x-www-form-urlencoded percent-encode set, and true.

    4. If output is not the empty string, then append U+0026 (&) to output.

    5. Append name, followed by U+003D (=), followed by value, to output.
  4. Return output.

5.3. Hooks

The application/x-www-form-urlencoded string parser#concept-urlencoded-string-parserReferenced in: takes a scalar value string input, UTF-8 encodes it, and then returns the result of application/x-www-form-urlencoded parsing it.

6. API

This section uses terminology from Web IDL. Browser user agents must support this API. JavaScript implementations should support this API. Other user agents or programming languages are encouraged to use an API suitable to their needs, which might not be this one. [WEBIDL]

6.1. URL class

A URL object has an associated:

To potentially strip trailing spaces from an opaque path#potentially-strip-trailing-spaces-from-an-opaque-pathReferenced in: given a URL object url:

  1. If url’s URL does not have an opaque path, then return.

  2. If url’s URL’s fragment is non-null, then return.

  3. If url’s URL’s query is non-null, then return.

  4. Remove all trailing U+0020 SPACE code points from url’s URL’s path.

The API URL parser#api-url-parserReferenced in: takes a scalar value string url and an optional null-or-scalar value string base (default null), and then runs these steps:

  1. Let parsedBase be null.

  2. If base is non-null:

    1. Set parsedBase to the result of running the basic URL parser on base.

    2. If parsedBase is failure, then return failure.

  3. Return the result of running the basic URL parser on url with parsedBase.


The new URL(url, base)#dom-url-urlReferenced in: constructor steps are:

  1. Let parsedURL be the result of running the API URL parser on url with base, if given.

  2. If parsedURL is failure, then throw a TypeError.

  3. Let query be parsedURL’s query, if that is non-null, and the empty string otherwise.

  4. Set this’s URL to parsedURL.

  5. Set this’s query object to a new URLSearchParams object.

  6. Initialize this’s query object with query.

  7. Set this’s query object’s URL object to this.

To parse a string into a URL without using a base URL, invoke the URL constructor with a single argument:

This throws an exception if the input is a relative-URL string:

For those cases a base URL is necessary:

A URL object can be used as a base URL (as the IDL requires a string as argument, a URL object stringifies to its href getter return value):


The static canParse(url, base)#dom-url-canparseReferenced in: method steps are:

  1. Let parsedURL be the result of running the API URL parser on url with base, if given.

  2. If parsedURL is failure, then return false.

  3. Return true.


The href#dom-url-hrefReferenced in: getter steps and the toJSON()#dom-url-tojsonReferenced in: method steps are to return the serialization of this’s URL.

The href setter steps are:

  1. Let parsedURL be the result of running the basic URL parser on the given value.

  2. If parsedURL is failure, then throw a TypeError.

  3. Set this’s URL to parsedURL.

  4. Empty this’s query object’s list.

  5. Let query be this’s URL’s query.

  6. If query is non-null, then set this’s query object’s list to the result of parsing query.

The origin#dom-url-originReferenced in: getter steps are to return the serialization of this’s URL’s origin. [HTML]

The protocol#dom-url-protocolReferenced in: getter steps are to return this’s URL’s scheme, followed by U+003A (:).

The protocol setter steps are to basic URL parse the given value, followed by U+003A (:), with this’s URL as url and scheme start state as state override.

The username#dom-url-usernameReferenced in: getter steps are to return this’s URL’s username.

The username setter steps are:

The password#dom-url-passwordReferenced in: getter steps are to return this’s URL’s password.

The password setter steps are:

The host#dom-url-hostReferenced in: getter steps are:

  1. Let url be this’s URL.

  2. If url’s host is null, then return the empty string.

  3. If url’s port is null, return url’s host, serialized.

  4. Return url’s host, serialized, followed by U+003A (:) and url’s port, serialized.

The host setter steps are:

If the given value for the host setter lacks a port, this’s URL’s port will not change. This can be unexpected as host getter does return a URL-port string so one might have assumed the setter to always "reset" both.

The hostname#dom-url-hostnameReferenced in: getter steps are:

  1. If this’s URL’s host is null, then return the empty string.

  2. Return this’s URL’s host, serialized.

The hostname setter steps are:

The port#dom-url-portReferenced in: getter steps are:

  1. If this’s URL’s port is null, then return the empty string.

  2. Return this’s URL’s port, serialized.

The port setter steps are:

  1. If this’s URL cannot have a username/password/port, then return.

  2. If the given value is the empty string, then set this’s URL’s port to null.

  3. Otherwise, basic URL parse the given value with this’s URL as url and port state as state override.

The pathname#dom-url-pathnameReferenced in: getter steps are to return the result of URL path serializing this’s URL.

The pathname setter steps are:

The search#dom-url-searchReferenced in: getter steps are:

  1. If this’s URL’s query is either null or the empty string, then return the empty string.

  2. Return U+003F (?), followed by this’s URL’s query.

The search setter steps are:

  1. Let url be this’s URL.

  2. If the given value is the empty string:

  3. Let input be the given value with a single leading U+003F (?) removed, if any.

  4. Set url’s query to the empty string.

  5. Basic URL parse input with url as url and query state as state override.

  6. Set this’s query object’s list to the result of parsing input.

The search setter has the potential to remove trailing U+0020 SPACE code points from this’s URL’s path. It does this so that running the URL parser on the output of running the URL serializer on this’s URL does not yield a URL that is not equal.

The searchParams#dom-url-searchparamsReferenced in: getter steps are to return this’s query object.

The hash#dom-url-hashReferenced in: getter steps are:

  1. If this’s URL’s fragment is either null or the empty string, then return the empty string.

  2. Return U+0023 (#), followed by this’s URL’s fragment.

The hash setter steps are:

  1. If the given value is the empty string:

  2. Let input be the given value with a single leading U+0023 (#) removed, if any.

  3. Set this’s URL’s fragment to the empty string.

  4. Basic URL parse input with this’s URL as url and fragment state as state override.

The hash setter has the potential to change this’s URL’s path in a manner equivalent to the search setter.

6.2. URLSearchParams class

Constructing and stringifying a URLSearchParams object is fairly straightforward:

As a URLSearchParams object uses the application/x-www-form-urlencoded format underneath there are some difference with how it encodes certain code points compared to a URL object (including href and search). This can be especially surprising when using searchParams to operate on a URL’s query.

URLSearchParams objects will percent-encode anything in the application/x-www-form-urlencoded percent-encode set, and will encode U+0020 SPACE as U+002B (+).

Ignoring encodings (use UTF-8), search will percent-encode anything in the query percent-encode set or the special-query percent-encode set (depending on whether or not the URL is special).

A URLSearchParams object has an associated:

A URLSearchParams object with a non-null URL object has the potential to change that object’s path in a manner equivalent to the URL object’s search and hash setters.

To initialize#urlsearchparams-initializeReferenced in: a URLSearchParams object query with init:

  1. If init is a sequence, then for each innerSequence of init:

    1. If innerSequence’s size is not 2, then throw a TypeError.

    2. Append (innerSequence[0], innerSequence[1]) to query’s list.

  2. Otherwise, if init is a record, then for each name → value of init, append (name, value) to query’s list.

  3. Otherwise:

    1. Assert: init is a string.

    2. Set query’s list to the result of parsing init.

To update#concept-urlsearchparams-updateReferenced in: a URLSearchParams object query:

  1. If query’s URL object is null, then return.

  2. Let serializedQuery be the serialization of query’s list.

  3. If serializedQuery is the empty string, then set serializedQuery to null.

  4. Set query’s URL object’s URL’s query to serializedQuery.

  5. If serializedQuery is null, then potentially strip trailing spaces from an opaque path with query’s URL object.

The new URLSearchParams(init)#dom-urlsearchparams-urlsearchparamsReferenced in: constructor steps are:

  1. If init is a string and starts with U+003F (?), then remove the first code point from init.

  2. Initialize this with init.

The size#dom-urlsearchparams-sizeReferenced in: getter steps are to return this’s list’s size.

The append(name, value)#dom-urlsearchparams-appendReferenced in: method steps are:

The delete(name, value)#dom-urlsearchparams-deleteReferenced in: method steps are:

  1. If value is given, then remove all tuples whose name is name and value is value from this’s list.

  2. Otherwise, remove all tuples whose name is name from this’s list.

  3. Update this.

The get(name)#dom-urlsearchparams-getReferenced in: method steps are to return the value of the first tuple whose name is name in this’s list, if there is such a tuple; otherwise null.

The getAll(name)#dom-urlsearchparams-getallReferenced in: method steps are to return the values of all tuples whose name is name in this’s list, in list order; otherwise the empty sequence.

The has(name, value)#dom-urlsearchparams-hasReferenced in: method steps are:

  1. If value is given and there is a tuple whose name is name and value is value in this’s list, then return true.

  2. If value is not given and there is a tuple whose name is name in this’s list, then return true.

  3. Return false.

The set(name, value)#dom-urlsearchparams-setReferenced in: method steps are:

  1. If this’s list contains any tuples whose name is name, then set the value of the first such tuple to value and remove the others.

  2. Otherwise, append (name, value) to this’s list.

  3. Update this.


It can be useful to sort the name-value tuples in a URLSearchParams object, in particular to increase cache hits. This can be accomplished through invoking the sort() method:

To avoid altering the original input, e.g., for comparison purposes, construct a new URLSearchParams object:

The sort()#dom-urlsearchparams-sortReferenced in: method steps are:

  1. Sort all tuples in this’s list, if any, by their names. Sorting must be done by comparison of code units. The relative order between tuples with equal names must be preserved.

  2. Update this.


The value pairs to iterate over are this’s list’s tuples with the key being the name and the value being the value.

The stringification behavior#urlsearchparams-stringification-behaviorReferenced in: steps are to return the serialization of this’s list.

6.3. URL APIs elsewhere

A standard that exposes URLs, should expose the URL as a string (by serializing an internal URL). A standard should not expose a URL using a URL object. URL objects are meant for URL manipulation. In IDL the USVString type should be used.

The higher-level notion here is that values are to be exposed as immutable data structures.

If a standard decides to use a variant of the name "URL" for a feature it defines, it should name such a feature "url" (i.e., lowercase and with an "l" at the end). Names such as "URL", "URI", and "IRI" should not be used. However, if the name is a compound, "URL" (i.e., uppercase) is preferred, e.g., "newURL" and "oldURL".

The EventSource and HashChangeEvent interfaces in HTML are examples of proper naming. [HTML]

Acknowledgments

There have been a lot of people that have helped make URLs more interoperable over the years and thereby furthered the goals of this standard. Likewise many people have helped making this standard what it is today.

With that, many thanks to 100の人, Adam Barth, Addison Phillips, Adrián Chaves, Albert Wiersch, Alex Christensen, Alexis Hunt, Alexandre Morgaut, Alexis Hunt, Alwin Blok, Andrew Sullivan, Arkadiusz Michalski, Behnam Esfahbod, Bobby Holley, Boris Zbarsky, Brad Hill, Brandon Ross, Cailyn Hansen, Chris Dumez, Chris Rebert, Corey Farwell, Dan Appelquist, Daniel Bratell, Daniel Stenberg, David Burns, David Håsäther, David Sheets, David Singer, David Walp, Domenic Denicola, Emily Schechter, Emily Stark, Eric Lawrence, Erik Arvidsson, Gavin Carothers, Geoff Richards, Glenn Maynard, Gordon P. Hemsley, hemanth, Henri Sivonen, Ian Hickson, Ilya Grigorik, Italo A. Casas, Jakub Gieryluk, James Graham, James Manger, James Ross, Jeff Hodges, Jeffrey Posnick, Jeffrey Yasskin, Joe Duarte, Joshua Bell, Jxck, Karl Wagner, 田村健人 (Kent TAMURA), Kevin Grandon, Kornel Lesiński, Larry Masinter, Leif Halvard Silli, Mark Amery, Mark Davis, Marcos Cáceres, Marijn Kruisselbrink, Martin Dürst, Mathias Bynens, Matt Falkenhagen, Matt Giuca, Michael Peick, Michael™ Smith, Michal Bukovský, Michel Suignard, Mikaël Geljić, Noah Levitt, Peter Occil, Philip Jägenstedt, Philippe Ombredanne, Prayag Verma, Rimas Misevičius, Robert Kieffer, Rodney Rehm, Roy Fielding, Ryan Sleevi, Sam Ruby, Sam Sneddon, Santiago M. Mola, Sebastian Mayr, Simon Pieters, Simon Sapin, Steven Vachon, Stuart Cook, Sven Uhlig, Tab Atkins, 吉野剛史 (Takeshi Yoshino), Tantek Çelik, Tiancheng "Timothy" Gu, Tim Berners-Lee, 簡冠庭 (Tim Guan-tin Chien), Titi_Alone, Tomek Wytrębowicz, Trevor Rowbotham, Tristan Seligmann, Valentin Gosu, Vyacheslav Matva, Wei Wang, Wolf Lammen, 山岸和利 (Yamagishi Kazutoshi), Yongsheng Zhang, 成瀬ゆい (Yui Naruse), and zealousidealroll for being awesome!

This standard is written by Anne van Kesteren (Apple, [email protected]).

Intellectual property rights

Copyright © WHATWG (Apple, Google, Mozilla, Microsoft). This work is licensed under a Creative Commons Attribution 4.0 International License. To the extent portions of it are incorporated into source code, such portions in the source code are licensed under the BSD 3-Clause License instead.

This is the Living Standard. Those interested in the patent-review version should view the Living Standard Review Draft.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[BIDI] Mark Davis; Ken Whistler. Unicode Bidirectional Algorithm. 16 August 2022. Unicode Standard Annex #9. URL: https://www.unicode.org/reports/tr9/tr9-46.html [ENCODING] Anne van Kesteren. Encoding Standard. Living Standard. URL: https://encoding.spec.whatwg.org/ [FILEAPI] Marijn Kruisselbrink. File API. URL: https://w3c.github.io/FileAPI/ [HTML] Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/ [IANA-URI-SCHEMES] Uniform Resource Identifier (URI) Schemes. URL: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml [INFRA] Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/ [PSL] Public Suffix List. Mozilla Foundation. [RFC4291] R. Hinden; S. Deering. IP Version 6 Addressing Architecture. February 2006. Draft Standard. URL: https://www.rfc-editor.org/rfc/rfc4291 [UTS46] Mark Davis; Michel Suignard. Unicode IDNA Compatibility Processing. 26 August 2022. Unicode Technical Standard #46. URL: https://www.unicode.org/reports/tr46/tr46-29.html [WEBIDL] Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/

Informative References

[ECMA-262] ECMAScript Language Specification. URL: https://tc39.es/ecma262/multipage/ [IDNFAQ] Internationalized Domain Names (IDN) FAQ. URL: https://unicode.org/faq/idn.html [RFC1034] P. Mockapetris. Domain names - concepts and facilities. November 1987. Internet Standard. URL: https://www.rfc-editor.org/rfc/rfc1034 [RFC3986] T. Berners-Lee; R. Fielding; L. Masinter. Uniform Resource Identifier (URI): Generic Syntax. January 2005. Internet Standard. URL: https://www.rfc-editor.org/rfc/rfc3986 [RFC3987] M. Duerst; M. Suignard. Internationalized Resource Identifiers (IRIs). January 2005. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc3987 [RFC5890] J. Klensin. Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework. August 2010. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc5890 [RFC5952] S. Kawamura; M. Kawashima. A Recommendation for IPv6 Address Text Representation. August 2010. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc5952 [RFC6454] A. Barth. The Web Origin Concept. December 2011. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc6454 [RFC7595] D. Thaler, Ed.; T. Hansen; T. Hardie. Guidelines and Registration Procedures for URI Schemes. June 2015. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc7595 [RFC791] J. Postel. Internet Protocol. September 1981. Internet Standard. URL: https://www.rfc-editor.org/rfc/rfc791 [UTR36] Mark Davis; Michel Suignard. Unicode Security Considerations. 19 September 2014. Unicode Technical Report #36. URL: https://www.unicode.org/reports/tr36/tr36-15.html [UTS39] Mark Davis; Michel Suignard. Unicode Security Mechanisms. 26 August 2022. Unicode Technical Standard #39. URL: https://www.unicode.org/reports/tr39/tr39-26.html

IDL Index

MDN

URL/URL

In all current engines.

Firefox26+Safari14.1+Chrome19+
Opera?Edge79+
Edge (Legacy)12+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js10.0.0+

MDN

URL/hash

In all current engines.

Firefox22+Safari7+Chrome32+
Opera?Edge79+
Edge (Legacy)13+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.0.0+

MDN

URL/host

In all current engines.

Firefox22+Safari7+Chrome32+
Opera?Edge79+
Edge (Legacy)13+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.0.0+

MDN

URL/hostname

In all current engines.

Firefox22+Safari10+Chrome32+
Opera?Edge79+
Edge (Legacy)13+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.0.0+

MDN

URL/href

In all current engines.

Firefox22+Safari10+Chrome32+
Opera?Edge79+
Edge (Legacy)13+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.0.0+

MDN

URL/origin

In all current engines.

Firefox26+Safari10+Chrome32+
Opera?Edge79+
Edge (Legacy)12+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet6.0+Opera Mobile?
Node.js7.0.0+

MDN

URL/password

In all current engines.

Firefox26+Safari10+Chrome32+
Opera?Edge79+
Edge (Legacy)12+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet6.0+Opera Mobile?
Node.js7.0.0+

MDN

URL/pathname

In all current engines.

Firefox22+Safari10+Chrome32+
Opera?Edge79+
Edge (Legacy)13+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.0.0+

MDN

URL/port

In all current engines.

Firefox22+Safari10+Chrome32+
Opera?Edge79+
Edge (Legacy)13+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.0.0+

MDN

URL/protocol

In all current engines.

Firefox22+Safari10+Chrome32+
Opera?Edge79+
Edge (Legacy)13+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.0.0+

MDN

URL/search

In all current engines.

Firefox22+Safari10+Chrome32+
Opera?Edge79+
Edge (Legacy)13+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.0.0+

MDN

URL/searchParams

In all current engines.

Firefox29+Safari10+Chrome51+
Opera?Edge79+
Edge (Legacy)17+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.5.0+

MDN

URL/toJSON

In all current engines.

Firefox54+Safari11+Chrome71+
Opera?Edge79+
Edge (Legacy)17+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.7.0+

MDN

URL/toString

In all current engines.

Firefox54+Safari7+Chrome19+
Opera?Edge79+
Edge (Legacy)17+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet6.0+Opera Mobile?
Node.js7.0.0+

MDN

URL/username

In all current engines.

Firefox26+Safari10+Chrome32+
Opera?Edge79+
Edge (Legacy)12+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet6.0+Opera Mobile?
Node.js7.0.0+

MDN

URL

In all current engines.

Firefox19+Safari7+Chrome32+
Opera?Edge79+
Edge (Legacy)12+IE10+
Firefox for Android?iOS Safari?Chrome for Android?Android WebView4.4+Samsung Internet?Opera Mobile?
Node.js10.0.0+

MDN

URLSearchParams/URLSearchParams

In all current engines.

Firefox29+Safari10.1+Chrome49+
Opera?Edge79+
Edge (Legacy)17+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.5.0+

MDN

URLSearchParams/append

In all current engines.

Firefox29+Safari10.1+Chrome49+
Opera?Edge79+
Edge (Legacy)17+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.5.0+

MDN

URLSearchParams/delete

In all current engines.

Firefox29+Safari14+Chrome49+
Opera?Edge79+
Edge (Legacy)17+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.5.0+

MDN

URLSearchParams/get

In all current engines.

Firefox29+Safari10.1+Chrome49+
Opera?Edge79+
Edge (Legacy)17+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.5.0+

MDN

URLSearchParams/getAll

In all current engines.

Firefox29+Safari10.1+Chrome49+
Opera?Edge79+
Edge (Legacy)17+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.5.0+

MDN

URLSearchParams/has

In all current engines.

Firefox29+Safari10.1+Chrome49+
Opera?Edge79+
Edge (Legacy)17+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.5.0+

MDN

URLSearchParams/set

In all current engines.

Firefox29+Safari10.1+Chrome49+
Opera?Edge79+
Edge (Legacy)17+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.5.0+

MDN

URLSearchParams/size

In all current engines.

Firefox112+Safaripreview+Chrome113+
Opera?Edge113+
Edge (Legacy)?IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js19.0.0+

MDN

URLSearchParams/sort

In all current engines.

Firefox54+Safari11+Chrome61+
Opera?Edge79+
Edge (Legacy)17+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.7.0+

MDN

URLSearchParams/toString

In all current engines.

Firefox29+Safari10.1+Chrome49+
Opera?Edge79+
Edge (Legacy)17+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js7.5.0+

MDN

URLSearchParams

In all current engines.

Firefox29+Safari10.1+Chrome49+
Opera?Edge79+
Edge (Legacy)17+IENone
Firefox for Android?iOS Safari?Chrome for Android?Android WebView?Samsung Internet?Opera Mobile?
Node.js10.0.0+

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK