[Docs] [txt|pdf|xml|html] [Tracker] [Email] [Nits]

Versions: 00 01

iri                                                             A. Barth
Internet-Draft                                              Google, Inc.
Intended status: Informational                          November 9, 2010
Expires: May 13, 2011


                       How Browsers Process URLs
                          draft-abarth-url-00

Abstract

   This document contains a precise specification of how browsers
   process URLs.  The behavior specified in this document might or might
   not match any particular browser, but browsers might be well-served
   by adopting the behavior defined herein.




































Barth                     Expires May 13, 2011                  [Page 1]


Internet-Draft          How Browsers Process URLs          November 2010


Editorial Note (To be removed by RFC Editor)

   If you have suggestions for improving this document, please send
   email to <mailto:public-iri@w3.org>.  Further Working Group
   information is available from <https://tools.ietf.org/wg/iri/>.

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on May 13, 2011.

Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the BSD License.








Barth                     Expires May 13, 2011                  [Page 2]


Internet-Draft          How Browsers Process URLs          November 2010


Table of Contents

   1.  Open Issues  . . . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  5
   3.  Parsing a URL  . . . . . . . . . . . . . . . . . . . . . . . .  6
     3.1.  Finding the scheme . . . . . . . . . . . . . . . . . . . .  6
     3.2.  Finding the authority, path, query, and fragment . . . . .  7
     3.3.  Finding the user information, host, and port . . . . . . .  8
     3.4.  Find the user name and password  . . . . . . . . . . . . .  8
   4.  Resolving a string relative to a base URL  . . . . . . . . . .  9
     4.1.  Resolving a string as a relative URL . . . . . . . . . . .  9
     4.2.  Resolving a string as a scheme-relative URL  . . . . . . . 10
     4.3.  Resolving a string as an authority-relative URL  . . . . . 10
     4.4.  Resolving a string as a query-relative URL . . . . . . . . 11
     4.5.  Resolving a string as a fragment-relative URL  . . . . . . 11
     4.6.  Resolving a string as a path-relative URL  . . . . . . . . 11
   Appendix A.  Acknowledgements  . . . . . . . . . . . . . . . . . . 12
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 13

































Barth                     Expires May 13, 2011                  [Page 3]


Internet-Draft          How Browsers Process URLs          November 2010


1.  Open Issues

   Browsers parse URLs differently depending on which operating system
   they're running on.  The problem is that they want to do sensible
   things for file paths, but file paths look different on Windows and
   Unix systems.

   How should we handle cases where browsers disaggree with the regular
   expression in RFC 3986?  Currently, this document aims to describe
   how browsers behave, but we'll likely need to compare that to RFC
   3986 at some point.  Some specific differences that have been brought
   up on the mailing list:

   o  http:///example.com/

   o  http://example.com;



































Barth                     Expires May 13, 2011                  [Page 4]


Internet-Draft          How Browsers Process URLs          November 2010


2.  Definitions

   A /control character/ is a character whose value is less than or
   equal to U+0020 (" ").

   A /slash character/ is either U+???? ("/") or U+???? ("\").

   An /authority terminating characters/ is either a slash charcter,
   U+???? ("?"), U+???? ("#"), or U+???? (";").

   During a parsing algorithm, the /remaining string/ are the characters
   of the input that have not yet been consumed.







































Barth                     Expires May 13, 2011                  [Page 5]


Internet-Draft          How Browsers Process URLs          November 2010


3.  Parsing a URL

   Given a string of characters, find the scheme, as described in
   Section ??.

   If the URL is invalid:

      -> Abort these steps.

   If the scheme is a single upper or lower case ASCII character:

      -> TODO: Windows drive specs!

   If the scheme is a ASCII case-insensitive match for "file":

      -> TODO: File URLs!

   If the scheme is a ASCII case-insensitive match for "mailto":

      -> TODO: I think mailto URLs are special, but more testing is
      required.

   If the scheme is hierarchical:

      -> In the after-scheme, if any, find the authority, path, query,
      and fragment, as decribed in Section ??.

      -> In the authority, if any, find the user information, host, and
      port, as described in Section ??.

      -> In the user-info, if any, find the user name and password, as
      described in Section ??.

      -> Abort these steps.

   The remaining string is the /path/.

3.1.  Finding the scheme

   Consume all leading and trailing control characters.

   If the remaining string does not contain a ":" character:

      -> The URL is invalid.

      -> Abort these steps.

   Consume characters up to, but not including, the first ":" character.



Barth                     Expires May 13, 2011                  [Page 6]


Internet-Draft          How Browsers Process URLs          November 2010


   These characters are the /scheme/.

   Consume the ":" character.

   The reamining characters are the /after-scheme/.

3.2.  Finding the authority, path, query, and fragment

   Consume any number of slash characters.

   If the remaining string does not contain any authority terminating
   characters:

      -> The remaining string is the /authority/.

      -> Abort these steps.

   Consume characters up to, but not including, the first authority
   terminating character.  The consumed characters are /authority/.

   If the remaining string does not contain a "?" character or a "#
   character:

      -> The remaining string is the /path/.

      -> Abort these steps.

   Consume characters up to, but not including, the first "?" or "#"
   charcter.  The consumed characters are the /path/.

   If the first character of the remaining string is a "?" character:

      -> Consume the "?" character.

      -> If the remaining string does not contain a "#" character:

         -> The remaining string is the /query/.

         -> Abort these steps.

      -> Consume characters up to, but not including, the first "#"
      charcter.  The consumed characters are the /query/.

   Consume the "#" character.

   The remaining string is the /fragment/.





Barth                     Expires May 13, 2011                  [Page 7]


Internet-Draft          How Browsers Process URLs          November 2010


3.3.  Finding the user information, host, and port

   If the remaining string contains an "@" character:

      -> Consume characters up to, but not including the *last* "@"
      character.  The consumed characters are the /user-info/.

      -> Consume the "@" character.

   If the remaining string does not contain an ":" character:

      -> The remaining string is the /host/.

      -> Abort these steps.

   If the first character of the remaining string is a "[" character,
   the remaining string contains a "]" character, and the last ":"
   character in the remaining string occurs before the last "]"
   character in the remaining string:

      -> The remaining string is the /host/.

      -> Abort these steps.

   Consume characters up to, but not including, the last ":" character.
   The consumed characters are the /host/.

   Consume the ":" character.

   The remaining string is the /port/.

3.4.  Find the user name and password

   If the remaining string does not contain a ":" character:

      -> The remaining string is the /user/.

      -> Abort these steps.

   Consume characters up to, but not including, the first ":" character.
   The consumed characters are the /user/.

   Consume the ":" character.

   The remaining string is the /password/.






Barth                     Expires May 13, 2011                  [Page 8]


Internet-Draft          How Browsers Process URLs          November 2010


4.  Resolving a string relative to a base URL

   Given a string /spec/ and a ParsedURL /base-url/, find the scheme of
   spec.

   TODO: We probably need to trim leading and trailing control
   characters.

   If spec is an invalid URL:

      -> The resolved URL is spec resolved as relative URL.

      -> Abort these steps.

   If spec's scheme contains any characters which are not "valid scheme
   characters" (TODO: Define valid scheme characters):

      -> The resolved URL is spec resolved as relative URL.

      -> Abort these steps.

   If base-url's scheme is an ASCII case insensitive match for spec's
   scheme and the shared scheme is hierarchical:

      -> The resolved URL is spec's after-scheme resolved as a relative
      URL.

      -> Abort these steps.

   The resolved URL is spec parsed as an absolute URL.

4.1.  Resolving a string as a relative URL

   Given a string /spec/ and a ParsedURL /base-url/...

   TODO: If base-url's scheme is not hierarchical, we can't resolve as a
   relative URL.  We'll probably want to return an invalid URL.  Check
   what happens when resolving an empty string as a relative URL with a
   non-hierarchical base.

   If spec is empty:

      -> The resolved URL is identical to base-url, with the fragment,
      if any, removed.

      -> Abort these steps.

   If the first character of spec is a slash character:



Barth                     Expires May 13, 2011                  [Page 9]


Internet-Draft          How Browsers Process URLs          November 2010


      -> If spec has at least two characters and the second character is
      also a slash character:

         -> The resolved URL is spec resolved as a scheme-relative URL.

      Otherwise:

         -> The resolved URL is spec resolved as an authority-relative
         URL.

      -> Abort these steps.

   If the first character of spec is a "?" character:

      -> The resolved URL is spec resolved as a query-relative URL.

      -> Abort these steps.

   If the first character of spec is a "#" character:

      -> The resolved URL is spec resolved as a fragment-relative URL.

      -> Abort these steps.

   The resolved URL is spec resolved as a path-relative URL.

4.2.  Resolving a string as a scheme-relative URL

   Given a string /spec/ and a ParsedURL /base-url/, let resolved-url be

   o  base-url's scheme

   o  concatenated with ":",

   o  concatenated with spec.

   The resolved URL is resolved-url parsed as an absolute URL.

4.3.  Resolving a string as an authority-relative URL

   Given a string /spec/ and a ParsedURL /base-url/, let resolved-url be

   o  base-url's scheme

   o  concatenated with "://",

   o  concatenated with base-url's authority,




Barth                     Expires May 13, 2011                 [Page 10]


Internet-Draft          How Browsers Process URLs          November 2010


   o  concatenated with spec.

   The resolved URL is resolved-url parsed as an absolute URL.

4.4.  Resolving a string as a query-relative URL

   Given a string /spec/ and a ParsedURL /base-url/, let resolved-url be

   o  base-url's scheme

   o  concatenated with "://",

   o  concatenated with base-url's authority,

   o  concatenated with base-url's path,

   o  concatenated with spec.

   The resolved URL is resolved-url parsed as an absolute URL.

4.5.  Resolving a string as a fragment-relative URL

   Given a string /spec/ and a ParsedURL /base-url/, let resolved-url be

   o  base-url's scheme

   o  concatenated with "://",

   o  concatenated with base-url's authority,

   o  concatenated with base-url's path,

   o  concatenated with "?",

   o  concatenated with base-url's query,

   o  concatenated with spec.

   The resolved URL is resolved-url parsed as an absolute URL.

4.6.  Resolving a string as a path-relative URL

   TODO: Handle path-relative URLs.  This requires a bunch of path and
   dot semantics.







Barth                     Expires May 13, 2011                 [Page 11]


Internet-Draft          How Browsers Process URLs          November 2010


Appendix A.  Acknowledgements

   TODO
















































Barth                     Expires May 13, 2011                 [Page 12]


Internet-Draft          How Browsers Process URLs          November 2010


Author's Address

   Adam Barth
   Google, Inc.

   Email: ietf@adambarth.com
   URI:   http://www.adambarth.com/












































Barth                     Expires May 13, 2011                 [Page 13]


Html markup produced by rfcmarkup 1.123, available from https://tools.ietf.org/tools/rfcmarkup/