polip: Koniec z us-ascii w Internecie?

Autor: Andrzej P. Wozniak <uszer_at_poczta.onet.pl.invalid>
Data: Tue 01 Apr 2008 - 20:55:40 MET DST
Message-ID: <fsu0rn$uku$1@mx1.internetia.pl>
Content-Type: text/plain; charset="iso-8859-2"

Nie wiem, czy ktoś zauważył publikację nowych RFC dotyczących rezygnacji
z us-ascii w Internecie, więc streszczam.

1. RFC 5198
Unicode Format for Network Interchange
Category: Standards Track

   This document proposes to establish "Net-Unicode" as a new
   standardized text transmission form for the Internet, to serve as an
   internationalized alternative for NVT ASCII when specified in new --
   and, where appropriate, updated -- protocols. UTF-8 [RFC3629] is
   chosen for the coding because it has good compatibility properties
   with ASCII and for other reasons discussed in the existing IETF
   character set policy [RFC2277].

Czyli: zamieniamy us-ascii na utf-8, jako znaków końca linii używamy tylko
CRLF, wywalamy tabulator, backspace itp. - rzeczy nudne i oczywiste. Całość
dostępna tu: http://www.rfc-editor.org/rfc/rfc5198.txt

Drugi RFC jest ciekawszy. Dotyczy tylko nazw domenowych, ale oznacza również
rezygnację z utf-8.

2. RFC 5242
A Generalized Unified Character Code: Western European and CJK Sections
Category: Informational

   Many issues have been identified with the use of general-purpose
   character sets for internationalized domain names and similar
   purposes. This memo specifies a fully unified coded character set
   for scripts based on Latin, Greek, Cyrillic, and Chinese characters.

There are four important principles in this work:

   1. If it looks alike, it is alike. The number of base characters
       and marks should be minimized. Glyphs are more important than
       character abstractions.

   2. If it is the same thing, it is the same thing. Two symbols that
       have the same semantic meaning in all contexts should be encoded
       in a way that allows their identity to be discovered by removing
       modifiers, rather than having to resort to external equivalence
       tables.

   3. For simplicity, when a character form can be evaluated on the
       basis of either serif or sanserif fonts, the sanserif font is
       always preferred.

4. The use of combining characters and modifiers is preferred to
adding more base characters.

Based on these principles, it becomes obvious that:

o Ligatures, digraphs, and final forms are constructed with special
modifiers so that relationships to basic forms are obvious.

   o Symbols consisting of multiple marks are always constructed from
      combining characters and positional modifiers; thus, the "i"
      character is constructed from the vertical line symbol followed by
      a combining dot above. Similarly "f" is composed of a centered
      vertical line, a right hook in the top position, and an
      appropriately-positioned composing hyphen.

This document draws strongly from the design and terminology of
Unicode [Unicode] but represents a radically different approach.

Nie chce mi się rozwijać sprawy, więc polecam samodzielną lekturę:
http://www.rfc-editor.org/rfc/rfc5242.txt

-- 
Andrzej P. Woźniak  uszer@pochta.onet.pl  (zamień miejscami z<->h w adresie)
  ...admin z przypadku często pewnych rzeczy nie wie/nie zna/nie umie/
  nie ma w zwyczaju. Znam z własnego doświadczenia. A to się potem mści.
  Także na nim samym.            -- Mariusz Kruk na pl.comp.os.advocacy

Received on Tue Apr 1 21:05:08 2008

To archiwum zostało wygenerowane przez hypermail 2.1.8 : Tue 01 Apr 2008 - 21:40:01 MET DST