hg/bck2brwsr: emul/mini/src/main/java/java/lang/Character.java@5e13b1ac2886

     1 /*

     2  * Copyright (c) 2002, 2010, Oracle and/or its affiliates. All rights reserved.

     3  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.

     4  *

     5  * This code is free software; you can redistribute it and/or modify it

     6  * under the terms of the GNU General Public License version 2 only, as

     7  * published by the Free Software Foundation.  Oracle designates this

     8  * particular file as subject to the "Classpath" exception as provided

     9  * by Oracle in the LICENSE file that accompanied this code.

    10  *

    11  * This code is distributed in the hope that it will be useful, but WITHOUT

    12  * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or

    13  * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License

    14  * version 2 for more details (a copy is included in the LICENSE file that

    15  * accompanied this code).

    16  *

    17  * You should have received a copy of the GNU General Public License version

    18  * 2 along with this work; if not, write to the Free Software Foundation,

    19  * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.

    20  *

    21  * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA

    22  * or visit www.oracle.com if you need additional information or have any

    23  * questions.

    24  */

    26 package java.lang;

    28 import org.apidesign.bck2brwsr.core.JavaScriptBody;

    30 /**

    31  * The {@code Character} class wraps a value of the primitive

    32  * type {@code char} in an object. An object of type

    33  * {@code Character} contains a single field whose type is

    34  * {@code char}.

    35  * <p>

    36  * In addition, this class provides several methods for determining

    37  * a character's category (lowercase letter, digit, etc.) and for converting

    38  * characters from uppercase to lowercase and vice versa.

    39  * <p>

    40  * Character information is based on the Unicode Standard, version 6.0.0.

    41  * <p>

    42  * The methods and data of class {@code Character} are defined by

    43  * the information in the <i>UnicodeData</i> file that is part of the

    44  * Unicode Character Database maintained by the Unicode

    45  * Consortium. This file specifies various properties including name

    46  * and general category for every defined Unicode code point or

    47  * character range.

    48  * <p>

    49  * The file and its description are available from the Unicode Consortium at:

    50  * <ul>

    51  * <li><a href="http://www.unicode.org">http://www.unicode.org</a>

    52  * </ul>

    53  *

    54  * <h4><a name="unicode">Unicode Character Representations</a></h4>

    55  *

    56  * <p>The {@code char} data type (and therefore the value that a

    57  * {@code Character} object encapsulates) are based on the

    58  * original Unicode specification, which defined characters as

    59  * fixed-width 16-bit entities. The Unicode Standard has since been

    60  * changed to allow for characters whose representation requires more

    61  * than 16 bits.  The range of legal <em>code point</em>s is now

    62  * U+0000 to U+10FFFF, known as <em>Unicode scalar value</em>.

    63  * (Refer to the <a

    64  * href="http://www.unicode.org/reports/tr27/#notation"><i>

    65  * definition</i></a> of the U+<i>n</i> notation in the Unicode

    66  * Standard.)

    67  *

    68  * <p><a name="BMP">The set of characters from U+0000 to U+FFFF is

    69  * sometimes referred to as the <em>Basic Multilingual Plane (BMP)</em>.

    70  * <a name="supplementary">Characters</a> whose code points are greater

    71  * than U+FFFF are called <em>supplementary character</em>s.  The Java

    72  * platform uses the UTF-16 representation in {@code char} arrays and

    73  * in the {@code String} and {@code StringBuffer} classes. In

    74  * this representation, supplementary characters are represented as a pair

    75  * of {@code char} values, the first from the <em>high-surrogates</em>

    76  * range, (&#92;uD800-&#92;uDBFF), the second from the

    77  * <em>low-surrogates</em> range (&#92;uDC00-&#92;uDFFF).

    78  *

    79  * <p>A {@code char} value, therefore, represents Basic

    80  * Multilingual Plane (BMP) code points, including the surrogate

    81  * code points, or code units of the UTF-16 encoding. An

    82  * {@code int} value represents all Unicode code points,

    83  * including supplementary code points. The lower (least significant)

    84  * 21 bits of {@code int} are used to represent Unicode code

    85  * points and the upper (most significant) 11 bits must be zero.

    86  * Unless otherwise specified, the behavior with respect to

    87  * supplementary characters and surrogate {@code char} values is

    88  * as follows:

    89  *

    90  * <ul>

    91  * <li>The methods that only accept a {@code char} value cannot support

    92  * supplementary characters. They treat {@code char} values from the

    93  * surrogate ranges as undefined characters. For example,

    94  * {@code Character.isLetter('\u005CuD840')} returns {@code false}, even though

    95  * this specific value if followed by any low-surrogate value in a string

    96  * would represent a letter.

    97  *

    98  * <li>The methods that accept an {@code int} value support all

    99  * Unicode characters, including supplementary characters. For

   100  * example, {@code Character.isLetter(0x2F81A)} returns

   101  * {@code true} because the code point value represents a letter

   102  * (a CJK ideograph).

   103  * </ul>

   104  *

   105  * <p>In the Java SE API documentation, <em>Unicode code point</em> is

   106  * used for character values in the range between U+0000 and U+10FFFF,

   107  * and <em>Unicode code unit</em> is used for 16-bit

   108  * {@code char} values that are code units of the <em>UTF-16</em>

   109  * encoding. For more information on Unicode terminology, refer to the

   110  * <a href="http://www.unicode.org/glossary/">Unicode Glossary</a>.

   111  *

   112  * @author  Lee Boynton

   113  * @author  Guy Steele

   114  * @author  Akira Tanaka

   115  * @author  Martin Buchholz

   116  * @author  Ulf Zibis

   117  * @since   1.0

   118  */

   119 public final

   120 class Character implements java.io.Serializable, Comparable<Character> {

   121     /**

   122      * The minimum radix available for conversion to and from strings.

   123      * The constant value of this field is the smallest value permitted

   124      * for the radix argument in radix-conversion methods such as the

   125      * {@code digit} method, the {@code forDigit} method, and the

   126      * {@code toString} method of class {@code Integer}.

   127      *

   128      * @see     Character#digit(char, int)

   129      * @see     Character#forDigit(int, int)

   130      * @see     Integer#toString(int, int)

   131      * @see     Integer#valueOf(String)

   132      */

   133     public static final int MIN_RADIX = 2;

   135     /**

   136      * The maximum radix available for conversion to and from strings.

   137      * The constant value of this field is the largest value permitted

   138      * for the radix argument in radix-conversion methods such as the

   139      * {@code digit} method, the {@code forDigit} method, and the

   140      * {@code toString} method of class {@code Integer}.

   141      *

   142      * @see     Character#digit(char, int)

   143      * @see     Character#forDigit(int, int)

   144      * @see     Integer#toString(int, int)

   145      * @see     Integer#valueOf(String)

   146      */

   147     public static final int MAX_RADIX = 36;

   149     /**

   150      * The constant value of this field is the smallest value of type

   151      * {@code char}, {@code '\u005Cu0000'}.

   152      *

   153      * @since   1.0.2

   154      */

   155     public static final char MIN_VALUE = '\u0000';

   157     /**

   158      * The constant value of this field is the largest value of type

   159      * {@code char}, {@code '\u005CuFFFF'}.

   160      *

   161      * @since   1.0.2

   162      */

   163     public static final char MAX_VALUE = '\uFFFF';

   165     /**

   166      * The {@code Class} instance representing the primitive type

   167      * {@code char}.

   168      *

   169      * @since   1.1

   170      */

   171     public static final Class<Character> TYPE = Class.getPrimitiveClass("char");

   173     /*

   174      * Normative general types

   175      */

   177     /*

   178      * General character types

   179      */

   181     /**

   182      * General category "Cn" in the Unicode specification.

   183      * @since   1.1

   184      */

   185     public static final byte UNASSIGNED = 0;

   187     /**

   188      * General category "Lu" in the Unicode specification.

   189      * @since   1.1

   190      */

   191     public static final byte UPPERCASE_LETTER = 1;

   193     /**

   194      * General category "Ll" in the Unicode specification.

   195      * @since   1.1

   196      */

   197     public static final byte LOWERCASE_LETTER = 2;

   199     /**

   200      * General category "Lt" in the Unicode specification.

   201      * @since   1.1

   202      */

   203     public static final byte TITLECASE_LETTER = 3;

   205     /**

   206      * General category "Lm" in the Unicode specification.

   207      * @since   1.1

   208      */

   209     public static final byte MODIFIER_LETTER = 4;

   211     /**

   212      * General category "Lo" in the Unicode specification.

   213      * @since   1.1

   214      */

   215     public static final byte OTHER_LETTER = 5;

   217     /**

   218      * General category "Mn" in the Unicode specification.

   219      * @since   1.1

   220      */

   221     public static final byte NON_SPACING_MARK = 6;

   223     /**

   224      * General category "Me" in the Unicode specification.

   225      * @since   1.1

   226      */

   227     public static final byte ENCLOSING_MARK = 7;

   229     /**

   230      * General category "Mc" in the Unicode specification.

   231      * @since   1.1

   232      */

   233     public static final byte COMBINING_SPACING_MARK = 8;

   235     /**

   236      * General category "Nd" in the Unicode specification.

   237      * @since   1.1

   238      */

   239     public static final byte DECIMAL_DIGIT_NUMBER        = 9;

   241     /**

   242      * General category "Nl" in the Unicode specification.

   243      * @since   1.1

   244      */

   245     public static final byte LETTER_NUMBER = 10;

   247     /**

   248      * General category "No" in the Unicode specification.

   249      * @since   1.1

   250      */

   251     public static final byte OTHER_NUMBER = 11;

   253     /**

   254      * General category "Zs" in the Unicode specification.

   255      * @since   1.1

   256      */

   257     public static final byte SPACE_SEPARATOR = 12;

   259     /**

   260      * General category "Zl" in the Unicode specification.

   261      * @since   1.1

   262      */

   263     public static final byte LINE_SEPARATOR = 13;

   265     /**

   266      * General category "Zp" in the Unicode specification.

   267      * @since   1.1

   268      */

   269     public static final byte PARAGRAPH_SEPARATOR = 14;

   271     /**

   272      * General category "Cc" in the Unicode specification.

   273      * @since   1.1

   274      */

   275     public static final byte CONTROL = 15;

   277     /**

   278      * General category "Cf" in the Unicode specification.

   279      * @since   1.1

   280      */

   281     public static final byte FORMAT = 16;

   283     /**

   284      * General category "Co" in the Unicode specification.

   285      * @since   1.1

   286      */

   287     public static final byte PRIVATE_USE = 18;

   289     /**

   290      * General category "Cs" in the Unicode specification.

   291      * @since   1.1

   292      */

   293     public static final byte SURROGATE = 19;

   295     /**

   296      * General category "Pd" in the Unicode specification.

   297      * @since   1.1

   298      */

   299     public static final byte DASH_PUNCTUATION = 20;

   301     /**

   302      * General category "Ps" in the Unicode specification.

   303      * @since   1.1

   304      */

   305     public static final byte START_PUNCTUATION = 21;

   307     /**

   308      * General category "Pe" in the Unicode specification.

   309      * @since   1.1

   310      */

   311     public static final byte END_PUNCTUATION = 22;

   313     /**

   314      * General category "Pc" in the Unicode specification.

   315      * @since   1.1

   316      */

   317     public static final byte CONNECTOR_PUNCTUATION = 23;

   319     /**

   320      * General category "Po" in the Unicode specification.

   321      * @since   1.1

   322      */

   323     public static final byte OTHER_PUNCTUATION = 24;

   325     /**

   326      * General category "Sm" in the Unicode specification.

   327      * @since   1.1

   328      */

   329     public static final byte MATH_SYMBOL = 25;

   331     /**

   332      * General category "Sc" in the Unicode specification.

   333      * @since   1.1

   334      */

   335     public static final byte CURRENCY_SYMBOL = 26;

   337     /**

   338      * General category "Sk" in the Unicode specification.

   339      * @since   1.1

   340      */

   341     public static final byte MODIFIER_SYMBOL = 27;

   343     /**

   344      * General category "So" in the Unicode specification.

   345      * @since   1.1

   346      */

   347     public static final byte OTHER_SYMBOL = 28;

   349     /**

   350      * General category "Pi" in the Unicode specification.

   351      * @since   1.4

   352      */

   353     public static final byte INITIAL_QUOTE_PUNCTUATION = 29;

   355     /**

   356      * General category "Pf" in the Unicode specification.

   357      * @since   1.4

   358      */

   359     public static final byte FINAL_QUOTE_PUNCTUATION = 30;

   361     /**

   362      * Error flag. Use int (code point) to avoid confusion with U+FFFF.

   363      */

   364     static final int ERROR = 0xFFFFFFFF;

   367     /**

   368      * Undefined bidirectional character type. Undefined {@code char}

   369      * values have undefined directionality in the Unicode specification.

   370      * @since 1.4

   371      */

   372     public static final byte DIRECTIONALITY_UNDEFINED = -1;

   374     /**

   375      * Strong bidirectional character type "L" in the Unicode specification.

   376      * @since 1.4

   377      */

   378     public static final byte DIRECTIONALITY_LEFT_TO_RIGHT = 0;

   380     /**

   381      * Strong bidirectional character type "R" in the Unicode specification.

   382      * @since 1.4

   383      */

   384     public static final byte DIRECTIONALITY_RIGHT_TO_LEFT = 1;

   386     /**

   387     * Strong bidirectional character type "AL" in the Unicode specification.

   388      * @since 1.4

   389      */

   390     public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC = 2;

   392     /**

   393      * Weak bidirectional character type "EN" in the Unicode specification.

   394      * @since 1.4

   395      */

   396     public static final byte DIRECTIONALITY_EUROPEAN_NUMBER = 3;

   398     /**

   399      * Weak bidirectional character type "ES" in the Unicode specification.

   400      * @since 1.4

   401      */

   402     public static final byte DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR = 4;

   404     /**

   405      * Weak bidirectional character type "ET" in the Unicode specification.

   406      * @since 1.4

   407      */

   408     public static final byte DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR = 5;

   410     /**

   411      * Weak bidirectional character type "AN" in the Unicode specification.

   412      * @since 1.4

   413      */

   414     public static final byte DIRECTIONALITY_ARABIC_NUMBER = 6;

   416     /**

   417      * Weak bidirectional character type "CS" in the Unicode specification.

   418      * @since 1.4

   419      */

   420     public static final byte DIRECTIONALITY_COMMON_NUMBER_SEPARATOR = 7;

   422     /**

   423      * Weak bidirectional character type "NSM" in the Unicode specification.

   424      * @since 1.4

   425      */

   426     public static final byte DIRECTIONALITY_NONSPACING_MARK = 8;

   428     /**

   429      * Weak bidirectional character type "BN" in the Unicode specification.

   430      * @since 1.4

   431      */

   432     public static final byte DIRECTIONALITY_BOUNDARY_NEUTRAL = 9;

   434     /**

   435      * Neutral bidirectional character type "B" in the Unicode specification.

   436      * @since 1.4

   437      */

   438     public static final byte DIRECTIONALITY_PARAGRAPH_SEPARATOR = 10;

   440     /**

   441      * Neutral bidirectional character type "S" in the Unicode specification.

   442      * @since 1.4

   443      */

   444     public static final byte DIRECTIONALITY_SEGMENT_SEPARATOR = 11;

   446     /**

   447      * Neutral bidirectional character type "WS" in the Unicode specification.

   448      * @since 1.4

   449      */

   450     public static final byte DIRECTIONALITY_WHITESPACE = 12;

   452     /**

   453      * Neutral bidirectional character type "ON" in the Unicode specification.

   454      * @since 1.4

   455      */

   456     public static final byte DIRECTIONALITY_OTHER_NEUTRALS = 13;

   458     /**

   459      * Strong bidirectional character type "LRE" in the Unicode specification.

   460      * @since 1.4

   461      */

   462     public static final byte DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING = 14;

   464     /**

   465      * Strong bidirectional character type "LRO" in the Unicode specification.

   466      * @since 1.4

   467      */

   468     public static final byte DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE = 15;

   470     /**

   471      * Strong bidirectional character type "RLE" in the Unicode specification.

   472      * @since 1.4

   473      */

   474     public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING = 16;

   476     /**

   477      * Strong bidirectional character type "RLO" in the Unicode specification.

   478      * @since 1.4

   479      */

   480     public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE = 17;

   482     /**

   483      * Weak bidirectional character type "PDF" in the Unicode specification.

   484      * @since 1.4

   485      */

   486     public static final byte DIRECTIONALITY_POP_DIRECTIONAL_FORMAT = 18;

   488     /**

   489      * The minimum value of a

   490      * <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit">

   491      * Unicode high-surrogate code unit</a>

   492      * in the UTF-16 encoding, constant {@code '\u005CuD800'}.

   493      * A high-surrogate is also known as a <i>leading-surrogate</i>.

   494      *

   495      * @since 1.5

   496      */

   497     public static final char MIN_HIGH_SURROGATE = '\uD800';

   499     /**

   500      * The maximum value of a

   501      * <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit">

   502      * Unicode high-surrogate code unit</a>

   503      * in the UTF-16 encoding, constant {@code '\u005CuDBFF'}.

   504      * A high-surrogate is also known as a <i>leading-surrogate</i>.

   505      *

   506      * @since 1.5

   507      */

   508     public static final char MAX_HIGH_SURROGATE = '\uDBFF';

   510     /**

   511      * The minimum value of a

   512      * <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit">

   513      * Unicode low-surrogate code unit</a>

   514      * in the UTF-16 encoding, constant {@code '\u005CuDC00'}.

   515      * A low-surrogate is also known as a <i>trailing-surrogate</i>.

   516      *

   517      * @since 1.5

   518      */

   519     public static final char MIN_LOW_SURROGATE  = '\uDC00';

   521     /**

   522      * The maximum value of a

   523      * <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit">

   524      * Unicode low-surrogate code unit</a>

   525      * in the UTF-16 encoding, constant {@code '\u005CuDFFF'}.

   526      * A low-surrogate is also known as a <i>trailing-surrogate</i>.

   527      *

   528      * @since 1.5

   529      */

   530     public static final char MAX_LOW_SURROGATE  = '\uDFFF';

   532     /**

   533      * The minimum value of a Unicode surrogate code unit in the

   534      * UTF-16 encoding, constant {@code '\u005CuD800'}.

   535      *

   536      * @since 1.5

   537      */

   538     public static final char MIN_SURROGATE = MIN_HIGH_SURROGATE;

   540     /**

   541      * The maximum value of a Unicode surrogate code unit in the

   542      * UTF-16 encoding, constant {@code '\u005CuDFFF'}.

   543      *

   544      * @since 1.5

   545      */

   546     public static final char MAX_SURROGATE = MAX_LOW_SURROGATE;

   548     /**

   549      * The minimum value of a

   550      * <a href="http://www.unicode.org/glossary/#supplementary_code_point">

   551      * Unicode supplementary code point</a>, constant {@code U+10000}.

   552      *

   553      * @since 1.5

   554      */

   555     public static final int MIN_SUPPLEMENTARY_CODE_POINT = 0x010000;

   557     /**

   558      * The minimum value of a

   559      * <a href="http://www.unicode.org/glossary/#code_point">

   560      * Unicode code point</a>, constant {@code U+0000}.

   561      *

   562      * @since 1.5

   563      */

   564     public static final int MIN_CODE_POINT = 0x000000;

   566     /**

   567      * The maximum value of a

   568      * <a href="http://www.unicode.org/glossary/#code_point">

   569      * Unicode code point</a>, constant {@code U+10FFFF}.

   570      *

   571      * @since 1.5

   572      */

   573     public static final int MAX_CODE_POINT = 0X10FFFF;

   576     /**

   577      * Instances of this class represent particular subsets of the Unicode

   578      * character set.  The only family of subsets defined in the

   579      * {@code Character} class is {@link Character.UnicodeBlock}.

   580      * Other portions of the Java API may define other subsets for their

   581      * own purposes.

   582      *

   583      * @since 1.2

   584      */

   585     public static class Subset  {

   587         private String name;

   589         /**

   590          * Constructs a new {@code Subset} instance.

   591          *

   592          * @param  name  The name of this subset

   593          * @exception NullPointerException if name is {@code null}

   594          */

   595         protected Subset(String name) {

   596             if (name == null) {

   597                 throw new NullPointerException("name");

   598             }

   599             this.name = name;

   600         }

   602         /**

   603          * Compares two {@code Subset} objects for equality.

   604          * This method returns {@code true} if and only if

   605          * {@code this} and the argument refer to the same

   606          * object; since this method is {@code final}, this

   607          * guarantee holds for all subclasses.

   608          */

   609         public final boolean equals(Object obj) {

   610             return (this == obj);

   611         }

   613         /**

   614          * Returns the standard hash code as defined by the

   615          * {@link Object#hashCode} method.  This method

   616          * is {@code final} in order to ensure that the

   617          * {@code equals} and {@code hashCode} methods will

   618          * be consistent in all subclasses.

   619          */

   620         public final int hashCode() {

   621             return super.hashCode();

   622         }

   624         /**

   625          * Returns the name of this subset.

   626          */

   627         public final String toString() {

   628             return name;

   629         }

   630     }

   632     // See http://www.unicode.org/Public/UNIDATA/Blocks.txt

   633     // for the latest specification of Unicode Blocks.

   636     /**

   637      * The value of the {@code Character}.

   638      *

   639      * @serial

   640      */

   641     private final char value;

   643     /** use serialVersionUID from JDK 1.0.2 for interoperability */

   644     private static final long serialVersionUID = 3786198910865385080L;

   646     /**

   647      * Constructs a newly allocated {@code Character} object that

   648      * represents the specified {@code char} value.

   649      *

   650      * @param  value   the value to be represented by the

   651      *                  {@code Character} object.

   652      */

   653     public Character(char value) {

   654         this.value = value;

   655     }

   657     private static class CharacterCache {

   658         private CharacterCache(){}

   660         static final Character cache[] = new Character[127 + 1];

   662         static {

   663             for (int i = 0; i < cache.length; i++)

   664                 cache[i] = new Character((char)i);

   665         }

   666     }

   668     /**

   669      * Returns a <tt>Character</tt> instance representing the specified

   670      * <tt>char</tt> value.

   671      * If a new <tt>Character</tt> instance is not required, this method

   672      * should generally be used in preference to the constructor

   673      * {@link #Character(char)}, as this method is likely to yield

   674      * significantly better space and time performance by caching

   675      * frequently requested values.

   676      *

   677      * This method will always cache values in the range {@code

   678      * '\u005Cu0000'} to {@code '\u005Cu007F'}, inclusive, and may

   679      * cache other values outside of this range.

   680      *

   681      * @param  c a char value.

   682      * @return a <tt>Character</tt> instance representing <tt>c</tt>.

   683      * @since  1.5

   684      */

   685     public static Character valueOf(char c) {

   686         if (c <= 127) { // must cache

   687             return CharacterCache.cache[(int)c];

   688         }

   689         return new Character(c);

   690     }

   692     /**

   693      * Returns the value of this {@code Character} object.

   694      * @return  the primitive {@code char} value represented by

   695      *          this object.

   696      */

   697     public char charValue() {

   698         return value;

   699     }

   701     /**

   702      * Returns a hash code for this {@code Character}; equal to the result

   703      * of invoking {@code charValue()}.

   704      *

   705      * @return a hash code value for this {@code Character}

   706      */

   707     public int hashCode() {

   708         return (int)value;

   709     }

   711     /**

   712      * Compares this object against the specified object.

   713      * The result is {@code true} if and only if the argument is not

   714      * {@code null} and is a {@code Character} object that

   715      * represents the same {@code char} value as this object.

   716      *

   717      * @param   obj   the object to compare with.

   718      * @return  {@code true} if the objects are the same;

   719      *          {@code false} otherwise.

   720      */

   721     public boolean equals(Object obj) {

   722         if (obj instanceof Character) {

   723             return value == ((Character)obj).charValue();

   724         }

   725         return false;

   726     }

   728     /**

   729      * Returns a {@code String} object representing this

   730      * {@code Character}'s value.  The result is a string of

   731      * length 1 whose sole component is the primitive

   732      * {@code char} value represented by this

   733      * {@code Character} object.

   734      *

   735      * @return  a string representation of this object.

   736      */

   737     public String toString() {

   738         char buf[] = {value};

   739         return String.valueOf(buf);

   740     }

   742     /**

   743      * Returns a {@code String} object representing the

   744      * specified {@code char}.  The result is a string of length

   745      * 1 consisting solely of the specified {@code char}.

   746      *

   747      * @param c the {@code char} to be converted

   748      * @return the string representation of the specified {@code char}

   749      * @since 1.4

   750      */

   751     public static String toString(char c) {

   752         return String.valueOf(c);

   753     }

   755     /**

   756      * Determines whether the specified code point is a valid

   757      * <a href="http://www.unicode.org/glossary/#code_point">

   758      * Unicode code point value</a>.

   759      *

   760      * @param  codePoint the Unicode code point to be tested

   761      * @return {@code true} if the specified code point value is between

   762      *         {@link #MIN_CODE_POINT} and

   763      *         {@link #MAX_CODE_POINT} inclusive;

   764      *         {@code false} otherwise.

   765      * @since  1.5

   766      */

   767     public static boolean isValidCodePoint(int codePoint) {

   768         // Optimized form of:

   769         //     codePoint >= MIN_CODE_POINT && codePoint <= MAX_CODE_POINT

   770         int plane = codePoint >>> 16;

   771         return plane < ((MAX_CODE_POINT + 1) >>> 16);

   772     }

   774     /**

   775      * Determines whether the specified character (Unicode code point)

   776      * is in the <a href="#BMP">Basic Multilingual Plane (BMP)</a>.

   777      * Such code points can be represented using a single {@code char}.

   778      *

   779      * @param  codePoint the character (Unicode code point) to be tested

   780      * @return {@code true} if the specified code point is between

   781      *         {@link #MIN_VALUE} and {@link #MAX_VALUE} inclusive;

   782      *         {@code false} otherwise.

   783      * @since  1.7

   784      */

   785     public static boolean isBmpCodePoint(int codePoint) {

   786         return codePoint >>> 16 == 0;

   787         // Optimized form of:

   788         //     codePoint >= MIN_VALUE && codePoint <= MAX_VALUE

   789         // We consistently use logical shift (>>>) to facilitate

   790         // additional runtime optimizations.

   791     }

   793     /**

   794      * Determines whether the specified character (Unicode code point)

   795      * is in the <a href="#supplementary">supplementary character</a> range.

   796      *

   797      * @param  codePoint the character (Unicode code point) to be tested

   798      * @return {@code true} if the specified code point is between

   799      *         {@link #MIN_SUPPLEMENTARY_CODE_POINT} and

   800      *         {@link #MAX_CODE_POINT} inclusive;

   801      *         {@code false} otherwise.

   802      * @since  1.5

   803      */

   804     public static boolean isSupplementaryCodePoint(int codePoint) {

   805         return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT

   806             && codePoint <  MAX_CODE_POINT + 1;

   807     }

   809     /**

   810      * Determines if the given {@code char} value is a

   811      * <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit">

   812      * Unicode high-surrogate code unit</a>

   813      * (also known as <i>leading-surrogate code unit</i>).

   814      *

   815      * <p>Such values do not represent characters by themselves,

   816      * but are used in the representation of

   817      * <a href="#supplementary">supplementary characters</a>

   818      * in the UTF-16 encoding.

   819      *

   820      * @param  ch the {@code char} value to be tested.

   821      * @return {@code true} if the {@code char} value is between

   822      *         {@link #MIN_HIGH_SURROGATE} and

   823      *         {@link #MAX_HIGH_SURROGATE} inclusive;

   824      *         {@code false} otherwise.

   825      * @see    Character#isLowSurrogate(char)

   826      * @see    Character.UnicodeBlock#of(int)

   827      * @since  1.5

   828      */

   829     public static boolean isHighSurrogate(char ch) {

   830         // Help VM constant-fold; MAX_HIGH_SURROGATE + 1 == MIN_LOW_SURROGATE

   831         return ch >= MIN_HIGH_SURROGATE && ch < (MAX_HIGH_SURROGATE + 1);

   832     }

   834     /**

   835      * Determines if the given {@code char} value is a

   836      * <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit">

   837      * Unicode low-surrogate code unit</a>

   838      * (also known as <i>trailing-surrogate code unit</i>).

   839      *

   840      * <p>Such values do not represent characters by themselves,

   841      * but are used in the representation of

   842      * <a href="#supplementary">supplementary characters</a>

   843      * in the UTF-16 encoding.

   844      *

   845      * @param  ch the {@code char} value to be tested.

   846      * @return {@code true} if the {@code char} value is between

   847      *         {@link #MIN_LOW_SURROGATE} and

   848      *         {@link #MAX_LOW_SURROGATE} inclusive;

   849      *         {@code false} otherwise.

   850      * @see    Character#isHighSurrogate(char)

   851      * @since  1.5

   852      */

   853     public static boolean isLowSurrogate(char ch) {

   854         return ch >= MIN_LOW_SURROGATE && ch < (MAX_LOW_SURROGATE + 1);

   855     }

   857     /**

   858      * Determines if the given {@code char} value is a Unicode

   859      * <i>surrogate code unit</i>.

   860      *

   861      * <p>Such values do not represent characters by themselves,

   862      * but are used in the representation of

   863      * <a href="#supplementary">supplementary characters</a>

   864      * in the UTF-16 encoding.

   865      *

   866      * <p>A char value is a surrogate code unit if and only if it is either

   867      * a {@linkplain #isLowSurrogate(char) low-surrogate code unit} or

   868      * a {@linkplain #isHighSurrogate(char) high-surrogate code unit}.

   869      *

   870      * @param  ch the {@code char} value to be tested.

   871      * @return {@code true} if the {@code char} value is between

   872      *         {@link #MIN_SURROGATE} and

   873      *         {@link #MAX_SURROGATE} inclusive;

   874      *         {@code false} otherwise.

   875      * @since  1.7

   876      */

   877     public static boolean isSurrogate(char ch) {

   878         return ch >= MIN_SURROGATE && ch < (MAX_SURROGATE + 1);

   879     }

   881     /**

   882      * Determines whether the specified pair of {@code char}

   883      * values is a valid

   884      * <a href="http://www.unicode.org/glossary/#surrogate_pair">

   885      * Unicode surrogate pair</a>.

   887      * <p>This method is equivalent to the expression:

   888      * <blockquote><pre>

   889      * isHighSurrogate(high) && isLowSurrogate(low)

   890      * </pre></blockquote>

   891      *

   892      * @param  high the high-surrogate code value to be tested

   893      * @param  low the low-surrogate code value to be tested

   894      * @return {@code true} if the specified high and

   895      * low-surrogate code values represent a valid surrogate pair;

   896      * {@code false} otherwise.

   897      * @since  1.5

   898      */

   899     public static boolean isSurrogatePair(char high, char low) {

   900         return isHighSurrogate(high) && isLowSurrogate(low);

   901     }

   903     /**

   904      * Determines the number of {@code char} values needed to

   905      * represent the specified character (Unicode code point). If the

   906      * specified character is equal to or greater than 0x10000, then

   907      * the method returns 2. Otherwise, the method returns 1.

   908      *

   909      * <p>This method doesn't validate the specified character to be a

   910      * valid Unicode code point. The caller must validate the

   911      * character value using {@link #isValidCodePoint(int) isValidCodePoint}

   912      * if necessary.

   913      *

   914      * @param   codePoint the character (Unicode code point) to be tested.

   915      * @return  2 if the character is a valid supplementary character; 1 otherwise.

   916      * @see     Character#isSupplementaryCodePoint(int)

   917      * @since   1.5

   918      */

   919     public static int charCount(int codePoint) {

   920         return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT ? 2 : 1;

   921     }

   923     /**

   924      * Converts the specified surrogate pair to its supplementary code

   925      * point value. This method does not validate the specified

   926      * surrogate pair. The caller must validate it using {@link

   927      * #isSurrogatePair(char, char) isSurrogatePair} if necessary.

   928      *

   929      * @param  high the high-surrogate code unit

   930      * @param  low the low-surrogate code unit

   931      * @return the supplementary code point composed from the

   932      *         specified surrogate pair.

   933      * @since  1.5

   934      */

   935     public static int toCodePoint(char high, char low) {

   936         // Optimized form of:

   937         // return ((high - MIN_HIGH_SURROGATE) << 10)

   938         //         + (low - MIN_LOW_SURROGATE)

   939         //         + MIN_SUPPLEMENTARY_CODE_POINT;

   940         return ((high << 10) + low) + (MIN_SUPPLEMENTARY_CODE_POINT

   941                                        - (MIN_HIGH_SURROGATE << 10)

   942                                        - MIN_LOW_SURROGATE);

   943     }

   945     /**

   946      * Returns the code point at the given index of the

   947      * {@code CharSequence}. If the {@code char} value at

   948      * the given index in the {@code CharSequence} is in the

   949      * high-surrogate range, the following index is less than the

   950      * length of the {@code CharSequence}, and the

   951      * {@code char} value at the following index is in the

   952      * low-surrogate range, then the supplementary code point

   953      * corresponding to this surrogate pair is returned. Otherwise,

   954      * the {@code char} value at the given index is returned.

   955      *

   956      * @param seq a sequence of {@code char} values (Unicode code

   957      * units)

   958      * @param index the index to the {@code char} values (Unicode

   959      * code units) in {@code seq} to be converted

   960      * @return the Unicode code point at the given index

   961      * @exception NullPointerException if {@code seq} is null.

   962      * @exception IndexOutOfBoundsException if the value

   963      * {@code index} is negative or not less than

   964      * {@link CharSequence#length() seq.length()}.

   965      * @since  1.5

   966      */

   967     public static int codePointAt(CharSequence seq, int index) {

   968         char c1 = seq.charAt(index++);

   969         if (isHighSurrogate(c1)) {

   970             if (index < seq.length()) {

   971                 char c2 = seq.charAt(index);

   972                 if (isLowSurrogate(c2)) {

   973                     return toCodePoint(c1, c2);

   974                 }

   975             }

   976         }

   977         return c1;

   978     }

   980     /**

   981      * Returns the code point at the given index of the

   982      * {@code char} array. If the {@code char} value at

   983      * the given index in the {@code char} array is in the

   984      * high-surrogate range, the following index is less than the

   985      * length of the {@code char} array, and the

   986      * {@code char} value at the following index is in the

   987      * low-surrogate range, then the supplementary code point

   988      * corresponding to this surrogate pair is returned. Otherwise,

   989      * the {@code char} value at the given index is returned.

   990      *

   991      * @param a the {@code char} array

   992      * @param index the index to the {@code char} values (Unicode

   993      * code units) in the {@code char} array to be converted

   994      * @return the Unicode code point at the given index

   995      * @exception NullPointerException if {@code a} is null.

   996      * @exception IndexOutOfBoundsException if the value

   997      * {@code index} is negative or not less than

   998      * the length of the {@code char} array.

   999      * @since  1.5

  1000      */

  1001     public static int codePointAt(char[] a, int index) {

  1002         return codePointAtImpl(a, index, a.length);

  1003     }

  1005     /**

  1006      * Returns the code point at the given index of the

  1007      * {@code char} array, where only array elements with

  1008      * {@code index} less than {@code limit} can be used. If

  1009      * the {@code char} value at the given index in the

  1010      * {@code char} array is in the high-surrogate range, the

  1011      * following index is less than the {@code limit}, and the

  1012      * {@code char} value at the following index is in the

  1013      * low-surrogate range, then the supplementary code point

  1014      * corresponding to this surrogate pair is returned. Otherwise,

  1015      * the {@code char} value at the given index is returned.

  1016      *

  1017      * @param a the {@code char} array

  1018      * @param index the index to the {@code char} values (Unicode

  1019      * code units) in the {@code char} array to be converted

  1020      * @param limit the index after the last array element that

  1021      * can be used in the {@code char} array

  1022      * @return the Unicode code point at the given index

  1023      * @exception NullPointerException if {@code a} is null.

  1024      * @exception IndexOutOfBoundsException if the {@code index}

  1025      * argument is negative or not less than the {@code limit}

  1026      * argument, or if the {@code limit} argument is negative or

  1027      * greater than the length of the {@code char} array.

  1028      * @since  1.5

  1029      */

  1030     public static int codePointAt(char[] a, int index, int limit) {

  1031         if (index >= limit || limit < 0 || limit > a.length) {

  1032             throw new IndexOutOfBoundsException();

  1033         }

  1034         return codePointAtImpl(a, index, limit);

  1035     }

  1037     // throws ArrayIndexOutofBoundsException if index out of bounds

  1038     static int codePointAtImpl(char[] a, int index, int limit) {

  1039         char c1 = a[index++];

  1040         if (isHighSurrogate(c1)) {

  1041             if (index < limit) {

  1042                 char c2 = a[index];

  1043                 if (isLowSurrogate(c2)) {

  1044                     return toCodePoint(c1, c2);

  1045                 }

  1046             }

  1047         }

  1048         return c1;

  1049     }

  1051     /**

  1052      * Returns the code point preceding the given index of the

  1053      * {@code CharSequence}. If the {@code char} value at

  1054      * {@code (index - 1)} in the {@code CharSequence} is in

  1055      * the low-surrogate range, {@code (index - 2)} is not

  1056      * negative, and the {@code char} value at {@code (index - 2)}

  1057      * in the {@code CharSequence} is in the

  1058      * high-surrogate range, then the supplementary code point

  1059      * corresponding to this surrogate pair is returned. Otherwise,

  1060      * the {@code char} value at {@code (index - 1)} is

  1061      * returned.

  1062      *

  1063      * @param seq the {@code CharSequence} instance

  1064      * @param index the index following the code point that should be returned

  1065      * @return the Unicode code point value before the given index.

  1066      * @exception NullPointerException if {@code seq} is null.

  1067      * @exception IndexOutOfBoundsException if the {@code index}

  1068      * argument is less than 1 or greater than {@link

  1069      * CharSequence#length() seq.length()}.

  1070      * @since  1.5

  1071      */

  1072     public static int codePointBefore(CharSequence seq, int index) {

  1073         char c2 = seq.charAt(--index);

  1074         if (isLowSurrogate(c2)) {

  1075             if (index > 0) {

  1076                 char c1 = seq.charAt(--index);

  1077                 if (isHighSurrogate(c1)) {

  1078                     return toCodePoint(c1, c2);

  1079                 }

  1080             }

  1081         }

  1082         return c2;

  1083     }

  1085     /**

  1086      * Returns the code point preceding the given index of the

  1087      * {@code char} array. If the {@code char} value at

  1088      * {@code (index - 1)} in the {@code char} array is in

  1089      * the low-surrogate range, {@code (index - 2)} is not

  1090      * negative, and the {@code char} value at {@code (index - 2)}

  1091      * in the {@code char} array is in the

  1092      * high-surrogate range, then the supplementary code point

  1093      * corresponding to this surrogate pair is returned. Otherwise,

  1094      * the {@code char} value at {@code (index - 1)} is

  1095      * returned.

  1096      *

  1097      * @param a the {@code char} array

  1098      * @param index the index following the code point that should be returned

  1099      * @return the Unicode code point value before the given index.

  1100      * @exception NullPointerException if {@code a} is null.

  1101      * @exception IndexOutOfBoundsException if the {@code index}

  1102      * argument is less than 1 or greater than the length of the

  1103      * {@code char} array

  1104      * @since  1.5

  1105      */

  1106     public static int codePointBefore(char[] a, int index) {

  1107         return codePointBeforeImpl(a, index, 0);

  1108     }

  1110     /**

  1111      * Returns the code point preceding the given index of the

  1112      * {@code char} array, where only array elements with

  1113      * {@code index} greater than or equal to {@code start}

  1114      * can be used. If the {@code char} value at {@code (index - 1)}

  1115      * in the {@code char} array is in the

  1116      * low-surrogate range, {@code (index - 2)} is not less than

  1117      * {@code start}, and the {@code char} value at

  1118      * {@code (index - 2)} in the {@code char} array is in

  1119      * the high-surrogate range, then the supplementary code point

  1120      * corresponding to this surrogate pair is returned. Otherwise,

  1121      * the {@code char} value at {@code (index - 1)} is

  1122      * returned.

  1123      *

  1124      * @param a the {@code char} array

  1125      * @param index the index following the code point that should be returned

  1126      * @param start the index of the first array element in the

  1127      * {@code char} array

  1128      * @return the Unicode code point value before the given index.

  1129      * @exception NullPointerException if {@code a} is null.

  1130      * @exception IndexOutOfBoundsException if the {@code index}

  1131      * argument is not greater than the {@code start} argument or

  1132      * is greater than the length of the {@code char} array, or

  1133      * if the {@code start} argument is negative or not less than

  1134      * the length of the {@code char} array.

  1135      * @since  1.5

  1136      */

  1137     public static int codePointBefore(char[] a, int index, int start) {

  1138         if (index <= start || start < 0 || start >= a.length) {

  1139             throw new IndexOutOfBoundsException();

  1140         }

  1141         return codePointBeforeImpl(a, index, start);

  1142     }

  1144     // throws ArrayIndexOutofBoundsException if index-1 out of bounds

  1145     static int codePointBeforeImpl(char[] a, int index, int start) {

  1146         char c2 = a[--index];

  1147         if (isLowSurrogate(c2)) {

  1148             if (index > start) {

  1149                 char c1 = a[--index];

  1150                 if (isHighSurrogate(c1)) {

  1151                     return toCodePoint(c1, c2);

  1152                 }

  1153             }

  1154         }

  1155         return c2;

  1156     }

  1158     /**

  1159      * Returns the leading surrogate (a

  1160      * <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit">

  1161      * high surrogate code unit</a>) of the

  1162      * <a href="http://www.unicode.org/glossary/#surrogate_pair">

  1163      * surrogate pair</a>

  1164      * representing the specified supplementary character (Unicode

  1165      * code point) in the UTF-16 encoding.  If the specified character

  1166      * is not a

  1167      * <a href="Character.html#supplementary">supplementary character</a>,

  1168      * an unspecified {@code char} is returned.

  1169      *

  1170      * <p>If

  1171      * {@link #isSupplementaryCodePoint isSupplementaryCodePoint(x)}

  1172      * is {@code true}, then

  1173      * {@link #isHighSurrogate isHighSurrogate}{@code (highSurrogate(x))} and

  1174      * {@link #toCodePoint toCodePoint}{@code (highSurrogate(x), }{@link #lowSurrogate lowSurrogate}{@code (x)) == x}

  1175      * are also always {@code true}.

  1176      *

  1177      * @param   codePoint a supplementary character (Unicode code point)

  1178      * @return  the leading surrogate code unit used to represent the

  1179      *          character in the UTF-16 encoding

  1180      * @since   1.7

  1181      */

  1182     public static char highSurrogate(int codePoint) {

  1183         return (char) ((codePoint >>> 10)

  1184             + (MIN_HIGH_SURROGATE - (MIN_SUPPLEMENTARY_CODE_POINT >>> 10)));

  1185     }

  1187     /**

  1188      * Returns the trailing surrogate (a

  1189      * <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit">

  1190      * low surrogate code unit</a>) of the

  1191      * <a href="http://www.unicode.org/glossary/#surrogate_pair">

  1192      * surrogate pair</a>

  1193      * representing the specified supplementary character (Unicode

  1194      * code point) in the UTF-16 encoding.  If the specified character

  1195      * is not a

  1196      * <a href="Character.html#supplementary">supplementary character</a>,

  1197      * an unspecified {@code char} is returned.

  1198      *

  1199      * <p>If

  1200      * {@link #isSupplementaryCodePoint isSupplementaryCodePoint(x)}

  1201      * is {@code true}, then

  1202      * {@link #isLowSurrogate isLowSurrogate}{@code (lowSurrogate(x))} and

  1203      * {@link #toCodePoint toCodePoint}{@code (}{@link #highSurrogate highSurrogate}{@code (x), lowSurrogate(x)) == x}

  1204      * are also always {@code true}.

  1205      *

  1206      * @param   codePoint a supplementary character (Unicode code point)

  1207      * @return  the trailing surrogate code unit used to represent the

  1208      *          character in the UTF-16 encoding

  1209      * @since   1.7

  1210      */

  1211     public static char lowSurrogate(int codePoint) {

  1212         return (char) ((codePoint & 0x3ff) + MIN_LOW_SURROGATE);

  1213     }

  1215     /**

  1216      * Converts the specified character (Unicode code point) to its

  1217      * UTF-16 representation. If the specified code point is a BMP

  1218      * (Basic Multilingual Plane or Plane 0) value, the same value is

  1219      * stored in {@code dst[dstIndex]}, and 1 is returned. If the

  1220      * specified code point is a supplementary character, its

  1221      * surrogate values are stored in {@code dst[dstIndex]}

  1222      * (high-surrogate) and {@code dst[dstIndex+1]}

  1223      * (low-surrogate), and 2 is returned.

  1224      *

  1225      * @param  codePoint the character (Unicode code point) to be converted.

  1226      * @param  dst an array of {@code char} in which the

  1227      * {@code codePoint}'s UTF-16 value is stored.

  1228      * @param dstIndex the start index into the {@code dst}

  1229      * array where the converted value is stored.

  1230      * @return 1 if the code point is a BMP code point, 2 if the

  1231      * code point is a supplementary code point.

  1232      * @exception IllegalArgumentException if the specified

  1233      * {@code codePoint} is not a valid Unicode code point.

  1234      * @exception NullPointerException if the specified {@code dst} is null.

  1235      * @exception IndexOutOfBoundsException if {@code dstIndex}

  1236      * is negative or not less than {@code dst.length}, or if

  1237      * {@code dst} at {@code dstIndex} doesn't have enough

  1238      * array element(s) to store the resulting {@code char}

  1239      * value(s). (If {@code dstIndex} is equal to

  1240      * {@code dst.length-1} and the specified

  1241      * {@code codePoint} is a supplementary character, the

  1242      * high-surrogate value is not stored in

  1243      * {@code dst[dstIndex]}.)

  1244      * @since  1.5

  1245      */

  1246     public static int toChars(int codePoint, char[] dst, int dstIndex) {

  1247         if (isBmpCodePoint(codePoint)) {

  1248             dst[dstIndex] = (char) codePoint;

  1249             return 1;

  1250         } else if (isValidCodePoint(codePoint)) {

  1251             toSurrogates(codePoint, dst, dstIndex);

  1252             return 2;

  1253         } else {

  1254             throw new IllegalArgumentException();

  1255         }

  1256     }

  1258     /**

  1259      * Converts the specified character (Unicode code point) to its

  1260      * UTF-16 representation stored in a {@code char} array. If

  1261      * the specified code point is a BMP (Basic Multilingual Plane or

  1262      * Plane 0) value, the resulting {@code char} array has

  1263      * the same value as {@code codePoint}. If the specified code

  1264      * point is a supplementary code point, the resulting

  1265      * {@code char} array has the corresponding surrogate pair.

  1266      *

  1267      * @param  codePoint a Unicode code point

  1268      * @return a {@code char} array having

  1269      *         {@code codePoint}'s UTF-16 representation.

  1270      * @exception IllegalArgumentException if the specified

  1271      * {@code codePoint} is not a valid Unicode code point.

  1272      * @since  1.5

  1273      */

  1274     public static char[] toChars(int codePoint) {

  1275         if (isBmpCodePoint(codePoint)) {

  1276             return new char[] { (char) codePoint };

  1277         } else if (isValidCodePoint(codePoint)) {

  1278             char[] result = new char[2];

  1279             toSurrogates(codePoint, result, 0);

  1280             return result;

  1281         } else {

  1282             throw new IllegalArgumentException();

  1283         }

  1284     }

  1286     static void toSurrogates(int codePoint, char[] dst, int index) {

  1287         // We write elements "backwards" to guarantee all-or-nothing

  1288         dst[index+1] = lowSurrogate(codePoint);

  1289         dst[index] = highSurrogate(codePoint);

  1290     }

  1292     /**

  1293      * Returns the number of Unicode code points in the text range of

  1294      * the specified char sequence. The text range begins at the

  1295      * specified {@code beginIndex} and extends to the

  1296      * {@code char} at index {@code endIndex - 1}. Thus the

  1297      * length (in {@code char}s) of the text range is

  1298      * {@code endIndex-beginIndex}. Unpaired surrogates within

  1299      * the text range count as one code point each.

  1300      *

  1301      * @param seq the char sequence

  1302      * @param beginIndex the index to the first {@code char} of

  1303      * the text range.

  1304      * @param endIndex the index after the last {@code char} of

  1305      * the text range.

  1306      * @return the number of Unicode code points in the specified text

  1307      * range

  1308      * @exception NullPointerException if {@code seq} is null.

  1309      * @exception IndexOutOfBoundsException if the

  1310      * {@code beginIndex} is negative, or {@code endIndex}

  1311      * is larger than the length of the given sequence, or

  1312      * {@code beginIndex} is larger than {@code endIndex}.

  1313      * @since  1.5

  1314      */

  1315     public static int codePointCount(CharSequence seq, int beginIndex, int endIndex) {

  1316         int length = seq.length();

  1317         if (beginIndex < 0 || endIndex > length || beginIndex > endIndex) {

  1318             throw new IndexOutOfBoundsException();

  1319         }

  1320         int n = endIndex - beginIndex;

  1321         for (int i = beginIndex; i < endIndex; ) {

  1322             if (isHighSurrogate(seq.charAt(i++)) && i < endIndex &&

  1323                 isLowSurrogate(seq.charAt(i))) {

  1324                 n--;

  1325                 i++;

  1326             }

  1327         }

  1328         return n;

  1329     }

  1331     /**

  1332      * Returns the number of Unicode code points in a subarray of the

  1333      * {@code char} array argument. The {@code offset}

  1334      * argument is the index of the first {@code char} of the

  1335      * subarray and the {@code count} argument specifies the

  1336      * length of the subarray in {@code char}s. Unpaired

  1337      * surrogates within the subarray count as one code point each.

  1338      *

  1339      * @param a the {@code char} array

  1340      * @param offset the index of the first {@code char} in the

  1341      * given {@code char} array

  1342      * @param count the length of the subarray in {@code char}s

  1343      * @return the number of Unicode code points in the specified subarray

  1344      * @exception NullPointerException if {@code a} is null.

  1345      * @exception IndexOutOfBoundsException if {@code offset} or

  1346      * {@code count} is negative, or if {@code offset +

  1347      * count} is larger than the length of the given array.

  1348      * @since  1.5

  1349      */

  1350     public static int codePointCount(char[] a, int offset, int count) {

  1351         if (count > a.length - offset || offset < 0 || count < 0) {

  1352             throw new IndexOutOfBoundsException();

  1353         }

  1354         return codePointCountImpl(a, offset, count);

  1355     }

  1357     static int codePointCountImpl(char[] a, int offset, int count) {

  1358         int endIndex = offset + count;

  1359         int n = count;

  1360         for (int i = offset; i < endIndex; ) {

  1361             if (isHighSurrogate(a[i++]) && i < endIndex &&

  1362                 isLowSurrogate(a[i])) {

  1363                 n--;

  1364                 i++;

  1365             }

  1366         }

  1367         return n;

  1368     }

  1370     /**

  1371      * Returns the index within the given char sequence that is offset

  1372      * from the given {@code index} by {@code codePointOffset}

  1373      * code points. Unpaired surrogates within the text range given by

  1374      * {@code index} and {@code codePointOffset} count as

  1375      * one code point each.

  1376      *

  1377      * @param seq the char sequence

  1378      * @param index the index to be offset

  1379      * @param codePointOffset the offset in code points

  1380      * @return the index within the char sequence

  1381      * @exception NullPointerException if {@code seq} is null.

  1382      * @exception IndexOutOfBoundsException if {@code index}

  1383      *   is negative or larger then the length of the char sequence,

  1384      *   or if {@code codePointOffset} is positive and the

  1385      *   subsequence starting with {@code index} has fewer than

  1386      *   {@code codePointOffset} code points, or if

  1387      *   {@code codePointOffset} is negative and the subsequence

  1388      *   before {@code index} has fewer than the absolute value

  1389      *   of {@code codePointOffset} code points.

  1390      * @since 1.5

  1391      */

  1392     public static int offsetByCodePoints(CharSequence seq, int index,

  1393                                          int codePointOffset) {

  1394         int length = seq.length();

  1395         if (index < 0 || index > length) {

  1396             throw new IndexOutOfBoundsException();

  1397         }

  1399         int x = index;

  1400         if (codePointOffset >= 0) {

  1401             int i;

  1402             for (i = 0; x < length && i < codePointOffset; i++) {

  1403                 if (isHighSurrogate(seq.charAt(x++)) && x < length &&

  1404                     isLowSurrogate(seq.charAt(x))) {

  1405                     x++;

  1406                 }

  1407             }

  1408             if (i < codePointOffset) {

  1409                 throw new IndexOutOfBoundsException();

  1410             }

  1411         } else {

  1412             int i;

  1413             for (i = codePointOffset; x > 0 && i < 0; i++) {

  1414                 if (isLowSurrogate(seq.charAt(--x)) && x > 0 &&

  1415                     isHighSurrogate(seq.charAt(x-1))) {

  1416                     x--;

  1417                 }

  1418             }

  1419             if (i < 0) {

  1420                 throw new IndexOutOfBoundsException();

  1421             }

  1422         }

  1423         return x;

  1424     }

  1426     /**

  1427      * Returns the index within the given {@code char} subarray

  1428      * that is offset from the given {@code index} by

  1429      * {@code codePointOffset} code points. The

  1430      * {@code start} and {@code count} arguments specify a

  1431      * subarray of the {@code char} array. Unpaired surrogates

  1432      * within the text range given by {@code index} and

  1433      * {@code codePointOffset} count as one code point each.

  1434      *

  1435      * @param a the {@code char} array

  1436      * @param start the index of the first {@code char} of the

  1437      * subarray

  1438      * @param count the length of the subarray in {@code char}s

  1439      * @param index the index to be offset

  1440      * @param codePointOffset the offset in code points

  1441      * @return the index within the subarray

  1442      * @exception NullPointerException if {@code a} is null.

  1443      * @exception IndexOutOfBoundsException

  1444      *   if {@code start} or {@code count} is negative,

  1445      *   or if {@code start + count} is larger than the length of

  1446      *   the given array,

  1447      *   or if {@code index} is less than {@code start} or

  1448      *   larger then {@code start + count},

  1449      *   or if {@code codePointOffset} is positive and the text range

  1450      *   starting with {@code index} and ending with {@code start + count - 1}

  1451      *   has fewer than {@code codePointOffset} code

  1452      *   points,

  1453      *   or if {@code codePointOffset} is negative and the text range

  1454      *   starting with {@code start} and ending with {@code index - 1}

  1455      *   has fewer than the absolute value of

  1456      *   {@code codePointOffset} code points.

  1457      * @since 1.5

  1458      */

  1459     public static int offsetByCodePoints(char[] a, int start, int count,

  1460                                          int index, int codePointOffset) {

  1461         if (count > a.length-start || start < 0 || count < 0

  1462             || index < start || index > start+count) {

  1463             throw new IndexOutOfBoundsException();

  1464         }

  1465         return offsetByCodePointsImpl(a, start, count, index, codePointOffset);

  1466     }

  1468     static int offsetByCodePointsImpl(char[]a, int start, int count,

  1469                                       int index, int codePointOffset) {

  1470         int x = index;

  1471         if (codePointOffset >= 0) {

  1472             int limit = start + count;

  1473             int i;

  1474             for (i = 0; x < limit && i < codePointOffset; i++) {

  1475                 if (isHighSurrogate(a[x++]) && x < limit &&

  1476                     isLowSurrogate(a[x])) {

  1477                     x++;

  1478                 }

  1479             }

  1480             if (i < codePointOffset) {

  1481                 throw new IndexOutOfBoundsException();

  1482             }

  1483         } else {

  1484             int i;

  1485             for (i = codePointOffset; x > start && i < 0; i++) {

  1486                 if (isLowSurrogate(a[--x]) && x > start &&

  1487                     isHighSurrogate(a[x-1])) {

  1488                     x--;

  1489                 }

  1490             }

  1491             if (i < 0) {

  1492                 throw new IndexOutOfBoundsException();

  1493             }

  1494         }

  1495         return x;

  1496     }

  1498     /**

  1499      * Determines if the specified character is a lowercase character.

  1500      * <p>

  1501      * A character is lowercase if its general category type, provided

  1502      * by {@code Character.getType(ch)}, is

  1503      * {@code LOWERCASE_LETTER}, or it has contributory property

  1504      * Other_Lowercase as defined by the Unicode Standard.

  1505      * <p>

  1506      * The following are examples of lowercase characters:

  1507      * <p><blockquote><pre>

  1508      * a b c d e f g h i j k l m n o p q r s t u v w x y z

  1509      * '&#92;u00DF' '&#92;u00E0' '&#92;u00E1' '&#92;u00E2' '&#92;u00E3' '&#92;u00E4' '&#92;u00E5' '&#92;u00E6'

  1510      * '&#92;u00E7' '&#92;u00E8' '&#92;u00E9' '&#92;u00EA' '&#92;u00EB' '&#92;u00EC' '&#92;u00ED' '&#92;u00EE'

  1511      * '&#92;u00EF' '&#92;u00F0' '&#92;u00F1' '&#92;u00F2' '&#92;u00F3' '&#92;u00F4' '&#92;u00F5' '&#92;u00F6'

  1512      * '&#92;u00F8' '&#92;u00F9' '&#92;u00FA' '&#92;u00FB' '&#92;u00FC' '&#92;u00FD' '&#92;u00FE' '&#92;u00FF'

  1513      * </pre></blockquote>

  1514      * <p> Many other Unicode characters are lowercase too.

  1515      *

  1516      * <p><b>Note:</b> This method cannot handle <a

  1517      * href="#supplementary"> supplementary characters</a>. To support

  1518      * all Unicode characters, including supplementary characters, use

  1519      * the {@link #isLowerCase(int)} method.

  1520      *

  1521      * @param   ch   the character to be tested.

  1522      * @return  {@code true} if the character is lowercase;

  1523      *          {@code false} otherwise.

  1524      * @see     Character#isLowerCase(char)

  1525      * @see     Character#isTitleCase(char)

  1526      * @see     Character#toLowerCase(char)

  1527      * @see     Character#getType(char)

  1528      */

  1529     public static boolean isLowerCase(char ch) {

  1530         return ch == toLowerCase(ch);

  1531     }

  1533     /**

  1534      * Determines if the specified character is an uppercase character.

  1535      * <p>

  1536      * A character is uppercase if its general category type, provided by

  1537      * {@code Character.getType(ch)}, is {@code UPPERCASE_LETTER}.

  1538      * or it has contributory property Other_Uppercase as defined by the Unicode Standard.

  1539      * <p>

  1540      * The following are examples of uppercase characters:

  1541      * <p><blockquote><pre>

  1542      * A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

  1543      * '&#92;u00C0' '&#92;u00C1' '&#92;u00C2' '&#92;u00C3' '&#92;u00C4' '&#92;u00C5' '&#92;u00C6' '&#92;u00C7'

  1544      * '&#92;u00C8' '&#92;u00C9' '&#92;u00CA' '&#92;u00CB' '&#92;u00CC' '&#92;u00CD' '&#92;u00CE' '&#92;u00CF'

  1545      * '&#92;u00D0' '&#92;u00D1' '&#92;u00D2' '&#92;u00D3' '&#92;u00D4' '&#92;u00D5' '&#92;u00D6' '&#92;u00D8'

  1546      * '&#92;u00D9' '&#92;u00DA' '&#92;u00DB' '&#92;u00DC' '&#92;u00DD' '&#92;u00DE'

  1547      * </pre></blockquote>

  1548      * <p> Many other Unicode characters are uppercase too.<p>

  1549      *

  1550      * <p><b>Note:</b> This method cannot handle <a

  1551      * href="#supplementary"> supplementary characters</a>. To support

  1552      * all Unicode characters, including supplementary characters, use

  1553      * the {@link #isUpperCase(int)} method.

  1554      *

  1555      * @param   ch   the character to be tested.

  1556      * @return  {@code true} if the character is uppercase;

  1557      *          {@code false} otherwise.

  1558      * @see     Character#isLowerCase(char)

  1559      * @see     Character#isTitleCase(char)

  1560      * @see     Character#toUpperCase(char)

  1561      * @see     Character#getType(char)

  1562      * @since   1.0

  1563      */

  1564     public static boolean isUpperCase(char ch) {

  1565         return ch == toUpperCase(ch);

  1566     }

  1568     /**

  1569      * Determines if the specified character is a titlecase character.

  1570      * <p>

  1571      * A character is a titlecase character if its general

  1572      * category type, provided by {@code Character.getType(ch)},

  1573      * is {@code TITLECASE_LETTER}.

  1574      * <p>

  1575      * Some characters look like pairs of Latin letters. For example, there

  1576      * is an uppercase letter that looks like "LJ" and has a corresponding

  1577      * lowercase letter that looks like "lj". A third form, which looks like "Lj",

  1578      * is the appropriate form to use when rendering a word in lowercase

  1579      * with initial capitals, as for a book title.

  1580      * <p>

  1581      * These are some of the Unicode characters for which this method returns

  1582      * {@code true}:

  1583      * <ul>

  1584      * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON}

  1585      * <li>{@code LATIN CAPITAL LETTER L WITH SMALL LETTER J}

  1586      * <li>{@code LATIN CAPITAL LETTER N WITH SMALL LETTER J}

  1587      * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z}

  1588      * </ul>

  1589      * <p> Many other Unicode characters are titlecase too.<p>

  1590      *

  1591      * <p><b>Note:</b> This method cannot handle <a

  1592      * href="#supplementary"> supplementary characters</a>. To support

  1593      * all Unicode characters, including supplementary characters, use

  1594      * the {@link #isTitleCase(int)} method.

  1595      *

  1596      * @param   ch   the character to be tested.

  1597      * @return  {@code true} if the character is titlecase;

  1598      *          {@code false} otherwise.

  1599      * @see     Character#isLowerCase(char)

  1600      * @see     Character#isUpperCase(char)

  1601      * @see     Character#toTitleCase(char)

  1602      * @see     Character#getType(char)

  1603      * @since   1.0.2

  1604      */

  1605     public static boolean isTitleCase(char ch) {

  1606         return isTitleCase((int)ch);

  1607     }

  1609     /**

  1610      * Determines if the specified character (Unicode code point) is a titlecase character.

  1611      * <p>

  1612      * A character is a titlecase character if its general

  1613      * category type, provided by {@link Character#getType(int) getType(codePoint)},

  1614      * is {@code TITLECASE_LETTER}.

  1615      * <p>

  1616      * Some characters look like pairs of Latin letters. For example, there

  1617      * is an uppercase letter that looks like "LJ" and has a corresponding

  1618      * lowercase letter that looks like "lj". A third form, which looks like "Lj",

  1619      * is the appropriate form to use when rendering a word in lowercase

  1620      * with initial capitals, as for a book title.

  1621      * <p>

  1622      * These are some of the Unicode characters for which this method returns

  1623      * {@code true}:

  1624      * <ul>

  1625      * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON}

  1626      * <li>{@code LATIN CAPITAL LETTER L WITH SMALL LETTER J}

  1627      * <li>{@code LATIN CAPITAL LETTER N WITH SMALL LETTER J}

  1628      * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z}

  1629      * </ul>

  1630      * <p> Many other Unicode characters are titlecase too.<p>

  1631      *

  1632      * @param   codePoint the character (Unicode code point) to be tested.

  1633      * @return  {@code true} if the character is titlecase;

  1634      *          {@code false} otherwise.

  1635      * @see     Character#isLowerCase(int)

  1636      * @see     Character#isUpperCase(int)

  1637      * @see     Character#toTitleCase(int)

  1638      * @see     Character#getType(int)

  1639      * @since   1.5

  1640      */

  1641     public static boolean isTitleCase(int codePoint) {

  1642         return getType(codePoint) == Character.TITLECASE_LETTER;

  1643     }

  1645     /**

  1646      * Determines if the specified character is a digit.

  1647      * <p>

  1648      * A character is a digit if its general category type, provided

  1649      * by {@code Character.getType(ch)}, is

  1650      * {@code DECIMAL_DIGIT_NUMBER}.

  1651      * <p>

  1652      * Some Unicode character ranges that contain digits:

  1653      * <ul>

  1654      * <li>{@code '\u005Cu0030'} through {@code '\u005Cu0039'},

  1655      *     ISO-LATIN-1 digits ({@code '0'} through {@code '9'})

  1656      * <li>{@code '\u005Cu0660'} through {@code '\u005Cu0669'},

  1657      *     Arabic-Indic digits

  1658      * <li>{@code '\u005Cu06F0'} through {@code '\u005Cu06F9'},

  1659      *     Extended Arabic-Indic digits

  1660      * <li>{@code '\u005Cu0966'} through {@code '\u005Cu096F'},

  1661      *     Devanagari digits

  1662      * <li>{@code '\u005CuFF10'} through {@code '\u005CuFF19'},

  1663      *     Fullwidth digits

  1664      * </ul>

  1665      *

  1666      * Many other character ranges contain digits as well.

  1667      *

  1668      * <p><b>Note:</b> This method cannot handle <a

  1669      * href="#supplementary"> supplementary characters</a>. To support

  1670      * all Unicode characters, including supplementary characters, use

  1671      * the {@link #isDigit(int)} method.

  1672      *

  1673      * @param   ch   the character to be tested.

  1674      * @return  {@code true} if the character is a digit;

  1675      *          {@code false} otherwise.

  1676      * @see     Character#digit(char, int)

  1677      * @see     Character#forDigit(int, int)

  1678      * @see     Character#getType(char)

  1679      */

  1680     public static boolean isDigit(char ch) {

  1681         return String.valueOf(ch).matches("\\d");

  1682     }

  1684     /**

  1685      * Determines if the specified character (Unicode code point) is a digit.

  1686      * <p>

  1687      * A character is a digit if its general category type, provided

  1688      * by {@link Character#getType(int) getType(codePoint)}, is

  1689      * {@code DECIMAL_DIGIT_NUMBER}.

  1690      * <p>

  1691      * Some Unicode character ranges that contain digits:

  1692      * <ul>

  1693      * <li>{@code '\u005Cu0030'} through {@code '\u005Cu0039'},

  1694      *     ISO-LATIN-1 digits ({@code '0'} through {@code '9'})

  1695      * <li>{@code '\u005Cu0660'} through {@code '\u005Cu0669'},

  1696      *     Arabic-Indic digits

  1697      * <li>{@code '\u005Cu06F0'} through {@code '\u005Cu06F9'},

  1698      *     Extended Arabic-Indic digits

  1699      * <li>{@code '\u005Cu0966'} through {@code '\u005Cu096F'},

  1700      *     Devanagari digits

  1701      * <li>{@code '\u005CuFF10'} through {@code '\u005CuFF19'},

  1702      *     Fullwidth digits

  1703      * </ul>

  1704      *

  1705      * Many other character ranges contain digits as well.

  1706      *

  1707      * @param   codePoint the character (Unicode code point) to be tested.

  1708      * @return  {@code true} if the character is a digit;

  1709      *          {@code false} otherwise.

  1710      * @see     Character#forDigit(int, int)

  1711      * @see     Character#getType(int)

  1712      * @since   1.5

  1713      */

  1714     public static boolean isDigit(int codePoint) {

  1715         return fromCodeChars(codePoint).matches("\\d");

  1716     }

  1718     @JavaScriptBody(args = "c", body = "return String.fromCharCode(c);")

  1719     private native static String fromCodeChars(int codePoint);

  1721     /**

  1722      * Determines if a character is defined in Unicode.

  1723      * <p>

  1724      * A character is defined if at least one of the following is true:

  1725      * <ul>

  1726      * <li>It has an entry in the UnicodeData file.

  1727      * <li>It has a value in a range defined by the UnicodeData file.

  1728      * </ul>

  1729      *

  1730      * <p><b>Note:</b> This method cannot handle <a

  1731      * href="#supplementary"> supplementary characters</a>. To support

  1732      * all Unicode characters, including supplementary characters, use

  1733      * the {@link #isDefined(int)} method.

  1734      *

  1735      * @param   ch   the character to be tested

  1736      * @return  {@code true} if the character has a defined meaning

  1737      *          in Unicode; {@code false} otherwise.

  1738      * @see     Character#isDigit(char)

  1739      * @see     Character#isLetter(char)

  1740      * @see     Character#isLetterOrDigit(char)

  1741      * @see     Character#isLowerCase(char)

  1742      * @see     Character#isTitleCase(char)

  1743      * @see     Character#isUpperCase(char)

  1744      * @since   1.0.2

  1745      */

  1746     public static boolean isDefined(char ch) {

  1747         return isDefined((int)ch);

  1748     }

  1750     /**

  1751      * Determines if a character (Unicode code point) is defined in Unicode.

  1752      * <p>

  1753      * A character is defined if at least one of the following is true:

  1754      * <ul>

  1755      * <li>It has an entry in the UnicodeData file.

  1756      * <li>It has a value in a range defined by the UnicodeData file.

  1757      * </ul>

  1758      *

  1759      * @param   codePoint the character (Unicode code point) to be tested.

  1760      * @return  {@code true} if the character has a defined meaning

  1761      *          in Unicode; {@code false} otherwise.

  1762      * @see     Character#isDigit(int)

  1763      * @see     Character#isLetter(int)

  1764      * @see     Character#isLetterOrDigit(int)

  1765      * @see     Character#isLowerCase(int)

  1766      * @see     Character#isTitleCase(int)

  1767      * @see     Character#isUpperCase(int)

  1768      * @since   1.5

  1769      */

  1770     public static boolean isDefined(int codePoint) {

  1771         return getType(codePoint) != Character.UNASSIGNED;

  1772     }

  1774     /**

  1775      * Determines if the specified character is a letter.

  1776      * <p>

  1777      * A character is considered to be a letter if its general

  1778      * category type, provided by {@code Character.getType(ch)},

  1779      * is any of the following:

  1780      * <ul>

  1781      * <li> {@code UPPERCASE_LETTER}

  1782      * <li> {@code LOWERCASE_LETTER}

  1783      * <li> {@code TITLECASE_LETTER}

  1784      * <li> {@code MODIFIER_LETTER}

  1785      * <li> {@code OTHER_LETTER}

  1786      * </ul>

  1787      *

  1788      * Not all letters have case. Many characters are

  1789      * letters but are neither uppercase nor lowercase nor titlecase.

  1790      *

  1791      * <p><b>Note:</b> This method cannot handle <a

  1792      * href="#supplementary"> supplementary characters</a>. To support

  1793      * all Unicode characters, including supplementary characters, use

  1794      * the {@link #isLetter(int)} method.

  1795      *

  1796      * @param   ch   the character to be tested.

  1797      * @return  {@code true} if the character is a letter;

  1798      *          {@code false} otherwise.

  1799      * @see     Character#isDigit(char)

  1800      * @see     Character#isJavaIdentifierStart(char)

  1801      * @see     Character#isJavaLetter(char)

  1802      * @see     Character#isJavaLetterOrDigit(char)

  1803      * @see     Character#isLetterOrDigit(char)

  1804      * @see     Character#isLowerCase(char)

  1805      * @see     Character#isTitleCase(char)

  1806      * @see     Character#isUnicodeIdentifierStart(char)

  1807      * @see     Character#isUpperCase(char)

  1808      */

  1809     public static boolean isLetter(char ch) {

  1810         return String.valueOf(ch).matches("\\w") && !isDigit(ch);

  1811     }

  1813     /**

  1814      * Determines if the specified character (Unicode code point) is a letter.

  1815      * <p>

  1816      * A character is considered to be a letter if its general

  1817      * category type, provided by {@link Character#getType(int) getType(codePoint)},

  1818      * is any of the following:

  1819      * <ul>

  1820      * <li> {@code UPPERCASE_LETTER}

  1821      * <li> {@code LOWERCASE_LETTER}

  1822      * <li> {@code TITLECASE_LETTER}

  1823      * <li> {@code MODIFIER_LETTER}

  1824      * <li> {@code OTHER_LETTER}

  1825      * </ul>

  1826      *

  1827      * Not all letters have case. Many characters are

  1828      * letters but are neither uppercase nor lowercase nor titlecase.

  1829      *

  1830      * @param   codePoint the character (Unicode code point) to be tested.

  1831      * @return  {@code true} if the character is a letter;

  1832      *          {@code false} otherwise.

  1833      * @see     Character#isDigit(int)

  1834      * @see     Character#isJavaIdentifierStart(int)

  1835      * @see     Character#isLetterOrDigit(int)

  1836      * @see     Character#isLowerCase(int)

  1837      * @see     Character#isTitleCase(int)

  1838      * @see     Character#isUnicodeIdentifierStart(int)

  1839      * @see     Character#isUpperCase(int)

  1840      * @since   1.5

  1841      */

  1842     public static boolean isLetter(int codePoint) {

  1843         return fromCodeChars(codePoint).matches("\\w") && !isDigit(codePoint);

  1844     }

  1846     /**

  1847      * Determines if the specified character is a letter or digit.

  1848      * <p>

  1849      * A character is considered to be a letter or digit if either

  1850      * {@code Character.isLetter(char ch)} or

  1851      * {@code Character.isDigit(char ch)} returns

  1852      * {@code true} for the character.

  1853      *

  1854      * <p><b>Note:</b> This method cannot handle <a

  1855      * href="#supplementary"> supplementary characters</a>. To support

  1856      * all Unicode characters, including supplementary characters, use

  1857      * the {@link #isLetterOrDigit(int)} method.

  1858      *

  1859      * @param   ch   the character to be tested.

  1860      * @return  {@code true} if the character is a letter or digit;

  1861      *          {@code false} otherwise.

  1862      * @see     Character#isDigit(char)

  1863      * @see     Character#isJavaIdentifierPart(char)

  1864      * @see     Character#isJavaLetter(char)

  1865      * @see     Character#isJavaLetterOrDigit(char)

  1866      * @see     Character#isLetter(char)

  1867      * @see     Character#isUnicodeIdentifierPart(char)

  1868      * @since   1.0.2

  1869      */

  1870     public static boolean isLetterOrDigit(char ch) {

  1871         return String.valueOf(ch).matches("\\w");

  1872     }

  1874     /**

  1875      * Determines if the specified character (Unicode code point) is a letter or digit.

  1876      * <p>

  1877      * A character is considered to be a letter or digit if either

  1878      * {@link #isLetter(int) isLetter(codePoint)} or

  1879      * {@link #isDigit(int) isDigit(codePoint)} returns

  1880      * {@code true} for the character.

  1881      *

  1882      * @param   codePoint the character (Unicode code point) to be tested.

  1883      * @return  {@code true} if the character is a letter or digit;

  1884      *          {@code false} otherwise.

  1885      * @see     Character#isDigit(int)

  1886      * @see     Character#isJavaIdentifierPart(int)

  1887      * @see     Character#isLetter(int)

  1888      * @see     Character#isUnicodeIdentifierPart(int)

  1889      * @since   1.5

  1890      */

  1891     public static boolean isLetterOrDigit(int codePoint) {

  1892         return fromCodeChars(codePoint).matches("\\w");

  1893     }

  1895     static int getType(int x) {

  1896         throw new UnsupportedOperationException();

  1897     }

  1899     /**

  1900      * Determines if the specified character is

  1901      * permissible as the first character in a Java identifier.

  1902      * <p>

  1903      * A character may start a Java identifier if and only if

  1904      * one of the following conditions is true:

  1905      * <ul>

  1906      * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true}

  1907      * <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER}

  1908      * <li> {@code ch} is a currency symbol (such as {@code '$'})

  1909      * <li> {@code ch} is a connecting punctuation character (such as {@code '_'}).

  1910      * </ul>

  1911      *

  1912      * <p><b>Note:</b> This method cannot handle <a

  1913      * href="#supplementary"> supplementary characters</a>. To support

  1914      * all Unicode characters, including supplementary characters, use

  1915      * the {@link #isJavaIdentifierStart(int)} method.

  1916      *

  1917      * @param   ch the character to be tested.

  1918      * @return  {@code true} if the character may start a Java identifier;

  1919      *          {@code false} otherwise.

  1920      * @see     Character#isJavaIdentifierPart(char)

  1921      * @see     Character#isLetter(char)

  1922      * @see     Character#isUnicodeIdentifierStart(char)

  1923      * @see     javax.lang.model.SourceVersion#isIdentifier(CharSequence)

  1924      * @since   1.1

  1925      */

  1926     public static boolean isJavaIdentifierStart(char ch) {

  1927         return isJavaIdentifierStart((int)ch);

  1928     }

  1930     /**

  1931      * Determines if the character (Unicode code point) is

  1932      * permissible as the first character in a Java identifier.

  1933      * <p>

  1934      * A character may start a Java identifier if and only if

  1935      * one of the following conditions is true:

  1936      * <ul>

  1937      * <li> {@link #isLetter(int) isLetter(codePoint)}

  1938      *      returns {@code true}

  1939      * <li> {@link #getType(int) getType(codePoint)}

  1940      *      returns {@code LETTER_NUMBER}

  1941      * <li> the referenced character is a currency symbol (such as {@code '$'})

  1942      * <li> the referenced character is a connecting punctuation character

  1943      *      (such as {@code '_'}).

  1944      * </ul>

  1945      *

  1946      * @param   codePoint the character (Unicode code point) to be tested.

  1947      * @return  {@code true} if the character may start a Java identifier;

  1948      *          {@code false} otherwise.

  1949      * @see     Character#isJavaIdentifierPart(int)

  1950      * @see     Character#isLetter(int)

  1951      * @see     Character#isUnicodeIdentifierStart(int)

  1952      * @see     javax.lang.model.SourceVersion#isIdentifier(CharSequence)

  1953      * @since   1.5

  1954      */

  1955     public static boolean isJavaIdentifierStart(int codePoint) {

  1956         return

  1957             ('A' <= codePoint && codePoint <= 'Z') ||

  1958             ('a' <= codePoint && codePoint <= 'z');

  1959     }

  1961     /**

  1962      * Determines if the specified character may be part of a Java

  1963      * identifier as other than the first character.

  1964      * <p>

  1965      * A character may be part of a Java identifier if any of the following

  1966      * are true:

  1967      * <ul>

  1968      * <li>  it is a letter

  1969      * <li>  it is a currency symbol (such as {@code '$'})

  1970      * <li>  it is a connecting punctuation character (such as {@code '_'})

  1971      * <li>  it is a digit

  1972      * <li>  it is a numeric letter (such as a Roman numeral character)

  1973      * <li>  it is a combining mark

  1974      * <li>  it is a non-spacing mark

  1975      * <li> {@code isIdentifierIgnorable} returns

  1976      * {@code true} for the character

  1977      * </ul>

  1978      *

  1979      * <p><b>Note:</b> This method cannot handle <a

  1980      * href="#supplementary"> supplementary characters</a>. To support

  1981      * all Unicode characters, including supplementary characters, use

  1982      * the {@link #isJavaIdentifierPart(int)} method.

  1983      *

  1984      * @param   ch      the character to be tested.

  1985      * @return {@code true} if the character may be part of a

  1986      *          Java identifier; {@code false} otherwise.

  1987      * @see     Character#isIdentifierIgnorable(char)

  1988      * @see     Character#isJavaIdentifierStart(char)

  1989      * @see     Character#isLetterOrDigit(char)

  1990      * @see     Character#isUnicodeIdentifierPart(char)

  1991      * @see     javax.lang.model.SourceVersion#isIdentifier(CharSequence)

  1992      * @since   1.1

  1993      */

  1994     public static boolean isJavaIdentifierPart(char ch) {

  1995         return isJavaIdentifierPart((int)ch);

  1996     }

  1998     /**

  1999      * Determines if the character (Unicode code point) may be part of a Java

  2000      * identifier as other than the first character.

  2001      * <p>

  2002      * A character may be part of a Java identifier if any of the following

  2003      * are true:

  2004      * <ul>

  2005      * <li>  it is a letter

  2006      * <li>  it is a currency symbol (such as {@code '$'})

  2007      * <li>  it is a connecting punctuation character (such as {@code '_'})

  2008      * <li>  it is a digit

  2009      * <li>  it is a numeric letter (such as a Roman numeral character)

  2010      * <li>  it is a combining mark

  2011      * <li>  it is a non-spacing mark

  2012      * <li> {@link #isIdentifierIgnorable(int)

  2013      * isIdentifierIgnorable(codePoint)} returns {@code true} for

  2014      * the character

  2015      * </ul>

  2016      *

  2017      * @param   codePoint the character (Unicode code point) to be tested.

  2018      * @return {@code true} if the character may be part of a

  2019      *          Java identifier; {@code false} otherwise.

  2020      * @see     Character#isIdentifierIgnorable(int)

  2021      * @see     Character#isJavaIdentifierStart(int)

  2022      * @see     Character#isLetterOrDigit(int)

  2023      * @see     Character#isUnicodeIdentifierPart(int)

  2024      * @see     javax.lang.model.SourceVersion#isIdentifier(CharSequence)

  2025      * @since   1.5

  2026      */

  2027     public static boolean isJavaIdentifierPart(int codePoint) {

  2028         return isJavaIdentifierStart(codePoint) ||

  2029             ('0' <= codePoint && codePoint <= '9') || codePoint == '$';

  2030     }

  2032     /**

  2033      * Converts the character argument to lowercase using case

  2034      * mapping information from the UnicodeData file.

  2035      * <p>

  2036      * Note that

  2037      * {@code Character.isLowerCase(Character.toLowerCase(ch))}

  2038      * does not always return {@code true} for some ranges of

  2039      * characters, particularly those that are symbols or ideographs.

  2040      *

  2041      * <p>In general, {@link String#toLowerCase()} should be used to map

  2042      * characters to lowercase. {@code String} case mapping methods

  2043      * have several benefits over {@code Character} case mapping methods.

  2044      * {@code String} case mapping methods can perform locale-sensitive

  2045      * mappings, context-sensitive mappings, and 1:M character mappings, whereas

  2046      * the {@code Character} case mapping methods cannot.

  2047      *

  2048      * <p><b>Note:</b> This method cannot handle <a

  2049      * href="#supplementary"> supplementary characters</a>. To support

  2050      * all Unicode characters, including supplementary characters, use

  2051      * the {@link #toLowerCase(int)} method.

  2052      *

  2053      * @param   ch   the character to be converted.

  2054      * @return  the lowercase equivalent of the character, if any;

  2055      *          otherwise, the character itself.

  2056      * @see     Character#isLowerCase(char)

  2057      * @see     String#toLowerCase()

  2058      */

  2059     public static char toLowerCase(char ch) {

  2060         return String.valueOf(ch).toLowerCase().charAt(0);

  2061     }

  2063     /**

  2064      * Converts the character argument to uppercase using case mapping

  2065      * information from the UnicodeData file.

  2066      * <p>

  2067      * Note that

  2068      * {@code Character.isUpperCase(Character.toUpperCase(ch))}

  2069      * does not always return {@code true} for some ranges of

  2070      * characters, particularly those that are symbols or ideographs.

  2071      *

  2072      * <p>In general, {@link String#toUpperCase()} should be used to map

  2073      * characters to uppercase. {@code String} case mapping methods

  2074      * have several benefits over {@code Character} case mapping methods.

  2075      * {@code String} case mapping methods can perform locale-sensitive

  2076      * mappings, context-sensitive mappings, and 1:M character mappings, whereas

  2077      * the {@code Character} case mapping methods cannot.

  2078      *

  2079      * <p><b>Note:</b> This method cannot handle <a

  2080      * href="#supplementary"> supplementary characters</a>. To support

  2081      * all Unicode characters, including supplementary characters, use

  2082      * the {@link #toUpperCase(int)} method.

  2083      *

  2084      * @param   ch   the character to be converted.

  2085      * @return  the uppercase equivalent of the character, if any;

  2086      *          otherwise, the character itself.

  2087      * @see     Character#isUpperCase(char)

  2088      * @see     String#toUpperCase()

  2089      */

  2090     public static char toUpperCase(char ch) {

  2091         return String.valueOf(ch).toUpperCase().charAt(0);

  2092     }

  2094     /**

  2095      * Returns the numeric value of the character {@code ch} in the

  2096      * specified radix.

  2097      * <p>

  2098      * If the radix is not in the range {@code MIN_RADIX} &le;

  2099      * {@code radix} &le; {@code MAX_RADIX} or if the

  2100      * value of {@code ch} is not a valid digit in the specified

  2101      * radix, {@code -1} is returned. A character is a valid digit

  2102      * if at least one of the following is true:

  2103      * <ul>

  2104      * <li>The method {@code isDigit} is {@code true} of the character

  2105      *     and the Unicode decimal digit value of the character (or its

  2106      *     single-character decomposition) is less than the specified radix.

  2107      *     In this case the decimal digit value is returned.

  2108      * <li>The character is one of the uppercase Latin letters

  2109      *     {@code 'A'} through {@code 'Z'} and its code is less than

  2110      *     {@code radix + 'A' - 10}.

  2111      *     In this case, {@code ch - 'A' + 10}

  2112      *     is returned.

  2113      * <li>The character is one of the lowercase Latin letters

  2114      *     {@code 'a'} through {@code 'z'} and its code is less than

  2115      *     {@code radix + 'a' - 10}.

  2116      *     In this case, {@code ch - 'a' + 10}

  2117      *     is returned.

  2118      * <li>The character is one of the fullwidth uppercase Latin letters A

  2119      *     ({@code '\u005CuFF21'}) through Z ({@code '\u005CuFF3A'})

  2120      *     and its code is less than

  2121      *     {@code radix + '\u005CuFF21' - 10}.

  2122      *     In this case, {@code ch - '\u005CuFF21' + 10}

  2123      *     is returned.

  2124      * <li>The character is one of the fullwidth lowercase Latin letters a

  2125      *     ({@code '\u005CuFF41'}) through z ({@code '\u005CuFF5A'})

  2126      *     and its code is less than

  2127      *     {@code radix + '\u005CuFF41' - 10}.

  2128      *     In this case, {@code ch - '\u005CuFF41' + 10}

  2129      *     is returned.

  2130      * </ul>

  2131      *

  2132      * <p><b>Note:</b> This method cannot handle <a

  2133      * href="#supplementary"> supplementary characters</a>. To support

  2134      * all Unicode characters, including supplementary characters, use

  2135      * the {@link #digit(int, int)} method.

  2136      *

  2137      * @param   ch      the character to be converted.

  2138      * @param   radix   the radix.

  2139      * @return  the numeric value represented by the character in the

  2140      *          specified radix.

  2141      * @see     Character#forDigit(int, int)

  2142      * @see     Character#isDigit(char)

  2143      */

  2144     public static int digit(char ch, int radix) {

  2145         return digit((int)ch, radix);

  2146     }

  2148     /**

  2149      * Returns the numeric value of the specified character (Unicode

  2150      * code point) in the specified radix.

  2151      *

  2152      * <p>If the radix is not in the range {@code MIN_RADIX} &le;

  2153      * {@code radix} &le; {@code MAX_RADIX} or if the

  2154      * character is not a valid digit in the specified

  2155      * radix, {@code -1} is returned. A character is a valid digit

  2156      * if at least one of the following is true:

  2157      * <ul>

  2158      * <li>The method {@link #isDigit(int) isDigit(codePoint)} is {@code true} of the character

  2159      *     and the Unicode decimal digit value of the character (or its

  2160      *     single-character decomposition) is less than the specified radix.

  2161      *     In this case the decimal digit value is returned.

  2162      * <li>The character is one of the uppercase Latin letters

  2163      *     {@code 'A'} through {@code 'Z'} and its code is less than

  2164      *     {@code radix + 'A' - 10}.

  2165      *     In this case, {@code codePoint - 'A' + 10}

  2166      *     is returned.

  2167      * <li>The character is one of the lowercase Latin letters

  2168      *     {@code 'a'} through {@code 'z'} and its code is less than

  2169      *     {@code radix + 'a' - 10}.

  2170      *     In this case, {@code codePoint - 'a' + 10}

  2171      *     is returned.

  2172      * <li>The character is one of the fullwidth uppercase Latin letters A

  2173      *     ({@code '\u005CuFF21'}) through Z ({@code '\u005CuFF3A'})

  2174      *     and its code is less than

  2175      *     {@code radix + '\u005CuFF21' - 10}.

  2176      *     In this case,

  2177      *     {@code codePoint - '\u005CuFF21' + 10}

  2178      *     is returned.

  2179      * <li>The character is one of the fullwidth lowercase Latin letters a

  2180      *     ({@code '\u005CuFF41'}) through z ({@code '\u005CuFF5A'})

  2181      *     and its code is less than

  2182      *     {@code radix + '\u005CuFF41'- 10}.

  2183      *     In this case,

  2184      *     {@code codePoint - '\u005CuFF41' + 10}

  2185      *     is returned.

  2186      * </ul>

  2187      *

  2188      * @param   codePoint the character (Unicode code point) to be converted.

  2189      * @param   radix   the radix.

  2190      * @return  the numeric value represented by the character in the

  2191      *          specified radix.

  2192      * @see     Character#forDigit(int, int)

  2193      * @see     Character#isDigit(int)

  2194      * @since   1.5

  2195      */

  2196     public static int digit(int codePoint, int radix) {

  2197         throw new UnsupportedOperationException();

  2198     }

  2200     /**

  2201      * Returns the {@code int} value that the specified Unicode

  2202      * character represents. For example, the character

  2203      * {@code '\u005Cu216C'} (the roman numeral fifty) will return

  2204      * an int with a value of 50.

  2205      * <p>

  2206      * The letters A-Z in their uppercase ({@code '\u005Cu0041'} through

  2207      * {@code '\u005Cu005A'}), lowercase

  2208      * ({@code '\u005Cu0061'} through {@code '\u005Cu007A'}), and

  2209      * full width variant ({@code '\u005CuFF21'} through

  2210      * {@code '\u005CuFF3A'} and {@code '\u005CuFF41'} through

  2211      * {@code '\u005CuFF5A'}) forms have numeric values from 10

  2212      * through 35. This is independent of the Unicode specification,

  2213      * which does not assign numeric values to these {@code char}

  2214      * values.

  2215      * <p>

  2216      * If the character does not have a numeric value, then -1 is returned.

  2217      * If the character has a numeric value that cannot be represented as a

  2218      * nonnegative integer (for example, a fractional value), then -2

  2219      * is returned.

  2220      *

  2221      * <p><b>Note:</b> This method cannot handle <a

  2222      * href="#supplementary"> supplementary characters</a>. To support

  2223      * all Unicode characters, including supplementary characters, use

  2224      * the {@link #getNumericValue(int)} method.

  2225      *

  2226      * @param   ch      the character to be converted.

  2227      * @return  the numeric value of the character, as a nonnegative {@code int}

  2228      *           value; -2 if the character has a numeric value that is not a

  2229      *          nonnegative integer; -1 if the character has no numeric value.

  2230      * @see     Character#forDigit(int, int)

  2231      * @see     Character#isDigit(char)

  2232      * @since   1.1

  2233      */

  2234     public static int getNumericValue(char ch) {

  2235         return getNumericValue((int)ch);

  2236     }

  2238     /**

  2239      * Returns the {@code int} value that the specified

  2240      * character (Unicode code point) represents. For example, the character

  2241      * {@code '\u005Cu216C'} (the Roman numeral fifty) will return

  2242      * an {@code int} with a value of 50.

  2243      * <p>

  2244      * The letters A-Z in their uppercase ({@code '\u005Cu0041'} through

  2245      * {@code '\u005Cu005A'}), lowercase

  2246      * ({@code '\u005Cu0061'} through {@code '\u005Cu007A'}), and

  2247      * full width variant ({@code '\u005CuFF21'} through

  2248      * {@code '\u005CuFF3A'} and {@code '\u005CuFF41'} through

  2249      * {@code '\u005CuFF5A'}) forms have numeric values from 10

  2250      * through 35. This is independent of the Unicode specification,

  2251      * which does not assign numeric values to these {@code char}

  2252      * values.

  2253      * <p>

  2254      * If the character does not have a numeric value, then -1 is returned.

  2255      * If the character has a numeric value that cannot be represented as a

  2256      * nonnegative integer (for example, a fractional value), then -2

  2257      * is returned.

  2258      *

  2259      * @param   codePoint the character (Unicode code point) to be converted.

  2260      * @return  the numeric value of the character, as a nonnegative {@code int}

  2261      *          value; -2 if the character has a numeric value that is not a

  2262      *          nonnegative integer; -1 if the character has no numeric value.

  2263      * @see     Character#forDigit(int, int)

  2264      * @see     Character#isDigit(int)

  2265      * @since   1.5

  2266      */

  2267     public static int getNumericValue(int codePoint) {

  2268         throw new UnsupportedOperationException();

  2269     }

  2271     /**

  2272      * Determines if the specified character is ISO-LATIN-1 white space.

  2273      * This method returns {@code true} for the following five

  2274      * characters only:

  2275      * <table>

  2276      * <tr><td>{@code '\t'}</td>            <td>{@code U+0009}</td>

  2277      *     <td>{@code HORIZONTAL TABULATION}</td></tr>

  2278      * <tr><td>{@code '\n'}</td>            <td>{@code U+000A}</td>

  2279      *     <td>{@code NEW LINE}</td></tr>

  2280      * <tr><td>{@code '\f'}</td>            <td>{@code U+000C}</td>

  2281      *     <td>{@code FORM FEED}</td></tr>

  2282      * <tr><td>{@code '\r'}</td>            <td>{@code U+000D}</td>

  2283      *     <td>{@code CARRIAGE RETURN}</td></tr>

  2284      * <tr><td>{@code '&nbsp;'}</td>  <td>{@code U+0020}</td>

  2285      *     <td>{@code SPACE}</td></tr>

  2286      * </table>

  2287      *

  2288      * @param      ch   the character to be tested.

  2289      * @return     {@code true} if the character is ISO-LATIN-1 white

  2290      *             space; {@code false} otherwise.

  2291      * @see        Character#isSpaceChar(char)

  2292      * @see        Character#isWhitespace(char)

  2293      * @deprecated Replaced by isWhitespace(char).

  2294      */

  2295     @Deprecated

  2296     public static boolean isSpace(char ch) {

  2297         return (ch <= 0x0020) &&

  2298             (((((1L << 0x0009) |

  2299             (1L << 0x000A) |

  2300             (1L << 0x000C) |

  2301             (1L << 0x000D) |

  2302             (1L << 0x0020)) >> ch) & 1L) != 0);

  2303     }

  2307     /**

  2308      * Determines if the specified character is white space according to Java.

  2309      * A character is a Java whitespace character if and only if it satisfies

  2310      * one of the following criteria:

  2311      * <ul>

  2312      * <li> It is a Unicode space character ({@code SPACE_SEPARATOR},

  2313      *      {@code LINE_SEPARATOR}, or {@code PARAGRAPH_SEPARATOR})

  2314      *      but is not also a non-breaking space ({@code '\u005Cu00A0'},

  2315      *      {@code '\u005Cu2007'}, {@code '\u005Cu202F'}).

  2316      * <li> It is {@code '\u005Ct'}, U+0009 HORIZONTAL TABULATION.

  2317      * <li> It is {@code '\u005Cn'}, U+000A LINE FEED.

  2318      * <li> It is {@code '\u005Cu000B'}, U+000B VERTICAL TABULATION.

  2319      * <li> It is {@code '\u005Cf'}, U+000C FORM FEED.

  2320      * <li> It is {@code '\u005Cr'}, U+000D CARRIAGE RETURN.

  2321      * <li> It is {@code '\u005Cu001C'}, U+001C FILE SEPARATOR.

  2322      * <li> It is {@code '\u005Cu001D'}, U+001D GROUP SEPARATOR.

  2323      * <li> It is {@code '\u005Cu001E'}, U+001E RECORD SEPARATOR.

  2324      * <li> It is {@code '\u005Cu001F'}, U+001F UNIT SEPARATOR.

  2325      * </ul>

  2326      *

  2327      * <p><b>Note:</b> This method cannot handle <a

  2328      * href="#supplementary"> supplementary characters</a>. To support

  2329      * all Unicode characters, including supplementary characters, use

  2330      * the {@link #isWhitespace(int)} method.

  2331      *

  2332      * @param   ch the character to be tested.

  2333      * @return  {@code true} if the character is a Java whitespace

  2334      *          character; {@code false} otherwise.

  2335      * @see     Character#isSpaceChar(char)

  2336      * @since   1.1

  2337      */

  2338     public static boolean isWhitespace(char ch) {

  2339         return isWhitespace((int)ch);

  2340     }

  2342     /**

  2343      * Determines if the specified character (Unicode code point) is

  2344      * white space according to Java.  A character is a Java

  2345      * whitespace character if and only if it satisfies one of the

  2346      * following criteria:

  2347      * <ul>

  2348      * <li> It is a Unicode space character ({@link #SPACE_SEPARATOR},

  2349      *      {@link #LINE_SEPARATOR}, or {@link #PARAGRAPH_SEPARATOR})

  2350      *      but is not also a non-breaking space ({@code '\u005Cu00A0'},

  2351      *      {@code '\u005Cu2007'}, {@code '\u005Cu202F'}).

  2352      * <li> It is {@code '\u005Ct'}, U+0009 HORIZONTAL TABULATION.

  2353      * <li> It is {@code '\u005Cn'}, U+000A LINE FEED.

  2354      * <li> It is {@code '\u005Cu000B'}, U+000B VERTICAL TABULATION.

  2355      * <li> It is {@code '\u005Cf'}, U+000C FORM FEED.

  2356      * <li> It is {@code '\u005Cr'}, U+000D CARRIAGE RETURN.

  2357      * <li> It is {@code '\u005Cu001C'}, U+001C FILE SEPARATOR.

  2358      * <li> It is {@code '\u005Cu001D'}, U+001D GROUP SEPARATOR.

  2359      * <li> It is {@code '\u005Cu001E'}, U+001E RECORD SEPARATOR.

  2360      * <li> It is {@code '\u005Cu001F'}, U+001F UNIT SEPARATOR.

  2361      * </ul>

  2362      * <p>

  2363      *

  2364      * @param   codePoint the character (Unicode code point) to be tested.

  2365      * @return  {@code true} if the character is a Java whitespace

  2366      *          character; {@code false} otherwise.

  2367      * @see     Character#isSpaceChar(int)

  2368      * @since   1.5

  2369      */

  2370     public static boolean isWhitespace(int codePoint) {

  2371         throw new UnsupportedOperationException();

  2372     }

  2374     /**

  2375      * Determines if the specified character is an ISO control

  2376      * character.  A character is considered to be an ISO control

  2377      * character if its code is in the range {@code '\u005Cu0000'}

  2378      * through {@code '\u005Cu001F'} or in the range

  2379      * {@code '\u005Cu007F'} through {@code '\u005Cu009F'}.

  2380      *

  2381      * <p><b>Note:</b> This method cannot handle <a

  2382      * href="#supplementary"> supplementary characters</a>. To support

  2383      * all Unicode characters, including supplementary characters, use

  2384      * the {@link #isISOControl(int)} method.

  2385      *

  2386      * @param   ch      the character to be tested.

  2387      * @return  {@code true} if the character is an ISO control character;

  2388      *          {@code false} otherwise.

  2389      *

  2390      * @see     Character#isSpaceChar(char)

  2391      * @see     Character#isWhitespace(char)

  2392      * @since   1.1

  2393      */

  2394     public static boolean isISOControl(char ch) {

  2395         return isISOControl((int)ch);

  2396     }

  2398     /**

  2399      * Determines if the referenced character (Unicode code point) is an ISO control

  2400      * character.  A character is considered to be an ISO control

  2401      * character if its code is in the range {@code '\u005Cu0000'}

  2402      * through {@code '\u005Cu001F'} or in the range

  2403      * {@code '\u005Cu007F'} through {@code '\u005Cu009F'}.

  2404      *

  2405      * @param   codePoint the character (Unicode code point) to be tested.

  2406      * @return  {@code true} if the character is an ISO control character;

  2407      *          {@code false} otherwise.

  2408      * @see     Character#isSpaceChar(int)

  2409      * @see     Character#isWhitespace(int)

  2410      * @since   1.5

  2411      */

  2412     public static boolean isISOControl(int codePoint) {

  2413         // Optimized form of:

  2414         //     (codePoint >= 0x00 && codePoint <= 0x1F) ||

  2415         //     (codePoint >= 0x7F && codePoint <= 0x9F);

  2416         return codePoint <= 0x9F &&

  2417             (codePoint >= 0x7F || (codePoint >>> 5 == 0));

  2418     }

  2420     /**

  2421      * Determines the character representation for a specific digit in

  2422      * the specified radix. If the value of {@code radix} is not a

  2423      * valid radix, or the value of {@code digit} is not a valid

  2424      * digit in the specified radix, the null character

  2425      * ({@code '\u005Cu0000'}) is returned.

  2426      * <p>

  2427      * The {@code radix} argument is valid if it is greater than or

  2428      * equal to {@code MIN_RADIX} and less than or equal to

  2429      * {@code MAX_RADIX}. The {@code digit} argument is valid if

  2430      * {@code 0 <= digit < radix}.

  2431      * <p>

  2432      * If the digit is less than 10, then

  2433      * {@code '0' + digit} is returned. Otherwise, the value

  2434      * {@code 'a' + digit - 10} is returned.

  2435      *

  2436      * @param   digit   the number to convert to a character.

  2437      * @param   radix   the radix.

  2438      * @return  the {@code char} representation of the specified digit

  2439      *          in the specified radix.

  2440      * @see     Character#MIN_RADIX

  2441      * @see     Character#MAX_RADIX

  2442      * @see     Character#digit(char, int)

  2443      */

  2444     public static char forDigit(int digit, int radix) {

  2445         if ((digit >= radix) || (digit < 0)) {

  2446             return '\0';

  2447         }

  2448         if ((radix < Character.MIN_RADIX) || (radix > Character.MAX_RADIX)) {

  2449             return '\0';

  2450         }

  2451         if (digit < 10) {

  2452             return (char)('0' + digit);

  2453         }

  2454         return (char)('a' - 10 + digit);

  2455     }

  2457     /**

  2458      * Compares two {@code Character} objects numerically.

  2459      *

  2460      * @param   anotherCharacter   the {@code Character} to be compared.

  2462      * @return  the value {@code 0} if the argument {@code Character}

  2463      *          is equal to this {@code Character}; a value less than

  2464      *          {@code 0} if this {@code Character} is numerically less

  2465      *          than the {@code Character} argument; and a value greater than

  2466      *          {@code 0} if this {@code Character} is numerically greater

  2467      *          than the {@code Character} argument (unsigned comparison).

  2468      *          Note that this is strictly a numerical comparison; it is not

  2469      *          locale-dependent.

  2470      * @since   1.2

  2471      */

  2472     public int compareTo(Character anotherCharacter) {

  2473         return compare(this.value, anotherCharacter.value);

  2474     }

  2476     /**

  2477      * Compares two {@code char} values numerically.

  2478      * The value returned is identical to what would be returned by:

  2479      * <pre>

  2480      *    Character.valueOf(x).compareTo(Character.valueOf(y))

  2481      * </pre>

  2482      *

  2483      * @param  x the first {@code char} to compare

  2484      * @param  y the second {@code char} to compare

  2485      * @return the value {@code 0} if {@code x == y};

  2486      *         a value less than {@code 0} if {@code x < y}; and

  2487      *         a value greater than {@code 0} if {@code x > y}

  2488      * @since 1.7

  2489      */

  2490     public static int compare(char x, char y) {

  2491         return x - y;

  2492     }

  2495     /**

  2496      * The number of bits used to represent a <tt>char</tt> value in unsigned

  2497      * binary form, constant {@code 16}.

  2498      *

  2499      * @since 1.5

  2500      */

  2501     public static final int SIZE = 16;

  2503     /**

  2504      * Returns the value obtained by reversing the order of the bytes in the

  2505      * specified <tt>char</tt> value.

  2506      *

  2507      * @return the value obtained by reversing (or, equivalently, swapping)

  2508      *     the bytes in the specified <tt>char</tt> value.

  2509      * @since 1.5

  2510      */

  2511     public static char reverseBytes(char ch) {

  2512         return (char) (((ch & 0xFF00) >> 8) | (ch << 8));

  2513     }

  2515 }

author	Jaroslav Tulach <jaroslav.tulach@apidesign.org>
	Sat, 26 Jan 2013 08:47:05 +0100
changeset 592	5e13b1ac2886
parent 563	6bfc15870186
child 594	035fcbd7a33c
permissions	-rw-r--r--