In order to support fields of the same name in subclasses we are now prefixing them with name of the class that defines them. To provide convenient way to access them from generated bytecode and also directly from JavaScript, there is a getter/setter function for each field. It starts with _ followed by the field name. If called with a parameter, it sets the field, with a parameter it just returns it.
2 * Copyright (c) 2002, 2010, Oracle and/or its affiliates. All rights reserved.
3 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
5 * This code is free software; you can redistribute it and/or modify it
6 * under the terms of the GNU General Public License version 2 only, as
7 * published by the Free Software Foundation. Oracle designates this
8 * particular file as subject to the "Classpath" exception as provided
9 * by Oracle in the LICENSE file that accompanied this code.
11 * This code is distributed in the hope that it will be useful, but WITHOUT
12 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
13 * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
14 * version 2 for more details (a copy is included in the LICENSE file that
15 * accompanied this code).
17 * You should have received a copy of the GNU General Public License version
18 * 2 along with this work; if not, write to the Free Software Foundation,
19 * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
21 * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
22 * or visit www.oracle.com if you need additional information or have any
28 import org.apidesign.bck2brwsr.core.JavaScriptBody;
31 * The {@code Character} class wraps a value of the primitive
32 * type {@code char} in an object. An object of type
33 * {@code Character} contains a single field whose type is
36 * In addition, this class provides several methods for determining
37 * a character's category (lowercase letter, digit, etc.) and for converting
38 * characters from uppercase to lowercase and vice versa.
40 * Character information is based on the Unicode Standard, version 6.0.0.
42 * The methods and data of class {@code Character} are defined by
43 * the information in the <i>UnicodeData</i> file that is part of the
44 * Unicode Character Database maintained by the Unicode
45 * Consortium. This file specifies various properties including name
46 * and general category for every defined Unicode code point or
49 * The file and its description are available from the Unicode Consortium at:
51 * <li><a href="http://www.unicode.org">http://www.unicode.org</a>
54 * <h4><a name="unicode">Unicode Character Representations</a></h4>
56 * <p>The {@code char} data type (and therefore the value that a
57 * {@code Character} object encapsulates) are based on the
58 * original Unicode specification, which defined characters as
59 * fixed-width 16-bit entities. The Unicode Standard has since been
60 * changed to allow for characters whose representation requires more
61 * than 16 bits. The range of legal <em>code point</em>s is now
62 * U+0000 to U+10FFFF, known as <em>Unicode scalar value</em>.
64 * href="http://www.unicode.org/reports/tr27/#notation"><i>
65 * definition</i></a> of the U+<i>n</i> notation in the Unicode
68 * <p><a name="BMP">The set of characters from U+0000 to U+FFFF is
69 * sometimes referred to as the <em>Basic Multilingual Plane (BMP)</em>.
70 * <a name="supplementary">Characters</a> whose code points are greater
71 * than U+FFFF are called <em>supplementary character</em>s. The Java
72 * platform uses the UTF-16 representation in {@code char} arrays and
73 * in the {@code String} and {@code StringBuffer} classes. In
74 * this representation, supplementary characters are represented as a pair
75 * of {@code char} values, the first from the <em>high-surrogates</em>
76 * range, (\uD800-\uDBFF), the second from the
77 * <em>low-surrogates</em> range (\uDC00-\uDFFF).
79 * <p>A {@code char} value, therefore, represents Basic
80 * Multilingual Plane (BMP) code points, including the surrogate
81 * code points, or code units of the UTF-16 encoding. An
82 * {@code int} value represents all Unicode code points,
83 * including supplementary code points. The lower (least significant)
84 * 21 bits of {@code int} are used to represent Unicode code
85 * points and the upper (most significant) 11 bits must be zero.
86 * Unless otherwise specified, the behavior with respect to
87 * supplementary characters and surrogate {@code char} values is
91 * <li>The methods that only accept a {@code char} value cannot support
92 * supplementary characters. They treat {@code char} values from the
93 * surrogate ranges as undefined characters. For example,
94 * {@code Character.isLetter('\u005CuD840')} returns {@code false}, even though
95 * this specific value if followed by any low-surrogate value in a string
96 * would represent a letter.
98 * <li>The methods that accept an {@code int} value support all
99 * Unicode characters, including supplementary characters. For
100 * example, {@code Character.isLetter(0x2F81A)} returns
101 * {@code true} because the code point value represents a letter
105 * <p>In the Java SE API documentation, <em>Unicode code point</em> is
106 * used for character values in the range between U+0000 and U+10FFFF,
107 * and <em>Unicode code unit</em> is used for 16-bit
108 * {@code char} values that are code units of the <em>UTF-16</em>
109 * encoding. For more information on Unicode terminology, refer to the
110 * <a href="http://www.unicode.org/glossary/">Unicode Glossary</a>.
112 * @author Lee Boynton
114 * @author Akira Tanaka
115 * @author Martin Buchholz
120 class Character implements java.io.Serializable, Comparable<Character> {
122 * The minimum radix available for conversion to and from strings.
123 * The constant value of this field is the smallest value permitted
124 * for the radix argument in radix-conversion methods such as the
125 * {@code digit} method, the {@code forDigit} method, and the
126 * {@code toString} method of class {@code Integer}.
128 * @see Character#digit(char, int)
129 * @see Character#forDigit(int, int)
130 * @see Integer#toString(int, int)
131 * @see Integer#valueOf(String)
133 public static final int MIN_RADIX = 2;
136 * The maximum radix available for conversion to and from strings.
137 * The constant value of this field is the largest value permitted
138 * for the radix argument in radix-conversion methods such as the
139 * {@code digit} method, the {@code forDigit} method, and the
140 * {@code toString} method of class {@code Integer}.
142 * @see Character#digit(char, int)
143 * @see Character#forDigit(int, int)
144 * @see Integer#toString(int, int)
145 * @see Integer#valueOf(String)
147 public static final int MAX_RADIX = 36;
150 * The constant value of this field is the smallest value of type
151 * {@code char}, {@code '\u005Cu0000'}.
155 public static final char MIN_VALUE = '\u0000';
158 * The constant value of this field is the largest value of type
159 * {@code char}, {@code '\u005CuFFFF'}.
163 public static final char MAX_VALUE = '\uFFFF';
166 * The {@code Class} instance representing the primitive type
171 public static final Class<Character> TYPE = Class.getPrimitiveClass("char");
174 * Normative general types
178 * General character types
182 * General category "Cn" in the Unicode specification.
185 public static final byte UNASSIGNED = 0;
188 * General category "Lu" in the Unicode specification.
191 public static final byte UPPERCASE_LETTER = 1;
194 * General category "Ll" in the Unicode specification.
197 public static final byte LOWERCASE_LETTER = 2;
200 * General category "Lt" in the Unicode specification.
203 public static final byte TITLECASE_LETTER = 3;
206 * General category "Lm" in the Unicode specification.
209 public static final byte MODIFIER_LETTER = 4;
212 * General category "Lo" in the Unicode specification.
215 public static final byte OTHER_LETTER = 5;
218 * General category "Mn" in the Unicode specification.
221 public static final byte NON_SPACING_MARK = 6;
224 * General category "Me" in the Unicode specification.
227 public static final byte ENCLOSING_MARK = 7;
230 * General category "Mc" in the Unicode specification.
233 public static final byte COMBINING_SPACING_MARK = 8;
236 * General category "Nd" in the Unicode specification.
239 public static final byte DECIMAL_DIGIT_NUMBER = 9;
242 * General category "Nl" in the Unicode specification.
245 public static final byte LETTER_NUMBER = 10;
248 * General category "No" in the Unicode specification.
251 public static final byte OTHER_NUMBER = 11;
254 * General category "Zs" in the Unicode specification.
257 public static final byte SPACE_SEPARATOR = 12;
260 * General category "Zl" in the Unicode specification.
263 public static final byte LINE_SEPARATOR = 13;
266 * General category "Zp" in the Unicode specification.
269 public static final byte PARAGRAPH_SEPARATOR = 14;
272 * General category "Cc" in the Unicode specification.
275 public static final byte CONTROL = 15;
278 * General category "Cf" in the Unicode specification.
281 public static final byte FORMAT = 16;
284 * General category "Co" in the Unicode specification.
287 public static final byte PRIVATE_USE = 18;
290 * General category "Cs" in the Unicode specification.
293 public static final byte SURROGATE = 19;
296 * General category "Pd" in the Unicode specification.
299 public static final byte DASH_PUNCTUATION = 20;
302 * General category "Ps" in the Unicode specification.
305 public static final byte START_PUNCTUATION = 21;
308 * General category "Pe" in the Unicode specification.
311 public static final byte END_PUNCTUATION = 22;
314 * General category "Pc" in the Unicode specification.
317 public static final byte CONNECTOR_PUNCTUATION = 23;
320 * General category "Po" in the Unicode specification.
323 public static final byte OTHER_PUNCTUATION = 24;
326 * General category "Sm" in the Unicode specification.
329 public static final byte MATH_SYMBOL = 25;
332 * General category "Sc" in the Unicode specification.
335 public static final byte CURRENCY_SYMBOL = 26;
338 * General category "Sk" in the Unicode specification.
341 public static final byte MODIFIER_SYMBOL = 27;
344 * General category "So" in the Unicode specification.
347 public static final byte OTHER_SYMBOL = 28;
350 * General category "Pi" in the Unicode specification.
353 public static final byte INITIAL_QUOTE_PUNCTUATION = 29;
356 * General category "Pf" in the Unicode specification.
359 public static final byte FINAL_QUOTE_PUNCTUATION = 30;
362 * Error flag. Use int (code point) to avoid confusion with U+FFFF.
364 static final int ERROR = 0xFFFFFFFF;
368 * Undefined bidirectional character type. Undefined {@code char}
369 * values have undefined directionality in the Unicode specification.
372 public static final byte DIRECTIONALITY_UNDEFINED = -1;
375 * Strong bidirectional character type "L" in the Unicode specification.
378 public static final byte DIRECTIONALITY_LEFT_TO_RIGHT = 0;
381 * Strong bidirectional character type "R" in the Unicode specification.
384 public static final byte DIRECTIONALITY_RIGHT_TO_LEFT = 1;
387 * Strong bidirectional character type "AL" in the Unicode specification.
390 public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC = 2;
393 * Weak bidirectional character type "EN" in the Unicode specification.
396 public static final byte DIRECTIONALITY_EUROPEAN_NUMBER = 3;
399 * Weak bidirectional character type "ES" in the Unicode specification.
402 public static final byte DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR = 4;
405 * Weak bidirectional character type "ET" in the Unicode specification.
408 public static final byte DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR = 5;
411 * Weak bidirectional character type "AN" in the Unicode specification.
414 public static final byte DIRECTIONALITY_ARABIC_NUMBER = 6;
417 * Weak bidirectional character type "CS" in the Unicode specification.
420 public static final byte DIRECTIONALITY_COMMON_NUMBER_SEPARATOR = 7;
423 * Weak bidirectional character type "NSM" in the Unicode specification.
426 public static final byte DIRECTIONALITY_NONSPACING_MARK = 8;
429 * Weak bidirectional character type "BN" in the Unicode specification.
432 public static final byte DIRECTIONALITY_BOUNDARY_NEUTRAL = 9;
435 * Neutral bidirectional character type "B" in the Unicode specification.
438 public static final byte DIRECTIONALITY_PARAGRAPH_SEPARATOR = 10;
441 * Neutral bidirectional character type "S" in the Unicode specification.
444 public static final byte DIRECTIONALITY_SEGMENT_SEPARATOR = 11;
447 * Neutral bidirectional character type "WS" in the Unicode specification.
450 public static final byte DIRECTIONALITY_WHITESPACE = 12;
453 * Neutral bidirectional character type "ON" in the Unicode specification.
456 public static final byte DIRECTIONALITY_OTHER_NEUTRALS = 13;
459 * Strong bidirectional character type "LRE" in the Unicode specification.
462 public static final byte DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING = 14;
465 * Strong bidirectional character type "LRO" in the Unicode specification.
468 public static final byte DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE = 15;
471 * Strong bidirectional character type "RLE" in the Unicode specification.
474 public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING = 16;
477 * Strong bidirectional character type "RLO" in the Unicode specification.
480 public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE = 17;
483 * Weak bidirectional character type "PDF" in the Unicode specification.
486 public static final byte DIRECTIONALITY_POP_DIRECTIONAL_FORMAT = 18;
489 * The minimum value of a
490 * <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit">
491 * Unicode high-surrogate code unit</a>
492 * in the UTF-16 encoding, constant {@code '\u005CuD800'}.
493 * A high-surrogate is also known as a <i>leading-surrogate</i>.
497 public static final char MIN_HIGH_SURROGATE = '\uD800';
500 * The maximum value of a
501 * <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit">
502 * Unicode high-surrogate code unit</a>
503 * in the UTF-16 encoding, constant {@code '\u005CuDBFF'}.
504 * A high-surrogate is also known as a <i>leading-surrogate</i>.
508 public static final char MAX_HIGH_SURROGATE = '\uDBFF';
511 * The minimum value of a
512 * <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit">
513 * Unicode low-surrogate code unit</a>
514 * in the UTF-16 encoding, constant {@code '\u005CuDC00'}.
515 * A low-surrogate is also known as a <i>trailing-surrogate</i>.
519 public static final char MIN_LOW_SURROGATE = '\uDC00';
522 * The maximum value of a
523 * <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit">
524 * Unicode low-surrogate code unit</a>
525 * in the UTF-16 encoding, constant {@code '\u005CuDFFF'}.
526 * A low-surrogate is also known as a <i>trailing-surrogate</i>.
530 public static final char MAX_LOW_SURROGATE = '\uDFFF';
533 * The minimum value of a Unicode surrogate code unit in the
534 * UTF-16 encoding, constant {@code '\u005CuD800'}.
538 public static final char MIN_SURROGATE = MIN_HIGH_SURROGATE;
541 * The maximum value of a Unicode surrogate code unit in the
542 * UTF-16 encoding, constant {@code '\u005CuDFFF'}.
546 public static final char MAX_SURROGATE = MAX_LOW_SURROGATE;
549 * The minimum value of a
550 * <a href="http://www.unicode.org/glossary/#supplementary_code_point">
551 * Unicode supplementary code point</a>, constant {@code U+10000}.
555 public static final int MIN_SUPPLEMENTARY_CODE_POINT = 0x010000;
558 * The minimum value of a
559 * <a href="http://www.unicode.org/glossary/#code_point">
560 * Unicode code point</a>, constant {@code U+0000}.
564 public static final int MIN_CODE_POINT = 0x000000;
567 * The maximum value of a
568 * <a href="http://www.unicode.org/glossary/#code_point">
569 * Unicode code point</a>, constant {@code U+10FFFF}.
573 public static final int MAX_CODE_POINT = 0X10FFFF;
577 * Instances of this class represent particular subsets of the Unicode
578 * character set. The only family of subsets defined in the
579 * {@code Character} class is {@link Character.UnicodeBlock}.
580 * Other portions of the Java API may define other subsets for their
585 public static class Subset {
590 * Constructs a new {@code Subset} instance.
592 * @param name The name of this subset
593 * @exception NullPointerException if name is {@code null}
595 protected Subset(String name) {
597 throw new NullPointerException("name");
603 * Compares two {@code Subset} objects for equality.
604 * This method returns {@code true} if and only if
605 * {@code this} and the argument refer to the same
606 * object; since this method is {@code final}, this
607 * guarantee holds for all subclasses.
609 public final boolean equals(Object obj) {
610 return (this == obj);
614 * Returns the standard hash code as defined by the
615 * {@link Object#hashCode} method. This method
616 * is {@code final} in order to ensure that the
617 * {@code equals} and {@code hashCode} methods will
618 * be consistent in all subclasses.
620 public final int hashCode() {
621 return super.hashCode();
625 * Returns the name of this subset.
627 public final String toString() {
632 // See http://www.unicode.org/Public/UNIDATA/Blocks.txt
633 // for the latest specification of Unicode Blocks.
637 * The value of the {@code Character}.
641 private final char value;
643 /** use serialVersionUID from JDK 1.0.2 for interoperability */
644 private static final long serialVersionUID = 3786198910865385080L;
647 * Constructs a newly allocated {@code Character} object that
648 * represents the specified {@code char} value.
650 * @param value the value to be represented by the
651 * {@code Character} object.
653 public Character(char value) {
657 private static class CharacterCache {
658 private CharacterCache(){}
660 static final Character cache[] = new Character[127 + 1];
663 for (int i = 0; i < cache.length; i++)
664 cache[i] = new Character((char)i);
669 * Returns a <tt>Character</tt> instance representing the specified
670 * <tt>char</tt> value.
671 * If a new <tt>Character</tt> instance is not required, this method
672 * should generally be used in preference to the constructor
673 * {@link #Character(char)}, as this method is likely to yield
674 * significantly better space and time performance by caching
675 * frequently requested values.
677 * This method will always cache values in the range {@code
678 * '\u005Cu0000'} to {@code '\u005Cu007F'}, inclusive, and may
679 * cache other values outside of this range.
681 * @param c a char value.
682 * @return a <tt>Character</tt> instance representing <tt>c</tt>.
685 public static Character valueOf(char c) {
686 if (c <= 127) { // must cache
687 return CharacterCache.cache[(int)c];
689 return new Character(c);
693 * Returns the value of this {@code Character} object.
694 * @return the primitive {@code char} value represented by
697 public char charValue() {
702 * Returns a hash code for this {@code Character}; equal to the result
703 * of invoking {@code charValue()}.
705 * @return a hash code value for this {@code Character}
707 public int hashCode() {
712 * Compares this object against the specified object.
713 * The result is {@code true} if and only if the argument is not
714 * {@code null} and is a {@code Character} object that
715 * represents the same {@code char} value as this object.
717 * @param obj the object to compare with.
718 * @return {@code true} if the objects are the same;
719 * {@code false} otherwise.
721 public boolean equals(Object obj) {
722 if (obj instanceof Character) {
723 return value == ((Character)obj).charValue();
729 * Returns a {@code String} object representing this
730 * {@code Character}'s value. The result is a string of
731 * length 1 whose sole component is the primitive
732 * {@code char} value represented by this
733 * {@code Character} object.
735 * @return a string representation of this object.
737 public String toString() {
738 char buf[] = {value};
739 return String.valueOf(buf);
743 * Returns a {@code String} object representing the
744 * specified {@code char}. The result is a string of length
745 * 1 consisting solely of the specified {@code char}.
747 * @param c the {@code char} to be converted
748 * @return the string representation of the specified {@code char}
751 public static String toString(char c) {
752 return String.valueOf(c);
756 * Determines whether the specified code point is a valid
757 * <a href="http://www.unicode.org/glossary/#code_point">
758 * Unicode code point value</a>.
760 * @param codePoint the Unicode code point to be tested
761 * @return {@code true} if the specified code point value is between
762 * {@link #MIN_CODE_POINT} and
763 * {@link #MAX_CODE_POINT} inclusive;
764 * {@code false} otherwise.
767 public static boolean isValidCodePoint(int codePoint) {
768 // Optimized form of:
769 // codePoint >= MIN_CODE_POINT && codePoint <= MAX_CODE_POINT
770 int plane = codePoint >>> 16;
771 return plane < ((MAX_CODE_POINT + 1) >>> 16);
775 * Determines whether the specified character (Unicode code point)
776 * is in the <a href="#BMP">Basic Multilingual Plane (BMP)</a>.
777 * Such code points can be represented using a single {@code char}.
779 * @param codePoint the character (Unicode code point) to be tested
780 * @return {@code true} if the specified code point is between
781 * {@link #MIN_VALUE} and {@link #MAX_VALUE} inclusive;
782 * {@code false} otherwise.
785 public static boolean isBmpCodePoint(int codePoint) {
786 return codePoint >>> 16 == 0;
787 // Optimized form of:
788 // codePoint >= MIN_VALUE && codePoint <= MAX_VALUE
789 // We consistently use logical shift (>>>) to facilitate
790 // additional runtime optimizations.
794 * Determines whether the specified character (Unicode code point)
795 * is in the <a href="#supplementary">supplementary character</a> range.
797 * @param codePoint the character (Unicode code point) to be tested
798 * @return {@code true} if the specified code point is between
799 * {@link #MIN_SUPPLEMENTARY_CODE_POINT} and
800 * {@link #MAX_CODE_POINT} inclusive;
801 * {@code false} otherwise.
804 public static boolean isSupplementaryCodePoint(int codePoint) {
805 return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT
806 && codePoint < MAX_CODE_POINT + 1;
810 * Determines if the given {@code char} value is a
811 * <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit">
812 * Unicode high-surrogate code unit</a>
813 * (also known as <i>leading-surrogate code unit</i>).
815 * <p>Such values do not represent characters by themselves,
816 * but are used in the representation of
817 * <a href="#supplementary">supplementary characters</a>
818 * in the UTF-16 encoding.
820 * @param ch the {@code char} value to be tested.
821 * @return {@code true} if the {@code char} value is between
822 * {@link #MIN_HIGH_SURROGATE} and
823 * {@link #MAX_HIGH_SURROGATE} inclusive;
824 * {@code false} otherwise.
825 * @see Character#isLowSurrogate(char)
826 * @see Character.UnicodeBlock#of(int)
829 public static boolean isHighSurrogate(char ch) {
830 // Help VM constant-fold; MAX_HIGH_SURROGATE + 1 == MIN_LOW_SURROGATE
831 return ch >= MIN_HIGH_SURROGATE && ch < (MAX_HIGH_SURROGATE + 1);
835 * Determines if the given {@code char} value is a
836 * <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit">
837 * Unicode low-surrogate code unit</a>
838 * (also known as <i>trailing-surrogate code unit</i>).
840 * <p>Such values do not represent characters by themselves,
841 * but are used in the representation of
842 * <a href="#supplementary">supplementary characters</a>
843 * in the UTF-16 encoding.
845 * @param ch the {@code char} value to be tested.
846 * @return {@code true} if the {@code char} value is between
847 * {@link #MIN_LOW_SURROGATE} and
848 * {@link #MAX_LOW_SURROGATE} inclusive;
849 * {@code false} otherwise.
850 * @see Character#isHighSurrogate(char)
853 public static boolean isLowSurrogate(char ch) {
854 return ch >= MIN_LOW_SURROGATE && ch < (MAX_LOW_SURROGATE + 1);
858 * Determines if the given {@code char} value is a Unicode
859 * <i>surrogate code unit</i>.
861 * <p>Such values do not represent characters by themselves,
862 * but are used in the representation of
863 * <a href="#supplementary">supplementary characters</a>
864 * in the UTF-16 encoding.
866 * <p>A char value is a surrogate code unit if and only if it is either
867 * a {@linkplain #isLowSurrogate(char) low-surrogate code unit} or
868 * a {@linkplain #isHighSurrogate(char) high-surrogate code unit}.
870 * @param ch the {@code char} value to be tested.
871 * @return {@code true} if the {@code char} value is between
872 * {@link #MIN_SURROGATE} and
873 * {@link #MAX_SURROGATE} inclusive;
874 * {@code false} otherwise.
877 public static boolean isSurrogate(char ch) {
878 return ch >= MIN_SURROGATE && ch < (MAX_SURROGATE + 1);
882 * Determines whether the specified pair of {@code char}
884 * <a href="http://www.unicode.org/glossary/#surrogate_pair">
885 * Unicode surrogate pair</a>.
887 * <p>This method is equivalent to the expression:
889 * isHighSurrogate(high) && isLowSurrogate(low)
890 * </pre></blockquote>
892 * @param high the high-surrogate code value to be tested
893 * @param low the low-surrogate code value to be tested
894 * @return {@code true} if the specified high and
895 * low-surrogate code values represent a valid surrogate pair;
896 * {@code false} otherwise.
899 public static boolean isSurrogatePair(char high, char low) {
900 return isHighSurrogate(high) && isLowSurrogate(low);
904 * Determines the number of {@code char} values needed to
905 * represent the specified character (Unicode code point). If the
906 * specified character is equal to or greater than 0x10000, then
907 * the method returns 2. Otherwise, the method returns 1.
909 * <p>This method doesn't validate the specified character to be a
910 * valid Unicode code point. The caller must validate the
911 * character value using {@link #isValidCodePoint(int) isValidCodePoint}
914 * @param codePoint the character (Unicode code point) to be tested.
915 * @return 2 if the character is a valid supplementary character; 1 otherwise.
916 * @see Character#isSupplementaryCodePoint(int)
919 public static int charCount(int codePoint) {
920 return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT ? 2 : 1;
924 * Converts the specified surrogate pair to its supplementary code
925 * point value. This method does not validate the specified
926 * surrogate pair. The caller must validate it using {@link
927 * #isSurrogatePair(char, char) isSurrogatePair} if necessary.
929 * @param high the high-surrogate code unit
930 * @param low the low-surrogate code unit
931 * @return the supplementary code point composed from the
932 * specified surrogate pair.
935 public static int toCodePoint(char high, char low) {
936 // Optimized form of:
937 // return ((high - MIN_HIGH_SURROGATE) << 10)
938 // + (low - MIN_LOW_SURROGATE)
939 // + MIN_SUPPLEMENTARY_CODE_POINT;
940 return ((high << 10) + low) + (MIN_SUPPLEMENTARY_CODE_POINT
941 - (MIN_HIGH_SURROGATE << 10)
942 - MIN_LOW_SURROGATE);
946 * Returns the code point at the given index of the
947 * {@code CharSequence}. If the {@code char} value at
948 * the given index in the {@code CharSequence} is in the
949 * high-surrogate range, the following index is less than the
950 * length of the {@code CharSequence}, and the
951 * {@code char} value at the following index is in the
952 * low-surrogate range, then the supplementary code point
953 * corresponding to this surrogate pair is returned. Otherwise,
954 * the {@code char} value at the given index is returned.
956 * @param seq a sequence of {@code char} values (Unicode code
958 * @param index the index to the {@code char} values (Unicode
959 * code units) in {@code seq} to be converted
960 * @return the Unicode code point at the given index
961 * @exception NullPointerException if {@code seq} is null.
962 * @exception IndexOutOfBoundsException if the value
963 * {@code index} is negative or not less than
964 * {@link CharSequence#length() seq.length()}.
967 public static int codePointAt(CharSequence seq, int index) {
968 char c1 = seq.charAt(index++);
969 if (isHighSurrogate(c1)) {
970 if (index < seq.length()) {
971 char c2 = seq.charAt(index);
972 if (isLowSurrogate(c2)) {
973 return toCodePoint(c1, c2);
981 * Returns the code point at the given index of the
982 * {@code char} array. If the {@code char} value at
983 * the given index in the {@code char} array is in the
984 * high-surrogate range, the following index is less than the
985 * length of the {@code char} array, and the
986 * {@code char} value at the following index is in the
987 * low-surrogate range, then the supplementary code point
988 * corresponding to this surrogate pair is returned. Otherwise,
989 * the {@code char} value at the given index is returned.
991 * @param a the {@code char} array
992 * @param index the index to the {@code char} values (Unicode
993 * code units) in the {@code char} array to be converted
994 * @return the Unicode code point at the given index
995 * @exception NullPointerException if {@code a} is null.
996 * @exception IndexOutOfBoundsException if the value
997 * {@code index} is negative or not less than
998 * the length of the {@code char} array.
1001 public static int codePointAt(char[] a, int index) {
1002 return codePointAtImpl(a, index, a.length);
1006 * Returns the code point at the given index of the
1007 * {@code char} array, where only array elements with
1008 * {@code index} less than {@code limit} can be used. If
1009 * the {@code char} value at the given index in the
1010 * {@code char} array is in the high-surrogate range, the
1011 * following index is less than the {@code limit}, and the
1012 * {@code char} value at the following index is in the
1013 * low-surrogate range, then the supplementary code point
1014 * corresponding to this surrogate pair is returned. Otherwise,
1015 * the {@code char} value at the given index is returned.
1017 * @param a the {@code char} array
1018 * @param index the index to the {@code char} values (Unicode
1019 * code units) in the {@code char} array to be converted
1020 * @param limit the index after the last array element that
1021 * can be used in the {@code char} array
1022 * @return the Unicode code point at the given index
1023 * @exception NullPointerException if {@code a} is null.
1024 * @exception IndexOutOfBoundsException if the {@code index}
1025 * argument is negative or not less than the {@code limit}
1026 * argument, or if the {@code limit} argument is negative or
1027 * greater than the length of the {@code char} array.
1030 public static int codePointAt(char[] a, int index, int limit) {
1031 if (index >= limit || limit < 0 || limit > a.length) {
1032 throw new IndexOutOfBoundsException();
1034 return codePointAtImpl(a, index, limit);
1037 // throws ArrayIndexOutofBoundsException if index out of bounds
1038 static int codePointAtImpl(char[] a, int index, int limit) {
1039 char c1 = a[index++];
1040 if (isHighSurrogate(c1)) {
1041 if (index < limit) {
1043 if (isLowSurrogate(c2)) {
1044 return toCodePoint(c1, c2);
1052 * Returns the code point preceding the given index of the
1053 * {@code CharSequence}. If the {@code char} value at
1054 * {@code (index - 1)} in the {@code CharSequence} is in
1055 * the low-surrogate range, {@code (index - 2)} is not
1056 * negative, and the {@code char} value at {@code (index - 2)}
1057 * in the {@code CharSequence} is in the
1058 * high-surrogate range, then the supplementary code point
1059 * corresponding to this surrogate pair is returned. Otherwise,
1060 * the {@code char} value at {@code (index - 1)} is
1063 * @param seq the {@code CharSequence} instance
1064 * @param index the index following the code point that should be returned
1065 * @return the Unicode code point value before the given index.
1066 * @exception NullPointerException if {@code seq} is null.
1067 * @exception IndexOutOfBoundsException if the {@code index}
1068 * argument is less than 1 or greater than {@link
1069 * CharSequence#length() seq.length()}.
1072 public static int codePointBefore(CharSequence seq, int index) {
1073 char c2 = seq.charAt(--index);
1074 if (isLowSurrogate(c2)) {
1076 char c1 = seq.charAt(--index);
1077 if (isHighSurrogate(c1)) {
1078 return toCodePoint(c1, c2);
1086 * Returns the code point preceding the given index of the
1087 * {@code char} array. If the {@code char} value at
1088 * {@code (index - 1)} in the {@code char} array is in
1089 * the low-surrogate range, {@code (index - 2)} is not
1090 * negative, and the {@code char} value at {@code (index - 2)}
1091 * in the {@code char} array is in the
1092 * high-surrogate range, then the supplementary code point
1093 * corresponding to this surrogate pair is returned. Otherwise,
1094 * the {@code char} value at {@code (index - 1)} is
1097 * @param a the {@code char} array
1098 * @param index the index following the code point that should be returned
1099 * @return the Unicode code point value before the given index.
1100 * @exception NullPointerException if {@code a} is null.
1101 * @exception IndexOutOfBoundsException if the {@code index}
1102 * argument is less than 1 or greater than the length of the
1103 * {@code char} array
1106 public static int codePointBefore(char[] a, int index) {
1107 return codePointBeforeImpl(a, index, 0);
1111 * Returns the code point preceding the given index of the
1112 * {@code char} array, where only array elements with
1113 * {@code index} greater than or equal to {@code start}
1114 * can be used. If the {@code char} value at {@code (index - 1)}
1115 * in the {@code char} array is in the
1116 * low-surrogate range, {@code (index - 2)} is not less than
1117 * {@code start}, and the {@code char} value at
1118 * {@code (index - 2)} in the {@code char} array is in
1119 * the high-surrogate range, then the supplementary code point
1120 * corresponding to this surrogate pair is returned. Otherwise,
1121 * the {@code char} value at {@code (index - 1)} is
1124 * @param a the {@code char} array
1125 * @param index the index following the code point that should be returned
1126 * @param start the index of the first array element in the
1127 * {@code char} array
1128 * @return the Unicode code point value before the given index.
1129 * @exception NullPointerException if {@code a} is null.
1130 * @exception IndexOutOfBoundsException if the {@code index}
1131 * argument is not greater than the {@code start} argument or
1132 * is greater than the length of the {@code char} array, or
1133 * if the {@code start} argument is negative or not less than
1134 * the length of the {@code char} array.
1137 public static int codePointBefore(char[] a, int index, int start) {
1138 if (index <= start || start < 0 || start >= a.length) {
1139 throw new IndexOutOfBoundsException();
1141 return codePointBeforeImpl(a, index, start);
1144 // throws ArrayIndexOutofBoundsException if index-1 out of bounds
1145 static int codePointBeforeImpl(char[] a, int index, int start) {
1146 char c2 = a[--index];
1147 if (isLowSurrogate(c2)) {
1148 if (index > start) {
1149 char c1 = a[--index];
1150 if (isHighSurrogate(c1)) {
1151 return toCodePoint(c1, c2);
1159 * Returns the leading surrogate (a
1160 * <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit">
1161 * high surrogate code unit</a>) of the
1162 * <a href="http://www.unicode.org/glossary/#surrogate_pair">
1163 * surrogate pair</a>
1164 * representing the specified supplementary character (Unicode
1165 * code point) in the UTF-16 encoding. If the specified character
1167 * <a href="Character.html#supplementary">supplementary character</a>,
1168 * an unspecified {@code char} is returned.
1171 * {@link #isSupplementaryCodePoint isSupplementaryCodePoint(x)}
1172 * is {@code true}, then
1173 * {@link #isHighSurrogate isHighSurrogate}{@code (highSurrogate(x))} and
1174 * {@link #toCodePoint toCodePoint}{@code (highSurrogate(x), }{@link #lowSurrogate lowSurrogate}{@code (x)) == x}
1175 * are also always {@code true}.
1177 * @param codePoint a supplementary character (Unicode code point)
1178 * @return the leading surrogate code unit used to represent the
1179 * character in the UTF-16 encoding
1182 public static char highSurrogate(int codePoint) {
1183 return (char) ((codePoint >>> 10)
1184 + (MIN_HIGH_SURROGATE - (MIN_SUPPLEMENTARY_CODE_POINT >>> 10)));
1188 * Returns the trailing surrogate (a
1189 * <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit">
1190 * low surrogate code unit</a>) of the
1191 * <a href="http://www.unicode.org/glossary/#surrogate_pair">
1192 * surrogate pair</a>
1193 * representing the specified supplementary character (Unicode
1194 * code point) in the UTF-16 encoding. If the specified character
1196 * <a href="Character.html#supplementary">supplementary character</a>,
1197 * an unspecified {@code char} is returned.
1200 * {@link #isSupplementaryCodePoint isSupplementaryCodePoint(x)}
1201 * is {@code true}, then
1202 * {@link #isLowSurrogate isLowSurrogate}{@code (lowSurrogate(x))} and
1203 * {@link #toCodePoint toCodePoint}{@code (}{@link #highSurrogate highSurrogate}{@code (x), lowSurrogate(x)) == x}
1204 * are also always {@code true}.
1206 * @param codePoint a supplementary character (Unicode code point)
1207 * @return the trailing surrogate code unit used to represent the
1208 * character in the UTF-16 encoding
1211 public static char lowSurrogate(int codePoint) {
1212 return (char) ((codePoint & 0x3ff) + MIN_LOW_SURROGATE);
1216 * Converts the specified character (Unicode code point) to its
1217 * UTF-16 representation. If the specified code point is a BMP
1218 * (Basic Multilingual Plane or Plane 0) value, the same value is
1219 * stored in {@code dst[dstIndex]}, and 1 is returned. If the
1220 * specified code point is a supplementary character, its
1221 * surrogate values are stored in {@code dst[dstIndex]}
1222 * (high-surrogate) and {@code dst[dstIndex+1]}
1223 * (low-surrogate), and 2 is returned.
1225 * @param codePoint the character (Unicode code point) to be converted.
1226 * @param dst an array of {@code char} in which the
1227 * {@code codePoint}'s UTF-16 value is stored.
1228 * @param dstIndex the start index into the {@code dst}
1229 * array where the converted value is stored.
1230 * @return 1 if the code point is a BMP code point, 2 if the
1231 * code point is a supplementary code point.
1232 * @exception IllegalArgumentException if the specified
1233 * {@code codePoint} is not a valid Unicode code point.
1234 * @exception NullPointerException if the specified {@code dst} is null.
1235 * @exception IndexOutOfBoundsException if {@code dstIndex}
1236 * is negative or not less than {@code dst.length}, or if
1237 * {@code dst} at {@code dstIndex} doesn't have enough
1238 * array element(s) to store the resulting {@code char}
1239 * value(s). (If {@code dstIndex} is equal to
1240 * {@code dst.length-1} and the specified
1241 * {@code codePoint} is a supplementary character, the
1242 * high-surrogate value is not stored in
1243 * {@code dst[dstIndex]}.)
1246 public static int toChars(int codePoint, char[] dst, int dstIndex) {
1247 if (isBmpCodePoint(codePoint)) {
1248 dst[dstIndex] = (char) codePoint;
1250 } else if (isValidCodePoint(codePoint)) {
1251 toSurrogates(codePoint, dst, dstIndex);
1254 throw new IllegalArgumentException();
1259 * Converts the specified character (Unicode code point) to its
1260 * UTF-16 representation stored in a {@code char} array. If
1261 * the specified code point is a BMP (Basic Multilingual Plane or
1262 * Plane 0) value, the resulting {@code char} array has
1263 * the same value as {@code codePoint}. If the specified code
1264 * point is a supplementary code point, the resulting
1265 * {@code char} array has the corresponding surrogate pair.
1267 * @param codePoint a Unicode code point
1268 * @return a {@code char} array having
1269 * {@code codePoint}'s UTF-16 representation.
1270 * @exception IllegalArgumentException if the specified
1271 * {@code codePoint} is not a valid Unicode code point.
1274 public static char[] toChars(int codePoint) {
1275 if (isBmpCodePoint(codePoint)) {
1276 return new char[] { (char) codePoint };
1277 } else if (isValidCodePoint(codePoint)) {
1278 char[] result = new char[2];
1279 toSurrogates(codePoint, result, 0);
1282 throw new IllegalArgumentException();
1286 static void toSurrogates(int codePoint, char[] dst, int index) {
1287 // We write elements "backwards" to guarantee all-or-nothing
1288 dst[index+1] = lowSurrogate(codePoint);
1289 dst[index] = highSurrogate(codePoint);
1293 * Returns the number of Unicode code points in the text range of
1294 * the specified char sequence. The text range begins at the
1295 * specified {@code beginIndex} and extends to the
1296 * {@code char} at index {@code endIndex - 1}. Thus the
1297 * length (in {@code char}s) of the text range is
1298 * {@code endIndex-beginIndex}. Unpaired surrogates within
1299 * the text range count as one code point each.
1301 * @param seq the char sequence
1302 * @param beginIndex the index to the first {@code char} of
1304 * @param endIndex the index after the last {@code char} of
1306 * @return the number of Unicode code points in the specified text
1308 * @exception NullPointerException if {@code seq} is null.
1309 * @exception IndexOutOfBoundsException if the
1310 * {@code beginIndex} is negative, or {@code endIndex}
1311 * is larger than the length of the given sequence, or
1312 * {@code beginIndex} is larger than {@code endIndex}.
1315 public static int codePointCount(CharSequence seq, int beginIndex, int endIndex) {
1316 int length = seq.length();
1317 if (beginIndex < 0 || endIndex > length || beginIndex > endIndex) {
1318 throw new IndexOutOfBoundsException();
1320 int n = endIndex - beginIndex;
1321 for (int i = beginIndex; i < endIndex; ) {
1322 if (isHighSurrogate(seq.charAt(i++)) && i < endIndex &&
1323 isLowSurrogate(seq.charAt(i))) {
1332 * Returns the number of Unicode code points in a subarray of the
1333 * {@code char} array argument. The {@code offset}
1334 * argument is the index of the first {@code char} of the
1335 * subarray and the {@code count} argument specifies the
1336 * length of the subarray in {@code char}s. Unpaired
1337 * surrogates within the subarray count as one code point each.
1339 * @param a the {@code char} array
1340 * @param offset the index of the first {@code char} in the
1341 * given {@code char} array
1342 * @param count the length of the subarray in {@code char}s
1343 * @return the number of Unicode code points in the specified subarray
1344 * @exception NullPointerException if {@code a} is null.
1345 * @exception IndexOutOfBoundsException if {@code offset} or
1346 * {@code count} is negative, or if {@code offset +
1347 * count} is larger than the length of the given array.
1350 public static int codePointCount(char[] a, int offset, int count) {
1351 if (count > a.length - offset || offset < 0 || count < 0) {
1352 throw new IndexOutOfBoundsException();
1354 return codePointCountImpl(a, offset, count);
1357 static int codePointCountImpl(char[] a, int offset, int count) {
1358 int endIndex = offset + count;
1360 for (int i = offset; i < endIndex; ) {
1361 if (isHighSurrogate(a[i++]) && i < endIndex &&
1362 isLowSurrogate(a[i])) {
1371 * Returns the index within the given char sequence that is offset
1372 * from the given {@code index} by {@code codePointOffset}
1373 * code points. Unpaired surrogates within the text range given by
1374 * {@code index} and {@code codePointOffset} count as
1375 * one code point each.
1377 * @param seq the char sequence
1378 * @param index the index to be offset
1379 * @param codePointOffset the offset in code points
1380 * @return the index within the char sequence
1381 * @exception NullPointerException if {@code seq} is null.
1382 * @exception IndexOutOfBoundsException if {@code index}
1383 * is negative or larger then the length of the char sequence,
1384 * or if {@code codePointOffset} is positive and the
1385 * subsequence starting with {@code index} has fewer than
1386 * {@code codePointOffset} code points, or if
1387 * {@code codePointOffset} is negative and the subsequence
1388 * before {@code index} has fewer than the absolute value
1389 * of {@code codePointOffset} code points.
1392 public static int offsetByCodePoints(CharSequence seq, int index,
1393 int codePointOffset) {
1394 int length = seq.length();
1395 if (index < 0 || index > length) {
1396 throw new IndexOutOfBoundsException();
1400 if (codePointOffset >= 0) {
1402 for (i = 0; x < length && i < codePointOffset; i++) {
1403 if (isHighSurrogate(seq.charAt(x++)) && x < length &&
1404 isLowSurrogate(seq.charAt(x))) {
1408 if (i < codePointOffset) {
1409 throw new IndexOutOfBoundsException();
1413 for (i = codePointOffset; x > 0 && i < 0; i++) {
1414 if (isLowSurrogate(seq.charAt(--x)) && x > 0 &&
1415 isHighSurrogate(seq.charAt(x-1))) {
1420 throw new IndexOutOfBoundsException();
1427 * Returns the index within the given {@code char} subarray
1428 * that is offset from the given {@code index} by
1429 * {@code codePointOffset} code points. The
1430 * {@code start} and {@code count} arguments specify a
1431 * subarray of the {@code char} array. Unpaired surrogates
1432 * within the text range given by {@code index} and
1433 * {@code codePointOffset} count as one code point each.
1435 * @param a the {@code char} array
1436 * @param start the index of the first {@code char} of the
1438 * @param count the length of the subarray in {@code char}s
1439 * @param index the index to be offset
1440 * @param codePointOffset the offset in code points
1441 * @return the index within the subarray
1442 * @exception NullPointerException if {@code a} is null.
1443 * @exception IndexOutOfBoundsException
1444 * if {@code start} or {@code count} is negative,
1445 * or if {@code start + count} is larger than the length of
1447 * or if {@code index} is less than {@code start} or
1448 * larger then {@code start + count},
1449 * or if {@code codePointOffset} is positive and the text range
1450 * starting with {@code index} and ending with {@code start + count - 1}
1451 * has fewer than {@code codePointOffset} code
1453 * or if {@code codePointOffset} is negative and the text range
1454 * starting with {@code start} and ending with {@code index - 1}
1455 * has fewer than the absolute value of
1456 * {@code codePointOffset} code points.
1459 public static int offsetByCodePoints(char[] a, int start, int count,
1460 int index, int codePointOffset) {
1461 if (count > a.length-start || start < 0 || count < 0
1462 || index < start || index > start+count) {
1463 throw new IndexOutOfBoundsException();
1465 return offsetByCodePointsImpl(a, start, count, index, codePointOffset);
1468 static int offsetByCodePointsImpl(char[]a, int start, int count,
1469 int index, int codePointOffset) {
1471 if (codePointOffset >= 0) {
1472 int limit = start + count;
1474 for (i = 0; x < limit && i < codePointOffset; i++) {
1475 if (isHighSurrogate(a[x++]) && x < limit &&
1476 isLowSurrogate(a[x])) {
1480 if (i < codePointOffset) {
1481 throw new IndexOutOfBoundsException();
1485 for (i = codePointOffset; x > start && i < 0; i++) {
1486 if (isLowSurrogate(a[--x]) && x > start &&
1487 isHighSurrogate(a[x-1])) {
1492 throw new IndexOutOfBoundsException();
1499 * Determines if the specified character is a lowercase character.
1501 * A character is lowercase if its general category type, provided
1502 * by {@code Character.getType(ch)}, is
1503 * {@code LOWERCASE_LETTER}, or it has contributory property
1504 * Other_Lowercase as defined by the Unicode Standard.
1506 * The following are examples of lowercase characters:
1507 * <p><blockquote><pre>
1508 * a b c d e f g h i j k l m n o p q r s t u v w x y z
1509 * '\u00DF' '\u00E0' '\u00E1' '\u00E2' '\u00E3' '\u00E4' '\u00E5' '\u00E6'
1510 * '\u00E7' '\u00E8' '\u00E9' '\u00EA' '\u00EB' '\u00EC' '\u00ED' '\u00EE'
1511 * '\u00EF' '\u00F0' '\u00F1' '\u00F2' '\u00F3' '\u00F4' '\u00F5' '\u00F6'
1512 * '\u00F8' '\u00F9' '\u00FA' '\u00FB' '\u00FC' '\u00FD' '\u00FE' '\u00FF'
1513 * </pre></blockquote>
1514 * <p> Many other Unicode characters are lowercase too.
1516 * <p><b>Note:</b> This method cannot handle <a
1517 * href="#supplementary"> supplementary characters</a>. To support
1518 * all Unicode characters, including supplementary characters, use
1519 * the {@link #isLowerCase(int)} method.
1521 * @param ch the character to be tested.
1522 * @return {@code true} if the character is lowercase;
1523 * {@code false} otherwise.
1524 * @see Character#isLowerCase(char)
1525 * @see Character#isTitleCase(char)
1526 * @see Character#toLowerCase(char)
1527 * @see Character#getType(char)
1529 public static boolean isLowerCase(char ch) {
1530 return ch == toLowerCase(ch);
1534 * Determines if the specified character is an uppercase character.
1536 * A character is uppercase if its general category type, provided by
1537 * {@code Character.getType(ch)}, is {@code UPPERCASE_LETTER}.
1538 * or it has contributory property Other_Uppercase as defined by the Unicode Standard.
1540 * The following are examples of uppercase characters:
1541 * <p><blockquote><pre>
1542 * A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
1543 * '\u00C0' '\u00C1' '\u00C2' '\u00C3' '\u00C4' '\u00C5' '\u00C6' '\u00C7'
1544 * '\u00C8' '\u00C9' '\u00CA' '\u00CB' '\u00CC' '\u00CD' '\u00CE' '\u00CF'
1545 * '\u00D0' '\u00D1' '\u00D2' '\u00D3' '\u00D4' '\u00D5' '\u00D6' '\u00D8'
1546 * '\u00D9' '\u00DA' '\u00DB' '\u00DC' '\u00DD' '\u00DE'
1547 * </pre></blockquote>
1548 * <p> Many other Unicode characters are uppercase too.<p>
1550 * <p><b>Note:</b> This method cannot handle <a
1551 * href="#supplementary"> supplementary characters</a>. To support
1552 * all Unicode characters, including supplementary characters, use
1553 * the {@link #isUpperCase(int)} method.
1555 * @param ch the character to be tested.
1556 * @return {@code true} if the character is uppercase;
1557 * {@code false} otherwise.
1558 * @see Character#isLowerCase(char)
1559 * @see Character#isTitleCase(char)
1560 * @see Character#toUpperCase(char)
1561 * @see Character#getType(char)
1564 public static boolean isUpperCase(char ch) {
1565 return ch == toUpperCase(ch);
1569 * Determines if the specified character is a titlecase character.
1571 * A character is a titlecase character if its general
1572 * category type, provided by {@code Character.getType(ch)},
1573 * is {@code TITLECASE_LETTER}.
1575 * Some characters look like pairs of Latin letters. For example, there
1576 * is an uppercase letter that looks like "LJ" and has a corresponding
1577 * lowercase letter that looks like "lj". A third form, which looks like "Lj",
1578 * is the appropriate form to use when rendering a word in lowercase
1579 * with initial capitals, as for a book title.
1581 * These are some of the Unicode characters for which this method returns
1584 * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON}
1585 * <li>{@code LATIN CAPITAL LETTER L WITH SMALL LETTER J}
1586 * <li>{@code LATIN CAPITAL LETTER N WITH SMALL LETTER J}
1587 * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z}
1589 * <p> Many other Unicode characters are titlecase too.<p>
1591 * <p><b>Note:</b> This method cannot handle <a
1592 * href="#supplementary"> supplementary characters</a>. To support
1593 * all Unicode characters, including supplementary characters, use
1594 * the {@link #isTitleCase(int)} method.
1596 * @param ch the character to be tested.
1597 * @return {@code true} if the character is titlecase;
1598 * {@code false} otherwise.
1599 * @see Character#isLowerCase(char)
1600 * @see Character#isUpperCase(char)
1601 * @see Character#toTitleCase(char)
1602 * @see Character#getType(char)
1605 public static boolean isTitleCase(char ch) {
1606 return isTitleCase((int)ch);
1610 * Determines if the specified character (Unicode code point) is a titlecase character.
1612 * A character is a titlecase character if its general
1613 * category type, provided by {@link Character#getType(int) getType(codePoint)},
1614 * is {@code TITLECASE_LETTER}.
1616 * Some characters look like pairs of Latin letters. For example, there
1617 * is an uppercase letter that looks like "LJ" and has a corresponding
1618 * lowercase letter that looks like "lj". A third form, which looks like "Lj",
1619 * is the appropriate form to use when rendering a word in lowercase
1620 * with initial capitals, as for a book title.
1622 * These are some of the Unicode characters for which this method returns
1625 * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON}
1626 * <li>{@code LATIN CAPITAL LETTER L WITH SMALL LETTER J}
1627 * <li>{@code LATIN CAPITAL LETTER N WITH SMALL LETTER J}
1628 * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z}
1630 * <p> Many other Unicode characters are titlecase too.<p>
1632 * @param codePoint the character (Unicode code point) to be tested.
1633 * @return {@code true} if the character is titlecase;
1634 * {@code false} otherwise.
1635 * @see Character#isLowerCase(int)
1636 * @see Character#isUpperCase(int)
1637 * @see Character#toTitleCase(int)
1638 * @see Character#getType(int)
1641 public static boolean isTitleCase(int codePoint) {
1642 return getType(codePoint) == Character.TITLECASE_LETTER;
1646 * Determines if the specified character is a digit.
1648 * A character is a digit if its general category type, provided
1649 * by {@code Character.getType(ch)}, is
1650 * {@code DECIMAL_DIGIT_NUMBER}.
1652 * Some Unicode character ranges that contain digits:
1654 * <li>{@code '\u005Cu0030'} through {@code '\u005Cu0039'},
1655 * ISO-LATIN-1 digits ({@code '0'} through {@code '9'})
1656 * <li>{@code '\u005Cu0660'} through {@code '\u005Cu0669'},
1657 * Arabic-Indic digits
1658 * <li>{@code '\u005Cu06F0'} through {@code '\u005Cu06F9'},
1659 * Extended Arabic-Indic digits
1660 * <li>{@code '\u005Cu0966'} through {@code '\u005Cu096F'},
1662 * <li>{@code '\u005CuFF10'} through {@code '\u005CuFF19'},
1666 * Many other character ranges contain digits as well.
1668 * <p><b>Note:</b> This method cannot handle <a
1669 * href="#supplementary"> supplementary characters</a>. To support
1670 * all Unicode characters, including supplementary characters, use
1671 * the {@link #isDigit(int)} method.
1673 * @param ch the character to be tested.
1674 * @return {@code true} if the character is a digit;
1675 * {@code false} otherwise.
1676 * @see Character#digit(char, int)
1677 * @see Character#forDigit(int, int)
1678 * @see Character#getType(char)
1680 public static boolean isDigit(char ch) {
1681 return String.valueOf(ch).matches("\\d");
1685 * Determines if the specified character (Unicode code point) is a digit.
1687 * A character is a digit if its general category type, provided
1688 * by {@link Character#getType(int) getType(codePoint)}, is
1689 * {@code DECIMAL_DIGIT_NUMBER}.
1691 * Some Unicode character ranges that contain digits:
1693 * <li>{@code '\u005Cu0030'} through {@code '\u005Cu0039'},
1694 * ISO-LATIN-1 digits ({@code '0'} through {@code '9'})
1695 * <li>{@code '\u005Cu0660'} through {@code '\u005Cu0669'},
1696 * Arabic-Indic digits
1697 * <li>{@code '\u005Cu06F0'} through {@code '\u005Cu06F9'},
1698 * Extended Arabic-Indic digits
1699 * <li>{@code '\u005Cu0966'} through {@code '\u005Cu096F'},
1701 * <li>{@code '\u005CuFF10'} through {@code '\u005CuFF19'},
1705 * Many other character ranges contain digits as well.
1707 * @param codePoint the character (Unicode code point) to be tested.
1708 * @return {@code true} if the character is a digit;
1709 * {@code false} otherwise.
1710 * @see Character#forDigit(int, int)
1711 * @see Character#getType(int)
1714 public static boolean isDigit(int codePoint) {
1715 return fromCodeChars(codePoint).matches("\\d");
1718 @JavaScriptBody(args = "c", body = "return String.fromCharCode(c);")
1719 private native static String fromCodeChars(int codePoint);
1722 * Determines if a character is defined in Unicode.
1724 * A character is defined if at least one of the following is true:
1726 * <li>It has an entry in the UnicodeData file.
1727 * <li>It has a value in a range defined by the UnicodeData file.
1730 * <p><b>Note:</b> This method cannot handle <a
1731 * href="#supplementary"> supplementary characters</a>. To support
1732 * all Unicode characters, including supplementary characters, use
1733 * the {@link #isDefined(int)} method.
1735 * @param ch the character to be tested
1736 * @return {@code true} if the character has a defined meaning
1737 * in Unicode; {@code false} otherwise.
1738 * @see Character#isDigit(char)
1739 * @see Character#isLetter(char)
1740 * @see Character#isLetterOrDigit(char)
1741 * @see Character#isLowerCase(char)
1742 * @see Character#isTitleCase(char)
1743 * @see Character#isUpperCase(char)
1746 public static boolean isDefined(char ch) {
1747 return isDefined((int)ch);
1751 * Determines if a character (Unicode code point) is defined in Unicode.
1753 * A character is defined if at least one of the following is true:
1755 * <li>It has an entry in the UnicodeData file.
1756 * <li>It has a value in a range defined by the UnicodeData file.
1759 * @param codePoint the character (Unicode code point) to be tested.
1760 * @return {@code true} if the character has a defined meaning
1761 * in Unicode; {@code false} otherwise.
1762 * @see Character#isDigit(int)
1763 * @see Character#isLetter(int)
1764 * @see Character#isLetterOrDigit(int)
1765 * @see Character#isLowerCase(int)
1766 * @see Character#isTitleCase(int)
1767 * @see Character#isUpperCase(int)
1770 public static boolean isDefined(int codePoint) {
1771 return getType(codePoint) != Character.UNASSIGNED;
1775 * Determines if the specified character is a letter.
1777 * A character is considered to be a letter if its general
1778 * category type, provided by {@code Character.getType(ch)},
1779 * is any of the following:
1781 * <li> {@code UPPERCASE_LETTER}
1782 * <li> {@code LOWERCASE_LETTER}
1783 * <li> {@code TITLECASE_LETTER}
1784 * <li> {@code MODIFIER_LETTER}
1785 * <li> {@code OTHER_LETTER}
1788 * Not all letters have case. Many characters are
1789 * letters but are neither uppercase nor lowercase nor titlecase.
1791 * <p><b>Note:</b> This method cannot handle <a
1792 * href="#supplementary"> supplementary characters</a>. To support
1793 * all Unicode characters, including supplementary characters, use
1794 * the {@link #isLetter(int)} method.
1796 * @param ch the character to be tested.
1797 * @return {@code true} if the character is a letter;
1798 * {@code false} otherwise.
1799 * @see Character#isDigit(char)
1800 * @see Character#isJavaIdentifierStart(char)
1801 * @see Character#isJavaLetter(char)
1802 * @see Character#isJavaLetterOrDigit(char)
1803 * @see Character#isLetterOrDigit(char)
1804 * @see Character#isLowerCase(char)
1805 * @see Character#isTitleCase(char)
1806 * @see Character#isUnicodeIdentifierStart(char)
1807 * @see Character#isUpperCase(char)
1809 public static boolean isLetter(char ch) {
1810 return String.valueOf(ch).matches("\\w") && !isDigit(ch);
1814 * Determines if the specified character (Unicode code point) is a letter.
1816 * A character is considered to be a letter if its general
1817 * category type, provided by {@link Character#getType(int) getType(codePoint)},
1818 * is any of the following:
1820 * <li> {@code UPPERCASE_LETTER}
1821 * <li> {@code LOWERCASE_LETTER}
1822 * <li> {@code TITLECASE_LETTER}
1823 * <li> {@code MODIFIER_LETTER}
1824 * <li> {@code OTHER_LETTER}
1827 * Not all letters have case. Many characters are
1828 * letters but are neither uppercase nor lowercase nor titlecase.
1830 * @param codePoint the character (Unicode code point) to be tested.
1831 * @return {@code true} if the character is a letter;
1832 * {@code false} otherwise.
1833 * @see Character#isDigit(int)
1834 * @see Character#isJavaIdentifierStart(int)
1835 * @see Character#isLetterOrDigit(int)
1836 * @see Character#isLowerCase(int)
1837 * @see Character#isTitleCase(int)
1838 * @see Character#isUnicodeIdentifierStart(int)
1839 * @see Character#isUpperCase(int)
1842 public static boolean isLetter(int codePoint) {
1843 return fromCodeChars(codePoint).matches("\\w") && !isDigit(codePoint);
1847 * Determines if the specified character is a letter or digit.
1849 * A character is considered to be a letter or digit if either
1850 * {@code Character.isLetter(char ch)} or
1851 * {@code Character.isDigit(char ch)} returns
1852 * {@code true} for the character.
1854 * <p><b>Note:</b> This method cannot handle <a
1855 * href="#supplementary"> supplementary characters</a>. To support
1856 * all Unicode characters, including supplementary characters, use
1857 * the {@link #isLetterOrDigit(int)} method.
1859 * @param ch the character to be tested.
1860 * @return {@code true} if the character is a letter or digit;
1861 * {@code false} otherwise.
1862 * @see Character#isDigit(char)
1863 * @see Character#isJavaIdentifierPart(char)
1864 * @see Character#isJavaLetter(char)
1865 * @see Character#isJavaLetterOrDigit(char)
1866 * @see Character#isLetter(char)
1867 * @see Character#isUnicodeIdentifierPart(char)
1870 public static boolean isLetterOrDigit(char ch) {
1871 return String.valueOf(ch).matches("\\w");
1875 * Determines if the specified character (Unicode code point) is a letter or digit.
1877 * A character is considered to be a letter or digit if either
1878 * {@link #isLetter(int) isLetter(codePoint)} or
1879 * {@link #isDigit(int) isDigit(codePoint)} returns
1880 * {@code true} for the character.
1882 * @param codePoint the character (Unicode code point) to be tested.
1883 * @return {@code true} if the character is a letter or digit;
1884 * {@code false} otherwise.
1885 * @see Character#isDigit(int)
1886 * @see Character#isJavaIdentifierPart(int)
1887 * @see Character#isLetter(int)
1888 * @see Character#isUnicodeIdentifierPart(int)
1891 public static boolean isLetterOrDigit(int codePoint) {
1892 return fromCodeChars(codePoint).matches("\\w");
1895 static int getType(int x) {
1896 throw new UnsupportedOperationException();
1900 * Determines if the specified character is
1901 * permissible as the first character in a Java identifier.
1903 * A character may start a Java identifier if and only if
1904 * one of the following conditions is true:
1906 * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true}
1907 * <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER}
1908 * <li> {@code ch} is a currency symbol (such as {@code '$'})
1909 * <li> {@code ch} is a connecting punctuation character (such as {@code '_'}).
1912 * <p><b>Note:</b> This method cannot handle <a
1913 * href="#supplementary"> supplementary characters</a>. To support
1914 * all Unicode characters, including supplementary characters, use
1915 * the {@link #isJavaIdentifierStart(int)} method.
1917 * @param ch the character to be tested.
1918 * @return {@code true} if the character may start a Java identifier;
1919 * {@code false} otherwise.
1920 * @see Character#isJavaIdentifierPart(char)
1921 * @see Character#isLetter(char)
1922 * @see Character#isUnicodeIdentifierStart(char)
1923 * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence)
1926 public static boolean isJavaIdentifierStart(char ch) {
1927 return isJavaIdentifierStart((int)ch);
1931 * Determines if the character (Unicode code point) is
1932 * permissible as the first character in a Java identifier.
1934 * A character may start a Java identifier if and only if
1935 * one of the following conditions is true:
1937 * <li> {@link #isLetter(int) isLetter(codePoint)}
1938 * returns {@code true}
1939 * <li> {@link #getType(int) getType(codePoint)}
1940 * returns {@code LETTER_NUMBER}
1941 * <li> the referenced character is a currency symbol (such as {@code '$'})
1942 * <li> the referenced character is a connecting punctuation character
1943 * (such as {@code '_'}).
1946 * @param codePoint the character (Unicode code point) to be tested.
1947 * @return {@code true} if the character may start a Java identifier;
1948 * {@code false} otherwise.
1949 * @see Character#isJavaIdentifierPart(int)
1950 * @see Character#isLetter(int)
1951 * @see Character#isUnicodeIdentifierStart(int)
1952 * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence)
1955 public static boolean isJavaIdentifierStart(int codePoint) {
1957 ('A' <= codePoint && codePoint <= 'Z') ||
1958 ('a' <= codePoint && codePoint <= 'z');
1962 * Determines if the specified character may be part of a Java
1963 * identifier as other than the first character.
1965 * A character may be part of a Java identifier if any of the following
1968 * <li> it is a letter
1969 * <li> it is a currency symbol (such as {@code '$'})
1970 * <li> it is a connecting punctuation character (such as {@code '_'})
1971 * <li> it is a digit
1972 * <li> it is a numeric letter (such as a Roman numeral character)
1973 * <li> it is a combining mark
1974 * <li> it is a non-spacing mark
1975 * <li> {@code isIdentifierIgnorable} returns
1976 * {@code true} for the character
1979 * <p><b>Note:</b> This method cannot handle <a
1980 * href="#supplementary"> supplementary characters</a>. To support
1981 * all Unicode characters, including supplementary characters, use
1982 * the {@link #isJavaIdentifierPart(int)} method.
1984 * @param ch the character to be tested.
1985 * @return {@code true} if the character may be part of a
1986 * Java identifier; {@code false} otherwise.
1987 * @see Character#isIdentifierIgnorable(char)
1988 * @see Character#isJavaIdentifierStart(char)
1989 * @see Character#isLetterOrDigit(char)
1990 * @see Character#isUnicodeIdentifierPart(char)
1991 * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence)
1994 public static boolean isJavaIdentifierPart(char ch) {
1995 return isJavaIdentifierPart((int)ch);
1999 * Determines if the character (Unicode code point) may be part of a Java
2000 * identifier as other than the first character.
2002 * A character may be part of a Java identifier if any of the following
2005 * <li> it is a letter
2006 * <li> it is a currency symbol (such as {@code '$'})
2007 * <li> it is a connecting punctuation character (such as {@code '_'})
2008 * <li> it is a digit
2009 * <li> it is a numeric letter (such as a Roman numeral character)
2010 * <li> it is a combining mark
2011 * <li> it is a non-spacing mark
2012 * <li> {@link #isIdentifierIgnorable(int)
2013 * isIdentifierIgnorable(codePoint)} returns {@code true} for
2017 * @param codePoint the character (Unicode code point) to be tested.
2018 * @return {@code true} if the character may be part of a
2019 * Java identifier; {@code false} otherwise.
2020 * @see Character#isIdentifierIgnorable(int)
2021 * @see Character#isJavaIdentifierStart(int)
2022 * @see Character#isLetterOrDigit(int)
2023 * @see Character#isUnicodeIdentifierPart(int)
2024 * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence)
2027 public static boolean isJavaIdentifierPart(int codePoint) {
2028 return isJavaIdentifierStart(codePoint) ||
2029 ('0' <= codePoint && codePoint <= '9') || codePoint == '$';
2033 * Converts the character argument to lowercase using case
2034 * mapping information from the UnicodeData file.
2037 * {@code Character.isLowerCase(Character.toLowerCase(ch))}
2038 * does not always return {@code true} for some ranges of
2039 * characters, particularly those that are symbols or ideographs.
2041 * <p>In general, {@link String#toLowerCase()} should be used to map
2042 * characters to lowercase. {@code String} case mapping methods
2043 * have several benefits over {@code Character} case mapping methods.
2044 * {@code String} case mapping methods can perform locale-sensitive
2045 * mappings, context-sensitive mappings, and 1:M character mappings, whereas
2046 * the {@code Character} case mapping methods cannot.
2048 * <p><b>Note:</b> This method cannot handle <a
2049 * href="#supplementary"> supplementary characters</a>. To support
2050 * all Unicode characters, including supplementary characters, use
2051 * the {@link #toLowerCase(int)} method.
2053 * @param ch the character to be converted.
2054 * @return the lowercase equivalent of the character, if any;
2055 * otherwise, the character itself.
2056 * @see Character#isLowerCase(char)
2057 * @see String#toLowerCase()
2059 public static char toLowerCase(char ch) {
2060 return String.valueOf(ch).toLowerCase().charAt(0);
2064 * Converts the character argument to uppercase using case mapping
2065 * information from the UnicodeData file.
2068 * {@code Character.isUpperCase(Character.toUpperCase(ch))}
2069 * does not always return {@code true} for some ranges of
2070 * characters, particularly those that are symbols or ideographs.
2072 * <p>In general, {@link String#toUpperCase()} should be used to map
2073 * characters to uppercase. {@code String} case mapping methods
2074 * have several benefits over {@code Character} case mapping methods.
2075 * {@code String} case mapping methods can perform locale-sensitive
2076 * mappings, context-sensitive mappings, and 1:M character mappings, whereas
2077 * the {@code Character} case mapping methods cannot.
2079 * <p><b>Note:</b> This method cannot handle <a
2080 * href="#supplementary"> supplementary characters</a>. To support
2081 * all Unicode characters, including supplementary characters, use
2082 * the {@link #toUpperCase(int)} method.
2084 * @param ch the character to be converted.
2085 * @return the uppercase equivalent of the character, if any;
2086 * otherwise, the character itself.
2087 * @see Character#isUpperCase(char)
2088 * @see String#toUpperCase()
2090 public static char toUpperCase(char ch) {
2091 return String.valueOf(ch).toUpperCase().charAt(0);
2095 * Returns the numeric value of the character {@code ch} in the
2098 * If the radix is not in the range {@code MIN_RADIX} ≤
2099 * {@code radix} ≤ {@code MAX_RADIX} or if the
2100 * value of {@code ch} is not a valid digit in the specified
2101 * radix, {@code -1} is returned. A character is a valid digit
2102 * if at least one of the following is true:
2104 * <li>The method {@code isDigit} is {@code true} of the character
2105 * and the Unicode decimal digit value of the character (or its
2106 * single-character decomposition) is less than the specified radix.
2107 * In this case the decimal digit value is returned.
2108 * <li>The character is one of the uppercase Latin letters
2109 * {@code 'A'} through {@code 'Z'} and its code is less than
2110 * {@code radix + 'A' - 10}.
2111 * In this case, {@code ch - 'A' + 10}
2113 * <li>The character is one of the lowercase Latin letters
2114 * {@code 'a'} through {@code 'z'} and its code is less than
2115 * {@code radix + 'a' - 10}.
2116 * In this case, {@code ch - 'a' + 10}
2118 * <li>The character is one of the fullwidth uppercase Latin letters A
2119 * ({@code '\u005CuFF21'}) through Z ({@code '\u005CuFF3A'})
2120 * and its code is less than
2121 * {@code radix + '\u005CuFF21' - 10}.
2122 * In this case, {@code ch - '\u005CuFF21' + 10}
2124 * <li>The character is one of the fullwidth lowercase Latin letters a
2125 * ({@code '\u005CuFF41'}) through z ({@code '\u005CuFF5A'})
2126 * and its code is less than
2127 * {@code radix + '\u005CuFF41' - 10}.
2128 * In this case, {@code ch - '\u005CuFF41' + 10}
2132 * <p><b>Note:</b> This method cannot handle <a
2133 * href="#supplementary"> supplementary characters</a>. To support
2134 * all Unicode characters, including supplementary characters, use
2135 * the {@link #digit(int, int)} method.
2137 * @param ch the character to be converted.
2138 * @param radix the radix.
2139 * @return the numeric value represented by the character in the
2141 * @see Character#forDigit(int, int)
2142 * @see Character#isDigit(char)
2144 public static int digit(char ch, int radix) {
2145 return digit((int)ch, radix);
2149 * Returns the numeric value of the specified character (Unicode
2150 * code point) in the specified radix.
2152 * <p>If the radix is not in the range {@code MIN_RADIX} ≤
2153 * {@code radix} ≤ {@code MAX_RADIX} or if the
2154 * character is not a valid digit in the specified
2155 * radix, {@code -1} is returned. A character is a valid digit
2156 * if at least one of the following is true:
2158 * <li>The method {@link #isDigit(int) isDigit(codePoint)} is {@code true} of the character
2159 * and the Unicode decimal digit value of the character (or its
2160 * single-character decomposition) is less than the specified radix.
2161 * In this case the decimal digit value is returned.
2162 * <li>The character is one of the uppercase Latin letters
2163 * {@code 'A'} through {@code 'Z'} and its code is less than
2164 * {@code radix + 'A' - 10}.
2165 * In this case, {@code codePoint - 'A' + 10}
2167 * <li>The character is one of the lowercase Latin letters
2168 * {@code 'a'} through {@code 'z'} and its code is less than
2169 * {@code radix + 'a' - 10}.
2170 * In this case, {@code codePoint - 'a' + 10}
2172 * <li>The character is one of the fullwidth uppercase Latin letters A
2173 * ({@code '\u005CuFF21'}) through Z ({@code '\u005CuFF3A'})
2174 * and its code is less than
2175 * {@code radix + '\u005CuFF21' - 10}.
2177 * {@code codePoint - '\u005CuFF21' + 10}
2179 * <li>The character is one of the fullwidth lowercase Latin letters a
2180 * ({@code '\u005CuFF41'}) through z ({@code '\u005CuFF5A'})
2181 * and its code is less than
2182 * {@code radix + '\u005CuFF41'- 10}.
2184 * {@code codePoint - '\u005CuFF41' + 10}
2188 * @param codePoint the character (Unicode code point) to be converted.
2189 * @param radix the radix.
2190 * @return the numeric value represented by the character in the
2192 * @see Character#forDigit(int, int)
2193 * @see Character#isDigit(int)
2196 public static int digit(int codePoint, int radix) {
2197 throw new UnsupportedOperationException();
2201 * Returns the {@code int} value that the specified Unicode
2202 * character represents. For example, the character
2203 * {@code '\u005Cu216C'} (the roman numeral fifty) will return
2204 * an int with a value of 50.
2206 * The letters A-Z in their uppercase ({@code '\u005Cu0041'} through
2207 * {@code '\u005Cu005A'}), lowercase
2208 * ({@code '\u005Cu0061'} through {@code '\u005Cu007A'}), and
2209 * full width variant ({@code '\u005CuFF21'} through
2210 * {@code '\u005CuFF3A'} and {@code '\u005CuFF41'} through
2211 * {@code '\u005CuFF5A'}) forms have numeric values from 10
2212 * through 35. This is independent of the Unicode specification,
2213 * which does not assign numeric values to these {@code char}
2216 * If the character does not have a numeric value, then -1 is returned.
2217 * If the character has a numeric value that cannot be represented as a
2218 * nonnegative integer (for example, a fractional value), then -2
2221 * <p><b>Note:</b> This method cannot handle <a
2222 * href="#supplementary"> supplementary characters</a>. To support
2223 * all Unicode characters, including supplementary characters, use
2224 * the {@link #getNumericValue(int)} method.
2226 * @param ch the character to be converted.
2227 * @return the numeric value of the character, as a nonnegative {@code int}
2228 * value; -2 if the character has a numeric value that is not a
2229 * nonnegative integer; -1 if the character has no numeric value.
2230 * @see Character#forDigit(int, int)
2231 * @see Character#isDigit(char)
2234 public static int getNumericValue(char ch) {
2235 return getNumericValue((int)ch);
2239 * Returns the {@code int} value that the specified
2240 * character (Unicode code point) represents. For example, the character
2241 * {@code '\u005Cu216C'} (the Roman numeral fifty) will return
2242 * an {@code int} with a value of 50.
2244 * The letters A-Z in their uppercase ({@code '\u005Cu0041'} through
2245 * {@code '\u005Cu005A'}), lowercase
2246 * ({@code '\u005Cu0061'} through {@code '\u005Cu007A'}), and
2247 * full width variant ({@code '\u005CuFF21'} through
2248 * {@code '\u005CuFF3A'} and {@code '\u005CuFF41'} through
2249 * {@code '\u005CuFF5A'}) forms have numeric values from 10
2250 * through 35. This is independent of the Unicode specification,
2251 * which does not assign numeric values to these {@code char}
2254 * If the character does not have a numeric value, then -1 is returned.
2255 * If the character has a numeric value that cannot be represented as a
2256 * nonnegative integer (for example, a fractional value), then -2
2259 * @param codePoint the character (Unicode code point) to be converted.
2260 * @return the numeric value of the character, as a nonnegative {@code int}
2261 * value; -2 if the character has a numeric value that is not a
2262 * nonnegative integer; -1 if the character has no numeric value.
2263 * @see Character#forDigit(int, int)
2264 * @see Character#isDigit(int)
2267 public static int getNumericValue(int codePoint) {
2268 throw new UnsupportedOperationException();
2272 * Determines if the specified character is ISO-LATIN-1 white space.
2273 * This method returns {@code true} for the following five
2276 * <tr><td>{@code '\t'}</td> <td>{@code U+0009}</td>
2277 * <td>{@code HORIZONTAL TABULATION}</td></tr>
2278 * <tr><td>{@code '\n'}</td> <td>{@code U+000A}</td>
2279 * <td>{@code NEW LINE}</td></tr>
2280 * <tr><td>{@code '\f'}</td> <td>{@code U+000C}</td>
2281 * <td>{@code FORM FEED}</td></tr>
2282 * <tr><td>{@code '\r'}</td> <td>{@code U+000D}</td>
2283 * <td>{@code CARRIAGE RETURN}</td></tr>
2284 * <tr><td>{@code ' '}</td> <td>{@code U+0020}</td>
2285 * <td>{@code SPACE}</td></tr>
2288 * @param ch the character to be tested.
2289 * @return {@code true} if the character is ISO-LATIN-1 white
2290 * space; {@code false} otherwise.
2291 * @see Character#isSpaceChar(char)
2292 * @see Character#isWhitespace(char)
2293 * @deprecated Replaced by isWhitespace(char).
2296 public static boolean isSpace(char ch) {
2297 return (ch <= 0x0020) &&
2298 (((((1L << 0x0009) |
2302 (1L << 0x0020)) >> ch) & 1L) != 0);
2308 * Determines if the specified character is white space according to Java.
2309 * A character is a Java whitespace character if and only if it satisfies
2310 * one of the following criteria:
2312 * <li> It is a Unicode space character ({@code SPACE_SEPARATOR},
2313 * {@code LINE_SEPARATOR}, or {@code PARAGRAPH_SEPARATOR})
2314 * but is not also a non-breaking space ({@code '\u005Cu00A0'},
2315 * {@code '\u005Cu2007'}, {@code '\u005Cu202F'}).
2316 * <li> It is {@code '\u005Ct'}, U+0009 HORIZONTAL TABULATION.
2317 * <li> It is {@code '\u005Cn'}, U+000A LINE FEED.
2318 * <li> It is {@code '\u005Cu000B'}, U+000B VERTICAL TABULATION.
2319 * <li> It is {@code '\u005Cf'}, U+000C FORM FEED.
2320 * <li> It is {@code '\u005Cr'}, U+000D CARRIAGE RETURN.
2321 * <li> It is {@code '\u005Cu001C'}, U+001C FILE SEPARATOR.
2322 * <li> It is {@code '\u005Cu001D'}, U+001D GROUP SEPARATOR.
2323 * <li> It is {@code '\u005Cu001E'}, U+001E RECORD SEPARATOR.
2324 * <li> It is {@code '\u005Cu001F'}, U+001F UNIT SEPARATOR.
2327 * <p><b>Note:</b> This method cannot handle <a
2328 * href="#supplementary"> supplementary characters</a>. To support
2329 * all Unicode characters, including supplementary characters, use
2330 * the {@link #isWhitespace(int)} method.
2332 * @param ch the character to be tested.
2333 * @return {@code true} if the character is a Java whitespace
2334 * character; {@code false} otherwise.
2335 * @see Character#isSpaceChar(char)
2338 public static boolean isWhitespace(char ch) {
2339 return isWhitespace((int)ch);
2343 * Determines if the specified character (Unicode code point) is
2344 * white space according to Java. A character is a Java
2345 * whitespace character if and only if it satisfies one of the
2346 * following criteria:
2348 * <li> It is a Unicode space character ({@link #SPACE_SEPARATOR},
2349 * {@link #LINE_SEPARATOR}, or {@link #PARAGRAPH_SEPARATOR})
2350 * but is not also a non-breaking space ({@code '\u005Cu00A0'},
2351 * {@code '\u005Cu2007'}, {@code '\u005Cu202F'}).
2352 * <li> It is {@code '\u005Ct'}, U+0009 HORIZONTAL TABULATION.
2353 * <li> It is {@code '\u005Cn'}, U+000A LINE FEED.
2354 * <li> It is {@code '\u005Cu000B'}, U+000B VERTICAL TABULATION.
2355 * <li> It is {@code '\u005Cf'}, U+000C FORM FEED.
2356 * <li> It is {@code '\u005Cr'}, U+000D CARRIAGE RETURN.
2357 * <li> It is {@code '\u005Cu001C'}, U+001C FILE SEPARATOR.
2358 * <li> It is {@code '\u005Cu001D'}, U+001D GROUP SEPARATOR.
2359 * <li> It is {@code '\u005Cu001E'}, U+001E RECORD SEPARATOR.
2360 * <li> It is {@code '\u005Cu001F'}, U+001F UNIT SEPARATOR.
2364 * @param codePoint the character (Unicode code point) to be tested.
2365 * @return {@code true} if the character is a Java whitespace
2366 * character; {@code false} otherwise.
2367 * @see Character#isSpaceChar(int)
2370 public static boolean isWhitespace(int codePoint) {
2371 throw new UnsupportedOperationException();
2375 * Determines if the specified character is an ISO control
2376 * character. A character is considered to be an ISO control
2377 * character if its code is in the range {@code '\u005Cu0000'}
2378 * through {@code '\u005Cu001F'} or in the range
2379 * {@code '\u005Cu007F'} through {@code '\u005Cu009F'}.
2381 * <p><b>Note:</b> This method cannot handle <a
2382 * href="#supplementary"> supplementary characters</a>. To support
2383 * all Unicode characters, including supplementary characters, use
2384 * the {@link #isISOControl(int)} method.
2386 * @param ch the character to be tested.
2387 * @return {@code true} if the character is an ISO control character;
2388 * {@code false} otherwise.
2390 * @see Character#isSpaceChar(char)
2391 * @see Character#isWhitespace(char)
2394 public static boolean isISOControl(char ch) {
2395 return isISOControl((int)ch);
2399 * Determines if the referenced character (Unicode code point) is an ISO control
2400 * character. A character is considered to be an ISO control
2401 * character if its code is in the range {@code '\u005Cu0000'}
2402 * through {@code '\u005Cu001F'} or in the range
2403 * {@code '\u005Cu007F'} through {@code '\u005Cu009F'}.
2405 * @param codePoint the character (Unicode code point) to be tested.
2406 * @return {@code true} if the character is an ISO control character;
2407 * {@code false} otherwise.
2408 * @see Character#isSpaceChar(int)
2409 * @see Character#isWhitespace(int)
2412 public static boolean isISOControl(int codePoint) {
2413 // Optimized form of:
2414 // (codePoint >= 0x00 && codePoint <= 0x1F) ||
2415 // (codePoint >= 0x7F && codePoint <= 0x9F);
2416 return codePoint <= 0x9F &&
2417 (codePoint >= 0x7F || (codePoint >>> 5 == 0));
2421 * Determines the character representation for a specific digit in
2422 * the specified radix. If the value of {@code radix} is not a
2423 * valid radix, or the value of {@code digit} is not a valid
2424 * digit in the specified radix, the null character
2425 * ({@code '\u005Cu0000'}) is returned.
2427 * The {@code radix} argument is valid if it is greater than or
2428 * equal to {@code MIN_RADIX} and less than or equal to
2429 * {@code MAX_RADIX}. The {@code digit} argument is valid if
2430 * {@code 0 <= digit < radix}.
2432 * If the digit is less than 10, then
2433 * {@code '0' + digit} is returned. Otherwise, the value
2434 * {@code 'a' + digit - 10} is returned.
2436 * @param digit the number to convert to a character.
2437 * @param radix the radix.
2438 * @return the {@code char} representation of the specified digit
2439 * in the specified radix.
2440 * @see Character#MIN_RADIX
2441 * @see Character#MAX_RADIX
2442 * @see Character#digit(char, int)
2444 public static char forDigit(int digit, int radix) {
2445 if ((digit >= radix) || (digit < 0)) {
2448 if ((radix < Character.MIN_RADIX) || (radix > Character.MAX_RADIX)) {
2452 return (char)('0' + digit);
2454 return (char)('a' - 10 + digit);
2458 * Compares two {@code Character} objects numerically.
2460 * @param anotherCharacter the {@code Character} to be compared.
2462 * @return the value {@code 0} if the argument {@code Character}
2463 * is equal to this {@code Character}; a value less than
2464 * {@code 0} if this {@code Character} is numerically less
2465 * than the {@code Character} argument; and a value greater than
2466 * {@code 0} if this {@code Character} is numerically greater
2467 * than the {@code Character} argument (unsigned comparison).
2468 * Note that this is strictly a numerical comparison; it is not
2472 public int compareTo(Character anotherCharacter) {
2473 return compare(this.value, anotherCharacter.value);
2477 * Compares two {@code char} values numerically.
2478 * The value returned is identical to what would be returned by:
2480 * Character.valueOf(x).compareTo(Character.valueOf(y))
2483 * @param x the first {@code char} to compare
2484 * @param y the second {@code char} to compare
2485 * @return the value {@code 0} if {@code x == y};
2486 * a value less than {@code 0} if {@code x < y}; and
2487 * a value greater than {@code 0} if {@code x > y}
2490 public static int compare(char x, char y) {
2496 * The number of bits used to represent a <tt>char</tt> value in unsigned
2497 * binary form, constant {@code 16}.
2501 public static final int SIZE = 16;
2504 * Returns the value obtained by reversing the order of the bytes in the
2505 * specified <tt>char</tt> value.
2507 * @return the value obtained by reversing (or, equivalently, swapping)
2508 * the bytes in the specified <tt>char</tt> value.
2511 public static char reverseBytes(char ch) {
2512 return (char) (((ch & 0xFF00) >> 8) | (ch << 8));