Package org.omegat.util
Class StringUtil
- java.lang.Object
-
- org.omegat.util.StringUtil
-
public final class StringUtil extends java.lang.Object
Utilities for string processing.
-
-
Field Summary
Fields Modifier and Type Field Description static char
TRUNCATE_CHAR
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.lang.String
capitalizeFirst(java.lang.String text, java.util.Locale locale)
static <T extends java.lang.Comparable<T>>
intcompareToWithNulls(T v1, T v2)
Compare two values, which could be null.static java.lang.String
compressSpaces(java.lang.String str)
Compresses spaces in case of non-preformatting paragraph.static java.util.List<java.lang.String>
convertToList(java.lang.String str)
For a string containing a space-separated list of items, convert that string into an ArrayListstatic java.lang.String
decodeBase64(java.lang.String b64data, java.nio.charset.Charset charset)
Decode the Base64-encodedcharset
bytes back to a String.static java.lang.String
encodeBase64(char[] chars, java.nio.charset.Charset charset)
Convert a char array'scharset
bytes into a Base64-encoded String.static java.lang.String
encodeBase64(java.lang.String string, java.nio.charset.Charset charset)
Convert a string'scharset
bytes into a Base64-encoded String.static boolean
equal(java.lang.String one, java.lang.String two)
Compares two strings for equality.static java.lang.String
escapeXMLChars(int cp)
Converts a single code point into valid XML.static java.lang.String
firstN(java.lang.String str, int len)
Extracts first N codepoints from string.static java.lang.String
format(java.lang.String str, java.lang.Object... arguments)
Formats UI strings.static int
getFirstLetterLowercase(java.lang.String s)
Returns first letter in lowercase.static java.lang.String
getTailSegments(java.lang.String str, int separator, int segments)
For a string delimited by some separator, retrieve the lastsegments
segments.static boolean
isCJK(java.lang.String input)
static boolean
isEmpty(java.lang.String str)
Check if string is empty, i.e.static boolean
isLowerCase(java.lang.String input)
Returns true if the input has at least one letter and all letters are lower case.static boolean
isMixedCase(java.lang.String input)
Returns true if the input has both upper case and lower case letters, but is not title case.static boolean
isSubstringAfter(java.lang.String text, int pos, java.lang.String substring)
Checks if text contains substring after specified position.static boolean
isSubstringBefore(java.lang.String text, int pos, java.lang.String substring)
Checks if text contains substring before specified position.static boolean
isTitleCase(int codePoint)
static boolean
isTitleCase(java.lang.String input)
Returns true if the input is title case, meaning the first character is UpperCase or TitleCase* and the rest of the string (if present) is LowerCase.static boolean
isUpperCase(java.lang.String input)
Returns true if the input is upper case.static boolean
isValidXMLChar(int codePoint)
static boolean
isWhiteSpace(int codePoint)
Returns true if the input is a whitespace character (including non-breaking characters that are false according toCharacter.isWhitespace(int)
).static boolean
isWhiteSpace(java.lang.String input)
Returns true if the input consists only of whitespace characters (including non-breaking characters that are false according toCharacter.isWhitespace(int)
).static java.lang.String
makeValidXML(java.lang.String plaintext)
Converts a stream of plaintext into valid XML.static java.lang.String
matchCapitalization(java.lang.String text, java.lang.String matchTo, java.util.Locale locale)
static java.lang.String
normalizeUnicode(java.lang.CharSequence text)
Apply Unicode NFC normalization to a string.static java.lang.String
normalizeWidth(java.lang.String text)
Normalize the width of characters in the supplied text.static <T> T
nvl(T... values)
Returns first not null object from list, or null if all values is null.static long
nvlLong(long... values)
Returns first non-zero object from list, or zero if all values is null.static java.lang.String
removeXMLInvalidChars(java.lang.String str)
Replace invalid XML chars by spaces.static java.lang.String
replaceCase(java.lang.String txt, java.util.Locale lang)
Interpret the case replacement language used in regular expressions: backslash u = uppercase next letter backslash l = lowercase next letter backslash U = uppercase next letters until backslash E backslash L = lowercase next letters until backslash E backslash u + backslash L = uppercase next letter then lowercase all until backslash E backslash l + backslash U = lowercase next letter then uppercase all until backslash E Warning: this method works with the string you give to it; if you want to do other substitutions, such as variable conversions, they must be done before the call to replaceCase, else this method will not apply to the non-yet converted parts!static java.lang.String
rstrip(java.lang.String text)
Strip whitespace from the end of a string.static java.lang.String
stripFromEnd(java.lang.String string, java.lang.String... toStrip)
static java.lang.String
toTitleCase(java.lang.String text, java.util.Locale locale)
Convert text to title case according to the supplied locale.static java.lang.String
truncate(java.lang.String text, int len)
Truncate the supplied text to a maximum of len codepoints.static java.lang.String
unescapeXMLEntities(java.lang.String text)
Converts XML entities to characters.static java.lang.String
wrap(java.lang.String text, int length)
Wrap line by length.
-
-
-
Field Detail
-
TRUNCATE_CHAR
public static final char TRUNCATE_CHAR
- See Also:
- Constant Field Values
-
-
Method Detail
-
isEmpty
public static boolean isEmpty(java.lang.String str)
Check if string is empty, i.e. null or length==0
-
isLowerCase
public static boolean isLowerCase(java.lang.String input)
Returns true if the input has at least one letter and all letters are lower case.
-
isUpperCase
public static boolean isUpperCase(java.lang.String input)
Returns true if the input is upper case.
-
isMixedCase
public static boolean isMixedCase(java.lang.String input)
Returns true if the input has both upper case and lower case letters, but is not title case.
-
isTitleCase
public static boolean isTitleCase(java.lang.String input)
Returns true if the input is title case, meaning the first character is UpperCase or TitleCase* and the rest of the string (if present) is LowerCase.*There are exotic characters that are neither UpperCase nor LowerCase, but are TitleCase: e.g. LATIN CAPITAL LETTER L WITH SMALL LETTER J (U+01C8)
These are handled correctly.
-
isTitleCase
public static boolean isTitleCase(int codePoint)
-
isWhiteSpace
public static boolean isWhiteSpace(java.lang.String input)
Returns true if the input consists only of whitespace characters (including non-breaking characters that are false according toCharacter.isWhitespace(int)
).
-
isWhiteSpace
public static boolean isWhiteSpace(int codePoint)
Returns true if the input is a whitespace character (including non-breaking characters that are false according toCharacter.isWhitespace(int)
).
-
isCJK
public static boolean isCJK(java.lang.String input)
-
capitalizeFirst
public static java.lang.String capitalizeFirst(java.lang.String text, java.util.Locale locale)
-
replaceCase
public static java.lang.String replaceCase(java.lang.String txt, java.util.Locale lang)
Interpret the case replacement language used in regular expressions:- backslash u = uppercase next letter
- backslash l = lowercase next letter
- backslash U = uppercase next letters until backslash E
- backslash L = lowercase next letters until backslash E
- backslash u + backslash L = uppercase next letter then lowercase all until backslash E
- backslash l + backslash U = lowercase next letter then uppercase all until backslash E
-
matchCapitalization
public static java.lang.String matchCapitalization(java.lang.String text, java.lang.String matchTo, java.util.Locale locale)
-
toTitleCase
public static java.lang.String toTitleCase(java.lang.String text, java.util.Locale locale)
Convert text to title case according to the supplied locale.
-
nvl
@SafeVarargs public static <T> T nvl(T... values)
Returns first not null object from list, or null if all values is null.
-
nvlLong
public static long nvlLong(long... values)
Returns first non-zero object from list, or zero if all values is null.
-
compareToWithNulls
public static <T extends java.lang.Comparable<T>> int compareToWithNulls(T v1, T v2)
Compare two values, which could be null.
-
firstN
public static java.lang.String firstN(java.lang.String str, int len)
Extracts first N codepoints from string.
-
truncate
public static java.lang.String truncate(java.lang.String text, int len)
Truncate the supplied text to a maximum of len codepoints. If truncated, the result will be the first (len - 1) codepoints plus a trailing ellipsis.- Parameters:
text
- The text to truncatelen
- The desired length (in codepoints) of the result- Returns:
- The truncated string
-
getFirstLetterLowercase
public static int getFirstLetterLowercase(java.lang.String s)
Returns first letter in lowercase. Usually used for create tag shortcuts.
-
isSubstringAfter
public static boolean isSubstringAfter(java.lang.String text, int pos, java.lang.String substring)
Checks if text contains substring after specified position.
-
isSubstringBefore
public static boolean isSubstringBefore(java.lang.String text, int pos, java.lang.String substring)
Checks if text contains substring before specified position.
-
stripFromEnd
public static java.lang.String stripFromEnd(java.lang.String string, java.lang.String... toStrip)
-
normalizeUnicode
public static java.lang.String normalizeUnicode(java.lang.CharSequence text)
Apply Unicode NFC normalization to a string.
-
removeXMLInvalidChars
public static java.lang.String removeXMLInvalidChars(java.lang.String str)
Replace invalid XML chars by spaces.- Parameters:
str
- input stream- Returns:
- result stream
- See Also:
- Supported chars
-
isValidXMLChar
public static boolean isValidXMLChar(int codePoint)
-
makeValidXML
public static java.lang.String makeValidXML(java.lang.String plaintext)
Converts a stream of plaintext into valid XML. Output stream must convert stream to UTF-8 when saving to disk.
-
compressSpaces
public static java.lang.String compressSpaces(java.lang.String str)
Compresses spaces in case of non-preformatting paragraph.
-
escapeXMLChars
public static java.lang.String escapeXMLChars(int cp)
Converts a single code point into valid XML. Output stream must convert stream to UTF-8 when saving to disk.
-
unescapeXMLEntities
public static java.lang.String unescapeXMLEntities(java.lang.String text)
Converts XML entities to characters.
-
equal
public static boolean equal(java.lang.String one, java.lang.String two)
Compares two strings for equality. Handles nulls: if both strings are nulls they are considered equal.
-
format
public static java.lang.String format(java.lang.String str, java.lang.Object... arguments)
Formats UI strings. Note: This is only a first attempt at putting right what goes wrong in MessageFormat. Currently it only duplicates single quotes, but it doesn't even test if the string contains parameters (numbers in curly braces), and it doesn't allow for string containg already escaped quotes.- Parameters:
str
- The string to formatarguments
- Arguments to use in formatting the string- Returns:
- The formatted string
-
normalizeWidth
public static java.lang.String normalizeWidth(java.lang.String text)
Normalize the width of characters in the supplied text. Specifically:- ASCII characters will become halfwidth
- Katakana characters will become fullwidth
- Hangul will become fullwidth
- Letter-like symbols and squared Latin abbreviations will be decomposed to ASCII
- Parameters:
text
-- Returns:
- Normalized-width text
-
rstrip
public static java.lang.String rstrip(java.lang.String text)
Strip whitespace from the end of a string. UsesCharacter.isWhitespace(int)
, so it does not strip the extra non-breaking whitespace included inisWhiteSpace(int)
.- Parameters:
text
-- Returns:
- text with trailing whitespace removed
-
encodeBase64
public static java.lang.String encodeBase64(java.lang.String string, java.nio.charset.Charset charset)
Convert a string'scharset
bytes into a Base64-encoded String.- Parameters:
string
- a stringcharset
- the charset with which to obtain the bytes- Returns:
- Base64-encoded String
-
encodeBase64
public static java.lang.String encodeBase64(char[] chars, java.nio.charset.Charset charset)
Convert a char array'scharset
bytes into a Base64-encoded String. Useful for handling passwords. Intermediate buffers are cleared after use.- Parameters:
chars
- a char arraycharset
- the charset with which to obtain the bytes- Returns:
- Base64-encoded String
-
decodeBase64
public static java.lang.String decodeBase64(java.lang.String b64data, java.nio.charset.Charset charset)
Decode the Base64-encodedcharset
bytes back to a String.- Parameters:
b64data
- Base64-encoded Stringcharset
- charset of decoded bytes- Returns:
- String
-
getTailSegments
public static java.lang.String getTailSegments(java.lang.String str, int separator, int segments)
For a string delimited by some separator, retrieve the lastsegments
segments.- Parameters:
str
- The stringseparator
- The separator delimiting the string's segmentssegments
- The number of segments to return, starting at the end- Returns:
- The trailing segments, or, if
segments
is greater than the number of segments contained instr
, thenstr
itself.
-
convertToList
public static java.util.List<java.lang.String> convertToList(java.lang.String str)
For a string containing a space-separated list of items, convert that string into an ArrayList- Parameters:
str
- The string, with items separated by whitespace- Returns:
- An ArrayList of the items in the original space-separated list
-
wrap
public static java.lang.String wrap(java.lang.String text, int length)
Wrap line by length.- Parameters:
text
- string to process.length
- wrap length.- Returns:
- string wrapped.
-
-