1aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt/* 2aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * Copyright (C) 2008 The Guava Authors 3aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * 4aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * Licensed under the Apache License, Version 2.0 (the "License"); 5aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * you may not use this file except in compliance with the License. 6aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * You may obtain a copy of the License at 7aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * 8aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * http://www.apache.org/licenses/LICENSE-2.0 9aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * 10aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * Unless required by applicable law or agreed to in writing, software 11aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * distributed under the License is distributed on an "AS IS" BASIS, 12aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * See the License for the specific language governing permissions and 14aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * limitations under the License. 15aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt */ 16aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt 17aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwaltpackage com.google.common.escape; 18aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt 19aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwaltimport com.google.common.annotations.Beta; 20aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwaltimport com.google.common.annotations.GwtCompatible; 21aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwaltimport com.google.common.base.Function; 22aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt 23aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt/** 24aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * An object that converts literal text into a format safe for inclusion in a particular context 25aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * (such as an XML document). Typically (but not always), the inverse process of "unescaping" the 26f43396caaaae8f336bcf6fe9128a89dc7a7b0a5cRobert Greenwalt * text is performed automatically by the relevant parser. 27f43396caaaae8f336bcf6fe9128a89dc7a7b0a5cRobert Greenwalt * 28f43396caaaae8f336bcf6fe9128a89dc7a7b0a5cRobert Greenwalt * <p>For example, an XML escaper would convert the literal string {@code "Foo<Bar>"} into {@code 29aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * "Foo<Bar>"} to prevent {@code "<Bar>"} from being confused with an XML tag. When the 30aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * resulting XML document is parsed, the parser API will return this text as the original literal 31aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * string {@code "Foo<Bar>"}. 3245b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * 3345b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * <p>An {@code Escaper} instance is required to be stateless, and safe when used concurrently by 3445b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * multiple threads. 3545b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * 3645b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * <p>Because, in general, escaping operates on the code points of a string and not on its 3745b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * individual {@code char} values, it is not safe to assume that {@code escape(s)} is equivalent to 3845b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * {@code escape(s.substring(0, n)) + escape(s.substing(n))} for arbitrary {@code n}. This is 3945b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * because of the possibility of splitting a surrogate pair. The only case in which it is safe to 4045b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * escape strings and concatenate the results is if you can rule out this possibility, either by 4145b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * splitting an existing long string into short strings adaptively around {@linkplain 4245b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * Character#isHighSurrogate surrogate} {@linkplain Character#isLowSurrogate pairs}, or by starting 43aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * with short strings already known to be free of unpaired surrogates. 44aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * 45aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * <p>The two primary implementations of this interface are {@link CharEscaper} and {@link 46aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * UnicodeEscaper}. They are heavily optimized for performance and greatly simplify the task of 47aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * implementing new escapers. It is strongly recommended that when implementing a new escaper you 48aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * extend one of these classes. If you find that you are unable to achieve the desired behavior 49aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * using either of these classes, please contact the Java libraries team for advice. 50aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * 51aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * <p>Several popular escapers are defined as constants in classes like {@link 52aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * com.google.common.html.HtmlEscapers}, {@link com.google.common.xml.XmlEscapers}, and {@link 53aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * SourceCodeEscapers}. To create your own escapers, use {@link CharEscaperBuilder}, or extend 54aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * {@code CharEscaper} or {@code UnicodeEscaper}. 55aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * 5645b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * @author David Beaumont 5745b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * @since 15.0 5845b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti */ 5945b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti@Beta 6045b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti@GwtCompatible 61aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwaltpublic abstract class Escaper { 620a46db5d88461d9a6c85bb2e95982ac4c511d57eRobert Greenwalt // TODO(user): evaluate custom implementations, considering package private constructor. 63aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt /** Constructor for use by subclasses. */ 6445b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti protected Escaper() {} 6545b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti 6645b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti /** 6745b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * Returns the escaped form of a given literal string. 6845b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * 6945b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * <p>Note that this method may treat input characters differently depending on the specific 7045b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * escaper implementation. 7145b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * 7245b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * <ul> 7345b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * <li>{@link UnicodeEscaper} handles <a href="http://en.wikipedia.org/wiki/UTF-16">UTF-16</a> 7445b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * correctly, including surrogate character pairs. If the input is badly formed the escaper 7545b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * should throw {@link IllegalArgumentException}. 7645b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * <li>{@link CharEscaper} handles Java characters independently and does not verify the input for 7745b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * well formed characters. A {@code CharEscaper} should not be used in situations where input 7845b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * is not guaranteed to be restricted to the Basic Multilingual Plane (BMP). 7945b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * </ul> 80aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * 8159b1a4ede7032c1b4d897e13dd4ede09b5e14743Robert Greenwalt * @param string the literal string to be escaped 8259b1a4ede7032c1b4d897e13dd4ede09b5e14743Robert Greenwalt * @return the escaped form of {@code string} 8359b1a4ede7032c1b4d897e13dd4ede09b5e14743Robert Greenwalt * @throws NullPointerException if {@code string} is null 84aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * @throws IllegalArgumentException if {@code string} contains badly formed UTF-16 or cannot be 8559b1a4ede7032c1b4d897e13dd4ede09b5e14743Robert Greenwalt * escaped for any other reason 86aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt */ 8759b1a4ede7032c1b4d897e13dd4ede09b5e14743Robert Greenwalt public abstract String escape(String string); 8859b1a4ede7032c1b4d897e13dd4ede09b5e14743Robert Greenwalt 8945b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti private final Function<String, String> asFunction = 9045b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti new Function<String, String>() { 9159b1a4ede7032c1b4d897e13dd4ede09b5e14743Robert Greenwalt @Override 92aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt public String apply(String from) { 938c0b528a4746228461ead10f0d477345b607fef1Kazuhiro Ondo return escape(from); 948c0b528a4746228461ead10f0d477345b607fef1Kazuhiro Ondo } 958c0b528a4746228461ead10f0d477345b607fef1Kazuhiro Ondo }; 968c0b528a4746228461ead10f0d477345b607fef1Kazuhiro Ondo 978c0b528a4746228461ead10f0d477345b607fef1Kazuhiro Ondo /** 988c0b528a4746228461ead10f0d477345b607fef1Kazuhiro Ondo * Returns a {@link Function} that invokes {@link #escape(String)} on this escaper. 998c0b528a4746228461ead10f0d477345b607fef1Kazuhiro Ondo */ 100f43396caaaae8f336bcf6fe9128a89dc7a7b0a5cRobert Greenwalt public final Function<String, String> asFunction() { 101f43396caaaae8f336bcf6fe9128a89dc7a7b0a5cRobert Greenwalt return asFunction; 102aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt } 10345b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti} 104aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt