1aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt/*
2aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * Copyright (C) 2008 The Guava Authors
3aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt *
4aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * Licensed under the Apache License, Version 2.0 (the "License");
5aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * you may not use this file except in compliance with the License.
6aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * You may obtain a copy of the License at
7aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt *
8aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * http://www.apache.org/licenses/LICENSE-2.0
9aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt *
10aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * Unless required by applicable law or agreed to in writing, software
11aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * distributed under the License is distributed on an "AS IS" BASIS,
12aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * See the License for the specific language governing permissions and
14aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * limitations under the License.
15aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt */
16aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt
17aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwaltpackage com.google.common.escape;
18aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt
19aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwaltimport com.google.common.annotations.Beta;
20aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwaltimport com.google.common.annotations.GwtCompatible;
21aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwaltimport com.google.common.base.Function;
22aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt
23aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt/**
24aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * An object that converts literal text into a format safe for inclusion in a particular context
25aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * (such as an XML document). Typically (but not always), the inverse process of "unescaping" the
26f43396caaaae8f336bcf6fe9128a89dc7a7b0a5cRobert Greenwalt * text is performed automatically by the relevant parser.
27f43396caaaae8f336bcf6fe9128a89dc7a7b0a5cRobert Greenwalt *
28f43396caaaae8f336bcf6fe9128a89dc7a7b0a5cRobert Greenwalt * <p>For example, an XML escaper would convert the literal string {@code "Foo<Bar>"} into {@code
29aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * "Foo&lt;Bar&gt;"} to prevent {@code "<Bar>"} from being confused with an XML tag. When the
30aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * resulting XML document is parsed, the parser API will return this text as the original literal
31aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * string {@code "Foo<Bar>"}.
3245b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti *
3345b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * <p>An {@code Escaper} instance is required to be stateless, and safe when used concurrently by
3445b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * multiple threads.
3545b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti *
3645b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * <p>Because, in general, escaping operates on the code points of a string and not on its
3745b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * individual {@code char} values, it is not safe to assume that {@code escape(s)} is equivalent to
3845b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * {@code escape(s.substring(0, n)) + escape(s.substing(n))} for arbitrary {@code n}. This is
3945b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * because of the possibility of splitting a surrogate pair. The only case in which it is safe to
4045b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * escape strings and concatenate the results is if you can rule out this possibility, either by
4145b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * splitting an existing long string into short strings adaptively around {@linkplain
4245b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * Character#isHighSurrogate surrogate} {@linkplain Character#isLowSurrogate pairs}, or by starting
43aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * with short strings already known to be free of unpaired surrogates.
44aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt *
45aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * <p>The two primary implementations of this interface are {@link CharEscaper} and {@link
46aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * UnicodeEscaper}. They are heavily optimized for performance and greatly simplify the task of
47aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * implementing new escapers. It is strongly recommended that when implementing a new escaper you
48aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * extend one of these classes. If you find that you are unable to achieve the desired behavior
49aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * using either of these classes, please contact the Java libraries team for advice.
50aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt *
51aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * <p>Several popular escapers are defined as constants in classes like {@link
52aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * com.google.common.html.HtmlEscapers}, {@link com.google.common.xml.XmlEscapers}, and {@link
53aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * SourceCodeEscapers}. To create your own escapers, use {@link CharEscaperBuilder}, or extend
54aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt * {@code CharEscaper} or {@code UnicodeEscaper}.
55aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt *
5645b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * @author David Beaumont
5745b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti * @since 15.0
5845b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti */
5945b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti@Beta
6045b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti@GwtCompatible
61aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwaltpublic abstract class Escaper {
620a46db5d88461d9a6c85bb2e95982ac4c511d57eRobert Greenwalt  // TODO(user): evaluate custom implementations, considering package private constructor.
63aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt  /** Constructor for use by subclasses. */
6445b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti  protected Escaper() {}
6545b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti
6645b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti  /**
6745b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti   * Returns the escaped form of a given literal string.
6845b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti   *
6945b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti   * <p>Note that this method may treat input characters differently depending on the specific
7045b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti   * escaper implementation.
7145b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti   *
7245b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti   * <ul>
7345b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti   * <li>{@link UnicodeEscaper} handles <a href="http://en.wikipedia.org/wiki/UTF-16">UTF-16</a>
7445b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti   *     correctly, including surrogate character pairs. If the input is badly formed the escaper
7545b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti   *     should throw {@link IllegalArgumentException}.
7645b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti   * <li>{@link CharEscaper} handles Java characters independently and does not verify the input for
7745b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti   *     well formed characters. A {@code CharEscaper} should not be used in situations where input
7845b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti   *     is not guaranteed to be restricted to the Basic Multilingual Plane (BMP).
7945b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti   * </ul>
80aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt   *
8159b1a4ede7032c1b4d897e13dd4ede09b5e14743Robert Greenwalt   * @param string the literal string to be escaped
8259b1a4ede7032c1b4d897e13dd4ede09b5e14743Robert Greenwalt   * @return the escaped form of {@code string}
8359b1a4ede7032c1b4d897e13dd4ede09b5e14743Robert Greenwalt   * @throws NullPointerException if {@code string} is null
84aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt   * @throws IllegalArgumentException if {@code string} contains badly formed UTF-16 or cannot be
8559b1a4ede7032c1b4d897e13dd4ede09b5e14743Robert Greenwalt   *         escaped for any other reason
86aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt   */
8759b1a4ede7032c1b4d897e13dd4ede09b5e14743Robert Greenwalt  public abstract String escape(String string);
8859b1a4ede7032c1b4d897e13dd4ede09b5e14743Robert Greenwalt
8945b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti  private final Function<String, String> asFunction =
9045b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti      new Function<String, String>() {
9159b1a4ede7032c1b4d897e13dd4ede09b5e14743Robert Greenwalt        @Override
92aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt        public String apply(String from) {
938c0b528a4746228461ead10f0d477345b607fef1Kazuhiro Ondo          return escape(from);
948c0b528a4746228461ead10f0d477345b607fef1Kazuhiro Ondo        }
958c0b528a4746228461ead10f0d477345b607fef1Kazuhiro Ondo      };
968c0b528a4746228461ead10f0d477345b607fef1Kazuhiro Ondo
978c0b528a4746228461ead10f0d477345b607fef1Kazuhiro Ondo  /**
988c0b528a4746228461ead10f0d477345b607fef1Kazuhiro Ondo   * Returns a {@link Function} that invokes {@link #escape(String)} on this escaper.
998c0b528a4746228461ead10f0d477345b607fef1Kazuhiro Ondo   */
100f43396caaaae8f336bcf6fe9128a89dc7a7b0a5cRobert Greenwalt  public final Function<String, String> asFunction() {
101f43396caaaae8f336bcf6fe9128a89dc7a7b0a5cRobert Greenwalt    return asFunction;
102aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt  }
10345b9a5bb93569ca49bbd44f7a518091371687f96Lorenzo Colitti}
104aa70f101e08098ed9cb190abe2d7f952561026b8Robert Greenwalt