15821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* fts2 has a design flaw which can lead to database corruption (see
25821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** below).  It is recommended not to use it any longer, instead use
35821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** fts3 (or higher).  If you believe that your use of fts2 is safe,
45821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** add -DSQLITE_ENABLE_BROKEN_FTS2=1 to your CFLAGS.
55821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
65821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#if (!defined(SQLITE_CORE) || defined(SQLITE_ENABLE_FTS2)) \
75821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        && !defined(SQLITE_ENABLE_BROKEN_FTS2)
85821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#error fts2 has a design flaw and has been deprecated.
95821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* The flaw is that fts2 uses the content table's unaliased rowid as
115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the unique docid.  fts2 embeds the rowid in the index it builds,
125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** and expects the rowid to not change.  The SQLite VACUUM operation
135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** will renumber such rowids, thereby breaking fts2.  If you are using
145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** fts2 in a system which has disabled VACUUM, then you can continue
155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** to use it safely.  Note that PRAGMA auto_vacuum does NOT disable
165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** VACUUM, though systems using auto_vacuum are unlikely to invoke
175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** VACUUM.
185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Unlike fts1, which is safe across VACUUM if you never delete
205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** documents, fts2 has a second exposure to this flaw, in the segments
215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** table.  So fts2 should be considered unsafe across VACUUM in all
225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** cases.
235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** 2006 Oct 10
275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** The author disclaims copyright to this source code.  In place of
295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** a legal notice, here is a blessing:
305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**    May you do good and not evil.
325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**    May you find forgiveness for yourself and forgive others.
335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**    May you share freely, never taking more than you give.
345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)******************************************************************************
365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** This is an SQLite module implementing full-text search.
385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess): To make it easier to spot changes without groveling
415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** through changelogs, I've defined GEARS_FTS2_CHANGES to call them
425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** out, and I will document them here.  On imports, these changes
435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** should be reviewed to make sure they are still present, or are
445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dropped as appropriate.
455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** SQLite core adds the custom function fts2_tokenizer() to be used
475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** for defining new tokenizers.  The second parameter is a vtable
485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** pointer encoded as a blob.  Obviously this cannot be exposed to
495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Gears callers for security reasons.  It could be suppressed in the
505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** authorizer, but for now I have simply commented the definition out.
515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define GEARS_FTS2_CHANGES 1
535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** The code in this file is only compiled if:
565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     * The FTS2 module is being built as an extension
585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**       (in which case SQLITE_CORE is not defined), or
595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     * The FTS2 module is being built into the core of
615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**       SQLite (in which case SQLITE_ENABLE_FTS2 is defined).
625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) Consider exporting this comment to an HTML file or the
655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** wiki.
665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* The full-text index is stored in a series of b+tree (-like)
685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** structures called segments which map terms to doclists.  The
695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** structures are like b+trees in layout, but are constructed from the
705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** bottom up in optimal fashion and are not updatable.  Since trees
715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** are built from the bottom up, things will be described from the
725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** bottom up.
735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**** Varints ****
765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** The basic unit of encoding is a variable-length integer called a
775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** varint.  We encode variable-length integers in little-endian order
785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** using seven bits * per byte as follows:
795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** KEY:
815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**         A = 0xxxxxxx    7 bits of data and one flag bit
825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**         B = 1xxxxxxx    7 bits of data and one flag bit
835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**  7 bits - A
855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** 14 bits - BA
865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** 21 bits - BBA
875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** and so on.
885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** This is identical to how sqlite encodes varints (see util.c).
905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**** Document lists ****
935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** A doclist (document list) holds a docid-sorted list of hits for a
945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** given term.  Doclists hold docids, and can optionally associate
955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** token positions and offsets with docids.
965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** A DL_POSITIONS_OFFSETS doclist is stored like this:
985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** array {
1005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   varint docid;
1015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   array {                (position list for column 0)
1025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     varint position;     (delta from previous position plus POS_BASE)
1035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     varint startOffset;  (delta from previous startOffset)
1045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     varint endOffset;    (delta from startOffset)
1055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   }
1065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   array {
1075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     varint POS_COLUMN;   (marks start of position list for new column)
1085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     varint column;       (index of new column)
1095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     array {
1105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**       varint position;   (delta from previous position plus POS_BASE)
1115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**       varint startOffset;(delta from previous startOffset)
1125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**       varint endOffset;  (delta from startOffset)
1135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     }
1145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   }
1155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   varint POS_END;        (marks end of positions for this document.
1165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** }
1175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
1185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Here, array { X } means zero or more occurrences of X, adjacent in
1195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** memory.  A "position" is an index of a token in the token stream
1205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** generated by the tokenizer, while an "offset" is a byte offset,
1215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** both based at 0.  Note that POS_END and POS_COLUMN occur in the
1225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** same logical place as the position element, and act as sentinals
1235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** ending a position list array.
1245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
1255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** A DL_POSITIONS doclist omits the startOffset and endOffset
1265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** information.  A DL_DOCIDS doclist omits both the position and
1275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** offset information, becoming an array of varint-encoded docids.
1285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
1295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** On-disk data is stored as type DL_DEFAULT, so we don't serialize
1305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the type.  Due to how deletion is implemented in the segmentation
1315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** system, on-disk doclists MUST store at least positions.
1325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
1335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
1345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**** Segment leaf nodes ****
1355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Segment leaf nodes store terms and doclists, ordered by term.  Leaf
1365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** nodes are written using LeafWriter, and read using LeafReader (to
1375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** iterate through a single leaf node's data) and LeavesReader (to
1385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** iterate through a segment's entire leaf layer).  Leaf nodes have
1395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the format:
1405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
1415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** varint iHeight;             (height from leaf level, always 0)
1425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** varint nTerm;               (length of first term)
1435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** char pTerm[nTerm];          (content of first term)
1445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** varint nDoclist;            (length of term's associated doclist)
1455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** char pDoclist[nDoclist];    (content of doclist)
1465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** array {
1475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**                             (further terms are delta-encoded)
1485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   varint nPrefix;           (length of prefix shared with previous term)
1495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   varint nSuffix;           (length of unshared suffix)
1505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   char pTermSuffix[nSuffix];(unshared suffix of next term)
1515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   varint nDoclist;          (length of term's associated doclist)
1525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   char pDoclist[nDoclist];  (content of doclist)
1535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** }
1545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
1555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Here, array { X } means zero or more occurrences of X, adjacent in
1565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** memory.
1575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
1585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Leaf nodes are broken into blocks which are stored contiguously in
1595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the %_segments table in sorted order.  This means that when the end
1605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** of a node is reached, the next term is in the node with the next
1615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** greater node id.
1625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
1635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** New data is spilled to a new leaf node when the current node
1645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** exceeds LEAF_MAX bytes (default 2048).  New data which itself is
1655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** larger than STANDALONE_MIN (default 1024) is placed in a standalone
1665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** node (a leaf node with a single term and doclist).  The goal of
1675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** these settings is to pack together groups of small doclists while
1685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** making it efficient to directly access large doclists.  The
1695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** assumption is that large doclists represent terms which are more
1705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** likely to be query targets.
1715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
1725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** TODO(shess) It may be useful for blocking decisions to be more
1735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dynamic.  For instance, it may make more sense to have a 2.5k leaf
1745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** node rather than splitting into 2k and .5k nodes.  My intuition is
1755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** that this might extend through 2x or 4x the pagesize.
1765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
1775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
1785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**** Segment interior nodes ****
1795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Segment interior nodes store blockids for subtree nodes and terms
1805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** to describe what data is stored by the each subtree.  Interior
1815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** nodes are written using InteriorWriter, and read using
1825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** InteriorReader.  InteriorWriters are created as needed when
1835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** SegmentWriter creates new leaf nodes, or when an interior node
1845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** itself grows too big and must be split.  The format of interior
1855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** nodes:
1865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
1875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** varint iHeight;           (height from leaf level, always >0)
1885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** varint iBlockid;          (block id of node's leftmost subtree)
1895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** optional {
1905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   varint nTerm;           (length of first term)
1915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   char pTerm[nTerm];      (content of first term)
1925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   array {
1935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**                                (further terms are delta-encoded)
1945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     varint nPrefix;            (length of shared prefix with previous term)
1955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     varint nSuffix;            (length of unshared suffix)
1965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     char pTermSuffix[nSuffix]; (unshared suffix of next term)
1975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   }
1985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** }
1995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Here, optional { X } means an optional element, while array { X }
2015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** means zero or more occurrences of X, adjacent in memory.
2025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** An interior node encodes n terms separating n+1 subtrees.  The
2045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** subtree blocks are contiguous, so only the first subtree's blockid
2055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** is encoded.  The subtree at iBlockid will contain all terms less
2065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** than the first term encoded (or all terms if no term is encoded).
2075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Otherwise, for terms greater than or equal to pTerm[i] but less
2085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** than pTerm[i+1], the subtree for that term will be rooted at
2095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** iBlockid+i.  Interior nodes only store enough term data to
2105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** distinguish adjacent children (if the rightmost term of the left
2115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** child is "something", and the leftmost term of the right child is
2125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** "wicked", only "w" is stored).
2135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** New data is spilled to a new interior node at the same height when
2155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the current node exceeds INTERIOR_MAX bytes (default 2048).
2165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** INTERIOR_MIN_TERMS (default 7) keeps large terms from monopolizing
2175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** interior nodes and making the tree too skinny.  The interior nodes
2185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** at a given height are naturally tracked by interior nodes at
2195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** height+1, and so on.
2205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**** Segment directory ****
2235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** The segment directory in table %_segdir stores meta-information for
2245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** merging and deleting segments, and also the root node of the
2255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** segment's tree.
2265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** The root node is the top node of the segment's tree after encoding
2285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the entire segment, restricted to ROOT_MAX bytes (default 1024).
2295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** This could be either a leaf node or an interior node.  If the top
2305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** node requires more than ROOT_MAX bytes, it is flushed to %_segments
2315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** and a new root interior node is generated (which should always fit
2325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** within ROOT_MAX because it only needs space for 2 varints, the
2335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** height and the blockid of the previous root).
2345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** The meta-information in the segment directory is:
2365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   level               - segment level (see below)
2375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   idx                 - index within level
2385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**                       - (level,idx uniquely identify a segment)
2395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   start_block         - first leaf node
2405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   leaves_end_block    - last leaf node
2415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   end_block           - last block (including interior nodes)
2425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   root                - contents of root node
2435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** If the root node is a leaf node, then start_block,
2455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** leaves_end_block, and end_block are all 0.
2465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**** Segment merging ****
2495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** To amortize update costs, segments are groups into levels and
2505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** merged in matches.  Each increase in level represents exponentially
2515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** more documents.
2525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** New documents (actually, document updates) are tokenized and
2545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** written individually (using LeafWriter) to a level 0 segment, with
2555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** incrementing idx.  When idx reaches MERGE_COUNT (default 16), all
2565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** level 0 segments are merged into a single level 1 segment.  Level 1
2575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** is populated like level 0, and eventually MERGE_COUNT level 1
2585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** segments are merged to a single level 2 segment (representing
2595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** MERGE_COUNT^2 updates), and so on.
2605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** A segment merge traverses all segments at a given level in
2625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** parallel, performing a straightforward sorted merge.  Since segment
2635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** leaf nodes are written in to the %_segments table in order, this
2645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** merge traverses the underlying sqlite disk structures efficiently.
2655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** After the merge, all segment blocks from the merged level are
2665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** deleted.
2675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** MERGE_COUNT controls how often we merge segments.  16 seems to be
2695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** somewhat of a sweet spot for insertion performance.  32 and 64 show
2705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** very similar performance numbers to 16 on insertion, though they're
2715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** a tiny bit slower (perhaps due to more overhead in merge-time
2725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** sorting).  8 is about 20% slower than 16, 4 about 50% slower than
2735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** 16, 2 about 66% slower than 16.
2745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** At query time, high MERGE_COUNT increases the number of segments
2765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** which need to be scanned and merged.  For instance, with 100k docs
2775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** inserted:
2785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**    MERGE_COUNT   segments
2805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**       16           25
2815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**        8           12
2825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**        4           10
2835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**        2            6
2845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** This appears to have only a moderate impact on queries for very
2865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** frequent terms (which are somewhat dominated by segment merge
2875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** costs), and infrequent and non-existent terms still seem to be fast
2885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** even with many segments.
2895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** TODO(shess) That said, it would be nice to have a better query-side
2915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** argument for MERGE_COUNT of 16.  Also, it is possible/likely that
2925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** optimizations to things like doclist merging will swing the sweet
2935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** spot around.
2945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
2975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**** Handling of deletions and updates ****
2985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Since we're using a segmented structure, with no docid-oriented
2995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** index into the term index, we clearly cannot simply update the term
3005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** index when a document is deleted or updated.  For deletions, we
3015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** write an empty doclist (varint(docid) varint(POS_END)), for updates
3025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** we simply write the new doclist.  Segment merges overwrite older
3035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** data for a particular docid with newer data, so deletes or updates
3045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** will eventually overtake the earlier data and knock it out.  The
3055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** query logic likewise merges doclists so that newer data knocks out
3065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** older data.
3075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
3085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** TODO(shess) Provide a VACUUM type operation to clear out all
3095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** deletions and duplications.  This would basically be a forced merge
3105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** into a single segment.
3115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
3125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
3135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#if !defined(SQLITE_CORE) || defined(SQLITE_ENABLE_FTS2)
3145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
3155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#if defined(SQLITE_ENABLE_FTS2) && !defined(SQLITE_CORE)
3165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)# define SQLITE_CORE 1
3175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
3185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
3195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#include <assert.h>
3205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#include <stdlib.h>
3215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#include <stdio.h>
3225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#include <string.h>
3235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#include "fts2.h"
3245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#include "fts2_hash.h"
3255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#include "fts2_tokenizer.h"
3265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#include "sqlite3.h"
3275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifndef SQLITE_CORE
3285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)# include "sqlite3ext.h"
3295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SQLITE_EXTENSION_INIT1
3305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
3315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
3325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
3335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) MAN, this thing needs some refactoring.  At minimum, it
3345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** would be nice to order the file better, perhaps something along the
3355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** lines of:
3365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
3375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**  - utility functions
3385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**  - table setup functions
3395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**  - table update functions
3405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**  - table query functions
3415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
3425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Put the query functions last because they're likely to reference
3435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** typedefs or functions from the table update section.
3445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
3455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
3465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#if 0
3475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)# define TRACE(A)  printf A; fflush(stdout)
3485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#else
3495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)# define TRACE(A)
3505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
3515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
3525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#if 0
3535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Useful to set breakpoints.  See main.c sqlite3Corrupt(). */
3545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fts2Corrupt(void){
3555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_CORRUPT;
3565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
3575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)# define SQLITE_CORRUPT_BKPT fts2Corrupt()
3585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#else
3595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)# define SQLITE_CORRUPT_BKPT SQLITE_CORRUPT
3605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
3615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
3625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* It is not safe to call isspace(), tolower(), or isalnum() on
3635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** hi-bit-set characters.  This is the same solution used in the
3645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** tokenizer.
3655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
3665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) The snippet-generation code should be using the
3675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** tokenizer-generated tokens rather than doing its own local
3685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** tokenization.
3695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
3705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) Is __isascii() a portable version of (c&0x80)==0? */
3715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int safe_isspace(char c){
3725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return c==' ' || c=='\t' || c=='\n' || c=='\r' || c=='\v' || c=='\f';
3735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
3745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int safe_tolower(char c){
3755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return (c>='A' && c<='Z') ? (c - 'A' + 'a') : c;
3765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
3775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int safe_isalnum(char c){
3785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return (c>='0' && c<='9') || (c>='A' && c<='Z') || (c>='a' && c<='z');
3795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
3805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
3815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef enum DocListType {
3825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DL_DOCIDS,              /* docids only */
3835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DL_POSITIONS,           /* docids + positions */
3845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DL_POSITIONS_OFFSETS    /* docids + positions + offsets */
3855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} DocListType;
3865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
3875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
3885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** By default, only positions and not offsets are stored in the doclists.
3895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** To change this so that offsets are stored too, compile with
3905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
3915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**          -DDL_DEFAULT=DL_POSITIONS_OFFSETS
3925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
3935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** If DL_DEFAULT is set to DL_DOCIDS, your table can only be inserted
3945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** into (no deletes or updates).
3955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
3965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifndef DL_DEFAULT
3975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)# define DL_DEFAULT DL_POSITIONS
3985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
3995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
4005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)enum {
4015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  POS_END = 0,        /* end of this position list */
4025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  POS_COLUMN,         /* followed by new column number */
4035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  POS_BASE
4045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)};
4055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
4065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* MERGE_COUNT controls how often we merge segments (see comment at
4075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** top of file).
4085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
4095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define MERGE_COUNT 16
4105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
4115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* utility functions */
4125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
4135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* CLEAR() and SCRAMBLE() abstract memset() on a pointer to a single
4145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** record to prevent errors of the form:
4155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
4165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** my_function(SomeType *b){
4175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   memset(b, '\0', sizeof(b));  // sizeof(b)!=sizeof(*b)
4185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** }
4195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
4205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) Obvious candidates for a header file. */
4215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define CLEAR(b) memset(b, '\0', sizeof(*(b)))
4225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
4235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifndef NDEBUG
4245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#  define SCRAMBLE(b) memset(b, 0x55, sizeof(*(b)))
4255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#else
4265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#  define SCRAMBLE(b)
4275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
4285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
4295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* We may need up to VARINT_MAX bytes to store an encoded 64-bit integer. */
4305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define VARINT_MAX 10
4315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
4325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Write a 64-bit variable-length integer to memory starting at p[0].
4335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * The length of data written will be between 1 and VARINT_MAX bytes.
4345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * The number of bytes written is returned. */
4355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int putVarint(char *p, sqlite_int64 v){
4365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  unsigned char *q = (unsigned char *) p;
4375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_uint64 vu = v;
4385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  do{
4395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *q++ = (unsigned char) ((vu & 0x7f) | 0x80);
4405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    vu >>= 7;
4415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }while( vu!=0 );
4425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  q[-1] &= 0x7f;  /* turn off high bit in final byte */
4435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( q - (unsigned char *)p <= VARINT_MAX );
4445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return (int) (q - (unsigned char *)p);
4455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
4465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
4475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Read a 64-bit variable-length integer from memory starting at p[0].
4485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * Return the number of bytes read, or 0 on error.
4495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * The value is stored in *v. */
4505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int getVarintSafe(const char *p, sqlite_int64 *v, int max){
4515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const unsigned char *q = (const unsigned char *) p;
4525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_uint64 x = 0, y = 1;
4535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( max>VARINT_MAX ) max = VARINT_MAX;
4545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( max && (*q & 0x80) == 0x80 ){
4555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    max--;
4565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    x += y * (*q++ & 0x7f);
4575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    y <<= 7;
4585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
4595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if ( !max ){
4605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( 0 );
4615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return 0;  /* tried to read too much; bad data */
4625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
4635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  x += y * (*q++);
4645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *v = (sqlite_int64) x;
4655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return (int) (q - (unsigned char *)p);
4665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
4675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
4685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int getVarint(const char *p, sqlite_int64 *v){
4695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return getVarintSafe(p, v, VARINT_MAX);
4705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
4715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
4725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int getVarint32Safe(const char *p, int *pi, int max){
4735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) sqlite_int64 i;
4745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) int ret = getVarintSafe(p, &i, max);
4755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) if( !ret ) return ret;
4765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *pi = (int) i;
4775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) assert( *pi==i );
4785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) return ret;
4795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
4805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
4815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int getVarint32(const char* p, int *pi){
4825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return getVarint32Safe(p, pi, VARINT_MAX);
4835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
4845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
4855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*******************************************************************/
4865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* DataBuffer is used to collect data into a buffer in piecemeal
4875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** fashion.  It implements the usual distinction between amount of
4885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** data currently stored (nData) and buffer capacity (nCapacity).
4895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
4905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dataBufferInit - create a buffer with given initial capacity.
4915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dataBufferReset - forget buffer's data, retaining capacity.
4925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dataBufferDestroy - free buffer's data.
4935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dataBufferSwap - swap contents of two buffers.
4945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dataBufferExpand - expand capacity without adding data.
4955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dataBufferAppend - append data.
4965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dataBufferAppend2 - append two pieces of data at once.
4975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dataBufferReplace - replace buffer's data.
4985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
4995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct DataBuffer {
5005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char *pData;          /* Pointer to malloc'ed buffer. */
5015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nCapacity;        /* Size of pData buffer. */
5025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nData;            /* End of data loaded into pData. */
5035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} DataBuffer;
5045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
5055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dataBufferInit(DataBuffer *pBuffer, int nCapacity){
5065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nCapacity>=0 );
5075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pBuffer->nData = 0;
5085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pBuffer->nCapacity = nCapacity;
5095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pBuffer->pData = nCapacity==0 ? NULL : sqlite3_malloc(nCapacity);
5105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
5115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dataBufferReset(DataBuffer *pBuffer){
5125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pBuffer->nData = 0;
5135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
5145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dataBufferDestroy(DataBuffer *pBuffer){
5155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pBuffer->pData!=NULL ) sqlite3_free(pBuffer->pData);
5165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SCRAMBLE(pBuffer);
5175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
5185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dataBufferSwap(DataBuffer *pBuffer1, DataBuffer *pBuffer2){
5195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer tmp = *pBuffer1;
5205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *pBuffer1 = *pBuffer2;
5215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *pBuffer2 = tmp;
5225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
5235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dataBufferExpand(DataBuffer *pBuffer, int nAddCapacity){
5245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nAddCapacity>0 );
5255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO(shess) Consider expanding more aggressively.  Note that the
5265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** underlying malloc implementation may take care of such things for
5275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** us already.
5285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
5295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pBuffer->nData+nAddCapacity>pBuffer->nCapacity ){
5305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pBuffer->nCapacity = pBuffer->nData+nAddCapacity;
5315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pBuffer->pData = sqlite3_realloc(pBuffer->pData, pBuffer->nCapacity);
5325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
5335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
5345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dataBufferAppend(DataBuffer *pBuffer,
5355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                             const char *pSource, int nSource){
5365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nSource>0 && pSource!=NULL );
5375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferExpand(pBuffer, nSource);
5385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  memcpy(pBuffer->pData+pBuffer->nData, pSource, nSource);
5395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pBuffer->nData += nSource;
5405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
5415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dataBufferAppend2(DataBuffer *pBuffer,
5425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                              const char *pSource1, int nSource1,
5435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                              const char *pSource2, int nSource2){
5445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nSource1>0 && pSource1!=NULL );
5455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nSource2>0 && pSource2!=NULL );
5465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferExpand(pBuffer, nSource1+nSource2);
5475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  memcpy(pBuffer->pData+pBuffer->nData, pSource1, nSource1);
5485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  memcpy(pBuffer->pData+pBuffer->nData+nSource1, pSource2, nSource2);
5495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pBuffer->nData += nSource1+nSource2;
5505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
5515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dataBufferReplace(DataBuffer *pBuffer,
5525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                              const char *pSource, int nSource){
5535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferReset(pBuffer);
5545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferAppend(pBuffer, pSource, nSource);
5555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
5565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
5575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* StringBuffer is a null-terminated version of DataBuffer. */
5585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct StringBuffer {
5595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer b;            /* Includes null terminator. */
5605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} StringBuffer;
5615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
5625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void initStringBuffer(StringBuffer *sb){
5635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferInit(&sb->b, 100);
5645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferReplace(&sb->b, "", 1);
5655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
5665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int stringBufferLength(StringBuffer *sb){
5675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return sb->b.nData-1;
5685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
5695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static char *stringBufferData(StringBuffer *sb){
5705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return sb->b.pData;
5715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
5725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void stringBufferDestroy(StringBuffer *sb){
5735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferDestroy(&sb->b);
5745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
5755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
5765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void nappend(StringBuffer *sb, const char *zFrom, int nFrom){
5775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( sb->b.nData>0 );
5785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nFrom>0 ){
5795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sb->b.nData--;
5805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferAppend2(&sb->b, zFrom, nFrom, "", 1);
5815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
5825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
5835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void append(StringBuffer *sb, const char *zFrom){
5845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nappend(sb, zFrom, strlen(zFrom));
5855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
5865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
5875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Append a list of strings separated by commas. */
5885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void appendList(StringBuffer *sb, int nString, char **azString){
5895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i;
5905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<nString; ++i){
5915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( i>0 ) append(sb, ", ");
5925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    append(sb, azString[i]);
5935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
5945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
5955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
5965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int endsInWhiteSpace(StringBuffer *p){
5975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return stringBufferLength(p)>0 &&
5985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    safe_isspace(stringBufferData(p)[stringBufferLength(p)-1]);
5995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
6005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
6015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* If the StringBuffer ends in something other than white space, add a
6025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** single space character to the end.
6035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
6045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void appendWhiteSpace(StringBuffer *p){
6055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( stringBufferLength(p)==0 ) return;
6065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( !endsInWhiteSpace(p) ) append(p, " ");
6075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
6085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
6095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Remove white space from the end of the StringBuffer */
6105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void trimWhiteSpace(StringBuffer *p){
6115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( endsInWhiteSpace(p) ){
6125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    p->b.pData[--p->b.nData-1] = '\0';
6135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
6145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
6155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
6165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*******************************************************************/
6175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* DLReader is used to read document elements from a doclist.  The
6185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** current docid is cached, so dlrDocid() is fast.  DLReader does not
6195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** own the doclist buffer.
6205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
6215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlrAtEnd - true if there's no more data to read.
6225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlrDocid - docid of current document.
6235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlrDocData - doclist data for current document (including docid).
6245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlrDocDataBytes - length of same.
6255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlrAllDataBytes - length of all remaining data.
6265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlrPosData - position data for current document.
6275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlrPosDataLen - length of pos data for current document (incl POS_END).
6285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlrStep - step to current document.
6295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlrInit - initial for doclist of given type against given data.
6305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlrDestroy - clean up.
6315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
6325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Expected usage is something like:
6335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
6345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   DLReader reader;
6355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   dlrInit(&reader, pData, nData);
6365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   while( !dlrAtEnd(&reader) ){
6375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     // calls to dlrDocid() and kin.
6385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     dlrStep(&reader);
6395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   }
6405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   dlrDestroy(&reader);
6415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
6425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct DLReader {
6435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DocListType iType;
6445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pData;
6455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nData;
6465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
6475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iDocid;
6485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nElement;
6495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} DLReader;
6505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
6515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int dlrAtEnd(DLReader *pReader){
6525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pReader->nData>=0 );
6535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->nData<=0;
6545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
6555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static sqlite_int64 dlrDocid(DLReader *pReader){
6565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !dlrAtEnd(pReader) );
6575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->iDocid;
6585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
6595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static const char *dlrDocData(DLReader *pReader){
6605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !dlrAtEnd(pReader) );
6615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->pData;
6625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
6635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int dlrDocDataBytes(DLReader *pReader){
6645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !dlrAtEnd(pReader) );
6655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->nElement;
6665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
6675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int dlrAllDataBytes(DLReader *pReader){
6685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !dlrAtEnd(pReader) );
6695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->nData;
6705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
6715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) Consider adding a field to track iDocid varint length
6725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** to make these two functions faster.  This might matter (a tiny bit)
6735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** for queries.
6745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
6755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static const char *dlrPosData(DLReader *pReader){
6765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iDummy;
6775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n = getVarintSafe(pReader->pData, &iDummy, pReader->nElement);
6785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( !n ) return NULL;
6795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !dlrAtEnd(pReader) );
6805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->pData+n;
6815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
6825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int dlrPosDataLen(DLReader *pReader){
6835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iDummy;
6845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n = getVarint(pReader->pData, &iDummy);
6855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !dlrAtEnd(pReader) );
6865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->nElement-n;
6875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
6885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int dlrStep(DLReader *pReader){
6895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !dlrAtEnd(pReader) );
6905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
6915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Skip past current doclist element. */
6925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pReader->nElement<=pReader->nData );
6935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->pData += pReader->nElement;
6945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->nData -= pReader->nElement;
6955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
6965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* If there is more data, read the next doclist element. */
6975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pReader->nData>0 ){
6985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite_int64 iDocidDelta;
6995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int nTotal = 0;
7005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int iDummy, n = getVarintSafe(pReader->pData, &iDocidDelta, pReader->nData);
7015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( !n ) return SQLITE_CORRUPT_BKPT;
7025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nTotal += n;
7035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->iDocid += iDocidDelta;
7045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( pReader->iType>=DL_POSITIONS ){
7055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      while( 1 ){
7065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        n = getVarint32Safe(pReader->pData+nTotal, &iDummy,
7075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                            pReader->nData-nTotal);
7085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( !n ) return SQLITE_CORRUPT_BKPT;
7095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        nTotal += n;
7105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( iDummy==POS_END ) break;
7115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( iDummy==POS_COLUMN ){
7125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          n = getVarint32Safe(pReader->pData+nTotal, &iDummy,
7135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                              pReader->nData-nTotal);
7145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          if( !n ) return SQLITE_CORRUPT_BKPT;
7155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          nTotal += n;
7165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }else if( pReader->iType==DL_POSITIONS_OFFSETS ){
7175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          n = getVarint32Safe(pReader->pData+nTotal, &iDummy,
7185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                              pReader->nData-nTotal);
7195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          if( !n ) return SQLITE_CORRUPT_BKPT;
7205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          nTotal += n;
7215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          n = getVarint32Safe(pReader->pData+nTotal, &iDummy,
7225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                              pReader->nData-nTotal);
7235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          if( !n ) return SQLITE_CORRUPT_BKPT;
7245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          nTotal += n;
7255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
7265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
7275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
7285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->nElement = nTotal;
7295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( pReader->nElement<=pReader->nData );
7305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
7315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
7325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
7335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dlrDestroy(DLReader *pReader){
7345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SCRAMBLE(pReader);
7355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
7365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int dlrInit(DLReader *pReader, DocListType iType,
7375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                   const char *pData, int nData){
7385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
7395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pData!=NULL && nData!=0 );
7405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->iType = iType;
7415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->pData = pData;
7425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->nData = nData;
7435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->nElement = 0;
7445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->iDocid = 0;
7455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
7465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Load the first element's data.  There must be a first element. */
7475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = dlrStep(pReader);
7485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) dlrDestroy(pReader);
7495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
7505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
7515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
7525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifndef NDEBUG
7535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Verify that the doclist can be validly decoded.  Also returns the
7545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** last docid found because it is convenient in other assertions for
7555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** DLWriter.
7565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
7575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void docListValidate(DocListType iType, const char *pData, int nData,
7585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                            sqlite_int64 *pLastDocid){
7595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iPrevDocid = 0;
7605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nData>0 );
7615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pData!=0 );
7625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pData+nData>pData );
7635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( nData!=0 ){
7645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite_int64 iDocidDelta;
7655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int n = getVarint(pData, &iDocidDelta);
7665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iPrevDocid += iDocidDelta;
7675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( iType>DL_DOCIDS ){
7685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      int iDummy;
7695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      while( 1 ){
7705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        n += getVarint32(pData+n, &iDummy);
7715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( iDummy==POS_END ) break;
7725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( iDummy==POS_COLUMN ){
7735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          n += getVarint32(pData+n, &iDummy);
7745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }else if( iType>DL_POSITIONS ){
7755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          n += getVarint32(pData+n, &iDummy);
7765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          n += getVarint32(pData+n, &iDummy);
7775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
7785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        assert( n<=nData );
7795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
7805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
7815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( n<=nData );
7825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pData += n;
7835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nData -= n;
7845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
7855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pLastDocid ) *pLastDocid = iPrevDocid;
7865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
7875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define ASSERT_VALID_DOCLIST(i, p, n, o) docListValidate(i, p, n, o)
7885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#else
7895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define ASSERT_VALID_DOCLIST(i, p, n, o) assert( 1 )
7905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
7915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
7925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*******************************************************************/
7935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* DLWriter is used to write doclist data to a DataBuffer.  DLWriter
7945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** always appends to the buffer and does not own it.
7955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
7965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlwInit - initialize to write a given type doclistto a buffer.
7975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlwDestroy - clear the writer's memory.  Does not free buffer.
7985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlwAppend - append raw doclist data to buffer.
7995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlwCopy - copy next doclist from reader to writer.
8005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlwAdd - construct doclist element and append to buffer.
8015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**    Only apply dlwAdd() to DL_DOCIDS doclists (else use PLWriter).
8025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
8035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct DLWriter {
8045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DocListType iType;
8055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer *b;
8065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iPrevDocid;
8075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifndef NDEBUG
8085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int has_iPrevDocid;
8095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
8105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} DLWriter;
8115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
8125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dlwInit(DLWriter *pWriter, DocListType iType, DataBuffer *b){
8135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->b = b;
8145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->iType = iType;
8155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->iPrevDocid = 0;
8165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifndef NDEBUG
8175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->has_iPrevDocid = 0;
8185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
8195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
8205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dlwDestroy(DLWriter *pWriter){
8215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SCRAMBLE(pWriter);
8225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
8235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* iFirstDocid is the first docid in the doclist in pData.  It is
8245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** needed because pData may point within a larger doclist, in which
8255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** case the first item would be delta-encoded.
8265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
8275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** iLastDocid is the final docid in the doclist in pData.  It is
8285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** needed to create the new iPrevDocid for future delta-encoding.  The
8295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** code could decode the passed doclist to recreate iLastDocid, but
8305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the only current user (docListMerge) already has decoded this
8315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** information.
8325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
8335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) This has become just a helper for docListMerge.
8345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Consider a refactor to make this cleaner.
8355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
8365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int dlwAppend(DLWriter *pWriter,
8375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                     const char *pData, int nData,
8385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                     sqlite_int64 iFirstDocid, sqlite_int64 iLastDocid){
8395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iDocid = 0;
8405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char c[VARINT_MAX];
8415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nFirstOld, nFirstNew;     /* Old and new varint len of first docid. */
8425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifndef NDEBUG
8435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iLastDocidDelta;
8445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
8455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
8465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Recode the initial docid as delta from iPrevDocid. */
8475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nFirstOld = getVarintSafe(pData, &iDocid, nData);
8485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( !nFirstOld ) return SQLITE_CORRUPT_BKPT;
8495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nFirstOld<nData || (nFirstOld==nData && pWriter->iType==DL_DOCIDS) );
8505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nFirstNew = putVarint(c, iFirstDocid-pWriter->iPrevDocid);
8515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
8525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Verify that the incoming doclist is valid AND that it ends with
8535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** the expected docid.  This is essential because we'll trust this
8545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** docid in future delta-encoding.
8555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
8565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ASSERT_VALID_DOCLIST(pWriter->iType, pData, nData, &iLastDocidDelta);
8575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iLastDocid==iFirstDocid-iDocid+iLastDocidDelta );
8585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
8595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Append recoded initial docid and everything else.  Rest of docids
8605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** should have been delta-encoded from previous initial docid.
8615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
8625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nFirstOld<nData ){
8635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferAppend2(pWriter->b, c, nFirstNew,
8645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                      pData+nFirstOld, nData-nFirstOld);
8655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
8665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferAppend(pWriter->b, c, nFirstNew);
8675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
8685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->iPrevDocid = iLastDocid;
8695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
8705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
8715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int dlwCopy(DLWriter *pWriter, DLReader *pReader){
8725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return dlwAppend(pWriter, dlrDocData(pReader), dlrDocDataBytes(pReader),
8735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                   dlrDocid(pReader), dlrDocid(pReader));
8745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
8755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dlwAdd(DLWriter *pWriter, sqlite_int64 iDocid){
8765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char c[VARINT_MAX];
8775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n = putVarint(c, iDocid-pWriter->iPrevDocid);
8785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
8795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Docids must ascend. */
8805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !pWriter->has_iPrevDocid || iDocid>pWriter->iPrevDocid );
8815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pWriter->iType==DL_DOCIDS );
8825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
8835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferAppend(pWriter->b, c, n);
8845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->iPrevDocid = iDocid;
8855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifndef NDEBUG
8865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->has_iPrevDocid = 1;
8875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
8885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
8895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
8905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*******************************************************************/
8915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* PLReader is used to read data from a document's position list.  As
8925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the caller steps through the list, data is cached so that varints
8935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** only need to be decoded once.
8945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
8955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** plrInit, plrDestroy - create/destroy a reader.
8965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** plrColumn, plrPosition, plrStartOffset, plrEndOffset - accessors
8975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** plrAtEnd - at end of stream, only call plrDestroy once true.
8985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** plrStep - step to the next element.
8995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
9005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct PLReader {
9015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* These refer to the next position's data.  nData will reach 0 when
9025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** reading the last position, so plrStep() signals EOF by setting
9035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** pData to NULL.
9045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
9055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pData;
9065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nData;
9075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
9085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DocListType iType;
9095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iColumn;         /* the last column read */
9105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iPosition;       /* the last position read */
9115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iStartOffset;    /* the last start offset read */
9125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iEndOffset;      /* the last end offset read */
9135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} PLReader;
9145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
9155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int plrAtEnd(PLReader *pReader){
9165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->pData==NULL;
9175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
9185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int plrColumn(PLReader *pReader){
9195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !plrAtEnd(pReader) );
9205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->iColumn;
9215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
9225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int plrPosition(PLReader *pReader){
9235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !plrAtEnd(pReader) );
9245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->iPosition;
9255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
9265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int plrStartOffset(PLReader *pReader){
9275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !plrAtEnd(pReader) );
9285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->iStartOffset;
9295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
9305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int plrEndOffset(PLReader *pReader){
9315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !plrAtEnd(pReader) );
9325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->iEndOffset;
9335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
9345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int plrStep(PLReader *pReader){
9355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i, n, nTotal = 0;
9365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
9375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !plrAtEnd(pReader) );
9385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
9395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pReader->nData<=0 ){
9405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->pData = NULL;
9415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_OK;
9425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
9435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
9445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n = getVarint32Safe(pReader->pData, &i, pReader->nData);
9455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( !n ) return SQLITE_CORRUPT_BKPT;
9465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nTotal += n;
9475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( i==POS_COLUMN ){
9485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = getVarint32Safe(pReader->pData+nTotal, &pReader->iColumn,
9495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                        pReader->nData-nTotal);
9505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( !n ) return SQLITE_CORRUPT_BKPT;
9515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nTotal += n;
9525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->iPosition = 0;
9535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->iStartOffset = 0;
9545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = getVarint32Safe(pReader->pData+nTotal, &i, pReader->nData-nTotal);
9555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( !n ) return SQLITE_CORRUPT_BKPT;
9565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nTotal += n;
9575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
9585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Should never see adjacent column changes. */
9595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( i!=POS_COLUMN );
9605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
9615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( i==POS_END ){
9625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( nTotal<=pReader->nData );
9635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->nData = 0;
9645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->pData = NULL;
9655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_OK;
9665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
9675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
9685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->iPosition += i-POS_BASE;
9695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pReader->iType==DL_POSITIONS_OFFSETS ){
9705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = getVarint32Safe(pReader->pData+nTotal, &i, pReader->nData-nTotal);
9715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( !n ) return SQLITE_CORRUPT_BKPT;
9725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nTotal += n;
9735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->iStartOffset += i;
9745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = getVarint32Safe(pReader->pData+nTotal, &i, pReader->nData-nTotal);
9755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( !n ) return SQLITE_CORRUPT_BKPT;
9765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nTotal += n;
9775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->iEndOffset = pReader->iStartOffset+i;
9785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
9795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nTotal<=pReader->nData );
9805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->pData += nTotal;
9815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->nData -= nTotal;
9825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
9835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
9845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
9855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void plrDestroy(PLReader *pReader){
9865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SCRAMBLE(pReader);
9875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
9885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
9895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int plrInit(PLReader *pReader, DLReader *pDLReader){
9905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
9915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->pData = dlrPosData(pDLReader);
9925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->nData = dlrPosDataLen(pDLReader);
9935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->iType = pDLReader->iType;
9945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->iColumn = 0;
9955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->iPosition = 0;
9965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->iStartOffset = 0;
9975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->iEndOffset = 0;
9985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = plrStep(pReader);
9995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) plrDestroy(pReader);
10005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
10015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
10025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
10035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*******************************************************************/
10045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* PLWriter is used in constructing a document's position list.  As a
10055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** convenience, if iType is DL_DOCIDS, PLWriter becomes a no-op.
10065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** PLWriter writes to the associated DLWriter's buffer.
10075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
10085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** plwInit - init for writing a document's poslist.
10095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** plwDestroy - clear a writer.
10105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** plwAdd - append position and offset information.
10115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** plwCopy - copy next position's data from reader to writer.
10125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** plwTerminate - add any necessary doclist terminator.
10135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
10145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Calling plwAdd() after plwTerminate() may result in a corrupt
10155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** doclist.
10165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
10175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) Until we've written the second item, we can cache the
10185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** first item's information.  Then we'd have three states:
10195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
10205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** - initialized with docid, no positions.
10215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** - docid and one position.
10225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** - docid and multiple positions.
10235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
10245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Only the last state needs to actually write to dlw->b, which would
10255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** be an improvement in the DLCollector case.
10265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
10275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct PLWriter {
10285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLWriter *dlw;
10295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
10305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iColumn;    /* the last column written */
10315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iPos;       /* the last position written */
10325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iOffset;    /* the last start offset written */
10335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} PLWriter;
10345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
10355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) In the case where the parent is reading these values
10365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** from a PLReader, we could optimize to a copy if that PLReader has
10375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the same type as pWriter.
10385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
10395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void plwAdd(PLWriter *pWriter, int iColumn, int iPos,
10405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                   int iStartOffset, int iEndOffset){
10415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Worst-case space for POS_COLUMN, iColumn, iPosDelta,
10425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** iStartOffsetDelta, and iEndOffsetDelta.
10435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
10445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char c[5*VARINT_MAX];
10455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n = 0;
10465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
10475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Ban plwAdd() after plwTerminate(). */
10485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pWriter->iPos!=-1 );
10495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
10505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pWriter->dlw->iType==DL_DOCIDS ) return;
10515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
10525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( iColumn!=pWriter->iColumn ){
10535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n += putVarint(c+n, POS_COLUMN);
10545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n += putVarint(c+n, iColumn);
10555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pWriter->iColumn = iColumn;
10565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pWriter->iPos = 0;
10575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pWriter->iOffset = 0;
10585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
10595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iPos>=pWriter->iPos );
10605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n += putVarint(c+n, POS_BASE+(iPos-pWriter->iPos));
10615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->iPos = iPos;
10625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pWriter->dlw->iType==DL_POSITIONS_OFFSETS ){
10635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( iStartOffset>=pWriter->iOffset );
10645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n += putVarint(c+n, iStartOffset-pWriter->iOffset);
10655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pWriter->iOffset = iStartOffset;
10665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( iEndOffset>=iStartOffset );
10675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n += putVarint(c+n, iEndOffset-iStartOffset);
10685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
10695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferAppend(pWriter->dlw->b, c, n);
10705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
10715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void plwCopy(PLWriter *pWriter, PLReader *pReader){
10725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  plwAdd(pWriter, plrColumn(pReader), plrPosition(pReader),
10735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)         plrStartOffset(pReader), plrEndOffset(pReader));
10745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
10755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void plwInit(PLWriter *pWriter, DLWriter *dlw, sqlite_int64 iDocid){
10765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char c[VARINT_MAX];
10775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n;
10785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
10795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->dlw = dlw;
10805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
10815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Docids must ascend. */
10825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !pWriter->dlw->has_iPrevDocid || iDocid>pWriter->dlw->iPrevDocid );
10835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n = putVarint(c, iDocid-pWriter->dlw->iPrevDocid);
10845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferAppend(pWriter->dlw->b, c, n);
10855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->dlw->iPrevDocid = iDocid;
10865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifndef NDEBUG
10875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->dlw->has_iPrevDocid = 1;
10885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
10895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
10905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->iColumn = 0;
10915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->iPos = 0;
10925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->iOffset = 0;
10935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
10945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) Should plwDestroy() also terminate the doclist?  But
10955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** then plwDestroy() would no longer be just a destructor, it would
10965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** also be doing work, which isn't consistent with the overall idiom.
10975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Another option would be for plwAdd() to always append any necessary
10985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** terminator, so that the output is always correct.  But that would
10995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** add incremental work to the common case with the only benefit being
11005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** API elegance.  Punt for now.
11015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
11025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void plwTerminate(PLWriter *pWriter){
11035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pWriter->dlw->iType>DL_DOCIDS ){
11045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    char c[VARINT_MAX];
11055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int n = putVarint(c, POS_END);
11065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferAppend(pWriter->dlw->b, c, n);
11075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
11085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifndef NDEBUG
11095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Mark as terminated for assert in plwAdd(). */
11105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->iPos = -1;
11115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
11125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
11135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void plwDestroy(PLWriter *pWriter){
11145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SCRAMBLE(pWriter);
11155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
11165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
11175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*******************************************************************/
11185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* DLCollector wraps PLWriter and DLWriter to provide a
11195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dynamically-allocated doclist area to use during tokenization.
11205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
11215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlcNew - malloc up and initialize a collector.
11225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlcDelete - destroy a collector and all contained items.
11235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlcAddPos - append position and offset information.
11245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlcAddDoclist - add the collected doclist to the given buffer.
11255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dlcNext - terminate the current document and open another.
11265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
11275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct DLCollector {
11285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer b;
11295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLWriter dlw;
11305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  PLWriter plw;
11315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} DLCollector;
11325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
11335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) This could also be done by calling plwTerminate() and
11345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dataBufferAppend().  I tried that, expecting nominal performance
11355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** differences, but it seemed to pretty reliably be worth 1% to code
11365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** it this way.  I suspect it is the incremental malloc overhead (some
11375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** percentage of the plwTerminate() calls will cause a realloc), so
11385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** this might be worth revisiting if the DataBuffer implementation
11395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** changes.
11405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
11415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dlcAddDoclist(DLCollector *pCollector, DataBuffer *b){
11425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pCollector->dlw.iType>DL_DOCIDS ){
11435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    char c[VARINT_MAX];
11445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int n = putVarint(c, POS_END);
11455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferAppend2(b, pCollector->b.pData, pCollector->b.nData, c, n);
11465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
11475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferAppend(b, pCollector->b.pData, pCollector->b.nData);
11485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
11495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
11505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dlcNext(DLCollector *pCollector, sqlite_int64 iDocid){
11515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  plwTerminate(&pCollector->plw);
11525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  plwDestroy(&pCollector->plw);
11535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  plwInit(&pCollector->plw, &pCollector->dlw, iDocid);
11545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
11555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dlcAddPos(DLCollector *pCollector, int iColumn, int iPos,
11565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                      int iStartOffset, int iEndOffset){
11575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  plwAdd(&pCollector->plw, iColumn, iPos, iStartOffset, iEndOffset);
11585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
11595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
11605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static DLCollector *dlcNew(sqlite_int64 iDocid, DocListType iType){
11615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLCollector *pCollector = sqlite3_malloc(sizeof(DLCollector));
11625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferInit(&pCollector->b, 0);
11635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlwInit(&pCollector->dlw, iType, &pCollector->b);
11645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  plwInit(&pCollector->plw, &pCollector->dlw, iDocid);
11655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pCollector;
11665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
11675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dlcDelete(DLCollector *pCollector){
11685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  plwDestroy(&pCollector->plw);
11695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlwDestroy(&pCollector->dlw);
11705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferDestroy(&pCollector->b);
11715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SCRAMBLE(pCollector);
11725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(pCollector);
11735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
11745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
11755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
11765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Copy the doclist data of iType in pData/nData into *out, trimming
11775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** unnecessary data as we go.  Only columns matching iColumn are
11785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** copied, all columns copied if iColumn is -1.  Elements with no
11795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** matching columns are dropped.  The output is an iOutType doclist.
11805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
11815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* NOTE(shess) This code is only valid after all doclists are merged.
11825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** If this is run before merges, then doclist items which represent
11835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** deletion will be trimmed, and will thus not effect a deletion
11845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** during the merge.
11855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
11865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int docListTrim(DocListType iType, const char *pData, int nData,
11875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       int iColumn, DocListType iOutType, DataBuffer *out){
11885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLReader dlReader;
11895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLWriter dlWriter;
11905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
11915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
11925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iOutType<=iType );
11935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
11945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = dlrInit(&dlReader, iType, pData, nData);
11955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
11965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlwInit(&dlWriter, iOutType, out);
11975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
11985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( !dlrAtEnd(&dlReader) ){
11995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    PLReader plReader;
12005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    PLWriter plWriter;
12015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int match = 0;
12025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
12035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = plrInit(&plReader, &dlReader);
12045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) break;
12055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
12065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    while( !plrAtEnd(&plReader) ){
12075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( iColumn==-1 || plrColumn(&plReader)==iColumn ){
12085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( !match ){
12095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          plwInit(&plWriter, &dlWriter, dlrDocid(&dlReader));
12105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          match = 1;
12115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
12125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        plwAdd(&plWriter, plrColumn(&plReader), plrPosition(&plReader),
12135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)               plrStartOffset(&plReader), plrEndOffset(&plReader));
12145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
12155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = plrStep(&plReader);
12165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ){
12175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        plrDestroy(&plReader);
12185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        goto err;
12195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
12205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
12215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( match ){
12225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      plwTerminate(&plWriter);
12235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      plwDestroy(&plWriter);
12245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
12255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
12265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    plrDestroy(&plReader);
12275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = dlrStep(&dlReader);
12285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) break;
12295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
12305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)err:
12315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlwDestroy(&dlWriter);
12325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlrDestroy(&dlReader);
12335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
12345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
12355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
12365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Used by docListMerge() to keep doclists in the ascending order by
12375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** docid, then ascending order by age (so the newest comes first).
12385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
12395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct OrderedDLReader {
12405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLReader *pReader;
12415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
12425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO(shess) If we assume that docListMerge pReaders is ordered by
12435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** age (which we do), then we could use pReader comparisons to break
12445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** ties.
12455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
12465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int idx;
12475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} OrderedDLReader;
12485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
12495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Order eof to end, then by docid asc, idx desc. */
12505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int orderedDLReaderCmp(OrderedDLReader *r1, OrderedDLReader *r2){
12515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( dlrAtEnd(r1->pReader) ){
12525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( dlrAtEnd(r2->pReader) ) return 0;  /* Both atEnd(). */
12535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return 1;                              /* Only r1 atEnd(). */
12545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
12555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( dlrAtEnd(r2->pReader) ) return -1;   /* Only r2 atEnd(). */
12565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
12575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( dlrDocid(r1->pReader)<dlrDocid(r2->pReader) ) return -1;
12585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( dlrDocid(r1->pReader)>dlrDocid(r2->pReader) ) return 1;
12595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
12605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Descending on idx. */
12615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return r2->idx-r1->idx;
12625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
12635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
12645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Bubble p[0] to appropriate place in p[1..n-1].  Assumes that
12655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** p[1..n-1] is already sorted.
12665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
12675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) Is this frequent enough to warrant a binary search?
12685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Before implementing that, instrument the code to check.  In most
12695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** current usage, I expect that p[0] will be less than p[1] a very
12705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** high proportion of the time.
12715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
12725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void orderedDLReaderReorder(OrderedDLReader *p, int n){
12735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( n>1 && orderedDLReaderCmp(p, p+1)>0 ){
12745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    OrderedDLReader tmp = p[0];
12755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    p[0] = p[1];
12765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    p[1] = tmp;
12775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n--;
12785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    p++;
12795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
12805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
12815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
12825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Given an array of doclist readers, merge their doclist elements
12835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** into out in sorted order (by docid), dropping elements from older
12845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** readers when there is a duplicate docid.  pReaders is assumed to be
12855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** ordered by age, oldest first.
12865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
12875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) nReaders must be <= MERGE_COUNT.  This should probably
12885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** be fixed.
12895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
12905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int docListMerge(DataBuffer *out,
12915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                        DLReader *pReaders, int nReaders){
12925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  OrderedDLReader readers[MERGE_COUNT];
12935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLWriter writer;
12945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i, n;
12955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pStart = 0;
12965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nStart = 0;
12975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iFirstDocid = 0, iLastDocid = 0;
12985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = SQLITE_OK;
12995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
13005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nReaders>0 );
13015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nReaders==1 ){
13025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferAppend(out, dlrDocData(pReaders), dlrAllDataBytes(pReaders));
13035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_OK;
13045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
13055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
13065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nReaders<=MERGE_COUNT );
13075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n = 0;
13085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<nReaders; i++){
13095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( pReaders[i].iType==pReaders[0].iType );
13105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    readers[i].pReader = pReaders+i;
13115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    readers[i].idx = i;
13125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n += dlrAllDataBytes(&pReaders[i]);
13135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
13145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Conservatively size output to sum of inputs.  Output should end
13155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** up strictly smaller than input.
13165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
13175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferExpand(out, n);
13185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
13195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Get the readers into sorted order. */
13205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( i-->0 ){
13215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    orderedDLReaderReorder(readers+i, nReaders-i);
13225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
13235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
13245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlwInit(&writer, pReaders[0].iType, out);
13255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( !dlrAtEnd(readers[0].pReader) ){
13265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite_int64 iDocid = dlrDocid(readers[0].pReader);
13275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
13285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* If this is a continuation of the current buffer to copy, extend
13295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** that buffer.  memcpy() seems to be more efficient if it has a
13305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** lots of data to copy.
13315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    */
13325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( dlrDocData(readers[0].pReader)==pStart+nStart ){
13335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      nStart += dlrDocDataBytes(readers[0].pReader);
13345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
13355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( pStart!=0 ){
13365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = dlwAppend(&writer, pStart, nStart, iFirstDocid, iLastDocid);
13375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( rc!=SQLITE_OK ) goto err;
13385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
13395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      pStart = dlrDocData(readers[0].pReader);
13405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      nStart = dlrDocDataBytes(readers[0].pReader);
13415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      iFirstDocid = iDocid;
13425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
13435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iLastDocid = iDocid;
13445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = dlrStep(readers[0].pReader);
13455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) goto err;
13465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
13475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Drop all of the older elements with the same docid. */
13485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    for(i=1; i<nReaders &&
13495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)             !dlrAtEnd(readers[i].pReader) &&
13505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)             dlrDocid(readers[i].pReader)==iDocid; i++){
13515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(readers[i].pReader);
13525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) goto err;
13535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
13545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
13555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Get the readers back into order. */
13565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    while( i-->0 ){
13575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      orderedDLReaderReorder(readers+i, nReaders-i);
13585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
13595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
13605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
13615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Copy over any remaining elements. */
13625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nStart>0 )
13635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = dlwAppend(&writer, pStart, nStart, iFirstDocid, iLastDocid);
13645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)err:
13655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlwDestroy(&writer);
13665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
13675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
13685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
13695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Helper function for posListUnion().  Compares the current position
13705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** between left and right, returning as standard C idiom of <0 if
13715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** left<right, >0 if left>right, and 0 if left==right.  "End" always
13725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** compares greater.
13735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
13745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int posListCmp(PLReader *pLeft, PLReader *pRight){
13755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pLeft->iType==pRight->iType );
13765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pLeft->iType==DL_DOCIDS ) return 0;
13775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
13785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( plrAtEnd(pLeft) ) return plrAtEnd(pRight) ? 0 : 1;
13795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( plrAtEnd(pRight) ) return -1;
13805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
13815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( plrColumn(pLeft)<plrColumn(pRight) ) return -1;
13825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( plrColumn(pLeft)>plrColumn(pRight) ) return 1;
13835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
13845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( plrPosition(pLeft)<plrPosition(pRight) ) return -1;
13855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( plrPosition(pLeft)>plrPosition(pRight) ) return 1;
13865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pLeft->iType==DL_POSITIONS ) return 0;
13875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
13885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( plrStartOffset(pLeft)<plrStartOffset(pRight) ) return -1;
13895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( plrStartOffset(pLeft)>plrStartOffset(pRight) ) return 1;
13905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
13915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( plrEndOffset(pLeft)<plrEndOffset(pRight) ) return -1;
13925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( plrEndOffset(pLeft)>plrEndOffset(pRight) ) return 1;
13935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
13945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return 0;
13955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
13965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
13975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Write the union of position lists in pLeft and pRight to pOut.
13985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** "Union" in this case meaning "All unique position tuples".  Should
13995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** work with any doclist type, though both inputs and the output
14005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** should be the same type.
14015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
14025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int posListUnion(DLReader *pLeft, DLReader *pRight, DLWriter *pOut){
14035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  PLReader left, right;
14045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  PLWriter writer;
14055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
14065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
14075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( dlrDocid(pLeft)==dlrDocid(pRight) );
14085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pLeft->iType==pRight->iType );
14095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pLeft->iType==pOut->iType );
14105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
14115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = plrInit(&left, pLeft);
14125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc != SQLITE_OK ) return rc;
14135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = plrInit(&right, pRight);
14145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc != SQLITE_OK ){
14155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    plrDestroy(&left);
14165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return rc;
14175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
14185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  plwInit(&writer, pOut, dlrDocid(pLeft));
14195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
14205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( !plrAtEnd(&left) || !plrAtEnd(&right) ){
14215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int c = posListCmp(&left, &right);
14225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( c<0 ){
14235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      plwCopy(&writer, &left);
14245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = plrStep(&left);
14255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc != SQLITE_OK ) break;
14265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else if( c>0 ){
14275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      plwCopy(&writer, &right);
14285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = plrStep(&right);
14295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc != SQLITE_OK ) break;
14305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
14315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      plwCopy(&writer, &left);
14325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = plrStep(&left);
14335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc != SQLITE_OK ) break;
14345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = plrStep(&right);
14355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc != SQLITE_OK ) break;
14365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
14375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
14385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
14395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  plwTerminate(&writer);
14405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  plwDestroy(&writer);
14415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  plrDestroy(&left);
14425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  plrDestroy(&right);
14435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
14445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
14455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
14465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Write the union of doclists in pLeft and pRight to pOut.  For
14475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** docids in common between the inputs, the union of the position
14485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** lists is written.  Inputs and outputs are always type DL_DEFAULT.
14495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
14505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int docListUnion(
14515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pLeft, int nLeft,
14525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pRight, int nRight,
14535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer *pOut      /* Write the combined doclist here */
14545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
14555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLReader left, right;
14565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLWriter writer;
14575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
14585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
14595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nLeft==0 ){
14605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( nRight!=0) dataBufferAppend(pOut, pRight, nRight);
14615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_OK;
14625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
14635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nRight==0 ){
14645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferAppend(pOut, pLeft, nLeft);
14655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_OK;
14665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
14675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
14685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = dlrInit(&left, DL_DEFAULT, pLeft, nLeft);
14695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
14705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = dlrInit(&right, DL_DEFAULT, pRight, nRight);
14715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ){
14725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dlrDestroy(&left);
14735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return rc;
14745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
14755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlwInit(&writer, DL_DEFAULT, pOut);
14765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
14775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( !dlrAtEnd(&left) || !dlrAtEnd(&right) ){
14785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( dlrAtEnd(&right) ){
14795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlwCopy(&writer, &left);
14805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
14815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&left);
14825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
14835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else if( dlrAtEnd(&left) ){
14845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlwCopy(&writer, &right);
14855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
14865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&right);
14875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
14885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else if( dlrDocid(&left)<dlrDocid(&right) ){
14895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlwCopy(&writer, &left);
14905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
14915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&left);
14925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
14935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else if( dlrDocid(&left)>dlrDocid(&right) ){
14945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlwCopy(&writer, &right);
14955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
14965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&right);
14975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
14985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
14995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = posListUnion(&left, &right, &writer);
15005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
15015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&left);
15025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
15035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&right);
15045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
15055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
15065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
15075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
15085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlrDestroy(&left);
15095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlrDestroy(&right);
15105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlwDestroy(&writer);
15115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
15125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
15135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
15145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* pLeft and pRight are DLReaders positioned to the same docid.
15155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
15165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** If there are no instances in pLeft or pRight where the position
15175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** of pLeft is one less than the position of pRight, then this
15185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** routine adds nothing to pOut.
15195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
15205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** If there are one or more instances where positions from pLeft
15215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** are exactly one less than positions from pRight, then add a new
15225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** document record to pOut.  If pOut wants to hold positions, then
15235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** include the positions from pRight that are one more than a
15245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** position in pLeft.  In other words:  pRight.iPos==pLeft.iPos+1.
15255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
15265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int posListPhraseMerge(DLReader *pLeft, DLReader *pRight,
15275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                              DLWriter *pOut){
15285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  PLReader left, right;
15295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  PLWriter writer;
15305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int match = 0;
15315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
15325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
15335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( dlrDocid(pLeft)==dlrDocid(pRight) );
15345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pOut->iType!=DL_POSITIONS_OFFSETS );
15355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
15365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = plrInit(&left, pLeft);
15375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
15385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = plrInit(&right, pRight);
15395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ){
15405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    plrDestroy(&left);
15415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return rc;
15425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
15435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
15445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( !plrAtEnd(&left) && !plrAtEnd(&right) ){
15455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( plrColumn(&left)<plrColumn(&right) ){
15465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = plrStep(&left);
15475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
15485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else if( plrColumn(&left)>plrColumn(&right) ){
15495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = plrStep(&right);
15505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
15515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else if( plrPosition(&left)+1<plrPosition(&right) ){
15525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = plrStep(&left);
15535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
15545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else if( plrPosition(&left)+1>plrPosition(&right) ){
15555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = plrStep(&right);
15565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
15575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
15585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( !match ){
15595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        plwInit(&writer, pOut, dlrDocid(pLeft));
15605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        match = 1;
15615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
15625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      plwAdd(&writer, plrColumn(&right), plrPosition(&right), 0, 0);
15635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = plrStep(&left);
15645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
15655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = plrStep(&right);
15665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
15675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
15685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
15695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
15705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( match ){
15715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    plwTerminate(&writer);
15725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    plwDestroy(&writer);
15735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
15745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
15755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  plrDestroy(&left);
15765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  plrDestroy(&right);
15775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
15785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
15795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
15805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* We have two doclists with positions:  pLeft and pRight.
15815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Write the phrase intersection of these two doclists into pOut.
15825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
15835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** A phrase intersection means that two documents only match
15845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** if pLeft.iPos+1==pRight.iPos.
15855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
15865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** iType controls the type of data written to pOut.  If iType is
15875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** DL_POSITIONS, the positions are those from pRight.
15885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
15895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int docListPhraseMerge(
15905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pLeft, int nLeft,
15915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pRight, int nRight,
15925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DocListType iType,
15935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer *pOut      /* Write the combined doclist here */
15945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
15955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLReader left, right;
15965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLWriter writer;
15975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
15985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
15995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nLeft==0 || nRight==0 ) return SQLITE_OK;
16005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
16015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iType!=DL_POSITIONS_OFFSETS );
16025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
16035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = dlrInit(&left, DL_POSITIONS, pLeft, nLeft);
16045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
16055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = dlrInit(&right, DL_POSITIONS, pRight, nRight);
16065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ){
16075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dlrDestroy(&left);
16085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return rc;
16095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
16105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlwInit(&writer, iType, pOut);
16115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
16125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( !dlrAtEnd(&left) && !dlrAtEnd(&right) ){
16135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( dlrDocid(&left)<dlrDocid(&right) ){
16145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&left);
16155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
16165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else if( dlrDocid(&right)<dlrDocid(&left) ){
16175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&right);
16185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
16195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
16205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = posListPhraseMerge(&left, &right, &writer);
16215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
16225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&left);
16235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
16245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&right);
16255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
16265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
16275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
16285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
16295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlrDestroy(&left);
16305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlrDestroy(&right);
16315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlwDestroy(&writer);
16325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
16335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
16345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
16355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* We have two DL_DOCIDS doclists:  pLeft and pRight.
16365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Write the intersection of these two doclists into pOut as a
16375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** DL_DOCIDS doclist.
16385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
16395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int docListAndMerge(
16405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pLeft, int nLeft,
16415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pRight, int nRight,
16425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer *pOut      /* Write the combined doclist here */
16435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
16445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLReader left, right;
16455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLWriter writer;
16465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
16475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
16485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nLeft==0 || nRight==0 ) return SQLITE_OK;
16495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
16505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = dlrInit(&left, DL_DOCIDS, pLeft, nLeft);
16515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
16525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = dlrInit(&right, DL_DOCIDS, pRight, nRight);
16535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ){
16545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dlrDestroy(&left);
16555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return rc;
16565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
16575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlwInit(&writer, DL_DOCIDS, pOut);
16585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
16595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( !dlrAtEnd(&left) && !dlrAtEnd(&right) ){
16605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( dlrDocid(&left)<dlrDocid(&right) ){
16615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&left);
16625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
16635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else if( dlrDocid(&right)<dlrDocid(&left) ){
16645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&right);
16655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
16665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
16675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dlwAdd(&writer, dlrDocid(&left));
16685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&left);
16695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
16705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&right);
16715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
16725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
16735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
16745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
16755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlrDestroy(&left);
16765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlrDestroy(&right);
16775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlwDestroy(&writer);
16785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
16795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
16805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
16815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* We have two DL_DOCIDS doclists:  pLeft and pRight.
16825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Write the union of these two doclists into pOut as a
16835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** DL_DOCIDS doclist.
16845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
16855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int docListOrMerge(
16865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pLeft, int nLeft,
16875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pRight, int nRight,
16885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer *pOut      /* Write the combined doclist here */
16895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
16905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLReader left, right;
16915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLWriter writer;
16925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
16935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
16945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nLeft==0 ){
16955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( nRight!=0 ) dataBufferAppend(pOut, pRight, nRight);
16965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_OK;
16975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
16985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nRight==0 ){
16995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferAppend(pOut, pLeft, nLeft);
17005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_OK;
17015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
17025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
17035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = dlrInit(&left, DL_DOCIDS, pLeft, nLeft);
17045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
17055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = dlrInit(&right, DL_DOCIDS, pRight, nRight);
17065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ){
17075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dlrDestroy(&left);
17085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return rc;
17095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
17105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlwInit(&writer, DL_DOCIDS, pOut);
17115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
17125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( !dlrAtEnd(&left) || !dlrAtEnd(&right) ){
17135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( dlrAtEnd(&right) ){
17145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dlwAdd(&writer, dlrDocid(&left));
17155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&left);
17165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
17175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else if( dlrAtEnd(&left) ){
17185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dlwAdd(&writer, dlrDocid(&right));
17195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&right);
17205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
17215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else if( dlrDocid(&left)<dlrDocid(&right) ){
17225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dlwAdd(&writer, dlrDocid(&left));
17235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&left);
17245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
17255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else if( dlrDocid(&right)<dlrDocid(&left) ){
17265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dlwAdd(&writer, dlrDocid(&right));
17275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&right);
17285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
17295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
17305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dlwAdd(&writer, dlrDocid(&left));
17315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&left);
17325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
17335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&right);
17345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
17355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
17365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
17375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
17385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlrDestroy(&left);
17395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlrDestroy(&right);
17405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlwDestroy(&writer);
17415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
17425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
17435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
17445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* We have two DL_DOCIDS doclists:  pLeft and pRight.
17455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Write into pOut as DL_DOCIDS doclist containing all documents that
17465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** occur in pLeft but not in pRight.
17475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
17485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int docListExceptMerge(
17495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pLeft, int nLeft,
17505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pRight, int nRight,
17515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer *pOut      /* Write the combined doclist here */
17525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
17535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLReader left, right;
17545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLWriter writer;
17555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
17565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
17575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nLeft==0 ) return SQLITE_OK;
17585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nRight==0 ){
17595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferAppend(pOut, pLeft, nLeft);
17605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_OK;
17615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
17625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
17635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = dlrInit(&left, DL_DOCIDS, pLeft, nLeft);
17645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
17655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = dlrInit(&right, DL_DOCIDS, pRight, nRight);
17665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ){
17675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dlrDestroy(&left);
17685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return rc;
17695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
17705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlwInit(&writer, DL_DOCIDS, pOut);
17715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
17725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( !dlrAtEnd(&left) ){
17735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    while( !dlrAtEnd(&right) && dlrDocid(&right)<dlrDocid(&left) ){
17745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrStep(&right);
17755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) goto err;
17765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
17775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( dlrAtEnd(&right) || dlrDocid(&left)<dlrDocid(&right) ){
17785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dlwAdd(&writer, dlrDocid(&left));
17795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
17805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = dlrStep(&left);
17815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) break;
17825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
17835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
17845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)err:
17855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlrDestroy(&left);
17865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlrDestroy(&right);
17875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlwDestroy(&writer);
17885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
17895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
17905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
17915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static char *string_dup_n(const char *s, int n){
17925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char *str = sqlite3_malloc(n + 1);
17935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  memcpy(str, s, n);
17945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  str[n] = '\0';
17955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return str;
17965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
17975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
17985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Duplicate a string; the caller must free() the returned string.
17995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * (We don't use strdup() since it is not part of the standard C library and
18005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * may not be available everywhere.) */
18015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static char *string_dup(const char *s){
18025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return string_dup_n(s, strlen(s));
18035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
18045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
18055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Format a string, replacing each occurrence of the % character with
18065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * zDb.zName.  This may be more convenient than sqlite_mprintf()
18075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * when one string is used repeatedly in a format string.
18085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * The caller must free() the returned string. */
18095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static char *string_format(const char *zFormat,
18105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                           const char *zDb, const char *zName){
18115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *p;
18125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  size_t len = 0;
18135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  size_t nDb = strlen(zDb);
18145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  size_t nName = strlen(zName);
18155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  size_t nFullTableName = nDb+1+nName;
18165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char *result;
18175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char *r;
18185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
18195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* first compute length needed */
18205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(p = zFormat ; *p ; ++p){
18215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    len += (*p=='%' ? nFullTableName : 1);
18225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
18235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  len += 1;  /* for null terminator */
18245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
18255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  r = result = sqlite3_malloc(len);
18265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(p = zFormat; *p; ++p){
18275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( *p=='%' ){
18285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      memcpy(r, zDb, nDb);
18295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      r += nDb;
18305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      *r++ = '.';
18315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      memcpy(r, zName, nName);
18325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      r += nName;
18335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    } else {
18345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      *r++ = *p;
18355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
18365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
18375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *r++ = '\0';
18385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( r == result + len );
18395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return result;
18405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
18415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
18425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int sql_exec(sqlite3 *db, const char *zDb, const char *zName,
18435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                    const char *zFormat){
18445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char *zCommand = string_format(zFormat, zDb, zName);
18455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
18465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TRACE(("FTS2 sql: %s\n", zCommand));
18475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_exec(db, zCommand, NULL, 0, NULL);
18485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(zCommand);
18495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
18505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
18515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
18525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int sql_prepare(sqlite3 *db, const char *zDb, const char *zName,
18535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       sqlite3_stmt **ppStmt, const char *zFormat){
18545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char *zCommand = string_format(zFormat, zDb, zName);
18555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
18565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TRACE(("FTS2 prepare: %s\n", zCommand));
18575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_prepare_v2(db, zCommand, -1, ppStmt, NULL);
18585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(zCommand);
18595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
18605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
18615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
18625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* end utility functions */
18635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
18645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Forward reference */
18655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct fulltext_vtab fulltext_vtab;
18665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
18675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* A single term in a query is represented by an instances of
18685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the following structure.
18695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
18705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct QueryTerm {
18715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  short int nPhrase; /* How many following terms are part of the same phrase */
18725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  short int iPhrase; /* This is the i-th term of a phrase. */
18735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  short int iColumn; /* Column of the index that must match this term */
18745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  signed char isOr;  /* this term is preceded by "OR" */
18755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  signed char isNot; /* this term is preceded by "-" */
18765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  signed char isPrefix; /* this term is followed by "*" */
18775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char *pTerm;       /* text of the term.  '\000' terminated.  malloced */
18785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nTerm;         /* Number of bytes in pTerm[] */
18795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} QueryTerm;
18805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
18815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
18825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* A query string is parsed into a Query structure.
18835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *
18845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * We could, in theory, allow query strings to be complicated
18855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * nested expressions with precedence determined by parentheses.
18865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * But none of the major search engines do this.  (Perhaps the
18875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * feeling is that an parenthesized expression is two complex of
18885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * an idea for the average user to grasp.)  Taking our lead from
18895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * the major search engines, we will allow queries to be a list
18905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * of terms (with an implied AND operator) or phrases in double-quotes,
18915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * with a single optional "-" before each non-phrase term to designate
18925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * negation and an optional OR connector.
18935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *
18945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * OR binds more tightly than the implied AND, which is what the
18955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * major search engines seem to do.  So, for example:
18965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *
18975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *    [one two OR three]     ==>    one AND (two OR three)
18985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *    [one OR two three]     ==>    (one OR two) AND three
18995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *
19005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * A "-" before a term matches all entries that lack that term.
19015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * The "-" must occur immediately before the term with in intervening
19025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * space.  This is how the search engines do it.
19035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *
19045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * A NOT term cannot be the right-hand operand of an OR.  If this
19055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * occurs in the query string, the NOT is ignored:
19065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *
19075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *    [one OR -two]          ==>    one OR two
19085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *
19095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) */
19105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct Query {
19115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab *pFts;  /* The full text index */
19125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nTerms;           /* Number of terms in the query */
19135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  QueryTerm *pTerms;    /* Array of terms.  Space obtained from malloc() */
19145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nextIsOr;         /* Set the isOr flag on the next inserted term */
19155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nextColumn;       /* Next word parsed must be in this column */
19165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int dfltColumn;       /* The default column */
19175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} Query;
19185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
19195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
19205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
19215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** An instance of the following structure keeps track of generated
19225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** matching-word offset information and snippets.
19235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
19245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct Snippet {
19255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nMatch;     /* Total number of matches */
19265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nAlloc;     /* Space allocated for aMatch[] */
19275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  struct snippetMatch { /* One entry for each matching term */
19285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    char snStatus;       /* Status flag for use while constructing snippets */
19295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    short int iCol;      /* The column that contains the match */
19305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    short int iTerm;     /* The index in Query.pTerms[] of the matching term */
19315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    short int nByte;     /* Number of bytes in the term */
19325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int iStart;          /* The offset to the first character of the term */
19335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  } *aMatch;      /* Points to space obtained from malloc */
19345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char *zOffset;  /* Text rendering of aMatch[] */
19355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nOffset;    /* strlen(zOffset) */
19365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char *zSnippet; /* Snippet text */
19375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nSnippet;   /* strlen(zSnippet) */
19385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} Snippet;
19395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
19405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
19415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef enum QueryType {
19425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  QUERY_GENERIC,   /* table scan */
19435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  QUERY_ROWID,     /* lookup by rowid */
19445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  QUERY_FULLTEXT   /* QUERY_FULLTEXT + [i] is a full-text search for column i*/
19455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} QueryType;
19465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
19475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef enum fulltext_statement {
19485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  CONTENT_INSERT_STMT,
19495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  CONTENT_SELECT_STMT,
19505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  CONTENT_UPDATE_STMT,
19515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  CONTENT_DELETE_STMT,
19525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  CONTENT_EXISTS_STMT,
19535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
19545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  BLOCK_INSERT_STMT,
19555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  BLOCK_SELECT_STMT,
19565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  BLOCK_DELETE_STMT,
19575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  BLOCK_DELETE_ALL_STMT,
19585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
19595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SEGDIR_MAX_INDEX_STMT,
19605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SEGDIR_SET_STMT,
19615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SEGDIR_SELECT_LEVEL_STMT,
19625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SEGDIR_SPAN_STMT,
19635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SEGDIR_DELETE_STMT,
19645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SEGDIR_SELECT_SEGMENT_STMT,
19655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SEGDIR_SELECT_ALL_STMT,
19665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SEGDIR_DELETE_ALL_STMT,
19675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SEGDIR_COUNT_STMT,
19685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
19695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  MAX_STMT                     /* Always at end! */
19705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} fulltext_statement;
19715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
19725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* These must exactly match the enum above. */
19735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess): Is there some risk that a statement will be used in two
19745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** cursors at once, e.g.  if a query joins a virtual table to itself?
19755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** If so perhaps we should move some of these to the cursor object.
19765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
19775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static const char *const fulltext_zStatement[MAX_STMT] = {
19785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* CONTENT_INSERT */ NULL,  /* generated in contentInsertStatement() */
19795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* CONTENT_SELECT */ "select * from %_content where rowid = ?",
19805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* CONTENT_UPDATE */ NULL,  /* generated in contentUpdateStatement() */
19815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* CONTENT_DELETE */ "delete from %_content where rowid = ?",
19825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* CONTENT_EXISTS */ "select rowid from %_content limit 1",
19835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
19845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* BLOCK_INSERT */ "insert into %_segments values (?)",
19855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* BLOCK_SELECT */ "select block from %_segments where rowid = ?",
19865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* BLOCK_DELETE */ "delete from %_segments where rowid between ? and ?",
19875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* BLOCK_DELETE_ALL */ "delete from %_segments",
19885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
19895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* SEGDIR_MAX_INDEX */ "select max(idx) from %_segdir where level = ?",
19905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* SEGDIR_SET */ "insert into %_segdir values (?, ?, ?, ?, ?, ?)",
19915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* SEGDIR_SELECT_LEVEL */
19925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  "select start_block, leaves_end_block, root, idx from %_segdir "
19935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  " where level = ? order by idx",
19945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* SEGDIR_SPAN */
19955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  "select min(start_block), max(end_block) from %_segdir "
19965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  " where level = ? and start_block <> 0",
19975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* SEGDIR_DELETE */ "delete from %_segdir where level = ?",
19985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
19995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* NOTE(shess): The first three results of the following two
20005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** statements must match.
20015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
20025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* SEGDIR_SELECT_SEGMENT */
20035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  "select start_block, leaves_end_block, root from %_segdir "
20045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  " where level = ? and idx = ?",
20055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* SEGDIR_SELECT_ALL */
20065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  "select start_block, leaves_end_block, root from %_segdir "
20075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  " order by level desc, idx asc",
20085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* SEGDIR_DELETE_ALL */ "delete from %_segdir",
20095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* SEGDIR_COUNT */ "select count(*), ifnull(max(level),0) from %_segdir",
20105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)};
20115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
20125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
20135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** A connection to a fulltext index is an instance of the following
20145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** structure.  The xCreate and xConnect methods create an instance
20155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** of this structure and xDestroy and xDisconnect free that instance.
20165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** All other methods receive a pointer to the structure as one of their
20175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** arguments.
20185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
20195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)struct fulltext_vtab {
20205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_vtab base;               /* Base class used by SQLite core */
20215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3 *db;                     /* The database connection */
20225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zDb;                 /* logical database name */
20235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zName;               /* virtual table name */
20245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nColumn;                     /* number of columns in virtual table */
20255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char **azColumn;                 /* column names.  malloced */
20265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char **azContentColumn;          /* column names in content table; malloced */
20275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_tokenizer *pTokenizer;   /* tokenizer for inserts and queries */
20285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
20295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Precompiled statements which we keep as long as the table is
20305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** open.
20315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
20325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *pFulltextStatements[MAX_STMT];
20335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
20345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Precompiled statements used for segment merges.  We run a
20355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** separate select across the leaf level of each tree being merged.
20365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
20375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *pLeafSelectStmts[MERGE_COUNT];
20385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* The statement used to prepare pLeafSelectStmts. */
20395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define LEAF_SELECT \
20405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  "select block from %_segments where rowid between ? and ? order by rowid"
20415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
20425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* These buffer pending index updates during transactions.
20435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** nPendingData estimates the memory size of the pending data.  It
20445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** doesn't include the hash-bucket overhead, nor any malloc
20455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** overhead.  When nPendingData exceeds kPendingThreshold, the
20465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** buffer is flushed even before the transaction closes.
20475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** pendingTerms stores the data, and is only valid when nPendingData
20485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** is >=0 (nPendingData<0 means pendingTerms has not been
20495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** initialized).  iPrevDocid is the last docid written, used to make
20505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** certain we're inserting in sorted order.
20515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
20525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nPendingData;
20535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define kPendingThreshold (1*1024*1024)
20545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iPrevDocid;
20555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fts2Hash pendingTerms;
20565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)};
20575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
20585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
20595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** When the core wants to do a query, it create a cursor using a
20605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** call to xOpen.  This structure is an instance of a cursor.  It
20615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** is destroyed by xClose.
20625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
20635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct fulltext_cursor {
20645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_vtab_cursor base;        /* Base class used by SQLite core */
20655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  QueryType iCursorType;           /* Copy of sqlite3_index_info.idxNum */
20665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *pStmt;             /* Prepared statement in use by the cursor */
20675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int eof;                         /* True if at End Of Results */
20685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  Query q;                         /* Parsed query string */
20695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  Snippet snippet;                 /* Cached snippet for the current row */
20705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iColumn;                     /* Column being searched */
20715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer result;               /* Doclist results from fulltextQuery */
20725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLReader reader;                 /* Result reader if result not empty */
20735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} fulltext_cursor;
20745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
20755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static struct fulltext_vtab *cursor_vtab(fulltext_cursor *c){
20765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return (fulltext_vtab *) c->base.pVtab;
20775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
20785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
20795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static const sqlite3_module fts2Module;   /* forward declaration */
20805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
20815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Return a dynamically generated statement of the form
20825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *   insert into %_content (rowid, ...) values (?, ...)
20835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) */
20845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static const char *contentInsertStatement(fulltext_vtab *v){
20855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  StringBuffer sb;
20865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i;
20875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
20885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  initStringBuffer(&sb);
20895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  append(&sb, "insert into %_content (rowid, ");
20905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  appendList(&sb, v->nColumn, v->azContentColumn);
20915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  append(&sb, ") values (?");
20925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<v->nColumn; ++i)
20935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    append(&sb, ", ?");
20945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  append(&sb, ")");
20955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return stringBufferData(&sb);
20965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
20975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
20985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Return a dynamically generated statement of the form
20995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *   update %_content set [col_0] = ?, [col_1] = ?, ...
21005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *                    where rowid = ?
21015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) */
21025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static const char *contentUpdateStatement(fulltext_vtab *v){
21035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  StringBuffer sb;
21045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i;
21055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
21065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  initStringBuffer(&sb);
21075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  append(&sb, "update %_content set ");
21085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<v->nColumn; ++i) {
21095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( i>0 ){
21105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      append(&sb, ", ");
21115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
21125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    append(&sb, v->azContentColumn[i]);
21135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    append(&sb, " = ?");
21145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
21155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  append(&sb, " where rowid = ?");
21165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return stringBufferData(&sb);
21175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
21185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
21195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Puts a freshly-prepared statement determined by iStmt in *ppStmt.
21205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** If the indicated statement has never been prepared, it is prepared
21215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** and cached, otherwise the cached version is reset.
21225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
21235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int sql_get_statement(fulltext_vtab *v, fulltext_statement iStmt,
21245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                             sqlite3_stmt **ppStmt){
21255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iStmt<MAX_STMT );
21265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( v->pFulltextStatements[iStmt]==NULL ){
21275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const char *zStmt;
21285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int rc;
21295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    switch( iStmt ){
21305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      case CONTENT_INSERT_STMT:
21315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        zStmt = contentInsertStatement(v); break;
21325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      case CONTENT_UPDATE_STMT:
21335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        zStmt = contentUpdateStatement(v); break;
21345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      default:
21355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        zStmt = fulltext_zStatement[iStmt];
21365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
21375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = sql_prepare(v->db, v->zDb, v->zName, &v->pFulltextStatements[iStmt],
21385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                         zStmt);
21395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( zStmt != fulltext_zStatement[iStmt]) sqlite3_free((void *) zStmt);
21405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
21415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  } else {
21425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int rc = sqlite3_reset(v->pFulltextStatements[iStmt]);
21435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
21445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
21455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
21465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *ppStmt = v->pFulltextStatements[iStmt];
21475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
21485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
21495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
21505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Like sqlite3_step(), but convert SQLITE_DONE to SQLITE_OK and
21515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** SQLITE_ROW to SQLITE_ERROR.  Useful for statements like UPDATE,
21525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** where we expect no results.
21535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
21545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int sql_single_step(sqlite3_stmt *s){
21555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = sqlite3_step(s);
21565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return (rc==SQLITE_DONE) ? SQLITE_OK : rc;
21575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
21585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
21595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Like sql_get_statement(), but for special replicated LEAF_SELECT
21605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** statements.  idx -1 is a special case for an uncached version of
21615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the statement (used in the optimize implementation).
21625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
21635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) Write version for generic statements and then share
21645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** that between the cached-statement functions.
21655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
21665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int sql_get_leaf_statement(fulltext_vtab *v, int idx,
21675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                  sqlite3_stmt **ppStmt){
21685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( idx>=-1 && idx<MERGE_COUNT );
21695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( idx==-1 ){
21705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return sql_prepare(v->db, v->zDb, v->zName, ppStmt, LEAF_SELECT);
21715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else if( v->pLeafSelectStmts[idx]==NULL ){
21725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int rc = sql_prepare(v->db, v->zDb, v->zName, &v->pLeafSelectStmts[idx],
21735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                         LEAF_SELECT);
21745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
21755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
21765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int rc = sqlite3_reset(v->pLeafSelectStmts[idx]);
21775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
21785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
21795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
21805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *ppStmt = v->pLeafSelectStmts[idx];
21815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
21825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
21835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
21845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* insert into %_content (rowid, ...) values ([rowid], [pValues]) */
21855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int content_insert(fulltext_vtab *v, sqlite3_value *rowid,
21865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          sqlite3_value **pValues){
21875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *s;
21885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i;
21895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = sql_get_statement(v, CONTENT_INSERT_STMT, &s);
21905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
21915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
21925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_value(s, 1, rowid);
21935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
21945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
21955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<v->nColumn; ++i){
21965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = sqlite3_bind_value(s, 2+i, pValues[i]);
21975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
21985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
21995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return sql_single_step(s);
22015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
22025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* update %_content set col0 = pValues[0], col1 = pValues[1], ...
22045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *                  where rowid = [iRowid] */
22055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int content_update(fulltext_vtab *v, sqlite3_value **pValues,
22065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          sqlite_int64 iRowid){
22075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *s;
22085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i;
22095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = sql_get_statement(v, CONTENT_UPDATE_STMT, &s);
22105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
22115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<v->nColumn; ++i){
22135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = sqlite3_bind_value(s, 1+i, pValues[i]);
22145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
22155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
22165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_int64(s, 1+v->nColumn, iRowid);
22185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
22195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return sql_single_step(s);
22215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
22225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void freeStringArray(int nString, const char **pString){
22245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i;
22255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for (i=0 ; i < nString ; ++i) {
22275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( pString[i]!=NULL ) sqlite3_free((void *) pString[i]);
22285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
22295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free((void *) pString);
22305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
22315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* select * from %_content where rowid = [iRow]
22335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * The caller must delete the returned array and all strings in it.
22345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * null fields will be NULL in the returned array.
22355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *
22365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * TODO: Perhaps we should return pointer/length strings here for consistency
22375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * with other code which uses pointer/length. */
22385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int content_select(fulltext_vtab *v, sqlite_int64 iRow,
22395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          const char ***pValues){
22405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *s;
22415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char **values;
22425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i;
22435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
22445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *pValues = NULL;
22465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sql_get_statement(v, CONTENT_SELECT_STMT, &s);
22485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
22495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_int64(s, 1, iRow);
22515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
22525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_step(s);
22545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_ROW ) return rc;
22555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  values = (const char **) sqlite3_malloc(v->nColumn * sizeof(const char *));
22575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<v->nColumn; ++i){
22585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( sqlite3_column_type(s, i)==SQLITE_NULL ){
22595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      values[i] = NULL;
22605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
22615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      values[i] = string_dup((char*)sqlite3_column_text(s, i));
22625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
22635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
22645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* We expect only one row.  We must execute another sqlite3_step()
22665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)   * to complete the iteration; otherwise the table will remain locked. */
22675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_step(s);
22685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_DONE ){
22695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *pValues = values;
22705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_OK;
22715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
22725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  freeStringArray(v->nColumn, values);
22745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
22755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
22765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* delete from %_content where rowid = [iRow ] */
22785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int content_delete(fulltext_vtab *v, sqlite_int64 iRow){
22795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *s;
22805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = sql_get_statement(v, CONTENT_DELETE_STMT, &s);
22815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
22825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_int64(s, 1, iRow);
22845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
22855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return sql_single_step(s);
22875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
22885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Returns SQLITE_ROW if any rows exist in %_content, SQLITE_DONE if
22905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** no rows exist, and any error in case of failure.
22915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
22925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int content_exists(fulltext_vtab *v){
22935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *s;
22945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = sql_get_statement(v, CONTENT_EXISTS_STMT, &s);
22955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
22965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
22975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_step(s);
22985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_ROW ) return rc;
22995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* We expect only one row.  We must execute another sqlite3_step()
23015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)   * to complete the iteration; otherwise the table will remain locked. */
23025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_step(s);
23035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_DONE ) return SQLITE_ROW;
23045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_ROW ) return SQLITE_ERROR;
23055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
23065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
23075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* insert into %_segments values ([pData])
23095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   returns assigned rowid in *piBlockid
23105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
23115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int block_insert(fulltext_vtab *v, const char *pData, int nData,
23125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                        sqlite_int64 *piBlockid){
23135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *s;
23145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = sql_get_statement(v, BLOCK_INSERT_STMT, &s);
23155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
23165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_blob(s, 1, pData, nData, SQLITE_STATIC);
23185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
23195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_step(s);
23215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_ROW ) return SQLITE_ERROR;
23225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_DONE ) return rc;
23235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *piBlockid = sqlite3_last_insert_rowid(v->db);
23255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
23265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
23275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* delete from %_segments
23295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   where rowid between [iStartBlockid] and [iEndBlockid]
23305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
23315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Deletes the range of blocks, inclusive, used to delete the blocks
23325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** which form a segment.
23335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
23345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int block_delete(fulltext_vtab *v,
23355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                        sqlite_int64 iStartBlockid, sqlite_int64 iEndBlockid){
23365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *s;
23375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = sql_get_statement(v, BLOCK_DELETE_STMT, &s);
23385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
23395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_int64(s, 1, iStartBlockid);
23415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
23425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_int64(s, 2, iEndBlockid);
23445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
23455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return sql_single_step(s);
23475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
23485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Returns SQLITE_ROW with *pidx set to the maximum segment idx found
23505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** at iLevel.  Returns SQLITE_DONE if there are no segments at
23515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** iLevel.  Otherwise returns an error.
23525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
23535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int segdir_max_index(fulltext_vtab *v, int iLevel, int *pidx){
23545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *s;
23555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = sql_get_statement(v, SEGDIR_MAX_INDEX_STMT, &s);
23565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
23575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_int(s, 1, iLevel);
23595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
23605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_step(s);
23625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Should always get at least one row due to how max() works. */
23635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_DONE ) return SQLITE_DONE;
23645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_ROW ) return rc;
23655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* NULL means that there were no inputs to max(). */
23675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( SQLITE_NULL==sqlite3_column_type(s, 0) ){
23685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = sqlite3_step(s);
23695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc==SQLITE_ROW ) return SQLITE_ERROR;
23705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return rc;
23715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
23725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *pidx = sqlite3_column_int(s, 0);
23745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* We expect only one row.  We must execute another sqlite3_step()
23765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)   * to complete the iteration; otherwise the table will remain locked. */
23775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_step(s);
23785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_ROW ) return SQLITE_ERROR;
23795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_DONE ) return rc;
23805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_ROW;
23815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
23825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* insert into %_segdir values (
23845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   [iLevel], [idx],
23855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   [iStartBlockid], [iLeavesEndBlockid], [iEndBlockid],
23865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   [pRootData]
23875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** )
23885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
23895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int segdir_set(fulltext_vtab *v, int iLevel, int idx,
23905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                      sqlite_int64 iStartBlockid,
23915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                      sqlite_int64 iLeavesEndBlockid,
23925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                      sqlite_int64 iEndBlockid,
23935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                      const char *pRootData, int nRootData){
23945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *s;
23955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = sql_get_statement(v, SEGDIR_SET_STMT, &s);
23965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
23975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
23985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_int(s, 1, iLevel);
23995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
24005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_int(s, 2, idx);
24025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
24035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_int64(s, 3, iStartBlockid);
24055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
24065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_int64(s, 4, iLeavesEndBlockid);
24085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
24095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_int64(s, 5, iEndBlockid);
24115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
24125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_blob(s, 6, pRootData, nRootData, SQLITE_STATIC);
24145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
24155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return sql_single_step(s);
24175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
24185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Queries %_segdir for the block span of the segments in level
24205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** iLevel.  Returns SQLITE_DONE if there are no blocks for iLevel,
24215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** SQLITE_ROW if there are blocks, else an error.
24225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
24235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int segdir_span(fulltext_vtab *v, int iLevel,
24245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       sqlite_int64 *piStartBlockid,
24255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       sqlite_int64 *piEndBlockid){
24265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *s;
24275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = sql_get_statement(v, SEGDIR_SPAN_STMT, &s);
24285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
24295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_int(s, 1, iLevel);
24315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
24325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_step(s);
24345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_DONE ) return SQLITE_DONE;  /* Should never happen */
24355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_ROW ) return rc;
24365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* This happens if all segments at this level are entirely inline. */
24385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( SQLITE_NULL==sqlite3_column_type(s, 0) ){
24395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* We expect only one row.  We must execute another sqlite3_step()
24405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     * to complete the iteration; otherwise the table will remain locked. */
24415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int rc2 = sqlite3_step(s);
24425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc2==SQLITE_ROW ) return SQLITE_ERROR;
24435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return rc2;
24445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
24455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *piStartBlockid = sqlite3_column_int64(s, 0);
24475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *piEndBlockid = sqlite3_column_int64(s, 1);
24485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* We expect only one row.  We must execute another sqlite3_step()
24505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)   * to complete the iteration; otherwise the table will remain locked. */
24515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_step(s);
24525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_ROW ) return SQLITE_ERROR;
24535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_DONE ) return rc;
24545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_ROW;
24555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
24565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Delete the segment blocks and segment directory records for all
24585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** segments at iLevel.
24595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
24605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int segdir_delete(fulltext_vtab *v, int iLevel){
24615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *s;
24625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iStartBlockid, iEndBlockid;
24635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = segdir_span(v, iLevel, &iStartBlockid, &iEndBlockid);
24645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_ROW && rc!=SQLITE_DONE ) return rc;
24655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_ROW ){
24675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = block_delete(v, iStartBlockid, iEndBlockid);
24685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
24695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
24705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Delete the segment directory itself. */
24725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sql_get_statement(v, SEGDIR_DELETE_STMT, &s);
24735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
24745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_int64(s, 1, iLevel);
24765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
24775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return sql_single_step(s);
24795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
24805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Delete entire fts index, SQLITE_OK on success, relevant error on
24825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** failure.
24835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
24845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int segdir_delete_all(fulltext_vtab *v){
24855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *s;
24865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = sql_get_statement(v, SEGDIR_DELETE_ALL_STMT, &s);
24875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
24885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sql_single_step(s);
24905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
24915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sql_get_statement(v, BLOCK_DELETE_ALL_STMT, &s);
24935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
24945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return sql_single_step(s);
24965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
24975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
24985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Returns SQLITE_OK with *pnSegments set to the number of entries in
24995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** %_segdir and *piMaxLevel set to the highest level which has a
25005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** segment.  Otherwise returns the SQLite error which caused failure.
25015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
25025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int segdir_count(fulltext_vtab *v, int *pnSegments, int *piMaxLevel){
25035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *s;
25045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = sql_get_statement(v, SEGDIR_COUNT_STMT, &s);
25055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
25065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
25075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_step(s);
25085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO(shess): This case should not be possible?  Should stronger
25095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** measures be taken if it happens?
25105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
25115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_DONE ){
25125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *pnSegments = 0;
25135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *piMaxLevel = 0;
25145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_OK;
25155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
25165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_ROW ) return rc;
25175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
25185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *pnSegments = sqlite3_column_int(s, 0);
25195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *piMaxLevel = sqlite3_column_int(s, 1);
25205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
25215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* We expect only one row.  We must execute another sqlite3_step()
25225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)   * to complete the iteration; otherwise the table will remain locked. */
25235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_step(s);
25245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_DONE ) return SQLITE_OK;
25255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_ROW ) return SQLITE_ERROR;
25265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
25275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
25285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
25295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) clearPendingTerms() is far down the file because
25305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** writeZeroSegment() is far down the file because LeafWriter is far
25315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** down the file.  Consider refactoring the code to move the non-vtab
25325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** code above the vtab code so that we don't need this forward
25335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** reference.
25345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
25355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int clearPendingTerms(fulltext_vtab *v);
25365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
25375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
25385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Free the memory used to contain a fulltext_vtab structure.
25395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
25405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void fulltext_vtab_destroy(fulltext_vtab *v){
25415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iStmt, i;
25425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
25435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TRACE(("FTS2 Destroy %p\n", v));
25445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for( iStmt=0; iStmt<MAX_STMT; iStmt++ ){
25455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( v->pFulltextStatements[iStmt]!=NULL ){
25465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_finalize(v->pFulltextStatements[iStmt]);
25475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      v->pFulltextStatements[iStmt] = NULL;
25485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
25495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
25505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
25515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for( i=0; i<MERGE_COUNT; i++ ){
25525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( v->pLeafSelectStmts[i]!=NULL ){
25535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_finalize(v->pLeafSelectStmts[i]);
25545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      v->pLeafSelectStmts[i] = NULL;
25555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
25565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
25575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
25585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( v->pTokenizer!=NULL ){
25595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    v->pTokenizer->pModule->xDestroy(v->pTokenizer);
25605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    v->pTokenizer = NULL;
25615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
25625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
25635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  clearPendingTerms(v);
25645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
25655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(v->azColumn);
25665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i = 0; i < v->nColumn; ++i) {
25675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_free(v->azContentColumn[i]);
25685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
25695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(v->azContentColumn);
25705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(v);
25715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
25725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
25735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
25745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Token types for parsing the arguments to xConnect or xCreate.
25755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
25765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define TOKEN_EOF         0    /* End of file */
25775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define TOKEN_SPACE       1    /* Any kind of whitespace */
25785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define TOKEN_ID          2    /* An identifier */
25795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define TOKEN_STRING      3    /* A string literal */
25805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define TOKEN_PUNCT       4    /* A single punctuation character */
25815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
25825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
25835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** If X is a character that can be used in an identifier then
25845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** IdChar(X) will be true.  Otherwise it is false.
25855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
25865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** For ASCII, any character with the high-order bit set is
25875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** allowed in an identifier.  For 7-bit characters,
25885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** sqlite3IsIdChar[X] must be 1.
25895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
25905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Ticket #1066.  the SQL standard does not allow '$' in the
25915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** middle of identfiers.  But many SQL implementations do.
25925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** SQLite will allow '$' in identifiers for compatibility.
25935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** But the feature is undocumented.
25945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
25955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static const char isIdChar[] = {
25965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF */
25975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  /* 2x */
25985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0,  /* 3x */
25995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  /* 4x */
26005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1,  /* 5x */
26015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  /* 6x */
26025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,  /* 7x */
26035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)};
26045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define IdChar(C)  (((c=C)&0x80)!=0 || (c>0x1f && isIdChar[c-0x20]))
26055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
26065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
26075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
26085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Return the length of the token that begins at z[0].
26095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Store the token type in *tokenType before returning.
26105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
26115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int getToken(const char *z, int *tokenType){
26125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i, c;
26135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  switch( *z ){
26145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    case 0: {
26155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      *tokenType = TOKEN_EOF;
26165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return 0;
26175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
26185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    case ' ': case '\t': case '\n': case '\f': case '\r': {
26195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      for(i=1; safe_isspace(z[i]); i++){}
26205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      *tokenType = TOKEN_SPACE;
26215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return i;
26225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
26235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    case '`':
26245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    case '\'':
26255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    case '"': {
26265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      int delim = z[0];
26275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      for(i=1; (c=z[i])!=0; i++){
26285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( c==delim ){
26295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          if( z[i+1]==delim ){
26305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            i++;
26315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          }else{
26325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            break;
26335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          }
26345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
26355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
26365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      *tokenType = TOKEN_STRING;
26375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return i + (c!=0);
26385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
26395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    case '[': {
26405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      for(i=1, c=z[0]; c!=']' && (c=z[i])!=0; i++){}
26415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      *tokenType = TOKEN_ID;
26425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return i;
26435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
26445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    default: {
26455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( !IdChar(*z) ){
26465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        break;
26475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
26485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      for(i=1; IdChar(z[i]); i++){}
26495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      *tokenType = TOKEN_ID;
26505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return i;
26515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
26525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
26535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *tokenType = TOKEN_PUNCT;
26545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return 1;
26555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
26565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
26575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
26585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** A token extracted from a string is an instance of the following
26595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** structure.
26605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
26615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct Token {
26625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *z;       /* Pointer to token text.  Not '\000' terminated */
26635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  short int n;         /* Length of the token text in bytes. */
26645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} Token;
26655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
26665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
26675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Given a input string (which is really one of the argv[] parameters
26685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** passed into xConnect or xCreate) split the string up into tokens.
26695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Return an array of pointers to '\000' terminated strings, one string
26705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** for each non-whitespace token.
26715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
26725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** The returned array is terminated by a single NULL pointer.
26735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
26745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Space to hold the returned array is obtained from a single
26755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** malloc and should be freed by passing the return value to free().
26765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** The individual strings within the token list are all a part of
26775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the single memory allocation and will all be freed at once.
26785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
26795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static char **tokenizeString(const char *z, int *pnToken){
26805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nToken = 0;
26815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  Token *aToken = sqlite3_malloc( strlen(z) * sizeof(aToken[0]) );
26825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n = 1;
26835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int e, i;
26845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int totalSize = 0;
26855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char **azToken;
26865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char *zCopy;
26875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( n>0 ){
26885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = getToken(z, &e);
26895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( e!=TOKEN_SPACE ){
26905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      aToken[nToken].z = z;
26915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      aToken[nToken].n = n;
26925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      nToken++;
26935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      totalSize += n+1;
26945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
26955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    z += n;
26965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
26975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  azToken = (char**)sqlite3_malloc( nToken*sizeof(char*) + totalSize );
26985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  zCopy = (char*)&azToken[nToken];
26995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nToken--;
27005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<nToken; i++){
27015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    azToken[i] = zCopy;
27025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = aToken[i].n;
27035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    memcpy(zCopy, aToken[i].z, n);
27045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    zCopy[n] = 0;
27055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    zCopy += n+1;
27065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
27075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  azToken[nToken] = 0;
27085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(aToken);
27095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *pnToken = nToken;
27105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return azToken;
27115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
27125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
27135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
27145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Convert an SQL-style quoted string into a normal string by removing
27155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the quote characters.  The conversion is done in-place.  If the
27165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** input does not begin with a quote character, then this routine
27175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** is a no-op.
27185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
27195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Examples:
27205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
27215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     "abc"   becomes   abc
27225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     'xyz'   becomes   xyz
27235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     [pqr]   becomes   pqr
27245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     `mno`   becomes   mno
27255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
27265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dequoteString(char *z){
27275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int quote;
27285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i, j;
27295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( z==0 ) return;
27305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  quote = z[0];
27315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  switch( quote ){
27325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    case '\'':  break;
27335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    case '"':   break;
27345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    case '`':   break;                /* For MySQL compatibility */
27355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    case '[':   quote = ']';  break;  /* For MS SqlServer compatibility */
27365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    default:    return;
27375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
27385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=1, j=0; z[i]; i++){
27395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( z[i]==quote ){
27405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( z[i+1]==quote ){
27415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        z[j++] = quote;
27425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        i++;
27435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }else{
27445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        z[j++] = 0;
27455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        break;
27465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
27475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
27485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      z[j++] = z[i];
27495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
27505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
27515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
27525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
27535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
27545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** The input azIn is a NULL-terminated list of tokens.  Remove the first
27555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** token and all punctuation tokens.  Remove the quotes from
27565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** around string literal tokens.
27575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
27585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Example:
27595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
27605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     input:      tokenize chinese ( 'simplifed' , 'mixed' )
27615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     output:     chinese simplifed mixed
27625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
27635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Another example:
27645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
27655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     input:      delimiters ( '[' , ']' , '...' )
27665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**     output:     [ ] ...
27675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
27685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void tokenListToIdList(char **azIn){
27695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i, j;
27705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( azIn ){
27715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    for(i=0, j=-1; azIn[i]; i++){
27725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( safe_isalnum(azIn[i][0]) || azIn[i][1] ){
27735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        dequoteString(azIn[i]);
27745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( j>=0 ){
27755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          azIn[j] = azIn[i];
27765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
27775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        j++;
27785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
27795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
27805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    azIn[j] = 0;
27815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
27825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
27835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
27845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
27855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
27865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Find the first alphanumeric token in the string zIn.  Null-terminate
27875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** this token.  Remove any quotation marks.  And return a pointer to
27885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the result.
27895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
27905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static char *firstToken(char *zIn, char **pzTail){
27915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n, ttype;
27925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while(1){
27935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = getToken(zIn, &ttype);
27945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( ttype==TOKEN_SPACE ){
27955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      zIn += n;
27965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else if( ttype==TOKEN_EOF ){
27975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      *pzTail = zIn;
27985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return 0;
27995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
28005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      zIn[n] = 0;
28015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      *pzTail = &zIn[1];
28025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dequoteString(zIn);
28035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return zIn;
28045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
28055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
28065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /*NOTREACHED*/
28075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
28085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
28095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Return true if...
28105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
28115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   *  s begins with the string t, ignoring case
28125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   *  s is longer than t
28135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   *  The first character of s beyond t is not a alphanumeric
28145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
28155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Ignore leading space in *s.
28165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
28175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** To put it another way, return true if the first token of
28185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** s[] is t[].
28195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
28205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int startsWith(const char *s, const char *t){
28215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( safe_isspace(*s) ){ s++; }
28225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( *t ){
28235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( safe_tolower(*s++)!=safe_tolower(*t++) ) return 0;
28245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
28255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return *s!='_' && !safe_isalnum(*s);
28265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
28275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
28285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
28295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** An instance of this structure defines the "spec" of a
28305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** full text index.  This structure is populated by parseSpec
28315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** and use by fulltextConnect and fulltextCreate.
28325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
28335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct TableSpec {
28345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zDb;         /* Logical database name */
28355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zName;       /* Name of the full-text index */
28365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nColumn;             /* Number of columns to be indexed */
28375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char **azColumn;         /* Original names of columns to be indexed */
28385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char **azContentColumn;  /* Column names for %_content */
28395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char **azTokenizer;      /* Name of tokenizer and its arguments */
28405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} TableSpec;
28415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
28425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
28435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Reclaim all of the memory used by a TableSpec
28445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
28455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void clearTableSpec(TableSpec *p) {
28465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(p->azColumn);
28475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(p->azContentColumn);
28485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(p->azTokenizer);
28495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
28505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
28515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Parse a CREATE VIRTUAL TABLE statement, which looks like this:
28525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *
28535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * CREATE VIRTUAL TABLE email
28545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *        USING fts2(subject, body, tokenize mytokenizer(myarg))
28555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *
28565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * We return parsed information in a TableSpec structure.
28575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) *
28585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) */
28595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int parseSpec(TableSpec *pSpec, int argc, const char *const*argv,
28605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                     char**pzErr){
28615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i, n;
28625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char *z, *zDummy;
28635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char **azArg;
28645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zTokenizer = 0;    /* argv[] entry describing the tokenizer */
28655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
28665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( argc>=3 );
28675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Current interface:
28685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** argv[0] - module name
28695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** argv[1] - database name
28705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** argv[2] - table name
28715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** argv[3..] - columns, optionally followed by tokenizer specification
28725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  **             and snippet delimiters specification.
28735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
28745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
28755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Make a copy of the complete argv[][] array in a single allocation.
28765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** The argv[][] array is read-only and transient.  We can write to the
28775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** copy in order to modify things and the copy is persistent.
28785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
28795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  CLEAR(pSpec);
28805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=n=0; i<argc; i++){
28815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n += strlen(argv[i]) + 1;
28825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
28835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  azArg = sqlite3_malloc( sizeof(char*)*argc + n );
28845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( azArg==0 ){
28855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_NOMEM;
28865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
28875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  z = (char*)&azArg[argc];
28885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<argc; i++){
28895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    azArg[i] = z;
28905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    strcpy(z, argv[i]);
28915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    z += strlen(z)+1;
28925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
28935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
28945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Identify the column names and the tokenizer and delimiter arguments
28955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** in the argv[][] array.
28965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
28975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pSpec->zDb = azArg[1];
28985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pSpec->zName = azArg[2];
28995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pSpec->nColumn = 0;
29005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pSpec->azColumn = azArg;
29015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  zTokenizer = "tokenize simple";
29025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=3; i<argc; ++i){
29035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( startsWith(azArg[i],"tokenize") ){
29045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      zTokenizer = azArg[i];
29055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
29065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      z = azArg[pSpec->nColumn] = firstToken(azArg[i], &zDummy);
29075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      pSpec->nColumn++;
29085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
29095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
29105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pSpec->nColumn==0 ){
29115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    azArg[0] = "content";
29125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pSpec->nColumn = 1;
29135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
29145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
29155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /*
29165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** Construct the list of content column names.
29175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  **
29185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** Each content column name will be of the form cNNAAAA
29195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** where NN is the column number and AAAA is the sanitized
29205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** column name.  "sanitized" means that special characters are
29215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** converted to "_".  The cNN prefix guarantees that all column
29225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** names are unique.
29235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  **
29245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** The AAAA suffix is not strictly necessary.  It is included
29255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** for the convenience of people who might examine the generated
29265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** %_content table and wonder what the columns are used for.
29275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
29285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pSpec->azContentColumn = sqlite3_malloc( pSpec->nColumn * sizeof(char *) );
29295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pSpec->azContentColumn==0 ){
29305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    clearTableSpec(pSpec);
29315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_NOMEM;
29325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
29335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<pSpec->nColumn; i++){
29345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    char *p;
29355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pSpec->azContentColumn[i] = sqlite3_mprintf("c%d%s", i, azArg[i]);
29365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    for (p = pSpec->azContentColumn[i]; *p ; ++p) {
29375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( !safe_isalnum(*p) ) *p = '_';
29385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
29395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
29405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
29415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /*
29425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** Parse the tokenizer specification string.
29435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
29445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pSpec->azTokenizer = tokenizeString(zTokenizer, &n);
29455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  tokenListToIdList(pSpec->azTokenizer);
29465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
29475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
29485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
29495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
29505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
29515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Generate a CREATE TABLE statement that describes the schema of
29525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the virtual table.  Return a pointer to this schema string.
29535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
29545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Space is obtained from sqlite3_mprintf() and should be freed
29555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** using sqlite3_free().
29565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
29575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static char *fulltextSchema(
29585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nColumn,                  /* Number of columns */
29595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *const* azColumn,  /* List of columns */
29605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zTableName        /* Name of the table */
29615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
29625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i;
29635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char *zSchema, *zNext;
29645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zSep = "(";
29655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  zSchema = sqlite3_mprintf("CREATE TABLE x");
29665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<nColumn; i++){
29675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    zNext = sqlite3_mprintf("%s%s%Q", zSchema, zSep, azColumn[i]);
29685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_free(zSchema);
29695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    zSchema = zNext;
29705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    zSep = ",";
29715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
29725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  zNext = sqlite3_mprintf("%s,%Q)", zSchema, zTableName);
29735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(zSchema);
29745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return zNext;
29755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
29765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
29775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
29785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Build a new sqlite3_vtab structure that will describe the
29795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** fulltext index defined by spec.
29805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
29815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int constructVtab(
29825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3 *db,              /* The SQLite database connection */
29835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fts2Hash *pHash,          /* Hash table containing tokenizers */
29845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TableSpec *spec,          /* Parsed spec information from parseSpec() */
29855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_vtab **ppVTab,    /* Write the resulting vtab structure here */
29865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char **pzErr              /* Write any error message here */
29875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
29885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
29895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n;
29905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab *v = 0;
29915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const sqlite3_tokenizer_module *m = NULL;
29925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char *schema;
29935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
29945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char const *zTok;         /* Name of tokenizer to use for this fts table */
29955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nTok;                 /* Length of zTok, including nul terminator */
29965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
29975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  v = (fulltext_vtab *) sqlite3_malloc(sizeof(fulltext_vtab));
29985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( v==0 ) return SQLITE_NOMEM;
29995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  CLEAR(v);
30005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* sqlite will initialize v->base */
30015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  v->db = db;
30025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  v->zDb = spec->zDb;       /* Freed when azColumn is freed */
30035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  v->zName = spec->zName;   /* Freed when azColumn is freed */
30045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  v->nColumn = spec->nColumn;
30055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  v->azContentColumn = spec->azContentColumn;
30065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  spec->azContentColumn = 0;
30075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  v->azColumn = spec->azColumn;
30085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  spec->azColumn = 0;
30095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
30105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( spec->azTokenizer==0 ){
30115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_NOMEM;
30125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
30135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
30145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  zTok = spec->azTokenizer[0];
30155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( !zTok ){
30165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    zTok = "simple";
30175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
30185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nTok = strlen(zTok)+1;
30195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
30205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  m = (sqlite3_tokenizer_module *)sqlite3Fts2HashFind(pHash, zTok, nTok);
30215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( !m ){
30225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *pzErr = sqlite3_mprintf("unknown tokenizer: %s", spec->azTokenizer[0]);
30235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = SQLITE_ERROR;
30245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    goto err;
30255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
30265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
30275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(n=0; spec->azTokenizer[n]; n++){}
30285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( n ){
30295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = m->xCreate(n-1, (const char*const*)&spec->azTokenizer[1],
30305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                    &v->pTokenizer);
30315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
30325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = m->xCreate(0, 0, &v->pTokenizer);
30335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
30345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) goto err;
30355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  v->pTokenizer->pModule = m;
30365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
30375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO: verify the existence of backing tables foo_content, foo_term */
30385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
30395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  schema = fulltextSchema(v->nColumn, (const char*const*)v->azColumn,
30405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          spec->zName);
30415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_declare_vtab(db, schema);
30425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(schema);
30435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) goto err;
30445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
30455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  memset(v->pFulltextStatements, 0, sizeof(v->pFulltextStatements));
30465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
30475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Indicate that the buffer is not live. */
30485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  v->nPendingData = -1;
30495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
30505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *ppVTab = &v->base;
30515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TRACE(("FTS2 Connect %p\n", v));
30525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
30535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
30545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
30555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)err:
30565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab_destroy(v);
30575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
30585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
30595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
30605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextConnect(
30615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3 *db,
30625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  void *pAux,
30635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int argc, const char *const*argv,
30645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_vtab **ppVTab,
30655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char **pzErr
30665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
30675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TableSpec spec;
30685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = parseSpec(&spec, argc, argv, pzErr);
30695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
30705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
30715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = constructVtab(db, (fts2Hash *)pAux, &spec, ppVTab, pzErr);
30725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  clearTableSpec(&spec);
30735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
30745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
30755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
30765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* The %_content table holds the text of each document, with
30775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the rowid used as the docid.
30785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
30795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) This comment needs elaboration to match the updated
30805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** code.  Work it into the top-of-file comment at that time.
30815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
30825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextCreate(sqlite3 *db, void *pAux,
30835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          int argc, const char * const *argv,
30845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          sqlite3_vtab **ppVTab, char **pzErr){
30855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
30865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TableSpec spec;
30875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  StringBuffer schema;
30885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TRACE(("FTS2 Create\n"));
30895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
30905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = parseSpec(&spec, argc, argv, pzErr);
30915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
30925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
30935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  initStringBuffer(&schema);
30945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  append(&schema, "CREATE TABLE %_content(");
30955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  appendList(&schema, spec.nColumn, spec.azContentColumn);
30965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  append(&schema, ")");
30975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sql_exec(db, spec.zDb, spec.zName, stringBufferData(&schema));
30985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  stringBufferDestroy(&schema);
30995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) goto out;
31005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sql_exec(db, spec.zDb, spec.zName,
31025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                "create table %_segments(block blob);");
31035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) goto out;
31045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sql_exec(db, spec.zDb, spec.zName,
31065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                "create table %_segdir("
31075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                "  level integer,"
31085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                "  idx integer,"
31095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                "  start_block integer,"
31105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                "  leaves_end_block integer,"
31115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                "  end_block integer,"
31125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                "  root blob,"
31135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                "  primary key(level, idx)"
31145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                ");");
31155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) goto out;
31165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = constructVtab(db, (fts2Hash *)pAux, &spec, ppVTab, pzErr);
31185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)out:
31205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  clearTableSpec(&spec);
31215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
31225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
31235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Decide how to handle an SQL query. */
31255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextBestIndex(sqlite3_vtab *pVTab, sqlite3_index_info *pInfo){
31265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i;
31275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TRACE(("FTS2 BestIndex\n"));
31285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<pInfo->nConstraint; ++i){
31305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const struct sqlite3_index_constraint *pConstraint;
31315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pConstraint = &pInfo->aConstraint[i];
31325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( pConstraint->usable ) {
31335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( pConstraint->iColumn==-1 &&
31345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          pConstraint->op==SQLITE_INDEX_CONSTRAINT_EQ ){
31355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        pInfo->idxNum = QUERY_ROWID;      /* lookup by rowid */
31365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        TRACE(("FTS2 QUERY_ROWID\n"));
31375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      } else if( pConstraint->iColumn>=0 &&
31385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                 pConstraint->op==SQLITE_INDEX_CONSTRAINT_MATCH ){
31395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        /* full-text search */
31405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        pInfo->idxNum = QUERY_FULLTEXT + pConstraint->iColumn;
31415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        TRACE(("FTS2 QUERY_FULLTEXT %d\n", pConstraint->iColumn));
31425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      } else continue;
31435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      pInfo->aConstraintUsage[i].argvIndex = 1;
31455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      pInfo->aConstraintUsage[i].omit = 1;
31465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* An arbitrary value for now.
31485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)       * TODO: Perhaps rowid matches should be considered cheaper than
31495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)       * full-text searches. */
31505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      pInfo->estimatedCost = 1.0;
31515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return SQLITE_OK;
31535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
31545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
31555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pInfo->idxNum = QUERY_GENERIC;
31565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
31575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
31585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextDisconnect(sqlite3_vtab *pVTab){
31605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TRACE(("FTS2 Disconnect %p\n", pVTab));
31615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab_destroy((fulltext_vtab *)pVTab);
31625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
31635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
31645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextDestroy(sqlite3_vtab *pVTab){
31665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab *v = (fulltext_vtab *)pVTab;
31675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
31685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TRACE(("FTS2 Destroy %p\n", pVTab));
31705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sql_exec(v->db, v->zDb, v->zName,
31715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                "drop table if exists %_content;"
31725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                "drop table if exists %_segments;"
31735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                "drop table if exists %_segdir;"
31745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                );
31755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
31765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab_destroy((fulltext_vtab *)pVTab);
31785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
31795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
31805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextOpen(sqlite3_vtab *pVTab, sqlite3_vtab_cursor **ppCursor){
31825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_cursor *c;
31835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  c = (fulltext_cursor *) sqlite3_malloc(sizeof(fulltext_cursor));
31855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( c ){
31865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    memset(c, 0, sizeof(fulltext_cursor));
31875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* sqlite will initialize c->base */
31885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *ppCursor = &c->base;
31895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    TRACE(("FTS2 Open %p: %p\n", pVTab, c));
31905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_OK;
31915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
31925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_NOMEM;
31935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
31945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
31955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
31975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Free all of the dynamically allocated memory held by *q
31985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
31995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void queryClear(Query *q){
32005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i;
32015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i = 0; i < q->nTerms; ++i){
32025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_free(q->pTerms[i].pTerm);
32035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
32045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(q->pTerms);
32055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  CLEAR(q);
32065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
32075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
32085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Free all of the dynamically allocated memory held by the
32095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Snippet
32105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
32115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void snippetClear(Snippet *p){
32125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(p->aMatch);
32135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(p->zOffset);
32145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(p->zSnippet);
32155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  CLEAR(p);
32165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
32175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
32185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Append a single entry to the p->aMatch[] log.
32195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
32205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void snippetAppendMatch(
32215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  Snippet *p,               /* Append the entry to this snippet */
32225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iCol, int iTerm,      /* The column and query term */
32235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iStart, int nByte     /* Offset and size of the match */
32245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
32255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i;
32265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  struct snippetMatch *pMatch;
32275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( p->nMatch+1>=p->nAlloc ){
32285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    p->nAlloc = p->nAlloc*2 + 10;
32295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    p->aMatch = sqlite3_realloc(p->aMatch, p->nAlloc*sizeof(p->aMatch[0]) );
32305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( p->aMatch==0 ){
32315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      p->nMatch = 0;
32325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      p->nAlloc = 0;
32335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return;
32345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
32355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
32365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  i = p->nMatch++;
32375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pMatch = &p->aMatch[i];
32385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pMatch->iCol = iCol;
32395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pMatch->iTerm = iTerm;
32405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pMatch->iStart = iStart;
32415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pMatch->nByte = nByte;
32425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
32435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
32445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
32455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Sizing information for the circular buffer used in snippetOffsetsOfColumn()
32465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
32475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define FTS2_ROTOR_SZ   (32)
32485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define FTS2_ROTOR_MASK (FTS2_ROTOR_SZ-1)
32495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
32505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
32515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Add entries to pSnippet->aMatch[] for every match that occurs against
32525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** document zDoc[0..nDoc-1] which is stored in column iColumn.
32535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
32545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void snippetOffsetsOfColumn(
32555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  Query *pQuery,
32565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  Snippet *pSnippet,
32575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iColumn,
32585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zDoc,
32595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nDoc
32605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
32615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const sqlite3_tokenizer_module *pTModule;  /* The tokenizer module */
32625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_tokenizer *pTokenizer;             /* The specific tokenizer */
32635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_tokenizer_cursor *pTCursor;        /* Tokenizer cursor */
32645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab *pVtab;                /* The full text index */
32655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nColumn;                         /* Number of columns in the index */
32665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const QueryTerm *aTerm;              /* Query string terms */
32675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nTerm;                           /* Number of query string terms */
32685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i, j;                            /* Loop counters */
32695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;                              /* Return code */
32705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  unsigned int match, prevMatch;       /* Phrase search bitmasks */
32715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zToken;                  /* Next token from the tokenizer */
32725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nToken;                          /* Size of zToken */
32735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iBegin, iEnd, iPos;              /* Offsets of beginning and end */
32745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
32755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* The following variables keep a circular buffer of the last
32765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** few tokens */
32775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  unsigned int iRotor = 0;             /* Index of current token */
32785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iRotorBegin[FTS2_ROTOR_SZ];      /* Beginning offset of token */
32795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iRotorLen[FTS2_ROTOR_SZ];        /* Length of token */
32805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
32815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pVtab = pQuery->pFts;
32825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nColumn = pVtab->nColumn;
32835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pTokenizer = pVtab->pTokenizer;
32845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pTModule = pTokenizer->pModule;
32855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = pTModule->xOpen(pTokenizer, zDoc, nDoc, &pTCursor);
32865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc ) return;
32875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pTCursor->pTokenizer = pTokenizer;
32885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  aTerm = pQuery->pTerms;
32895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nTerm = pQuery->nTerms;
32905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nTerm>=FTS2_ROTOR_SZ ){
32915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nTerm = FTS2_ROTOR_SZ - 1;
32925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
32935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  prevMatch = 0;
32945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while(1){
32955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = pTModule->xNext(pTCursor, &zToken, &nToken, &iBegin, &iEnd, &iPos);
32965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc ) break;
32975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iRotorBegin[iRotor&FTS2_ROTOR_MASK] = iBegin;
32985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iRotorLen[iRotor&FTS2_ROTOR_MASK] = iEnd-iBegin;
32995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    match = 0;
33005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    for(i=0; i<nTerm; i++){
33015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      int iCol;
33025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      iCol = aTerm[i].iColumn;
33035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( iCol>=0 && iCol<nColumn && iCol!=iColumn ) continue;
33045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( aTerm[i].nTerm>nToken ) continue;
33055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( !aTerm[i].isPrefix && aTerm[i].nTerm<nToken ) continue;
33065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( aTerm[i].nTerm<=nToken );
33075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( memcmp(aTerm[i].pTerm, zToken, aTerm[i].nTerm) ) continue;
33085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( aTerm[i].iPhrase>1 && (prevMatch & (1<<i))==0 ) continue;
33095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      match |= 1<<i;
33105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( i==nTerm-1 || aTerm[i+1].iPhrase==1 ){
33115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        for(j=aTerm[i].iPhrase-1; j>=0; j--){
33125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          int k = (iRotor-j) & FTS2_ROTOR_MASK;
33135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          snippetAppendMatch(pSnippet, iColumn, i-j,
33145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                iRotorBegin[k], iRotorLen[k]);
33155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
33165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
33175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
33185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    prevMatch = match<<1;
33195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iRotor++;
33205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
33215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pTModule->xClose(pTCursor);
33225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
33235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
33245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
33255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
33265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Compute all offsets for the current row of the query.
33275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** If the offsets have already been computed, this routine is a no-op.
33285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
33295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void snippetAllOffsets(fulltext_cursor *p){
33305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nColumn;
33315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iColumn, i;
33325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iFirst, iLast;
33335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab *pFts;
33345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
33355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( p->snippet.nMatch ) return;
33365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( p->q.nTerms==0 ) return;
33375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pFts = p->q.pFts;
33385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nColumn = pFts->nColumn;
33395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  iColumn = (p->iCursorType - QUERY_FULLTEXT);
33405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( iColumn<0 || iColumn>=nColumn ){
33415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iFirst = 0;
33425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iLast = nColumn-1;
33435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
33445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iFirst = iColumn;
33455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iLast = iColumn;
33465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
33475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=iFirst; i<=iLast; i++){
33485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const char *zDoc;
33495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int nDoc;
33505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    zDoc = (const char*)sqlite3_column_text(p->pStmt, i+1);
33515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nDoc = sqlite3_column_bytes(p->pStmt, i+1);
33525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    snippetOffsetsOfColumn(&p->q, &p->snippet, i, zDoc, nDoc);
33535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
33545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
33555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
33565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
33575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Convert the information in the aMatch[] array of the snippet
33585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** into the string zOffset[0..nOffset-1].
33595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
33605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void snippetOffsetText(Snippet *p){
33615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i;
33625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int cnt = 0;
33635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  StringBuffer sb;
33645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char zBuf[200];
33655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( p->zOffset ) return;
33665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  initStringBuffer(&sb);
33675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<p->nMatch; i++){
33685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    struct snippetMatch *pMatch = &p->aMatch[i];
33695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    zBuf[0] = ' ';
33705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_snprintf(sizeof(zBuf)-1, &zBuf[cnt>0], "%d %d %d %d",
33715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        pMatch->iCol, pMatch->iTerm, pMatch->iStart, pMatch->nByte);
33725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    append(&sb, zBuf);
33735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    cnt++;
33745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
33755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  p->zOffset = stringBufferData(&sb);
33765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  p->nOffset = stringBufferLength(&sb);
33775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
33785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
33795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
33805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** zDoc[0..nDoc-1] is phrase of text.  aMatch[0..nMatch-1] are a set
33815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** of matching words some of which might be in zDoc.  zDoc is column
33825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** number iCol.
33835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
33845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** iBreak is suggested spot in zDoc where we could begin or end an
33855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** excerpt.  Return a value similar to iBreak but possibly adjusted
33865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** to be a little left or right so that the break point is better.
33875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
33885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int wordBoundary(
33895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iBreak,                   /* The suggested break point */
33905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zDoc,             /* Document text */
33915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nDoc,                     /* Number of bytes in zDoc[] */
33925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  struct snippetMatch *aMatch,  /* Matching words */
33935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nMatch,                   /* Number of entries in aMatch[] */
33945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iCol                      /* The column number for zDoc[] */
33955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
33965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i;
33975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( iBreak<=10 ){
33985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return 0;
33995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
34005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( iBreak>=nDoc-10 ){
34015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return nDoc;
34025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
34035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<nMatch && aMatch[i].iCol<iCol; i++){}
34045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( i<nMatch && aMatch[i].iStart+aMatch[i].nByte<iBreak ){ i++; }
34055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( i<nMatch ){
34065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( aMatch[i].iStart<iBreak+10 ){
34075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return aMatch[i].iStart;
34085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
34095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( i>0 && aMatch[i-1].iStart+aMatch[i-1].nByte>=iBreak ){
34105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return aMatch[i-1].iStart;
34115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
34125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
34135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=1; i<=10; i++){
34145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( safe_isspace(zDoc[iBreak-i]) ){
34155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return iBreak - i + 1;
34165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
34175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( safe_isspace(zDoc[iBreak+i]) ){
34185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return iBreak + i + 1;
34195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
34205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
34215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return iBreak;
34225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
34235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
34245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
34255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
34265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
34275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Allowed values for Snippet.aMatch[].snStatus
34285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
34295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define SNIPPET_IGNORE  0   /* It is ok to omit this match from the snippet */
34305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define SNIPPET_DESIRED 1   /* We want to include this match in the snippet */
34315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
34325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
34335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Generate the text of a snippet.
34345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
34355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void snippetText(
34365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_cursor *pCursor,   /* The cursor we need the snippet for */
34375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zStartMark,     /* Markup to appear before each match */
34385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zEndMark,       /* Markup to appear after each match */
34395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zEllipsis       /* Ellipsis mark */
34405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
34415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i, j;
34425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  struct snippetMatch *aMatch;
34435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nMatch;
34445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nDesired;
34455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  StringBuffer sb;
34465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int tailCol;
34475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int tailOffset;
34485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iCol;
34495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nDoc;
34505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zDoc;
34515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iStart, iEnd;
34525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int tailEllipsis = 0;
34535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iMatch;
34545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
34555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
34565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(pCursor->snippet.zSnippet);
34575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pCursor->snippet.zSnippet = 0;
34585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  aMatch = pCursor->snippet.aMatch;
34595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nMatch = pCursor->snippet.nMatch;
34605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  initStringBuffer(&sb);
34615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
34625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<nMatch; i++){
34635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    aMatch[i].snStatus = SNIPPET_IGNORE;
34645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
34655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nDesired = 0;
34665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<pCursor->q.nTerms; i++){
34675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    for(j=0; j<nMatch; j++){
34685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( aMatch[j].iTerm==i ){
34695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        aMatch[j].snStatus = SNIPPET_DESIRED;
34705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        nDesired++;
34715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        break;
34725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
34735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
34745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
34755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
34765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  iMatch = 0;
34775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  tailCol = -1;
34785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  tailOffset = 0;
34795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<nMatch && nDesired>0; i++){
34805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( aMatch[i].snStatus!=SNIPPET_DESIRED ) continue;
34815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nDesired--;
34825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iCol = aMatch[i].iCol;
34835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    zDoc = (const char*)sqlite3_column_text(pCursor->pStmt, iCol+1);
34845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nDoc = sqlite3_column_bytes(pCursor->pStmt, iCol+1);
34855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iStart = aMatch[i].iStart - 40;
34865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iStart = wordBoundary(iStart, zDoc, nDoc, aMatch, nMatch, iCol);
34875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( iStart<=10 ){
34885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      iStart = 0;
34895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
34905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( iCol==tailCol && iStart<=tailOffset+20 ){
34915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      iStart = tailOffset;
34925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
34935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( (iCol!=tailCol && tailCol>=0) || iStart!=tailOffset ){
34945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      trimWhiteSpace(&sb);
34955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      appendWhiteSpace(&sb);
34965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      append(&sb, zEllipsis);
34975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      appendWhiteSpace(&sb);
34985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
34995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iEnd = aMatch[i].iStart + aMatch[i].nByte + 40;
35005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iEnd = wordBoundary(iEnd, zDoc, nDoc, aMatch, nMatch, iCol);
35015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( iEnd>=nDoc-10 ){
35025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      iEnd = nDoc;
35035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      tailEllipsis = 0;
35045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
35055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      tailEllipsis = 1;
35065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
35075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    while( iMatch<nMatch && aMatch[iMatch].iCol<iCol ){ iMatch++; }
35085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    while( iStart<iEnd ){
35095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      while( iMatch<nMatch && aMatch[iMatch].iStart<iStart
35105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)             && aMatch[iMatch].iCol<=iCol ){
35115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        iMatch++;
35125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
35135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( iMatch<nMatch && aMatch[iMatch].iStart<iEnd
35145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)             && aMatch[iMatch].iCol==iCol ){
35155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        nappend(&sb, &zDoc[iStart], aMatch[iMatch].iStart - iStart);
35165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        iStart = aMatch[iMatch].iStart;
35175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        append(&sb, zStartMark);
35185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        nappend(&sb, &zDoc[iStart], aMatch[iMatch].nByte);
35195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        append(&sb, zEndMark);
35205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        iStart += aMatch[iMatch].nByte;
35215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        for(j=iMatch+1; j<nMatch; j++){
35225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          if( aMatch[j].iTerm==aMatch[iMatch].iTerm
35235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)              && aMatch[j].snStatus==SNIPPET_DESIRED ){
35245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            nDesired--;
35255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            aMatch[j].snStatus = SNIPPET_IGNORE;
35265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          }
35275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
35285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }else{
35295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        nappend(&sb, &zDoc[iStart], iEnd - iStart);
35305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        iStart = iEnd;
35315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
35325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
35335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    tailCol = iCol;
35345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    tailOffset = iEnd;
35355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
35365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  trimWhiteSpace(&sb);
35375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( tailEllipsis ){
35385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    appendWhiteSpace(&sb);
35395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    append(&sb, zEllipsis);
35405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
35415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pCursor->snippet.zSnippet = stringBufferData(&sb);
35425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pCursor->snippet.nSnippet = stringBufferLength(&sb);
35435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
35445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
35455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
35465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
35475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Close the cursor.  For additional information see the documentation
35485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** on the xClose method of the virtual table interface.
35495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
35505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextClose(sqlite3_vtab_cursor *pCursor){
35515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_cursor *c = (fulltext_cursor *) pCursor;
35525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TRACE(("FTS2 Close %p\n", c));
35535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_finalize(c->pStmt);
35545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  queryClear(&c->q);
35555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  snippetClear(&c->snippet);
35565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( c->result.nData!=0 ) dlrDestroy(&c->reader);
35575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferDestroy(&c->result);
35585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(c);
35595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
35605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
35615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
35625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextNext(sqlite3_vtab_cursor *pCursor){
35635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_cursor *c = (fulltext_cursor *) pCursor;
35645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
35655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
35665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TRACE(("FTS2 Next %p\n", pCursor));
35675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  snippetClear(&c->snippet);
35685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( c->iCursorType < QUERY_FULLTEXT ){
35695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* TODO(shess) Handle SQLITE_SCHEMA AND SQLITE_BUSY. */
35705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = sqlite3_step(c->pStmt);
35715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    switch( rc ){
35725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      case SQLITE_ROW:
35735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        c->eof = 0;
35745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        return SQLITE_OK;
35755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      case SQLITE_DONE:
35765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        c->eof = 1;
35775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        return SQLITE_OK;
35785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      default:
35795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        c->eof = 1;
35805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        return rc;
35815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
35825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  } else {  /* full-text query */
35835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = sqlite3_reset(c->pStmt);
35845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
35855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
35865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( c->result.nData==0 || dlrAtEnd(&c->reader) ){
35875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      c->eof = 1;
35885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return SQLITE_OK;
35895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
35905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = sqlite3_bind_int64(c->pStmt, 1, dlrDocid(&c->reader));
35915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
35925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = dlrStep(&c->reader);
35935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
35945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* TODO(shess) Handle SQLITE_SCHEMA AND SQLITE_BUSY. */
35955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = sqlite3_step(c->pStmt);
35965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc==SQLITE_ROW ){   /* the case we expect */
35975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      c->eof = 0;
35985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return SQLITE_OK;
35995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
36005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
36015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Corrupt if the index refers to missing document. */
36025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc==SQLITE_DONE ) return SQLITE_CORRUPT_BKPT;
36035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
36045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return rc;
36055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
36065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
36075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
36085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
36095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) If we pushed LeafReader to the top of the file, or to
36105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** another file, term_select() could be pushed above
36115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** docListOfTerm().
36125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
36135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int termSelect(fulltext_vtab *v, int iColumn,
36145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                      const char *pTerm, int nTerm, int isPrefix,
36155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                      DocListType iType, DataBuffer *out);
36165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
36175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Return a DocList corresponding to the query term *pTerm.  If *pTerm
36185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** is the first term of a phrase query, go ahead and evaluate the phrase
36195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** query and return the doclist for the entire phrase query.
36205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
36215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** The resulting DL_DOCIDS doclist is stored in pResult, which is
36225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** overwritten.
36235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
36245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int docListOfTerm(
36255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab *v,   /* The full text index */
36265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iColumn,        /* column to restrict to.  No restriction if >=nColumn */
36275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  QueryTerm *pQTerm,  /* Term we are looking for, or 1st term of a phrase */
36285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer *pResult /* Write the result here */
36295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
36305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer left, right, new;
36315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i, rc;
36325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
36335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* No phrase search if no position info. */
36345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pQTerm->nPhrase==0 || DL_DEFAULT!=DL_DOCIDS );
36355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
36365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* This code should never be called with buffered updates. */
36375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( v->nPendingData<0 );
36385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
36395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferInit(&left, 0);
36405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = termSelect(v, iColumn, pQTerm->pTerm, pQTerm->nTerm, pQTerm->isPrefix,
36415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                  0<pQTerm->nPhrase ? DL_POSITIONS : DL_DOCIDS, &left);
36425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc ) return rc;
36435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=1; i<=pQTerm->nPhrase && left.nData>0; i++){
36445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferInit(&right, 0);
36455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = termSelect(v, iColumn, pQTerm[i].pTerm, pQTerm[i].nTerm,
36465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                    pQTerm[i].isPrefix, DL_POSITIONS, &right);
36475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc ){
36485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dataBufferDestroy(&left);
36495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return rc;
36505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
36515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferInit(&new, 0);
36525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = docListPhraseMerge(left.pData, left.nData, right.pData, right.nData,
36535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                            i<pQTerm->nPhrase ? DL_POSITIONS : DL_DOCIDS, &new);
36545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferDestroy(&left);
36555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferDestroy(&right);
36565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ){
36575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dataBufferDestroy(&new);
36585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return rc;
36595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
36605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    left = new;
36615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
36625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *pResult = left;
36635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
36645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
36655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
36665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Add a new term pTerm[0..nTerm-1] to the query *q.
36675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
36685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void queryAdd(Query *q, const char *pTerm, int nTerm){
36695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  QueryTerm *t;
36705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ++q->nTerms;
36715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  q->pTerms = sqlite3_realloc(q->pTerms, q->nTerms * sizeof(q->pTerms[0]));
36725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( q->pTerms==0 ){
36735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    q->nTerms = 0;
36745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return;
36755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
36765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  t = &q->pTerms[q->nTerms - 1];
36775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  CLEAR(t);
36785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  t->pTerm = sqlite3_malloc(nTerm+1);
36795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  memcpy(t->pTerm, pTerm, nTerm);
36805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  t->pTerm[nTerm] = 0;
36815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  t->nTerm = nTerm;
36825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  t->isOr = q->nextIsOr;
36835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  t->isPrefix = 0;
36845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  q->nextIsOr = 0;
36855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  t->iColumn = q->nextColumn;
36865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  q->nextColumn = q->dfltColumn;
36875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
36885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
36895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
36905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Check to see if the string zToken[0...nToken-1] matches any
36915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** column name in the virtual table.   If it does,
36925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** return the zero-indexed column number.  If not, return -1.
36935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
36945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int checkColumnSpecifier(
36955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab *pVtab,    /* The virtual table */
36965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zToken,      /* Text of the token */
36975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nToken               /* Number of characters in the token */
36985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
36995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i;
37005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<pVtab->nColumn; i++){
37015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( memcmp(pVtab->azColumn[i], zToken, nToken)==0
37025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        && pVtab->azColumn[i][nToken]==0 ){
37035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return i;
37045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
37055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
37065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return -1;
37075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
37085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
37095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
37105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Parse the text at pSegment[0..nSegment-1].  Add additional terms
37115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** to the query being assemblied in pQuery.
37125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
37135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** inPhrase is true if pSegment[0..nSegement-1] is contained within
37145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** double-quotes.  If inPhrase is true, then the first term
37155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** is marked with the number of terms in the phrase less one and
37165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** OR and "-" syntax is ignored.  If inPhrase is false, then every
37175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** term found is marked with nPhrase=0 and OR and "-" syntax is significant.
37185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
37195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int tokenizeSegment(
37205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_tokenizer *pTokenizer,          /* The tokenizer to use */
37215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pSegment, int nSegment,     /* Query expression being parsed */
37225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int inPhrase,                           /* True if within "..." */
37235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  Query *pQuery                           /* Append results here */
37245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
37255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const sqlite3_tokenizer_module *pModule = pTokenizer->pModule;
37265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_tokenizer_cursor *pCursor;
37275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int firstIndex = pQuery->nTerms;
37285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iCol;
37295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nTerm = 1;
37305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iEndLast = -1;
37315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
37325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = pModule->xOpen(pTokenizer, pSegment, nSegment, &pCursor);
37335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
37345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pCursor->pTokenizer = pTokenizer;
37355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
37365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( 1 ){
37375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const char *pToken;
37385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int nToken, iBegin, iEnd, iPos;
37395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
37405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = pModule->xNext(pCursor,
37415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                        &pToken, &nToken,
37425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                        &iBegin, &iEnd, &iPos);
37435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) break;
37445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( !inPhrase &&
37455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        pSegment[iEnd]==':' &&
37465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)         (iCol = checkColumnSpecifier(pQuery->pFts, pToken, nToken))>=0 ){
37475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      pQuery->nextColumn = iCol;
37485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      continue;
37495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
37505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( !inPhrase && pQuery->nTerms>0 && nToken==2
37515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)         && pSegment[iBegin]=='O' && pSegment[iBegin+1]=='R' ){
37525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      pQuery->nextIsOr = 1;
37535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      continue;
37545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
37555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
37565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /*
37575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     * The ICU tokenizer considers '*' a break character, so the code below
37585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     * sets isPrefix correctly, but since that code doesn't eat the '*', the
37595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     * ICU tokenizer returns it as the next token.  So eat it here until a
37605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     * better solution presents itself.
37615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     */
37625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( pQuery->nTerms>0 && nToken==1 && pSegment[iBegin]=='*' &&
37635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        iEndLast==iBegin){
37645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      pQuery->pTerms[pQuery->nTerms-1].isPrefix = 1;
37655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      continue;
37665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
37675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iEndLast = iEnd;
37685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
37695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    queryAdd(pQuery, pToken, nToken);
37705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( !inPhrase && iBegin>0 && pSegment[iBegin-1]=='-' ){
37715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      pQuery->pTerms[pQuery->nTerms-1].isNot = 1;
37725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
37735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( iEnd<nSegment && pSegment[iEnd]=='*' ){
37745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      pQuery->pTerms[pQuery->nTerms-1].isPrefix = 1;
37755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
37765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pQuery->pTerms[pQuery->nTerms-1].iPhrase = nTerm;
37775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( inPhrase ){
37785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      nTerm++;
37795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
37805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
37815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
37825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( inPhrase && pQuery->nTerms>firstIndex ){
37835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pQuery->pTerms[firstIndex].nPhrase = pQuery->nTerms - firstIndex - 1;
37845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
37855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
37865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pModule->xClose(pCursor);
37875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
37885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
37895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Parse a query string, yielding a Query object pQuery.
37905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
37915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** The calling function will need to queryClear() to clean up
37925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the dynamically allocated memory held by pQuery.
37935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
37945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int parseQuery(
37955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab *v,        /* The fulltext index */
37965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zInput,      /* Input text of the query string */
37975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nInput,              /* Size of the input text */
37985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int dfltColumn,          /* Default column of the index to match against */
37995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  Query *pQuery            /* Write the parse results here. */
38005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
38015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iInput, inPhrase = 0;
38025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
38035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( zInput==0 ) nInput = 0;
38045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nInput<0 ) nInput = strlen(zInput);
38055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pQuery->nTerms = 0;
38065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pQuery->pTerms = NULL;
38075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pQuery->nextIsOr = 0;
38085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pQuery->nextColumn = dfltColumn;
38095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pQuery->dfltColumn = dfltColumn;
38105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pQuery->pFts = v;
38115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
38125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(iInput=0; iInput<nInput; ++iInput){
38135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int i;
38145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    for(i=iInput; i<nInput && zInput[i]!='"'; ++i){}
38155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( i>iInput ){
38165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      tokenizeSegment(v->pTokenizer, zInput+iInput, i-iInput, inPhrase,
38175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       pQuery);
38185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
38195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iInput = i;
38205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( i<nInput ){
38215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( zInput[i]=='"' );
38225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      inPhrase = !inPhrase;
38235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
38245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
38255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
38265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( inPhrase ){
38275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* unmatched quote */
38285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    queryClear(pQuery);
38295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_ERROR;
38305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
38315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
38325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
38335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
38345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) Refactor the code to remove this forward decl. */
38355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int flushPendingTerms(fulltext_vtab *v);
38365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
38375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Perform a full-text query using the search expression in
38385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** zInput[0..nInput-1].  Return a list of matching documents
38395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** in pResult.
38405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
38415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Queries must match column iColumn.  Or if iColumn>=nColumn
38425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** they are allowed to match against any column.
38435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
38445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextQuery(
38455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab *v,      /* The full text index */
38465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iColumn,           /* Match against this column by default */
38475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zInput,    /* The query string */
38485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nInput,            /* Number of bytes in zInput[] */
38495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer *pResult,   /* Write the result doclist here */
38505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  Query *pQuery          /* Put parsed query string here */
38515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
38525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i, iNext, rc;
38535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer left, right, or, new;
38545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nNot = 0;
38555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  QueryTerm *aTerm;
38565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
38575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO(shess) Instead of flushing pendingTerms, we could query for
38585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** the relevant term and merge the doclist into what we receive from
38595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** the database.  Wait and see if this is a common issue, first.
38605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  **
38615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** A good reason not to flush is to not generate update-related
38625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** error codes from here.
38635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
38645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
38655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Flush any buffered updates before executing the query. */
38665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = flushPendingTerms(v);
38675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
38685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
38695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO(shess) I think that the queryClear() calls below are not
38705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** necessary, because fulltextClose() already clears the query.
38715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
38725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = parseQuery(v, zInput, nInput, iColumn, pQuery);
38735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
38745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
38755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Empty or NULL queries return no results. */
38765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pQuery->nTerms==0 ){
38775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferInit(pResult, 0);
38785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_OK;
38795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
38805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
38815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Merge AND terms. */
38825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO(shess) I think we can early-exit if( i>nNot && left.nData==0 ). */
38835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  aTerm = pQuery->pTerms;
38845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i = 0; i<pQuery->nTerms; i=iNext){
38855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( aTerm[i].isNot ){
38865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* Handle all NOT terms in a separate pass */
38875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      nNot++;
38885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      iNext = i + aTerm[i].nPhrase+1;
38895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      continue;
38905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
38915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iNext = i + aTerm[i].nPhrase + 1;
38925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = docListOfTerm(v, aTerm[i].iColumn, &aTerm[i], &right);
38935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc ){
38945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( i!=nNot ) dataBufferDestroy(&left);
38955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      queryClear(pQuery);
38965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return rc;
38975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
38985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    while( iNext<pQuery->nTerms && aTerm[iNext].isOr ){
38995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = docListOfTerm(v, aTerm[iNext].iColumn, &aTerm[iNext], &or);
39005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      iNext += aTerm[iNext].nPhrase + 1;
39015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc ){
39025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( i!=nNot ) dataBufferDestroy(&left);
39035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        dataBufferDestroy(&right);
39045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        queryClear(pQuery);
39055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        return rc;
39065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
39075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dataBufferInit(&new, 0);
39085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = docListOrMerge(right.pData, right.nData, or.pData, or.nData, &new);
39095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dataBufferDestroy(&right);
39105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dataBufferDestroy(&or);
39115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ){
39125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( i!=nNot ) dataBufferDestroy(&left);
39135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        queryClear(pQuery);
39145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        dataBufferDestroy(&new);
39155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        return rc;
39165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
39175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      right = new;
39185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
39195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( i==nNot ){           /* first term processed. */
39205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      left = right;
39215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
39225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dataBufferInit(&new, 0);
39235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = docListAndMerge(left.pData, left.nData,
39245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                           right.pData, right.nData, &new);
39255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dataBufferDestroy(&right);
39265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dataBufferDestroy(&left);
39275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ){
39285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        queryClear(pQuery);
39295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        dataBufferDestroy(&new);
39305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        return rc;
39315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
39325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      left = new;
39335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
39345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
39355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
39365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nNot==pQuery->nTerms ){
39375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* We do not yet know how to handle a query of only NOT terms */
39385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_ERROR;
39395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
39405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
39415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Do the EXCEPT terms */
39425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<pQuery->nTerms;  i += aTerm[i].nPhrase + 1){
39435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( !aTerm[i].isNot ) continue;
39445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = docListOfTerm(v, aTerm[i].iColumn, &aTerm[i], &right);
39455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc ){
39465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      queryClear(pQuery);
39475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dataBufferDestroy(&left);
39485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return rc;
39495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
39505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferInit(&new, 0);
39515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = docListExceptMerge(left.pData, left.nData,
39525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                            right.pData, right.nData, &new);
39535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferDestroy(&right);
39545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferDestroy(&left);
39555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ){
39565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      queryClear(pQuery);
39575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dataBufferDestroy(&new);
39585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return rc;
39595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
39605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    left = new;
39615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
39625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
39635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *pResult = left;
39645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
39655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
39665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
39675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
39685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** This is the xFilter interface for the virtual table.  See
39695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the virtual table xFilter method documentation for additional
39705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** information.
39715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
39725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** If idxNum==QUERY_GENERIC then do a full table scan against
39735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the %_content table.
39745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
39755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** If idxNum==QUERY_ROWID then do a rowid lookup for a single entry
39765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** in the %_content table.
39775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
39785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** If idxNum>=QUERY_FULLTEXT then use the full text index.  The
39795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** column on the left-hand side of the MATCH operator is column
39805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** number idxNum-QUERY_FULLTEXT, 0 indexed.  argv[0] is the right-hand
39815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** side of the MATCH operator.
39825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
39835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) Upgrade the cursor initialization and destruction to
39845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** account for fulltextFilter() being called multiple times on the
39855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** same cursor.  The current solution is very fragile.  Apply fix to
39865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** fts2 as appropriate.
39875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
39885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextFilter(
39895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_vtab_cursor *pCursor,     /* The cursor used for this query */
39905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int idxNum, const char *idxStr,   /* Which indexing scheme to use */
39915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int argc, sqlite3_value **argv    /* Arguments for the indexing scheme */
39925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
39935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_cursor *c = (fulltext_cursor *) pCursor;
39945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab *v = cursor_vtab(c);
39955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
39965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
39975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TRACE(("FTS2 Filter %p\n",pCursor));
39985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
39995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* If the cursor has a statement that was not prepared according to
40005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** idxNum, clear it.  I believe all calls to fulltextFilter with a
40015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** given cursor will have the same idxNum , but in this case it's
40025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** easy to be safe.
40035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
40045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( c->pStmt && c->iCursorType!=idxNum ){
40055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_finalize(c->pStmt);
40065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    c->pStmt = NULL;
40075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
40085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
40095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Get a fresh statement appropriate to idxNum. */
40105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO(shess): Add a prepared-statement cache in the vt structure.
40115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** The cache must handle multiple open cursors.  Easier to cache the
40125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** statement variants at the vt to reduce malloc/realloc/free here.
40135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** Or we could have a StringBuffer variant which allowed stack
40145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** construction for small values.
40155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
40165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( !c->pStmt ){
40175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    char *zSql = sqlite3_mprintf("select rowid, * from %%_content %s",
40185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                 idxNum==QUERY_GENERIC ? "" : "where rowid=?");
40195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = sql_prepare(v->db, v->zDb, v->zName, &c->pStmt, zSql);
40205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_free(zSql);
40215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
40225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    c->iCursorType = idxNum;
40235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
40245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_reset(c->pStmt);
40255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( c->iCursorType==idxNum );
40265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
40275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
40285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  switch( idxNum ){
40295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    case QUERY_GENERIC:
40305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      break;
40315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
40325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    case QUERY_ROWID:
40335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = sqlite3_bind_int64(c->pStmt, 1, sqlite3_value_int64(argv[0]));
40345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) return rc;
40355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      break;
40365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
40375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    default:   /* full-text search */
40385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    {
40395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      const char *zQuery = (const char *)sqlite3_value_text(argv[0]);
40405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( idxNum<=QUERY_FULLTEXT+v->nColumn);
40415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( argc==1 );
40425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      queryClear(&c->q);
40435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( c->result.nData!=0 ){
40445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        /* This case happens if the same cursor is used repeatedly. */
40455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        dlrDestroy(&c->reader);
40465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        dataBufferReset(&c->result);
40475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }else{
40485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        dataBufferInit(&c->result, 0);
40495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
40505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = fulltextQuery(v, idxNum-QUERY_FULLTEXT, zQuery, -1, &c->result, &c->q);
40515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) return rc;
40525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( c->result.nData!=0 ){
40535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = dlrInit(&c->reader, DL_DOCIDS, c->result.pData, c->result.nData);
40545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( rc!=SQLITE_OK ) return rc;
40555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
40565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      break;
40575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
40585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
40595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
40605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return fulltextNext(pCursor);
40615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
40625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
40635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* This is the xEof method of the virtual table.  The SQLite core
40645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** calls this routine to find out if it has reached the end of
40655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** a query's results set.
40665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
40675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextEof(sqlite3_vtab_cursor *pCursor){
40685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_cursor *c = (fulltext_cursor *) pCursor;
40695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return c->eof;
40705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
40715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
40725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* This is the xColumn method of the virtual table.  The SQLite
40735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** core calls this method during a query when it needs the value
40745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** of a column from the virtual table.  This method needs to use
40755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** one of the sqlite3_result_*() routines to store the requested
40765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** value back in the pContext.
40775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
40785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextColumn(sqlite3_vtab_cursor *pCursor,
40795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          sqlite3_context *pContext, int idxCol){
40805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_cursor *c = (fulltext_cursor *) pCursor;
40815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab *v = cursor_vtab(c);
40825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
40835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( idxCol<v->nColumn ){
40845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_value *pVal = sqlite3_column_value(c->pStmt, idxCol+1);
40855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_result_value(pContext, pVal);
40865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else if( idxCol==v->nColumn ){
40875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* The extra column whose name is the same as the table.
40885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** Return a blob which is a pointer to the cursor
40895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    */
40905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_result_blob(pContext, &c, sizeof(c), SQLITE_TRANSIENT);
40915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
40925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
40935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
40945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
40955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* This is the xRowid method.  The SQLite core calls this routine to
40965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** retrive the rowid for the current row of the result set.  The
40975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** rowid should be written to *pRowid.
40985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
40995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextRowid(sqlite3_vtab_cursor *pCursor, sqlite_int64 *pRowid){
41005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_cursor *c = (fulltext_cursor *) pCursor;
41015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
41025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *pRowid = sqlite3_column_int64(c->pStmt, 0);
41035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
41045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
41055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
41065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Add all terms in [zText] to pendingTerms table.  If [iColumn] > 0,
41075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** we also store positions and offsets in the hash table using that
41085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** column number.
41095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
41105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int buildTerms(fulltext_vtab *v, sqlite_int64 iDocid,
41115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                      const char *zText, int iColumn){
41125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_tokenizer *pTokenizer = v->pTokenizer;
41135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_tokenizer_cursor *pCursor;
41145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pToken;
41155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nTokenBytes;
41165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iStartOffset, iEndOffset, iPosition;
41175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
41185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
41195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = pTokenizer->pModule->xOpen(pTokenizer, zText, -1, &pCursor);
41205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
41215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
41225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pCursor->pTokenizer = pTokenizer;
41235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( SQLITE_OK==(rc=pTokenizer->pModule->xNext(pCursor,
41245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                                   &pToken, &nTokenBytes,
41255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                                   &iStartOffset, &iEndOffset,
41265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                                   &iPosition)) ){
41275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    DLCollector *p;
41285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int nData;                   /* Size of doclist before our update. */
41295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
41305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Positions can't be negative; we use -1 as a terminator
41315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     * internally.  Token can't be NULL or empty. */
41325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( iPosition<0 || pToken == NULL || nTokenBytes == 0 ){
41335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = SQLITE_ERROR;
41345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      break;
41355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
41365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
41375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    p = fts2HashFind(&v->pendingTerms, pToken, nTokenBytes);
41385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( p==NULL ){
41395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      nData = 0;
41405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      p = dlcNew(iDocid, DL_DEFAULT);
41415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      fts2HashInsert(&v->pendingTerms, pToken, nTokenBytes, p);
41425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
41435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* Overhead for our hash table entry, the key, and the value. */
41445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      v->nPendingData += sizeof(struct fts2HashElem)+sizeof(*p)+nTokenBytes;
41455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
41465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      nData = p->b.nData;
41475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( p->dlw.iPrevDocid!=iDocid ) dlcNext(p, iDocid);
41485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
41495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( iColumn>=0 ){
41505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dlcAddPos(p, iColumn, iPosition, iStartOffset, iEndOffset);
41515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
41525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
41535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Accumulate data added by dlcNew or dlcNext, and dlcAddPos. */
41545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    v->nPendingData += p->b.nData-nData;
41555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
41565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
41575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO(shess) Check return?  Should this be able to cause errors at
41585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** this point?  Actually, same question about sqlite3_finalize(),
41595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** though one could argue that failure there means that the data is
41605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** not durable.  *ponder*
41615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
41625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pTokenizer->pModule->xClose(pCursor);
41635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( SQLITE_DONE == rc ) return SQLITE_OK;
41645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
41655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
41665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
41675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Add doclists for all terms in [pValues] to pendingTerms table. */
41685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int insertTerms(fulltext_vtab *v, sqlite_int64 iRowid,
41695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       sqlite3_value **pValues){
41705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i;
41715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i = 0; i < v->nColumn ; ++i){
41725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    char *zText = (char*)sqlite3_value_text(pValues[i]);
41735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int rc = buildTerms(v, iRowid, zText, i);
41745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
41755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
41765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
41775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
41785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
41795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Add empty doclists for all terms in the given row's content to
41805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** pendingTerms.
41815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
41825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int deleteTerms(fulltext_vtab *v, sqlite_int64 iRowid){
41835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char **pValues;
41845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i, rc;
41855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
41865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO(shess) Should we allow such tables at all? */
41875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( DL_DEFAULT==DL_DOCIDS ) return SQLITE_ERROR;
41885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
41895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = content_select(v, iRowid, &pValues);
41905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
41915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
41925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i = 0 ; i < v->nColumn; ++i) {
41935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = buildTerms(v, iRowid, pValues[i], -1);
41945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) break;
41955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
41965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
41975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  freeStringArray(v->nColumn, pValues);
41985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
41995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
42005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) Refactor the code to remove this forward decl. */
42025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int initPendingTerms(fulltext_vtab *v, sqlite_int64 iDocid);
42035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Insert a row into the %_content table; set *piRowid to be the ID of the
42055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** new row.  Add doclists for terms to pendingTerms.
42065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
42075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int index_insert(fulltext_vtab *v, sqlite3_value *pRequestRowid,
42085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                        sqlite3_value **pValues, sqlite_int64 *piRowid){
42095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
42105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = content_insert(v, pRequestRowid, pValues);  /* execute an SQL INSERT */
42125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
42135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *piRowid = sqlite3_last_insert_rowid(v->db);
42155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = initPendingTerms(v, *piRowid);
42165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
42175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return insertTerms(v, *piRowid, pValues);
42195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
42205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Delete a row from the %_content table; add empty doclists for terms
42225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** to pendingTerms.
42235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
42245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int index_delete(fulltext_vtab *v, sqlite_int64 iRow){
42255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = initPendingTerms(v, iRow);
42265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
42275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = deleteTerms(v, iRow);
42295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
42305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return content_delete(v, iRow);  /* execute an SQL DELETE */
42325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
42335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Update a row in the %_content table; add delete doclists to
42355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** pendingTerms for old terms not in the new data, add insert doclists
42365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** to pendingTerms for terms in the new data.
42375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
42385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int index_update(fulltext_vtab *v, sqlite_int64 iRow,
42395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                        sqlite3_value **pValues){
42405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = initPendingTerms(v, iRow);
42415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
42425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Generate an empty doclist for each term that previously appeared in this
42445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)   * row. */
42455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = deleteTerms(v, iRow);
42465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
42475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = content_update(v, pValues, iRow);  /* execute an SQL UPDATE */
42495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
42505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Now add positions for terms which appear in the updated row. */
42525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return insertTerms(v, iRow, pValues);
42535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
42545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*******************************************************************/
42565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* InteriorWriter is used to collect terms and block references into
42575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** interior nodes in %_segments.  See commentary at top of file for
42585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** format.
42595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
42605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* How large interior nodes can grow. */
42625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define INTERIOR_MAX 2048
42635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Minimum number of terms per interior node (except the root). This
42655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** prevents large terms from making the tree too skinny - must be >0
42665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** so that the tree always makes progress.  Note that the min tree
42675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** fanout will be INTERIOR_MIN_TERMS+1.
42685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
42695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define INTERIOR_MIN_TERMS 7
42705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#if INTERIOR_MIN_TERMS<1
42715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)# error INTERIOR_MIN_TERMS must be greater than 0.
42725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
42735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* ROOT_MAX controls how much data is stored inline in the segment
42755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** directory.
42765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
42775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) Push ROOT_MAX down to whoever is writing things.  It's
42785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** only here so that interiorWriterRootInfo() and leafWriterRootInfo()
42795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** can both see it, but if the caller passed it in, we wouldn't even
42805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** need a define.
42815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
42825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define ROOT_MAX 1024
42835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#if ROOT_MAX<VARINT_MAX*2
42845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)# error ROOT_MAX must have enough space for a header.
42855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
42865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* InteriorBlock stores a linked-list of interior blocks while a lower
42885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** layer is being constructed.
42895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
42905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct InteriorBlock {
42915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer term;           /* Leftmost term in block's subtree. */
42925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer data;           /* Accumulated data for the block. */
42935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  struct InteriorBlock *next;
42945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} InteriorBlock;
42955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
42965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static InteriorBlock *interiorBlockNew(int iHeight, sqlite_int64 iChildBlock,
42975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                       const char *pTerm, int nTerm){
42985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  InteriorBlock *block = sqlite3_malloc(sizeof(InteriorBlock));
42995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char c[VARINT_MAX+VARINT_MAX];
43005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n;
43015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
43025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( block ){
43035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    memset(block, 0, sizeof(*block));
43045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferInit(&block->term, 0);
43055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferReplace(&block->term, pTerm, nTerm);
43065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
43075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = putVarint(c, iHeight);
43085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n += putVarint(c+n, iChildBlock);
43095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferInit(&block->data, INTERIOR_MAX);
43105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferReplace(&block->data, c, n);
43115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
43125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return block;
43135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
43145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
43155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifndef NDEBUG
43165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Verify that the data is readable as an interior node. */
43175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void interiorBlockValidate(InteriorBlock *pBlock){
43185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pData = pBlock->data.pData;
43195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nData = pBlock->data.nData;
43205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n, iDummy;
43215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iBlockid;
43225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
43235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nData>0 );
43245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pData!=0 );
43255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pData+nData>pData );
43265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
43275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Must lead with height of node as a varint(n), n>0 */
43285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n = getVarint32(pData, &iDummy);
43295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( n>0 );
43305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iDummy>0 );
43315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( n<nData );
43325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pData += n;
43335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nData -= n;
43345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
43355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Must contain iBlockid. */
43365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n = getVarint(pData, &iBlockid);
43375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( n>0 );
43385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( n<=nData );
43395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pData += n;
43405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nData -= n;
43415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
43425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Zero or more terms of positive length */
43435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nData!=0 ){
43445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* First term is not delta-encoded. */
43455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = getVarint32(pData, &iDummy);
43465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( n>0 );
43475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( iDummy>0 );
43485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( n+iDummy>0);
43495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( n+iDummy<=nData );
43505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pData += n+iDummy;
43515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nData -= n+iDummy;
43525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
43535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Following terms delta-encoded. */
43545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    while( nData!=0 ){
43555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* Length of shared prefix. */
43565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      n = getVarint32(pData, &iDummy);
43575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( n>0 );
43585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( iDummy>=0 );
43595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( n<nData );
43605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      pData += n;
43615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      nData -= n;
43625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
43635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* Length and data of distinct suffix. */
43645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      n = getVarint32(pData, &iDummy);
43655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( n>0 );
43665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( iDummy>0 );
43675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( n+iDummy>0);
43685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( n+iDummy<=nData );
43695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      pData += n+iDummy;
43705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      nData -= n+iDummy;
43715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
43725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
43735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
43745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define ASSERT_VALID_INTERIOR_BLOCK(x) interiorBlockValidate(x)
43755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#else
43765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define ASSERT_VALID_INTERIOR_BLOCK(x) assert( 1 )
43775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
43785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
43795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct InteriorWriter {
43805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iHeight;                   /* from 0 at leaves. */
43815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  InteriorBlock *first, *last;
43825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  struct InteriorWriter *parentWriter;
43835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
43845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer term;               /* Last term written to block "last". */
43855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iOpeningChildBlock; /* First child block in block "last". */
43865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifndef NDEBUG
43875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iLastChildBlock;  /* for consistency checks. */
43885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
43895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} InteriorWriter;
43905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
43915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Initialize an interior node where pTerm[nTerm] marks the leftmost
43925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** term in the tree.  iChildBlock is the leftmost child block at the
43935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** next level down the tree.
43945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
43955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void interiorWriterInit(int iHeight, const char *pTerm, int nTerm,
43965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                               sqlite_int64 iChildBlock,
43975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                               InteriorWriter *pWriter){
43985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  InteriorBlock *block;
43995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iHeight>0 );
44005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  CLEAR(pWriter);
44015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
44025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->iHeight = iHeight;
44035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->iOpeningChildBlock = iChildBlock;
44045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifndef NDEBUG
44055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->iLastChildBlock = iChildBlock;
44065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
44075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  block = interiorBlockNew(iHeight, iChildBlock, pTerm, nTerm);
44085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->last = pWriter->first = block;
44095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ASSERT_VALID_INTERIOR_BLOCK(pWriter->last);
44105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferInit(&pWriter->term, 0);
44115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
44125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
44135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Append the child node rooted at iChildBlock to the interior node,
44145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** with pTerm[nTerm] as the leftmost term in iChildBlock's subtree.
44155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
44165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void interiorWriterAppend(InteriorWriter *pWriter,
44175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                 const char *pTerm, int nTerm,
44185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                 sqlite_int64 iChildBlock){
44195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char c[VARINT_MAX+VARINT_MAX];
44205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n, nPrefix = 0;
44215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
44225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ASSERT_VALID_INTERIOR_BLOCK(pWriter->last);
44235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
44245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* The first term written into an interior node is actually
44255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** associated with the second child added (the first child was added
44265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** in interiorWriterInit, or in the if clause at the bottom of this
44275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** function).  That term gets encoded straight up, with nPrefix left
44285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** at 0.
44295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
44305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pWriter->term.nData==0 ){
44315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = putVarint(c, nTerm);
44325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
44335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    while( nPrefix<pWriter->term.nData &&
44345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)           pTerm[nPrefix]==pWriter->term.pData[nPrefix] ){
44355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      nPrefix++;
44365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
44375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
44385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = putVarint(c, nPrefix);
44395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n += putVarint(c+n, nTerm-nPrefix);
44405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
44415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
44425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifndef NDEBUG
44435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->iLastChildBlock++;
44445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
44455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pWriter->iLastChildBlock==iChildBlock );
44465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
44475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Overflow to a new block if the new term makes the current block
44485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** too big, and the current block already has enough terms.
44495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
44505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pWriter->last->data.nData+n+nTerm-nPrefix>INTERIOR_MAX &&
44515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      iChildBlock-pWriter->iOpeningChildBlock>INTERIOR_MIN_TERMS ){
44525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pWriter->last->next = interiorBlockNew(pWriter->iHeight, iChildBlock,
44535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                           pTerm, nTerm);
44545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pWriter->last = pWriter->last->next;
44555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pWriter->iOpeningChildBlock = iChildBlock;
44565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferReset(&pWriter->term);
44575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
44585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferAppend2(&pWriter->last->data, c, n,
44595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                      pTerm+nPrefix, nTerm-nPrefix);
44605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferReplace(&pWriter->term, pTerm, nTerm);
44615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
44625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ASSERT_VALID_INTERIOR_BLOCK(pWriter->last);
44635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
44645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
44655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Free the space used by pWriter, including the linked-list of
44665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** InteriorBlocks, and parentWriter, if present.
44675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
44685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int interiorWriterDestroy(InteriorWriter *pWriter){
44695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  InteriorBlock *block = pWriter->first;
44705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
44715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( block!=NULL ){
44725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    InteriorBlock *b = block;
44735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    block = block->next;
44745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferDestroy(&b->term);
44755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferDestroy(&b->data);
44765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_free(b);
44775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
44785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pWriter->parentWriter!=NULL ){
44795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    interiorWriterDestroy(pWriter->parentWriter);
44805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_free(pWriter->parentWriter);
44815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
44825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferDestroy(&pWriter->term);
44835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SCRAMBLE(pWriter);
44845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
44855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
44865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
44875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* If pWriter can fit entirely in ROOT_MAX, return it as the root info
44885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** directly, leaving *piEndBlockid unchanged.  Otherwise, flush
44895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** pWriter to %_segments, building a new layer of interior nodes, and
44905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** recursively ask for their root into.
44915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
44925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int interiorWriterRootInfo(fulltext_vtab *v, InteriorWriter *pWriter,
44935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                  char **ppRootInfo, int *pnRootInfo,
44945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                  sqlite_int64 *piEndBlockid){
44955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  InteriorBlock *block = pWriter->first;
44965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iBlockid = 0;
44975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
44985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
44995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* If we can fit the segment inline */
45005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( block==pWriter->last && block->data.nData<ROOT_MAX ){
45015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *ppRootInfo = block->data.pData;
45025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *pnRootInfo = block->data.nData;
45035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_OK;
45045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
45055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Flush the first block to %_segments, and create a new level of
45075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** interior node.
45085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
45095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ASSERT_VALID_INTERIOR_BLOCK(block);
45105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = block_insert(v, block->data.pData, block->data.nData, &iBlockid);
45115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
45125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *piEndBlockid = iBlockid;
45135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->parentWriter = sqlite3_malloc(sizeof(*pWriter->parentWriter));
45155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  interiorWriterInit(pWriter->iHeight+1,
45165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                     block->term.pData, block->term.nData,
45175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                     iBlockid, pWriter->parentWriter);
45185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Flush additional blocks and append to the higher interior
45205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** node.
45215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
45225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(block=block->next; block!=NULL; block=block->next){
45235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ASSERT_VALID_INTERIOR_BLOCK(block);
45245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = block_insert(v, block->data.pData, block->data.nData, &iBlockid);
45255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
45265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *piEndBlockid = iBlockid;
45275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    interiorWriterAppend(pWriter->parentWriter,
45295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                         block->term.pData, block->term.nData, iBlockid);
45305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
45315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Parent node gets the chance to be the root. */
45335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return interiorWriterRootInfo(v, pWriter->parentWriter,
45345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                ppRootInfo, pnRootInfo, piEndBlockid);
45355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
45365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/****************************************************************/
45385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* InteriorReader is used to read off the data from an interior node
45395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** (see comment at top of file for the format).
45405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
45415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct InteriorReader {
45425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pData;
45435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nData;
45445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer term;          /* previous term, for decoding term delta. */
45465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iBlockid;
45485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} InteriorReader;
45495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void interiorReaderDestroy(InteriorReader *pReader){
45515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferDestroy(&pReader->term);
45525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SCRAMBLE(pReader);
45535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
45545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int interiorReaderInit(const char *pData, int nData,
45565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                              InteriorReader *pReader){
45575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n, nTerm;
45585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* These conditions are checked and met by the callers. */
45605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nData>0 );
45615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pData[0]!='\0' );
45625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  CLEAR(pReader);
45645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Decode the base blockid, and set the cursor to the first term. */
45665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n = getVarintSafe(pData+1, &pReader->iBlockid, nData-1);
45675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( !n ) return SQLITE_CORRUPT_BKPT;
45685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->pData = pData+1+n;
45695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->nData = nData-(1+n);
45705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* A single-child interior node (such as when a leaf node was too
45725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** large for the segment directory) won't have any terms.
45735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** Otherwise, decode the first term.
45745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
45755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pReader->nData==0 ){
45765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferInit(&pReader->term, 0);
45775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
45785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = getVarint32Safe(pReader->pData, &nTerm, pReader->nData);
45795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( !n || nTerm<0 || nTerm>pReader->nData-n) return SQLITE_CORRUPT_BKPT;
45805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferInit(&pReader->term, nTerm);
45815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferReplace(&pReader->term, pReader->pData+n, nTerm);
45825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->pData += n+nTerm;
45835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->nData -= n+nTerm;
45845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
45855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
45865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
45875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int interiorReaderAtEnd(InteriorReader *pReader){
45895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->term.nData<=0;
45905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
45915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static sqlite_int64 interiorReaderCurrentBlockid(InteriorReader *pReader){
45935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->iBlockid;
45945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
45955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
45965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int interiorReaderTermBytes(InteriorReader *pReader){
45975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !interiorReaderAtEnd(pReader) );
45985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->term.nData;
45995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
46005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static const char *interiorReaderTerm(InteriorReader *pReader){
46015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !interiorReaderAtEnd(pReader) );
46025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->term.pData;
46035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
46045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
46055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Step forward to the next term in the node. */
46065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int interiorReaderStep(InteriorReader *pReader){
46075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !interiorReaderAtEnd(pReader) );
46085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
46095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* If the last term has been read, signal eof, else construct the
46105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** next term.
46115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
46125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pReader->nData==0 ){
46135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferReset(&pReader->term);
46145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
46155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int n, nPrefix, nSuffix;
46165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
46175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = getVarint32Safe(pReader->pData, &nPrefix, pReader->nData);
46185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( !n ) return SQLITE_CORRUPT_BKPT;
46195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->nData -= n;
46205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->pData += n;
46215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = getVarint32Safe(pReader->pData, &nSuffix, pReader->nData);
46225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( !n ) return SQLITE_CORRUPT_BKPT;
46235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->nData -= n;
46245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->pData += n;
46255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( nSuffix<0 || nSuffix>pReader->nData ) return SQLITE_CORRUPT_BKPT;
46265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( nPrefix<0 || nPrefix>pReader->term.nData ) return SQLITE_CORRUPT_BKPT;
46275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
46285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Truncate the current term and append suffix data. */
46295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->term.nData = nPrefix;
46305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferAppend(&pReader->term, pReader->pData, nSuffix);
46315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
46325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->pData += nSuffix;
46335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->nData -= nSuffix;
46345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
46355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->iBlockid++;
46365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
46375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
46385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
46395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Compare the current term to pTerm[nTerm], returning strcmp-style
46405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** results.  If isPrefix, equality means equal through nTerm bytes.
46415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
46425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int interiorReaderTermCmp(InteriorReader *pReader,
46435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                 const char *pTerm, int nTerm, int isPrefix){
46445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pReaderTerm = interiorReaderTerm(pReader);
46455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nReaderTerm = interiorReaderTermBytes(pReader);
46465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int c, n = nReaderTerm<nTerm ? nReaderTerm : nTerm;
46475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
46485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( n==0 ){
46495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( nReaderTerm>0 ) return -1;
46505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( nTerm>0 ) return 1;
46515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return 0;
46525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
46535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
46545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  c = memcmp(pReaderTerm, pTerm, n);
46555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( c!=0 ) return c;
46565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( isPrefix && n==nTerm ) return 0;
46575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return nReaderTerm - nTerm;
46585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
46595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
46605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/****************************************************************/
46615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* LeafWriter is used to collect terms and associated doclist data
46625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** into leaf blocks in %_segments (see top of file for format info).
46635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Expected usage is:
46645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
46655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** LeafWriter writer;
46665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** leafWriterInit(0, 0, &writer);
46675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** while( sorted_terms_left_to_process ){
46685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   // data is doclist data for that term.
46695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   rc = leafWriterStep(v, &writer, pTerm, nTerm, pData, nData);
46705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**   if( rc!=SQLITE_OK ) goto err;
46715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** }
46725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** rc = leafWriterFinalize(v, &writer);
46735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**err:
46745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** leafWriterDestroy(&writer);
46755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** return rc;
46765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
46775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** leafWriterStep() may write a collected leaf out to %_segments.
46785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** leafWriterFinalize() finishes writing any buffered data and stores
46795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** a root node in %_segdir.  leafWriterDestroy() frees all buffers and
46805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** InteriorWriters allocated as part of writing this segment.
46815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
46825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** TODO(shess) Document leafWriterStepMerge().
46835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
46845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
46855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Put terms with data this big in their own block. */
46865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define STANDALONE_MIN 1024
46875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
46885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Keep leaf blocks below this size. */
46895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define LEAF_MAX 2048
46905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
46915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct LeafWriter {
46925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iLevel;
46935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int idx;
46945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iStartBlockid;     /* needed to create the root info */
46955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iEndBlockid;       /* when we're done writing. */
46965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
46975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer term;                /* previous encoded term */
46985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer data;                /* encoding buffer */
46995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
47005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* bytes of first term in the current node which distinguishes that
47015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** term from the last term of the previous node.
47025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
47035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nTermDistinct;
47045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
47055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  InteriorWriter parentWriter;    /* if we overflow */
47065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int has_parent;
47075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} LeafWriter;
47085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
47095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void leafWriterInit(int iLevel, int idx, LeafWriter *pWriter){
47105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  CLEAR(pWriter);
47115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->iLevel = iLevel;
47125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pWriter->idx = idx;
47135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
47145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferInit(&pWriter->term, 32);
47155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
47165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Start out with a reasonably sized block, though it can grow. */
47175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferInit(&pWriter->data, LEAF_MAX);
47185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
47195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
47205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifndef NDEBUG
47215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Verify that the data is readable as a leaf node. */
47225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void leafNodeValidate(const char *pData, int nData){
47235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n, iDummy;
47245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
47255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nData==0 ) return;
47265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nData>0 );
47275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pData!=0 );
47285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pData+nData>pData );
47295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
47305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Must lead with a varint(0) */
47315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n = getVarint32(pData, &iDummy);
47325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iDummy==0 );
47335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( n>0 );
47345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( n<nData );
47355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pData += n;
47365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nData -= n;
47375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
47385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Leading term length and data must fit in buffer. */
47395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n = getVarint32(pData, &iDummy);
47405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( n>0 );
47415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iDummy>0 );
47425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( n+iDummy>0 );
47435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( n+iDummy<nData );
47445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pData += n+iDummy;
47455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nData -= n+iDummy;
47465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
47475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Leading term's doclist length and data must fit. */
47485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n = getVarint32(pData, &iDummy);
47495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( n>0 );
47505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iDummy>0 );
47515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( n+iDummy>0 );
47525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( n+iDummy<=nData );
47535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ASSERT_VALID_DOCLIST(DL_DEFAULT, pData+n, iDummy, NULL);
47545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pData += n+iDummy;
47555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nData -= n+iDummy;
47565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
47575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Verify that trailing terms and doclists also are readable. */
47585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( nData!=0 ){
47595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = getVarint32(pData, &iDummy);
47605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( n>0 );
47615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( iDummy>=0 );
47625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( n<nData );
47635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pData += n;
47645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nData -= n;
47655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = getVarint32(pData, &iDummy);
47665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( n>0 );
47675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( iDummy>0 );
47685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( n+iDummy>0 );
47695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( n+iDummy<nData );
47705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pData += n+iDummy;
47715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nData -= n+iDummy;
47725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
47735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = getVarint32(pData, &iDummy);
47745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( n>0 );
47755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( iDummy>0 );
47765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( n+iDummy>0 );
47775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( n+iDummy<=nData );
47785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ASSERT_VALID_DOCLIST(DL_DEFAULT, pData+n, iDummy, NULL);
47795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pData += n+iDummy;
47805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nData -= n+iDummy;
47815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
47825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
47835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define ASSERT_VALID_LEAF_NODE(p, n) leafNodeValidate(p, n)
47845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#else
47855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#define ASSERT_VALID_LEAF_NODE(p, n) assert( 1 )
47865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
47875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
47885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Flush the current leaf node to %_segments, and adding the resulting
47895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** blockid and the starting term to the interior node which will
47905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** contain it.
47915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
47925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leafWriterInternalFlush(fulltext_vtab *v, LeafWriter *pWriter,
47935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                   int iData, int nData){
47945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iBlockid = 0;
47955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pStartingTerm;
47965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nStartingTerm, rc, n;
47975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
47985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Must have the leading varint(0) flag, plus at least some
47995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** valid-looking data.
48005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
48015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nData>2 );
48025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iData>=0 );
48035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iData+nData<=pWriter->data.nData );
48045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ASSERT_VALID_LEAF_NODE(pWriter->data.pData+iData, nData);
48055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
48065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = block_insert(v, pWriter->data.pData+iData, nData, &iBlockid);
48075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
48085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iBlockid!=0 );
48095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
48105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Reconstruct the first term in the leaf for purposes of building
48115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** the interior node.
48125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
48135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n = getVarint32(pWriter->data.pData+iData+1, &nStartingTerm);
48145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pStartingTerm = pWriter->data.pData+iData+1+n;
48155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pWriter->data.nData>iData+1+n+nStartingTerm );
48165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pWriter->nTermDistinct>0 );
48175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pWriter->nTermDistinct<=nStartingTerm );
48185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nStartingTerm = pWriter->nTermDistinct;
48195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
48205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pWriter->has_parent ){
48215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    interiorWriterAppend(&pWriter->parentWriter,
48225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                         pStartingTerm, nStartingTerm, iBlockid);
48235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
48245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    interiorWriterInit(1, pStartingTerm, nStartingTerm, iBlockid,
48255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       &pWriter->parentWriter);
48265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pWriter->has_parent = 1;
48275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
48285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
48295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Track the span of this segment's leaf nodes. */
48305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pWriter->iEndBlockid==0 ){
48315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pWriter->iEndBlockid = pWriter->iStartBlockid = iBlockid;
48325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
48335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pWriter->iEndBlockid++;
48345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( iBlockid==pWriter->iEndBlockid );
48355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
48365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
48375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
48385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
48395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leafWriterFlush(fulltext_vtab *v, LeafWriter *pWriter){
48405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = leafWriterInternalFlush(v, pWriter, 0, pWriter->data.nData);
48415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
48425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
48435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Re-initialize the output buffer. */
48445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferReset(&pWriter->data);
48455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
48465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
48475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
48485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
48495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Fetch the root info for the segment.  If the entire leaf fits
48505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** within ROOT_MAX, then it will be returned directly, otherwise it
48515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** will be flushed and the root info will be returned from the
48525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** interior node.  *piEndBlockid is set to the blockid of the last
48535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** interior or leaf node written to disk (0 if none are written at
48545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** all).
48555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
48565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leafWriterRootInfo(fulltext_vtab *v, LeafWriter *pWriter,
48575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                              char **ppRootInfo, int *pnRootInfo,
48585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                              sqlite_int64 *piEndBlockid){
48595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* we can fit the segment entirely inline */
48605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( !pWriter->has_parent && pWriter->data.nData<ROOT_MAX ){
48615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *ppRootInfo = pWriter->data.pData;
48625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *pnRootInfo = pWriter->data.nData;
48635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *piEndBlockid = 0;
48645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_OK;
48655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
48665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
48675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Flush remaining leaf data. */
48685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pWriter->data.nData>0 ){
48695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int rc = leafWriterFlush(v, pWriter);
48705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
48715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
48725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
48735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* We must have flushed a leaf at some point. */
48745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pWriter->has_parent );
48755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
48765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Tenatively set the end leaf blockid as the end blockid.  If the
48775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** interior node can be returned inline, this will be the final
48785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** blockid, otherwise it will be overwritten by
48795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** interiorWriterRootInfo().
48805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
48815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *piEndBlockid = pWriter->iEndBlockid;
48825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
48835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return interiorWriterRootInfo(v, &pWriter->parentWriter,
48845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                ppRootInfo, pnRootInfo, piEndBlockid);
48855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
48865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
48875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Collect the rootInfo data and store it into the segment directory.
48885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** This has the effect of flushing the segment's leaf data to
48895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** %_segments, and also flushing any interior nodes to %_segments.
48905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
48915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leafWriterFinalize(fulltext_vtab *v, LeafWriter *pWriter){
48925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iEndBlockid;
48935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char *pRootInfo;
48945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc, nRootInfo;
48955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
48965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = leafWriterRootInfo(v, pWriter, &pRootInfo, &nRootInfo, &iEndBlockid);
48975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
48985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
48995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Don't bother storing an entirely empty segment. */
49005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( iEndBlockid==0 && nRootInfo==0 ) return SQLITE_OK;
49015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
49025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return segdir_set(v, pWriter->iLevel, pWriter->idx,
49035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                    pWriter->iStartBlockid, pWriter->iEndBlockid,
49045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                    iEndBlockid, pRootInfo, nRootInfo);
49055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
49065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
49075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void leafWriterDestroy(LeafWriter *pWriter){
49085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pWriter->has_parent ) interiorWriterDestroy(&pWriter->parentWriter);
49095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferDestroy(&pWriter->term);
49105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferDestroy(&pWriter->data);
49115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
49125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
49135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Encode a term into the leafWriter, delta-encoding as appropriate.
49145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Returns the length of the new term which distinguishes it from the
49155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** previous term, which can be used to set nTermDistinct when a node
49165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** boundary is crossed.
49175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
49185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leafWriterEncodeTerm(LeafWriter *pWriter,
49195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                const char *pTerm, int nTerm){
49205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char c[VARINT_MAX+VARINT_MAX];
49215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n, nPrefix = 0;
49225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
49235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nTerm>0 );
49245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( nPrefix<pWriter->term.nData &&
49255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)         pTerm[nPrefix]==pWriter->term.pData[nPrefix] ){
49265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nPrefix++;
49275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Failing this implies that the terms weren't in order. */
49285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( nPrefix<nTerm );
49295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
49305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
49315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pWriter->data.nData==0 ){
49325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Encode the node header and leading term as:
49335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    **  varint(0)
49345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    **  varint(nTerm)
49355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    **  char pTerm[nTerm]
49365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    */
49375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = putVarint(c, '\0');
49385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n += putVarint(c+n, nTerm);
49395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferAppend2(&pWriter->data, c, n, pTerm, nTerm);
49405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
49415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Delta-encode the term as:
49425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    **  varint(nPrefix)
49435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    **  varint(nSuffix)
49445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    **  char pTermSuffix[nSuffix]
49455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    */
49465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = putVarint(c, nPrefix);
49475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n += putVarint(c+n, nTerm-nPrefix);
49485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferAppend2(&pWriter->data, c, n, pTerm+nPrefix, nTerm-nPrefix);
49495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
49505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferReplace(&pWriter->term, pTerm, nTerm);
49515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
49525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return nPrefix+1;
49535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
49545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
49555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Used to avoid a memmove when a large amount of doclist data is in
49565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the buffer.  This constructs a node and term header before
49575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** iDoclistData and flushes the resulting complete node using
49585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** leafWriterInternalFlush().
49595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
49605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leafWriterInlineFlush(fulltext_vtab *v, LeafWriter *pWriter,
49615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                 const char *pTerm, int nTerm,
49625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                 int iDoclistData){
49635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char c[VARINT_MAX+VARINT_MAX];
49645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iData, n = putVarint(c, 0);
49655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n += putVarint(c+n, nTerm);
49665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
49675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* There should always be room for the header.  Even if pTerm shared
49685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** a substantial prefix with the previous term, the entire prefix
49695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** could be constructed from earlier data in the doclist, so there
49705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** should be room.
49715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
49725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iDoclistData>=n+nTerm );
49735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
49745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  iData = iDoclistData-(n+nTerm);
49755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  memcpy(pWriter->data.pData+iData, c, n);
49765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  memcpy(pWriter->data.pData+iData+n, pTerm, nTerm);
49775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
49785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return leafWriterInternalFlush(v, pWriter, iData, pWriter->data.nData-iData);
49795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
49805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
49815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Push pTerm[nTerm] along with the doclist data to the leaf layer of
49825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** %_segments.
49835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
49845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leafWriterStepMerge(fulltext_vtab *v, LeafWriter *pWriter,
49855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                               const char *pTerm, int nTerm,
49865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                               DLReader *pReaders, int nReaders){
49875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char c[VARINT_MAX+VARINT_MAX];
49885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iTermData = pWriter->data.nData, iDoclistData;
49895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i, nData, n, nActualData, nActual, rc, nTermDistinct;
49905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
49915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ASSERT_VALID_LEAF_NODE(pWriter->data.pData, pWriter->data.nData);
49925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nTermDistinct = leafWriterEncodeTerm(pWriter, pTerm, nTerm);
49935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
49945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Remember nTermDistinct if opening a new node. */
49955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( iTermData==0 ) pWriter->nTermDistinct = nTermDistinct;
49965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
49975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  iDoclistData = pWriter->data.nData;
49985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
49995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Estimate the length of the merged doclist so we can leave space
50005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** to encode it.
50015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
50025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0, nData=0; i<nReaders; i++){
50035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nData += dlrAllDataBytes(&pReaders[i]);
50045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
50055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n = putVarint(c, nData);
50065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferAppend(&pWriter->data, c, n);
50075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
50085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = docListMerge(&pWriter->data, pReaders, nReaders);
50095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!= SQLITE_OK ) return rc;
50105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ASSERT_VALID_DOCLIST(DL_DEFAULT,
50115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       pWriter->data.pData+iDoclistData+n,
50125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       pWriter->data.nData-iDoclistData-n, NULL);
50135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
50145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* The actual amount of doclist data at this point could be smaller
50155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** than the length we encoded.  Additionally, the space required to
50165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** encode this length could be smaller.  For small doclists, this is
50175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** not a big deal, we can just use memmove() to adjust things.
50185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
50195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nActualData = pWriter->data.nData-(iDoclistData+n);
50205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nActual = putVarint(c, nActualData);
50215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nActualData<=nData );
50225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nActual<=n );
50235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
50245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* If the new doclist is big enough for force a standalone leaf
50255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** node, we can immediately flush it inline without doing the
50265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** memmove().
50275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
50285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO(shess) This test matches leafWriterStep(), which does this
50295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** test before it knows the cost to varint-encode the term and
50305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** doclist lengths.  At some point, change to
50315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** pWriter->data.nData-iTermData>STANDALONE_MIN.
50325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
50335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nTerm+nActualData>STANDALONE_MIN ){
50345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Push leaf node from before this term. */
50355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( iTermData>0 ){
50365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = leafWriterInternalFlush(v, pWriter, 0, iTermData);
50375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) return rc;
50385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
50395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      pWriter->nTermDistinct = nTermDistinct;
50405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
50415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
50425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Fix the encoded doclist length. */
50435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iDoclistData += n - nActual;
50445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    memcpy(pWriter->data.pData+iDoclistData, c, nActual);
50455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
50465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Push the standalone leaf node. */
50475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = leafWriterInlineFlush(v, pWriter, pTerm, nTerm, iDoclistData);
50485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
50495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
50505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Leave the node empty. */
50515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferReset(&pWriter->data);
50525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
50535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return rc;
50545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
50555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
50565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* At this point, we know that the doclist was small, so do the
50575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** memmove if indicated.
50585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
50595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nActual<n ){
50605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    memmove(pWriter->data.pData+iDoclistData+nActual,
50615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            pWriter->data.pData+iDoclistData+n,
50625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            pWriter->data.nData-(iDoclistData+n));
50635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pWriter->data.nData -= n-nActual;
50645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
50655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
50665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Replace written length with actual length. */
50675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  memcpy(pWriter->data.pData+iDoclistData, c, nActual);
50685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
50695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* If the node is too large, break things up. */
50705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO(shess) This test matches leafWriterStep(), which does this
50715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** test before it knows the cost to varint-encode the term and
50725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** doclist lengths.  At some point, change to
50735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** pWriter->data.nData>LEAF_MAX.
50745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
50755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( iTermData+nTerm+nActualData>LEAF_MAX ){
50765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Flush out the leading data as a node */
50775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = leafWriterInternalFlush(v, pWriter, 0, iTermData);
50785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
50795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
50805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pWriter->nTermDistinct = nTermDistinct;
50815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
50825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Rebuild header using the current term */
50835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = putVarint(pWriter->data.pData, 0);
50845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n += putVarint(pWriter->data.pData+n, nTerm);
50855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    memcpy(pWriter->data.pData+n, pTerm, nTerm);
50865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n += nTerm;
50875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
50885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* There should always be room, because the previous encoding
50895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** included all data necessary to construct the term.
50905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    */
50915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( n<iDoclistData );
50925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* So long as STANDALONE_MIN is half or less of LEAF_MAX, the
50935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** following memcpy() is safe (as opposed to needing a memmove).
50945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    */
50955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( 2*STANDALONE_MIN<=LEAF_MAX );
50965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( n+pWriter->data.nData-iDoclistData<iDoclistData );
50975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    memcpy(pWriter->data.pData+n,
50985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)           pWriter->data.pData+iDoclistData,
50995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)           pWriter->data.nData-iDoclistData);
51005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pWriter->data.nData -= iDoclistData-n;
51015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
51025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ASSERT_VALID_LEAF_NODE(pWriter->data.pData, pWriter->data.nData);
51035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
51055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
51065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Push pTerm[nTerm] along with the doclist data to the leaf layer of
51085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** %_segments.
51095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
51105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) Revise writeZeroSegment() so that doclists are
51115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** constructed directly in pWriter->data.
51125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
51135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leafWriterStep(fulltext_vtab *v, LeafWriter *pWriter,
51145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          const char *pTerm, int nTerm,
51155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          const char *pData, int nData){
51165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
51175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLReader reader;
51185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = dlrInit(&reader, DL_DEFAULT, pData, nData);
51205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
51215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = leafWriterStepMerge(v, pWriter, pTerm, nTerm, &reader, 1);
51225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlrDestroy(&reader);
51235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
51255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
51265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/****************************************************************/
51295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* LeafReader is used to iterate over an individual leaf node. */
51305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct LeafReader {
51315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer term;          /* copy of current term. */
51325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pData;        /* data for current term. */
51345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nData;
51355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} LeafReader;
51365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void leafReaderDestroy(LeafReader *pReader){
51385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferDestroy(&pReader->term);
51395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SCRAMBLE(pReader);
51405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
51415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leafReaderAtEnd(LeafReader *pReader){
51435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->nData<=0;
51445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
51455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Access the current term. */
51475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leafReaderTermBytes(LeafReader *pReader){
51485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->term.nData;
51495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
51505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static const char *leafReaderTerm(LeafReader *pReader){
51515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pReader->term.nData>0 );
51525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->term.pData;
51535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
51545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Access the doclist data for the current term. */
51565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leafReaderDataBytes(LeafReader *pReader){
51575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nData;
51585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pReader->term.nData>0 );
51595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  getVarint32(pReader->pData, &nData);
51605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return nData;
51615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
51625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static const char *leafReaderData(LeafReader *pReader){
51635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n, nData;
51645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pReader->term.nData>0 );
51655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n = getVarint32Safe(pReader->pData, &nData, pReader->nData);
51665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( !n || nData>pReader->nData-n ) return NULL;
51675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->pData+n;
51685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
51695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leafReaderInit(const char *pData, int nData, LeafReader *pReader){
51715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nTerm, n;
51725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* All callers check this precondition. */
51745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nData>0 );
51755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pData[0]=='\0' );
51765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  CLEAR(pReader);
51785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Read the first term, skipping the header byte. */
51805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n = getVarint32Safe(pData+1, &nTerm, nData-1);
51815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( !n || nTerm<0 || nTerm>nData-1-n ) return SQLITE_CORRUPT_BKPT;
51825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferInit(&pReader->term, nTerm);
51835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferReplace(&pReader->term, pData+1+n, nTerm);
51845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Position after the first term. */
51865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->pData = pData+1+n+nTerm;
51875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->nData = nData-1-n-nTerm;
51885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
51895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
51905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Step the reader forward to the next term. */
51925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leafReaderStep(LeafReader *pReader){
51935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n, nData, nPrefix, nSuffix;
51945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !leafReaderAtEnd(pReader) );
51955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
51965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Skip previous entry's data block. */
51975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n = getVarint32Safe(pReader->pData, &nData, pReader->nData);
51985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( !n || nData<0 || nData>pReader->nData-n ) return SQLITE_CORRUPT_BKPT;
51995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->pData += n+nData;
52005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->nData -= n+nData;
52015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
52025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( !leafReaderAtEnd(pReader) ){
52035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Construct the new term using a prefix from the old term plus a
52045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** suffix from the leaf data.
52055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    */
52065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = getVarint32Safe(pReader->pData, &nPrefix, pReader->nData);
52075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( !n ) return SQLITE_CORRUPT_BKPT;
52085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->nData -= n;
52095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->pData += n;
52105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    n = getVarint32Safe(pReader->pData, &nSuffix, pReader->nData);
52115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( !n ) return SQLITE_CORRUPT_BKPT;
52125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->nData -= n;
52135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->pData += n;
52145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( nSuffix<0 || nSuffix>pReader->nData ) return SQLITE_CORRUPT_BKPT;
52155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( nPrefix<0 || nPrefix>pReader->term.nData ) return SQLITE_CORRUPT_BKPT;
52165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->term.nData = nPrefix;
52175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferAppend(&pReader->term, pReader->pData, nSuffix);
52185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
52195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->pData += nSuffix;
52205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->nData -= nSuffix;
52215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
52225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
52235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
52245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
52255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* strcmp-style comparison of pReader's current term against pTerm.
52265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** If isPrefix, equality means equal through nTerm bytes.
52275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
52285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leafReaderTermCmp(LeafReader *pReader,
52295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                             const char *pTerm, int nTerm, int isPrefix){
52305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int c, n = pReader->term.nData<nTerm ? pReader->term.nData : nTerm;
52315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( n==0 ){
52325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( pReader->term.nData>0 ) return -1;
52335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if(nTerm>0 ) return 1;
52345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return 0;
52355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
52365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
52375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  c = memcmp(pReader->term.pData, pTerm, n);
52385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( c!=0 ) return c;
52395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( isPrefix && n==nTerm ) return 0;
52405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->term.nData - nTerm;
52415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
52425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
52435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
52445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/****************************************************************/
52455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* LeavesReader wraps LeafReader to allow iterating over the entire
52465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** leaf layer of the tree.
52475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
52485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct LeavesReader {
52495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int idx;                  /* Index within the segment. */
52505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
52515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *pStmt;      /* Statement we're streaming leaves from. */
52525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int eof;                  /* we've seen SQLITE_DONE from pStmt. */
52535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
52545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  LeafReader leafReader;    /* reader for the current leaf. */
52555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer rootData;      /* root data for inline. */
52565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} LeavesReader;
52575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
52585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Access the current term. */
52595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leavesReaderTermBytes(LeavesReader *pReader){
52605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !pReader->eof );
52615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return leafReaderTermBytes(&pReader->leafReader);
52625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
52635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static const char *leavesReaderTerm(LeavesReader *pReader){
52645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !pReader->eof );
52655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return leafReaderTerm(&pReader->leafReader);
52665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
52675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
52685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Access the doclist data for the current term. */
52695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leavesReaderDataBytes(LeavesReader *pReader){
52705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !pReader->eof );
52715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return leafReaderDataBytes(&pReader->leafReader);
52725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
52735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static const char *leavesReaderData(LeavesReader *pReader){
52745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !pReader->eof );
52755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return leafReaderData(&pReader->leafReader);
52765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
52775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
52785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leavesReaderAtEnd(LeavesReader *pReader){
52795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return pReader->eof;
52805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
52815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
52825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* loadSegmentLeaves() may not read all the way to SQLITE_DONE, thus
52835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** leaving the statement handle open, which locks the table.
52845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
52855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) This "solution" is not satisfactory.  Really, there
52865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** should be check-in function for all statement handles which
52875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** arranges to call sqlite3_reset().  This most likely will require
52885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** modification to control flow all over the place, though, so for now
52895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** just punt.
52905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
52915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Note the the current system assumes that segment merges will run to
52925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** completion, which is why this particular probably hasn't arisen in
52935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** this case.  Probably a brittle assumption.
52945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
52955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leavesReaderReset(LeavesReader *pReader){
52965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return sqlite3_reset(pReader->pStmt);
52975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
52985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
52995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void leavesReaderDestroy(LeavesReader *pReader){
53005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* If idx is -1, that means we're using a non-cached statement
53015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** handle in the optimize() case, so we need to release it.
53025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
53035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pReader->pStmt!=NULL && pReader->idx==-1 ){
53045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_finalize(pReader->pStmt);
53055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
53065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  leafReaderDestroy(&pReader->leafReader);
53075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferDestroy(&pReader->rootData);
53085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SCRAMBLE(pReader);
53095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
53105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
53115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Initialize pReader with the given root data (if iStartBlockid==0
53125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the leaf data was entirely contained in the root), or from the
53135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** stream of blocks between iStartBlockid and iEndBlockid, inclusive.
53145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
53155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess): Figure out a means of indicating how many leaves are
53165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** expected, for purposes of detecting corruption.
53175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
53185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leavesReaderInit(fulltext_vtab *v,
53195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                            int idx,
53205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                            sqlite_int64 iStartBlockid,
53215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                            sqlite_int64 iEndBlockid,
53225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                            const char *pRootData, int nRootData,
53235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                            LeavesReader *pReader){
53245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  CLEAR(pReader);
53255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pReader->idx = idx;
53265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
53275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferInit(&pReader->rootData, 0);
53285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( iStartBlockid==0 ){
53295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int rc;
53305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Corrupt if this can't be a leaf node. */
53315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( pRootData==NULL || nRootData<1 || pRootData[0]!='\0' ){
53325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return SQLITE_CORRUPT_BKPT;
53335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
53345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Entire leaf level fit in root data. */
53355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferReplace(&pReader->rootData, pRootData, nRootData);
53365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = leafReaderInit(pReader->rootData.pData, pReader->rootData.nData,
53375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                        &pReader->leafReader);
53385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ){
53395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dataBufferDestroy(&pReader->rootData);
53405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return rc;
53415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
53425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
53435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_stmt *s;
53445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int rc = sql_get_leaf_statement(v, idx, &s);
53455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
53465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
53475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = sqlite3_bind_int64(s, 1, iStartBlockid);
53485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) goto err;
53495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
53505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = sqlite3_bind_int64(s, 2, iEndBlockid);
53515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) goto err;
53525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
53535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = sqlite3_step(s);
53545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
53555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Corrupt if interior node referenced missing leaf node. */
53565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc==SQLITE_DONE ){
53575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = SQLITE_CORRUPT_BKPT;
53585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      goto err;
53595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
53605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
53615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_ROW ) goto err;
53625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = SQLITE_OK;
53635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
53645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Corrupt if leaf data isn't a blob. */
53655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( sqlite3_column_type(s, 0)!=SQLITE_BLOB ){
53665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = SQLITE_CORRUPT_BKPT;
53675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
53685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      const char *pLeafData = sqlite3_column_blob(s, 0);
53695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      int nLeafData = sqlite3_column_bytes(s, 0);
53705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
53715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* Corrupt if this can't be a leaf node. */
53725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( pLeafData==NULL || nLeafData<1 || pLeafData[0]!='\0' ){
53735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = SQLITE_CORRUPT_BKPT;
53745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }else{
53755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = leafReaderInit(pLeafData, nLeafData, &pReader->leafReader);
53765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
53775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
53785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
53795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) err:
53805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ){
53815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( idx==-1 ){
53825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        sqlite3_finalize(s);
53835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }else{
53845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        sqlite3_reset(s);
53855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
53865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return rc;
53875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
53885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
53895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pReader->pStmt = s;
53905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
53915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
53925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
53935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
53945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Step the current leaf forward to the next term.  If we reach the
53955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** end of the current leaf, step forward to the next leaf block.
53965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
53975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leavesReaderStep(fulltext_vtab *v, LeavesReader *pReader){
53985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
53995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( !leavesReaderAtEnd(pReader) );
54005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = leafReaderStep(&pReader->leafReader);
54015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
54025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
54035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( leafReaderAtEnd(&pReader->leafReader) ){
54045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( pReader->rootData.pData ){
54055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      pReader->eof = 1;
54065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return SQLITE_OK;
54075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
54085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = sqlite3_step(pReader->pStmt);
54095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_ROW ){
54105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      pReader->eof = 1;
54115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return rc==SQLITE_DONE ? SQLITE_OK : rc;
54125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
54135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
54145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Corrupt if leaf data isn't a blob. */
54155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( sqlite3_column_type(pReader->pStmt, 0)!=SQLITE_BLOB ){
54165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return SQLITE_CORRUPT_BKPT;
54175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
54185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      LeafReader tmp;
54195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      const char *pLeafData = sqlite3_column_blob(pReader->pStmt, 0);
54205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      int nLeafData = sqlite3_column_bytes(pReader->pStmt, 0);
54215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
54225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* Corrupt if this can't be a leaf node. */
54235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( pLeafData==NULL || nLeafData<1 || pLeafData[0]!='\0' ){
54245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        return SQLITE_CORRUPT_BKPT;
54255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
54265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
54275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = leafReaderInit(pLeafData, nLeafData, &tmp);
54285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) return rc;
54295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      leafReaderDestroy(&pReader->leafReader);
54305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      pReader->leafReader = tmp;
54315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
54325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
54335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
54345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
54355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
54365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Order LeavesReaders by their term, ignoring idx.  Readers at eof
54375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** always sort to the end.
54385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
54395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leavesReaderTermCmp(LeavesReader *lr1, LeavesReader *lr2){
54405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( leavesReaderAtEnd(lr1) ){
54415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( leavesReaderAtEnd(lr2) ) return 0;
54425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return 1;
54435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
54445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( leavesReaderAtEnd(lr2) ) return -1;
54455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
54465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return leafReaderTermCmp(&lr1->leafReader,
54475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                           leavesReaderTerm(lr2), leavesReaderTermBytes(lr2),
54485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                           0);
54495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
54505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
54515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Similar to leavesReaderTermCmp(), with additional ordering by idx
54525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** so that older segments sort before newer segments.
54535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
54545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leavesReaderCmp(LeavesReader *lr1, LeavesReader *lr2){
54555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int c = leavesReaderTermCmp(lr1, lr2);
54565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( c!=0 ) return c;
54575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return lr1->idx-lr2->idx;
54585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
54595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
54605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Assume that pLr[1]..pLr[nLr] are sorted.  Bubble pLr[0] into its
54615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** sorted position.
54625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
54635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void leavesReaderReorder(LeavesReader *pLr, int nLr){
54645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( nLr>1 && leavesReaderCmp(pLr, pLr+1)>0 ){
54655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    LeavesReader tmp = pLr[0];
54665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pLr[0] = pLr[1];
54675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pLr[1] = tmp;
54685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nLr--;
54695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pLr++;
54705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
54715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
54725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
54735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Initializes pReaders with the segments from level iLevel, returning
54745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the number of segments in *piReaders.  Leaves pReaders in sorted
54755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** order.
54765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
54775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leavesReadersInit(fulltext_vtab *v, int iLevel,
54785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                             LeavesReader *pReaders, int *piReaders){
54795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *s;
54805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i, rc = sql_get_statement(v, SEGDIR_SELECT_LEVEL_STMT, &s);
54815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
54825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
54835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_int(s, 1, iLevel);
54845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
54855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
54865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  i = 0;
54875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( (rc = sqlite3_step(s))==SQLITE_ROW ){
54885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite_int64 iStart = sqlite3_column_int64(s, 0);
54895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite_int64 iEnd = sqlite3_column_int64(s, 1);
54905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const char *pRootData = sqlite3_column_blob(s, 2);
54915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int nRootData = sqlite3_column_bytes(s, 2);
54925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite_int64 iIndex = sqlite3_column_int64(s, 3);
54935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
54945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Corrupt if we get back different types than we stored. */
54955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Also corrupt if the index is not sequential starting at 0. */
54965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( sqlite3_column_type(s, 0)!=SQLITE_INTEGER ||
54975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        sqlite3_column_type(s, 1)!=SQLITE_INTEGER ||
54985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        sqlite3_column_type(s, 2)!=SQLITE_BLOB ||
54995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        i!=iIndex ||
55005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        i>=MERGE_COUNT ){
55015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = SQLITE_CORRUPT_BKPT;
55025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      break;
55035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
55045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
55055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = leavesReaderInit(v, i, iStart, iEnd, pRootData, nRootData,
55065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          &pReaders[i]);
55075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) break;
55085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
55095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    i++;
55105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
55115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_DONE ){
55125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    while( i-->0 ){
55135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      leavesReaderDestroy(&pReaders[i]);
55145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
55155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_reset(s);          /* So we don't leave a lock. */
55165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return rc;
55175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
55185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
55195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *piReaders = i;
55205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
55215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Leave our results sorted by term, then age. */
55225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( i-- ){
55235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    leavesReaderReorder(pReaders+i, *piReaders-i);
55245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
55255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
55265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
55275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
55285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Merge doclists from pReaders[nReaders] into a single doclist, which
55295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** is written to pWriter.  Assumes pReaders is ordered oldest to
55305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** newest.
55315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
55325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) Consider putting this inline in segmentMerge(). */
55335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int leavesReadersMerge(fulltext_vtab *v,
55345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                              LeavesReader *pReaders, int nReaders,
55355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                              LeafWriter *pWriter){
55365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLReader dlReaders[MERGE_COUNT];
55375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pTerm = leavesReaderTerm(pReaders);
55385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i, nTerm = leavesReaderTermBytes(pReaders);
55395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
55405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
55415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nReaders<=MERGE_COUNT );
55425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
55435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<nReaders; i++){
55445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const char *pData = leavesReaderData(pReaders+i);
55455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( pData==NULL ){
55465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = SQLITE_CORRUPT_BKPT;
55475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      break;
55485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
55495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = dlrInit(&dlReaders[i], DL_DEFAULT,
55505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                 pData,
55515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                 leavesReaderDataBytes(pReaders+i));
55525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) break;
55535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
55545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ){
55555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    while( i-->0 ){
55565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dlrDestroy(&dlReaders[i]);
55575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
55585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return rc;
55595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
55605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
55615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return leafWriterStepMerge(v, pWriter, pTerm, nTerm, dlReaders, nReaders);
55625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
55635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
55645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Forward ref due to mutual recursion with segdirNextIndex(). */
55655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int segmentMerge(fulltext_vtab *v, int iLevel);
55665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
55675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Put the next available index at iLevel into *pidx.  If iLevel
55685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** already has MERGE_COUNT segments, they are merged to a higher
55695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** level to make room.
55705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
55715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int segdirNextIndex(fulltext_vtab *v, int iLevel, int *pidx){
55725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = segdir_max_index(v, iLevel, pidx);
55735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_DONE ){              /* No segments at iLevel. */
55745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *pidx = 0;
55755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else if( rc==SQLITE_ROW ){
55765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( *pidx==(MERGE_COUNT-1) ){
55775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = segmentMerge(v, iLevel);
55785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) return rc;
55795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      *pidx = 0;
55805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
55815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      (*pidx)++;
55825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
55835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
55845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return rc;
55855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
55865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
55875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
55885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
55895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Merge MERGE_COUNT segments at iLevel into a new segment at
55905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** iLevel+1.  If iLevel+1 is already full of segments, those will be
55915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** merged to make room.
55925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
55935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int segmentMerge(fulltext_vtab *v, int iLevel){
55945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  LeafWriter writer;
55955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  LeavesReader lrs[MERGE_COUNT];
55965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i, rc, idx = 0;
55975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
55985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Determine the next available segment index at the next level,
55995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** merging as necessary.
56005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
56015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = segdirNextIndex(v, iLevel+1, &idx);
56025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
56035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
56045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO(shess) This assumes that we'll always see exactly
56055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** MERGE_COUNT segments to merge at a given level.  That will be
56065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** broken if we allow the developer to request preemptive or
56075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** deferred merging.
56085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
56095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  memset(&lrs, '\0', sizeof(lrs));
56105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = leavesReadersInit(v, iLevel, lrs, &i);
56115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
56125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
56135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  leafWriterInit(iLevel+1, idx, &writer);
56145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
56155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( i!=MERGE_COUNT ){
56165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = SQLITE_CORRUPT_BKPT;
56175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    goto err;
56185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
56195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
56205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Since leavesReaderReorder() pushes readers at eof to the end,
56215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** when the first reader is empty, all will be empty.
56225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
56235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( !leavesReaderAtEnd(lrs) ){
56245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Figure out how many readers share their next term. */
56255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    for(i=1; i<MERGE_COUNT && !leavesReaderAtEnd(lrs+i); i++){
56265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( 0!=leavesReaderTermCmp(lrs, lrs+i) ) break;
56275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
56285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
56295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = leavesReadersMerge(v, lrs, i, &writer);
56305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) goto err;
56315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
56325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Step forward those that were merged. */
56335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    while( i-->0 ){
56345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = leavesReaderStep(v, lrs+i);
56355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) goto err;
56365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
56375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* Reorder by term, then by age. */
56385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      leavesReaderReorder(lrs+i, MERGE_COUNT-i);
56395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
56405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
56415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
56425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<MERGE_COUNT; i++){
56435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    leavesReaderDestroy(&lrs[i]);
56445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
56455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
56465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = leafWriterFinalize(v, &writer);
56475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  leafWriterDestroy(&writer);
56485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
56495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
56505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Delete the merged segment data. */
56515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return segdir_delete(v, iLevel);
56525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
56535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) err:
56545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<MERGE_COUNT; i++){
56555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    leavesReaderDestroy(&lrs[i]);
56565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
56575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  leafWriterDestroy(&writer);
56585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
56595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
56605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
56615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Accumulate the union of *acc and *pData into *acc. */
56625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int docListAccumulateUnion(DataBuffer *acc,
56635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                  const char *pData, int nData) {
56645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer tmp = *acc;
56655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
56665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferInit(acc, tmp.nData+nData);
56675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = docListUnion(tmp.pData, tmp.nData, pData, nData, acc);
56685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferDestroy(&tmp);
56695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
56705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
56715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
56725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) It might be interesting to explore different merge
56735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** strategies, here.  For instance, since this is a sorted merge, we
56745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** could easily merge many doclists in parallel.  With some
56755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** comprehension of the storage format, we could merge all of the
56765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** doclists within a leaf node directly from the leaf node's storage.
56775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** It may be worthwhile to merge smaller doclists before larger
56785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** doclists, since they can be traversed more quickly - but the
56795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** results may have less overlap, making them more expensive in a
56805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** different way.
56815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
56825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
56835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Scan pReader for pTerm/nTerm, and merge the term's doclist over
56845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** *out (any doclists with duplicate docids overwrite those in *out).
56855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Internal function for loadSegmentLeaf().
56865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
56875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int loadSegmentLeavesInt(fulltext_vtab *v, LeavesReader *pReader,
56885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                const char *pTerm, int nTerm, int isPrefix,
56895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                DataBuffer *out){
56905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* doclist data is accumulated into pBuffers similar to how one does
56915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** increment in binary arithmetic.  If index 0 is empty, the data is
56925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** stored there.  If there is data there, it is merged and the
56935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** results carried into position 1, with further merge-and-carry
56945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** until an empty position is found.
56955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
56965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer *pBuffers = NULL;
56975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nBuffers = 0, nMaxBuffers = 0, rc;
56985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
56995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nTerm>0 );
57005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
57015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(rc=SQLITE_OK; rc==SQLITE_OK && !leavesReaderAtEnd(pReader);
57025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc=leavesReaderStep(v, pReader)){
57035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* TODO(shess) Really want leavesReaderTermCmp(), but that name is
57045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** already taken to compare the terms of two LeavesReaders.  Think
57055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** on a better name.  [Meanwhile, break encapsulation rather than
57065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** use a confusing name.]
57075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    */
57085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int c = leafReaderTermCmp(&pReader->leafReader, pTerm, nTerm, isPrefix);
57095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( c>0 ) break;      /* Past any possible matches. */
57105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( c==0 ){
57115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      int iBuffer, nData;
57125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      const char *pData = leavesReaderData(pReader);
57135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( pData==NULL ){
57145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = SQLITE_CORRUPT_BKPT;
57155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        break;
57165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
57175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      nData = leavesReaderDataBytes(pReader);
57185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
57195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* Find the first empty buffer. */
57205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      for(iBuffer=0; iBuffer<nBuffers; ++iBuffer){
57215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( 0==pBuffers[iBuffer].nData ) break;
57225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
57235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
57245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* Out of buffers, add an empty one. */
57255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( iBuffer==nBuffers ){
57265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( nBuffers==nMaxBuffers ){
57275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          DataBuffer *p;
57285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          nMaxBuffers += 20;
57295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
57305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          /* Manual realloc so we can handle NULL appropriately. */
57315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          p = sqlite3_malloc(nMaxBuffers*sizeof(*pBuffers));
57325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          if( p==NULL ){
57335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            rc = SQLITE_NOMEM;
57345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            break;
57355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          }
57365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
57375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          if( nBuffers>0 ){
57385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            assert(pBuffers!=NULL);
57395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            memcpy(p, pBuffers, nBuffers*sizeof(*pBuffers));
57405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            sqlite3_free(pBuffers);
57415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          }
57425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          pBuffers = p;
57435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
57445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        dataBufferInit(&(pBuffers[nBuffers]), 0);
57455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        nBuffers++;
57465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
57475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
57485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* At this point, must have an empty at iBuffer. */
57495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert(iBuffer<nBuffers && pBuffers[iBuffer].nData==0);
57505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
57515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* If empty was first buffer, no need for merge logic. */
57525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( iBuffer==0 ){
57535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        dataBufferReplace(&(pBuffers[0]), pData, nData);
57545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }else{
57555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        /* pAcc is the empty buffer the merged data will end up in. */
57565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        DataBuffer *pAcc = &(pBuffers[iBuffer]);
57575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        DataBuffer *p = &(pBuffers[0]);
57585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
57595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        /* Handle position 0 specially to avoid need to prime pAcc
57605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        ** with pData/nData.
57615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        */
57625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        dataBufferSwap(p, pAcc);
57635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = docListAccumulateUnion(pAcc, pData, nData);
57645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( rc!=SQLITE_OK ) goto err;
57655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
57665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        /* Accumulate remaining doclists into pAcc. */
57675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        for(++p; p<pAcc; ++p){
57685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          rc = docListAccumulateUnion(pAcc, p->pData, p->nData);
57695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          if( rc!=SQLITE_OK ) goto err;
57705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
57715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          /* dataBufferReset() could allow a large doclist to blow up
57725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          ** our memory requirements.
57735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          */
57745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          if( p->nCapacity<1024 ){
57755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            dataBufferReset(p);
57765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          }else{
57775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            dataBufferDestroy(p);
57785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            dataBufferInit(p, 0);
57795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          }
57805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
57815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
57825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
57835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
57845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
57855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Union all the doclists together into *out. */
57865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO(shess) What if *out is big?  Sigh. */
57875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_OK && nBuffers>0 ){
57885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int iBuffer;
57895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    for(iBuffer=0; iBuffer<nBuffers; ++iBuffer){
57905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( pBuffers[iBuffer].nData>0 ){
57915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( out->nData==0 ){
57925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          dataBufferSwap(out, &(pBuffers[iBuffer]));
57935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }else{
57945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          rc = docListAccumulateUnion(out, pBuffers[iBuffer].pData,
57955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                      pBuffers[iBuffer].nData);
57965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          if( rc!=SQLITE_OK ) break;
57975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
57985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
57995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
58005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
58015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
58025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)err:
58035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( nBuffers-- ){
58045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferDestroy(&(pBuffers[nBuffers]));
58055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
58065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pBuffers!=NULL ) sqlite3_free(pBuffers);
58075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
58085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
58095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
58105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
58115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Call loadSegmentLeavesInt() with pData/nData as input. */
58125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int loadSegmentLeaf(fulltext_vtab *v, const char *pData, int nData,
58135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                           const char *pTerm, int nTerm, int isPrefix,
58145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                           DataBuffer *out){
58155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  LeavesReader reader;
58165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
58175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
58185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nData>1 );
58195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( *pData=='\0' );
58205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = leavesReaderInit(v, 0, 0, 0, pData, nData, &reader);
58215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
58225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
58235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = loadSegmentLeavesInt(v, &reader, pTerm, nTerm, isPrefix, out);
58245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  leavesReaderReset(&reader);
58255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  leavesReaderDestroy(&reader);
58265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
58275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
58285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
58295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Call loadSegmentLeavesInt() with the leaf nodes from iStartLeaf to
58305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** iEndLeaf (inclusive) as input, and merge the resulting doclist into
58315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** out.
58325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
58335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int loadSegmentLeaves(fulltext_vtab *v,
58345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                             sqlite_int64 iStartLeaf, sqlite_int64 iEndLeaf,
58355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                             const char *pTerm, int nTerm, int isPrefix,
58365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                             DataBuffer *out){
58375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
58385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  LeavesReader reader;
58395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
58405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iStartLeaf<=iEndLeaf );
58415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = leavesReaderInit(v, 0, iStartLeaf, iEndLeaf, NULL, 0, &reader);
58425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
58435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
58445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = loadSegmentLeavesInt(v, &reader, pTerm, nTerm, isPrefix, out);
58455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  leavesReaderReset(&reader);
58465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  leavesReaderDestroy(&reader);
58475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
58485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
58495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
58505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Taking pData/nData as an interior node, find the sequence of child
58515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** nodes which could include pTerm/nTerm/isPrefix.  Note that the
58525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** interior node terms logically come between the blocks, so there is
58535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** one more blockid than there are terms (that block contains terms >=
58545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the last interior-node term).
58555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
58565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) The calling code may already know that the end child is
58575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** not worth calculating, because the end may be in a later sibling
58585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** node.  Consider whether breaking symmetry is worthwhile.  I suspect
58595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** it is not worthwhile.
58605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
58615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int getChildrenContaining(const char *pData, int nData,
58625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                 const char *pTerm, int nTerm, int isPrefix,
58635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                 sqlite_int64 *piStartChild,
58645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                 sqlite_int64 *piEndChild){
58655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  InteriorReader reader;
58665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
58675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
58685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nData>1 );
58695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( *pData!='\0' );
58705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = interiorReaderInit(pData, nData, &reader);
58715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
58725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
58735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Scan for the first child which could contain pTerm/nTerm. */
58745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( !interiorReaderAtEnd(&reader) ){
58755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( interiorReaderTermCmp(&reader, pTerm, nTerm, 0)>0 ) break;
58765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = interiorReaderStep(&reader);
58775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ){
58785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      interiorReaderDestroy(&reader);
58795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return rc;
58805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
58815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
58825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *piStartChild = interiorReaderCurrentBlockid(&reader);
58835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
58845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Keep scanning to find a term greater than our term, using prefix
58855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** comparison if indicated.  If isPrefix is false, this will be the
58865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** same blockid as the starting block.
58875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
58885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( !interiorReaderAtEnd(&reader) ){
58895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( interiorReaderTermCmp(&reader, pTerm, nTerm, isPrefix)>0 ) break;
58905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = interiorReaderStep(&reader);
58915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ){
58925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      interiorReaderDestroy(&reader);
58935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return rc;
58945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
58955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
58965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  *piEndChild = interiorReaderCurrentBlockid(&reader);
58975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
58985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  interiorReaderDestroy(&reader);
58995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
59005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Children must ascend, and if !prefix, both must be the same. */
59015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( *piEndChild>=*piStartChild );
59025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( isPrefix || *piStartChild==*piEndChild );
59035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
59045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
59055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
59065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Read block at iBlockid and pass it with other params to
59075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** getChildrenContaining().
59085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
59095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int loadAndGetChildrenContaining(
59105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab *v,
59115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 iBlockid,
59125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pTerm, int nTerm, int isPrefix,
59135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite_int64 *piStartChild, sqlite_int64 *piEndChild
59145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
59155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *s = NULL;
59165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
59175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
59185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iBlockid!=0 );
59195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pTerm!=NULL );
59205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nTerm!=0 );        /* TODO(shess) Why not allow this? */
59215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( piStartChild!=NULL );
59225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( piEndChild!=NULL );
59235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
59245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sql_get_statement(v, BLOCK_SELECT_STMT, &s);
59255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
59265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
59275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_bind_int64(s, 1, iBlockid);
59285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
59295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
59305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_step(s);
59315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Corrupt if interior node references missing child node. */
59325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_DONE ) return SQLITE_CORRUPT_BKPT;
59335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_ROW ) return rc;
59345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
59355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Corrupt if child node isn't a blob. */
59365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( sqlite3_column_type(s, 0)!=SQLITE_BLOB ){
59375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_reset(s);         /* So we don't leave a lock. */
59385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_CORRUPT_BKPT;
59395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
59405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const char *pData = sqlite3_column_blob(s, 0);
59415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int nData = sqlite3_column_bytes(s, 0);
59425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
59435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Corrupt if child is not a valid interior node. */
59445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( pData==NULL || nData<1 || pData[0]=='\0' ){
59455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_reset(s);         /* So we don't leave a lock. */
59465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return SQLITE_CORRUPT_BKPT;
59475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
59485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
59495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = getChildrenContaining(pData, nData, pTerm, nTerm,
59505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                               isPrefix, piStartChild, piEndChild);
59515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ){
59525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_reset(s);
59535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return rc;
59545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
59555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
59565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
59575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* We expect only one row.  We must execute another sqlite3_step()
59585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)   * to complete the iteration; otherwise the table will remain
59595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)   * locked. */
59605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = sqlite3_step(s);
59615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_ROW ) return SQLITE_ERROR;
59625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_DONE ) return rc;
59635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
59645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
59655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
59665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
59675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Traverse the tree represented by pData[nData] looking for
59685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** pTerm[nTerm], placing its doclist into *out.  This is internal to
59695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** loadSegment() to make error-handling cleaner.
59705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
59715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int loadSegmentInt(fulltext_vtab *v, const char *pData, int nData,
59725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          sqlite_int64 iLeavesEnd,
59735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          const char *pTerm, int nTerm, int isPrefix,
59745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          DataBuffer *out){
59755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Special case where root is a leaf. */
59765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( *pData=='\0' ){
59775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return loadSegmentLeaf(v, pData, nData, pTerm, nTerm, isPrefix, out);
59785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
59795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int rc;
59805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite_int64 iStartChild, iEndChild;
59815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
59825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Process pData as an interior node, then loop down the tree
59835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** until we find the set of leaf nodes to scan for the term.
59845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    */
59855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = getChildrenContaining(pData, nData, pTerm, nTerm, isPrefix,
59865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                               &iStartChild, &iEndChild);
59875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
59885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    while( iStartChild>iLeavesEnd ){
59895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite_int64 iNextStart, iNextEnd;
59905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = loadAndGetChildrenContaining(v, iStartChild, pTerm, nTerm, isPrefix,
59915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                        &iNextStart, &iNextEnd);
59925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) return rc;
59935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
59945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* If we've branched, follow the end branch, too. */
59955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( iStartChild!=iEndChild ){
59965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        sqlite_int64 iDummy;
59975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = loadAndGetChildrenContaining(v, iEndChild, pTerm, nTerm, isPrefix,
59985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                          &iDummy, &iNextEnd);
59995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( rc!=SQLITE_OK ) return rc;
60005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
60015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
60025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( iNextStart<=iNextEnd );
60035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      iStartChild = iNextStart;
60045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      iEndChild = iNextEnd;
60055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
60065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( iStartChild<=iLeavesEnd );
60075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( iEndChild<=iLeavesEnd );
60085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
60095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Scan through the leaf segments for doclists. */
60105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return loadSegmentLeaves(v, iStartChild, iEndChild,
60115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                             pTerm, nTerm, isPrefix, out);
60125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
60135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
60145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
60155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Call loadSegmentInt() to collect the doclist for pTerm/nTerm, then
60165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** merge its doclist over *out (any duplicate doclists read from the
60175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** segment rooted at pData will overwrite those in *out).
60185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
60195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) Consider changing this to determine the depth of the
60205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** leaves using either the first characters of interior nodes (when
60215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** ==1, we're one level above the leaves), or the first character of
60225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** the root (which will describe the height of the tree directly).
60235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Either feels somewhat tricky to me.
60245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
60255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess) The current merge is likely to be slow for large
60265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** doclists (though it should process from newest/smallest to
60275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** oldest/largest, so it may not be that bad).  It might be useful to
60285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** modify things to allow for N-way merging.  This could either be
60295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** within a segment, with pairwise merges across segments, or across
60305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** all segments at once.
60315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
60325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int loadSegment(fulltext_vtab *v, const char *pData, int nData,
60335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       sqlite_int64 iLeavesEnd,
60345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       const char *pTerm, int nTerm, int isPrefix,
60355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       DataBuffer *out){
60365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer result;
60375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
60385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
60395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Corrupt if segment root can't be valid. */
60405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pData==NULL || nData<1 ) return SQLITE_CORRUPT_BKPT;
60415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
60425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* This code should never be called with buffered updates. */
60435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( v->nPendingData<0 );
60445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
60455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferInit(&result, 0);
60465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = loadSegmentInt(v, pData, nData, iLeavesEnd,
60475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                      pTerm, nTerm, isPrefix, &result);
60485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_OK && result.nData>0 ){
60495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( out->nData==0 ){
60505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      DataBuffer tmp = *out;
60515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      *out = result;
60525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      result = tmp;
60535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
60545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      DataBuffer merged;
60555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      DLReader readers[2];
60565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
60575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrInit(&readers[0], DL_DEFAULT, out->pData, out->nData);
60585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc==SQLITE_OK ){
60595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = dlrInit(&readers[1], DL_DEFAULT, result.pData, result.nData);
60605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( rc==SQLITE_OK ){
60615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          dataBufferInit(&merged, out->nData+result.nData);
60625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          rc = docListMerge(&merged, readers, 2);
60635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          dataBufferDestroy(out);
60645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          *out = merged;
60655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          dlrDestroy(&readers[1]);
60665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
60675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        dlrDestroy(&readers[0]);
60685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
60695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
60705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
60715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
60725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferDestroy(&result);
60735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
60745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
60755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
60765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Scan the database and merge together the posting lists for the term
60775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** into *out.
60785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
60795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int termSelect(fulltext_vtab *v, int iColumn,
60805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                      const char *pTerm, int nTerm, int isPrefix,
60815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                      DocListType iType, DataBuffer *out){
60825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer doclist;
60835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_stmt *s;
60845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = sql_get_statement(v, SEGDIR_SELECT_ALL_STMT, &s);
60855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
60865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
60875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* This code should never be called with buffered updates. */
60885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( v->nPendingData<0 );
60895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
60905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferInit(&doclist, 0);
60915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
60925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Traverse the segments from oldest to newest so that newer doclist
60935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** elements for given docids overwrite older elements.
60945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
60955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( (rc = sqlite3_step(s))==SQLITE_ROW ){
60965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const char *pData = sqlite3_column_blob(s, 2);
60975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const int nData = sqlite3_column_bytes(s, 2);
60985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const sqlite_int64 iLeavesEnd = sqlite3_column_int64(s, 1);
60995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
61005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Corrupt if we get back different types than we stored. */
61015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( sqlite3_column_type(s, 1)!=SQLITE_INTEGER ||
61025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        sqlite3_column_type(s, 2)!=SQLITE_BLOB ){
61035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = SQLITE_CORRUPT_BKPT;
61045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      goto err;
61055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
61065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
61075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = loadSegment(v, pData, nData, iLeavesEnd, pTerm, nTerm, isPrefix,
61085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                     &doclist);
61095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) goto err;
61105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
61115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_DONE ){
61125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = SQLITE_OK;
61135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( doclist.nData!=0 ){
61145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* TODO(shess) The old term_select_all() code applied the column
61155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      ** restrict as we merged segments, leading to smaller buffers.
61165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      ** This is probably worthwhile to bring back, once the new storage
61175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      ** system is checked in.
61185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      */
61195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( iColumn==v->nColumn) iColumn = -1;
61205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = docListTrim(DL_DEFAULT, doclist.pData, doclist.nData,
61215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       iColumn, iType, out);
61225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
61235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
61245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
61255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) err:
61265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_reset(s);         /* So we don't leave a lock. */
61275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferDestroy(&doclist);
61285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
61295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
61305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
61315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/****************************************************************/
61325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Used to hold hashtable data for sorting. */
61335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct TermData {
61345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pTerm;
61355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nTerm;
61365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLCollector *pCollector;
61375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} TermData;
61385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
61395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Orders TermData elements in strcmp fashion ( <0 for less-than, 0
61405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** for equal, >0 for greater-than).
61415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
61425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int termDataCmp(const void *av, const void *bv){
61435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const TermData *a = (const TermData *)av;
61445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const TermData *b = (const TermData *)bv;
61455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int n = a->nTerm<b->nTerm ? a->nTerm : b->nTerm;
61465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int c = memcmp(a->pTerm, b->pTerm, n);
61475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( c!=0 ) return c;
61485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return a->nTerm-b->nTerm;
61495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
61505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
61515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Order pTerms data by term, then write a new level 0 segment using
61525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** LeafWriter.
61535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
61545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int writeZeroSegment(fulltext_vtab *v, fts2Hash *pTerms){
61555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fts2HashElem *e;
61565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int idx, rc, i, n;
61575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TermData *pData;
61585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  LeafWriter writer;
61595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer dl;
61605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
61615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Determine the next index at level 0, merging as necessary. */
61625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = segdirNextIndex(v, 0, &idx);
61635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
61645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
61655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  n = fts2HashCount(pTerms);
61665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pData = sqlite3_malloc(n*sizeof(TermData));
61675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
61685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i = 0, e = fts2HashFirst(pTerms); e; i++, e = fts2HashNext(e)){
61695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( i<n );
61705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pData[i].pTerm = fts2HashKey(e);
61715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pData[i].nTerm = fts2HashKeysize(e);
61725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pData[i].pCollector = fts2HashData(e);
61735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
61745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( i==n );
61755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
61765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO(shess) Should we allow user-defined collation sequences,
61775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** here?  I think we only need that once we support prefix searches.
61785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
61795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( n>1 ) qsort(pData, n, sizeof(*pData), termDataCmp);
61805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
61815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO(shess) Refactor so that we can write directly to the segment
61825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** DataBuffer, as happens for segment merges.
61835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
61845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  leafWriterInit(0, idx, &writer);
61855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferInit(&dl, 0);
61865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(i=0; i<n; i++){
61875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferReset(&dl);
61885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dlcAddDoclist(pData[i].pCollector, &dl);
61895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = leafWriterStep(v, &writer,
61905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                        pData[i].pTerm, pData[i].nTerm, dl.pData, dl.nData);
61915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) goto err;
61925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
61935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = leafWriterFinalize(v, &writer);
61945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
61955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) err:
61965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferDestroy(&dl);
61975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(pData);
61985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  leafWriterDestroy(&writer);
61995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
62005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
62015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
62025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* If pendingTerms has data, free it. */
62035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int clearPendingTerms(fulltext_vtab *v){
62045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( v->nPendingData>=0 ){
62055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    fts2HashElem *e;
62065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    for(e=fts2HashFirst(&v->pendingTerms); e; e=fts2HashNext(e)){
62075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dlcDelete(fts2HashData(e));
62085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
62095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    fts2HashClear(&v->pendingTerms);
62105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    v->nPendingData = -1;
62115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
62125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
62135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
62145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
62155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* If pendingTerms has data, flush it to a level-zero segment, and
62165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** free it.
62175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
62185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int flushPendingTerms(fulltext_vtab *v){
62195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( v->nPendingData>=0 ){
62205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int rc = writeZeroSegment(v, &v->pendingTerms);
62215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc==SQLITE_OK ) clearPendingTerms(v);
62225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return rc;
62235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
62245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
62255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
62265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
62275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* If pendingTerms is "too big", or docid is out of order, flush it.
62285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Regardless, be certain that pendingTerms is initialized for use.
62295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
62305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int initPendingTerms(fulltext_vtab *v, sqlite_int64 iDocid){
62315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* TODO(shess) Explore whether partially flushing the buffer on
62325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** forced-flush would provide better performance.  I suspect that if
62335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** we ordered the doclists by size and flushed the largest until the
62345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** buffer was half empty, that would let the less frequent terms
62355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** generate longer doclists.
62365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
62375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( iDocid<=v->iPrevDocid || v->nPendingData>kPendingThreshold ){
62385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int rc = flushPendingTerms(v);
62395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) return rc;
62405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
62415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( v->nPendingData<0 ){
62425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    fts2HashInit(&v->pendingTerms, FTS2_HASH_STRING, 1);
62435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    v->nPendingData = 0;
62445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
62455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  v->iPrevDocid = iDocid;
62465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
62475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
62485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
62495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* This function implements the xUpdate callback; it is the top-level entry
62505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) * point for inserting, deleting or updating a row in a full-text table. */
62515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextUpdate(sqlite3_vtab *pVtab, int nArg, sqlite3_value **ppArg,
62525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                   sqlite_int64 *pRowid){
62535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab *v = (fulltext_vtab *) pVtab;
62545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
62555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
62565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TRACE(("FTS2 Update %p\n", pVtab));
62575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
62585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nArg<2 ){
62595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = index_delete(v, sqlite3_value_int64(ppArg[0]));
62605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc==SQLITE_OK ){
62615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* If we just deleted the last row in the table, clear out the
62625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      ** index data.
62635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      */
62645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = content_exists(v);
62655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc==SQLITE_ROW ){
62665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = SQLITE_OK;
62675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }else if( rc==SQLITE_DONE ){
62685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        /* Clear the pending terms so we don't flush a useless level-0
62695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        ** segment when the transaction closes.
62705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        */
62715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = clearPendingTerms(v);
62725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( rc==SQLITE_OK ){
62735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          rc = segdir_delete_all(v);
62745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
62755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
62765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
62775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  } else if( sqlite3_value_type(ppArg[0]) != SQLITE_NULL ){
62785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* An update:
62795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     * ppArg[0] = old rowid
62805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     * ppArg[1] = new rowid
62815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     * ppArg[2..2+v->nColumn-1] = values
62825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     * ppArg[2+v->nColumn] = value for magic column (we ignore this)
62835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     */
62845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite_int64 rowid = sqlite3_value_int64(ppArg[0]);
62855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( sqlite3_value_type(ppArg[1]) != SQLITE_INTEGER ||
62865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_value_int64(ppArg[1]) != rowid ){
62875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = SQLITE_ERROR;  /* we don't allow changing the rowid */
62885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    } else {
62895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( nArg==2+v->nColumn+1);
62905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = index_update(v, rowid, &ppArg[2]);
62915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
62925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  } else {
62935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* An insert:
62945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     * ppArg[1] = requested rowid
62955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     * ppArg[2..2+v->nColumn-1] = values
62965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     * ppArg[2+v->nColumn] = value for magic column (we ignore this)
62975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     */
62985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( nArg==2+v->nColumn+1);
62995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = index_insert(v, ppArg[1], &ppArg[2], pRowid);
63005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
63015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
63025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
63035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
63045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
63055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextSync(sqlite3_vtab *pVtab){
63065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TRACE(("FTS2 xSync()\n"));
63075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return flushPendingTerms((fulltext_vtab *)pVtab);
63085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
63095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
63105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextBegin(sqlite3_vtab *pVtab){
63115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab *v = (fulltext_vtab *) pVtab;
63125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TRACE(("FTS2 xBegin()\n"));
63135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
63145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Any buffered updates should have been cleared by the previous
63155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** transaction.
63165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
63175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( v->nPendingData<0 );
63185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return clearPendingTerms(v);
63195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
63205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
63215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextCommit(sqlite3_vtab *pVtab){
63225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab *v = (fulltext_vtab *) pVtab;
63235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TRACE(("FTS2 xCommit()\n"));
63245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
63255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Buffered updates should have been cleared by fulltextSync(). */
63265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( v->nPendingData<0 );
63275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return clearPendingTerms(v);
63285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
63295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
63305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextRollback(sqlite3_vtab *pVtab){
63315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TRACE(("FTS2 xRollback()\n"));
63325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return clearPendingTerms((fulltext_vtab *)pVtab);
63335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
63345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
63355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
63365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Implementation of the snippet() function for FTS2
63375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
63385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void snippetFunc(
63395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_context *pContext,
63405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int argc,
63415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_value **argv
63425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
63435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_cursor *pCursor;
63445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( argc<1 ) return;
63455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( sqlite3_value_type(argv[0])!=SQLITE_BLOB ||
63465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_value_bytes(argv[0])!=sizeof(pCursor) ){
63475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_result_error(pContext, "illegal first argument to html_snippet",-1);
63485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
63495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const char *zStart = "<b>";
63505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const char *zEnd = "</b>";
63515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const char *zEllipsis = "<b>...</b>";
63525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    memcpy(&pCursor, sqlite3_value_blob(argv[0]), sizeof(pCursor));
63535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( argc>=2 ){
63545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      zStart = (const char*)sqlite3_value_text(argv[1]);
63555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( argc>=3 ){
63565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        zEnd = (const char*)sqlite3_value_text(argv[2]);
63575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( argc>=4 ){
63585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          zEllipsis = (const char*)sqlite3_value_text(argv[3]);
63595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
63605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
63615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
63625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    snippetAllOffsets(pCursor);
63635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    snippetText(pCursor, zStart, zEnd, zEllipsis);
63645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_result_text(pContext, pCursor->snippet.zSnippet,
63655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                        pCursor->snippet.nSnippet, SQLITE_STATIC);
63665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
63675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
63685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
63695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
63705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Implementation of the offsets() function for FTS2
63715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
63725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void snippetOffsetsFunc(
63735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_context *pContext,
63745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int argc,
63755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_value **argv
63765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
63775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_cursor *pCursor;
63785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( argc<1 ) return;
63795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( sqlite3_value_type(argv[0])!=SQLITE_BLOB ||
63805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_value_bytes(argv[0])!=sizeof(pCursor) ){
63815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_result_error(pContext, "illegal first argument to offsets",-1);
63825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
63835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    memcpy(&pCursor, sqlite3_value_blob(argv[0]), sizeof(pCursor));
63845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    snippetAllOffsets(pCursor);
63855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    snippetOffsetText(&pCursor->snippet);
63865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_result_text(pContext,
63875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                        pCursor->snippet.zOffset, pCursor->snippet.nOffset,
63885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                        SQLITE_STATIC);
63895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
63905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
63915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
63925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* OptLeavesReader is nearly identical to LeavesReader, except that
63935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** where LeavesReader is geared towards the merging of complete
63945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** segment levels (with exactly MERGE_COUNT segments), OptLeavesReader
63955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** is geared towards implementation of the optimize() function, and
63965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** can merge all segments simultaneously.  This version may be
63975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** somewhat less efficient than LeavesReader because it merges into an
63985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** accumulator rather than doing an N-way merge, but since segment
63995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** size grows exponentially (so segment count logrithmically) this is
64005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** probably not an immediate problem.
64015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
64025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess): Prove that assertion, or extend the merge code to
64035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** merge tree fashion (like the prefix-searching code does).
64045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
64055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* TODO(shess): OptLeavesReader and LeavesReader could probably be
64065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** merged with little or no loss of performance for LeavesReader.  The
64075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** merged code would need to handle >MERGE_COUNT segments, and would
64085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** also need to be able to optionally optimize away deletes.
64095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
64105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)typedef struct OptLeavesReader {
64115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Segment number, to order readers by age. */
64125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int segment;
64135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  LeavesReader reader;
64145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)} OptLeavesReader;
64155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
64165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int optLeavesReaderAtEnd(OptLeavesReader *pReader){
64175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return leavesReaderAtEnd(&pReader->reader);
64185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
64195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int optLeavesReaderTermBytes(OptLeavesReader *pReader){
64205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return leavesReaderTermBytes(&pReader->reader);
64215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
64225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static const char *optLeavesReaderData(OptLeavesReader *pReader){
64235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return leavesReaderData(&pReader->reader);
64245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
64255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int optLeavesReaderDataBytes(OptLeavesReader *pReader){
64265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return leavesReaderDataBytes(&pReader->reader);
64275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
64285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static const char *optLeavesReaderTerm(OptLeavesReader *pReader){
64295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return leavesReaderTerm(&pReader->reader);
64305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
64315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int optLeavesReaderStep(fulltext_vtab *v, OptLeavesReader *pReader){
64325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return leavesReaderStep(v, &pReader->reader);
64335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
64345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int optLeavesReaderTermCmp(OptLeavesReader *lr1, OptLeavesReader *lr2){
64355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return leavesReaderTermCmp(&lr1->reader, &lr2->reader);
64365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
64375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Order by term ascending, segment ascending (oldest to newest), with
64385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** exhausted readers to the end.
64395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
64405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int optLeavesReaderCmp(OptLeavesReader *lr1, OptLeavesReader *lr2){
64415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int c = optLeavesReaderTermCmp(lr1, lr2);
64425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( c!=0 ) return c;
64435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return lr1->segment-lr2->segment;
64445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
64455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Bubble pLr[0] to appropriate place in pLr[1..nLr-1].  Assumes that
64465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** pLr[1..nLr-1] is already sorted.
64475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
64485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void optLeavesReaderReorder(OptLeavesReader *pLr, int nLr){
64495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( nLr>1 && optLeavesReaderCmp(pLr, pLr+1)>0 ){
64505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    OptLeavesReader tmp = pLr[0];
64515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pLr[0] = pLr[1];
64525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pLr[1] = tmp;
64535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nLr--;
64545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pLr++;
64555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
64565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
64575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
64585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* optimize() helper function.  Put the readers in order and iterate
64595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** through them, merging doclists for matching terms into pWriter.
64605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Returns SQLITE_OK on success, or the SQLite error code which
64615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** prevented success.
64625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
64635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int optimizeInternal(fulltext_vtab *v,
64645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                            OptLeavesReader *readers, int nReaders,
64655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                            LeafWriter *pWriter){
64665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int i, rc = SQLITE_OK;
64675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer doclist, merged, tmp;
64685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pData;
64695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
64705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Order the readers. */
64715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  i = nReaders;
64725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( i-- > 0 ){
64735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    optLeavesReaderReorder(&readers[i], nReaders-i);
64745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
64755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
64765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferInit(&doclist, LEAF_MAX);
64775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferInit(&merged, LEAF_MAX);
64785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
64795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Exhausted readers bubble to the end, so when the first reader is
64805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** at eof, all are at eof.
64815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
64825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( !optLeavesReaderAtEnd(&readers[0]) ){
64835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
64845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Figure out how many readers share the next term. */
64855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    for(i=1; i<nReaders && !optLeavesReaderAtEnd(&readers[i]); i++){
64865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( 0!=optLeavesReaderTermCmp(&readers[0], &readers[i]) ) break;
64875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
64885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
64895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pData = optLeavesReaderData(&readers[0]);
64905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( pData==NULL ){
64915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = SQLITE_CORRUPT_BKPT;
64925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      break;
64935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
64945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
64955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Special-case for no merge. */
64965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( i==1 ){
64975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* Trim deletions from the doclist. */
64985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dataBufferReset(&merged);
64995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = docListTrim(DL_DEFAULT,
65005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       pData,
65015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       optLeavesReaderDataBytes(&readers[0]),
65025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       -1, DL_DEFAULT, &merged);
65035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!= SQLITE_OK ) break;
65045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
65055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      DLReader dlReaders[MERGE_COUNT];
65065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      int iReader, nReaders;
65075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
65085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* Prime the pipeline with the first reader's doclist.  After
65095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      ** one pass index 0 will reference the accumulated doclist.
65105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      */
65115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = dlrInit(&dlReaders[0], DL_DEFAULT,
65125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                   pData,
65135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                   optLeavesReaderDataBytes(&readers[0]));
65145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
65155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      iReader = 1;
65165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
65175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( iReader<i );  /* Must execute the loop at least once. */
65185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      while( iReader<i ){
65195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        /* Merge 16 inputs per pass. */
65205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        for( nReaders=1; iReader<i && nReaders<MERGE_COUNT;
65215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)             iReader++, nReaders++ ){
65225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          pData = optLeavesReaderData(&readers[iReader]);
65235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          if( pData == NULL ){
65245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            rc = SQLITE_CORRUPT_BKPT;
65255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            break;
65265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          }
65275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          rc = dlrInit(&dlReaders[nReaders], DL_DEFAULT,
65285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       pData,
65295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       optLeavesReaderDataBytes(&readers[iReader]));
65305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          if( rc != SQLITE_OK ) break;
65315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
65325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
65335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        /* Merge doclists and swap result into accumulator. */
65345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( rc==SQLITE_OK ){
65355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          dataBufferReset(&merged);
65365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          rc = docListMerge(&merged, dlReaders, nReaders);
65375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          tmp = merged;
65385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          merged = doclist;
65395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          doclist = tmp;
65405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
65415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
65425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        while( nReaders-- > 0 ){
65435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          dlrDestroy(&dlReaders[nReaders]);
65445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
65455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
65465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( rc!=SQLITE_OK ) goto err;
65475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
65485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        /* Accumulated doclist to reader 0 for next pass. */
65495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = dlrInit(&dlReaders[0], DL_DEFAULT, doclist.pData, doclist.nData);
65505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( rc!=SQLITE_OK ) goto err;
65515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
65525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
65535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* Destroy reader that was left in the pipeline. */
65545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dlrDestroy(&dlReaders[0]);
65555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
65565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* Trim deletions from the doclist. */
65575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dataBufferReset(&merged);
65585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = docListTrim(DL_DEFAULT, doclist.pData, doclist.nData,
65595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       -1, DL_DEFAULT, &merged);
65605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) goto err;
65615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
65625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
65635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Only pass doclists with hits (skip if all hits deleted). */
65645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( merged.nData>0 ){
65655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = leafWriterStep(v, pWriter,
65665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          optLeavesReaderTerm(&readers[0]),
65675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          optLeavesReaderTermBytes(&readers[0]),
65685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          merged.pData, merged.nData);
65695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) goto err;
65705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
65715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
65725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Step merged readers to next term and reorder. */
65735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    while( i-- > 0 ){
65745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = optLeavesReaderStep(v, &readers[i]);
65755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) goto err;
65765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
65775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      optLeavesReaderReorder(&readers[i], nReaders-i);
65785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
65795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
65805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
65815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) err:
65825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferDestroy(&doclist);
65835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferDestroy(&merged);
65845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
65855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
65865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
65875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Implement optimize() function for FTS3.  optimize(t) merges all
65885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** segments in the fts index into a single segment.  't' is the magic
65895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** table-named column.
65905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
65915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void optimizeFunc(sqlite3_context *pContext,
65925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                         int argc, sqlite3_value **argv){
65935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_cursor *pCursor;
65945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( argc>1 ){
65955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_result_error(pContext, "excess arguments to optimize()",-1);
65965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else if( sqlite3_value_type(argv[0])!=SQLITE_BLOB ||
65975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            sqlite3_value_bytes(argv[0])!=sizeof(pCursor) ){
65985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_result_error(pContext, "illegal first argument to optimize",-1);
65995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
66005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    fulltext_vtab *v;
66015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int i, rc, iMaxLevel;
66025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    OptLeavesReader *readers;
66035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int nReaders;
66045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    LeafWriter writer;
66055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_stmt *s;
66065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    memcpy(&pCursor, sqlite3_value_blob(argv[0]), sizeof(pCursor));
66085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    v = cursor_vtab(pCursor);
66095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Flush any buffered updates before optimizing. */
66115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = flushPendingTerms(v);
66125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) goto err;
66135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = segdir_count(v, &nReaders, &iMaxLevel);
66155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) goto err;
66165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( nReaders==0 || nReaders==1 ){
66175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_result_text(pContext, "Index already optimal", -1,
66185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          SQLITE_STATIC);
66195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return;
66205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
66215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = sql_get_statement(v, SEGDIR_SELECT_ALL_STMT, &s);
66235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) goto err;
66245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    readers = sqlite3_malloc(nReaders*sizeof(readers[0]));
66265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( readers==NULL ) goto err;
66275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Note that there will already be a segment at this position
66295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** until we call segdir_delete() on iMaxLevel.
66305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    */
66315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    leafWriterInit(iMaxLevel, 0, &writer);
66325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    i = 0;
66345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    while( (rc = sqlite3_step(s))==SQLITE_ROW ){
66355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite_int64 iStart = sqlite3_column_int64(s, 0);
66365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite_int64 iEnd = sqlite3_column_int64(s, 1);
66375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      const char *pRootData = sqlite3_column_blob(s, 2);
66385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      int nRootData = sqlite3_column_bytes(s, 2);
66395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* Corrupt if we get back different types than we stored. */
66415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( sqlite3_column_type(s, 0)!=SQLITE_INTEGER ||
66425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          sqlite3_column_type(s, 1)!=SQLITE_INTEGER ||
66435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          sqlite3_column_type(s, 2)!=SQLITE_BLOB ){
66445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = SQLITE_CORRUPT_BKPT;
66455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        break;
66465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
66475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( i<nReaders );
66495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = leavesReaderInit(v, -1, iStart, iEnd, pRootData, nRootData,
66505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                            &readers[i].reader);
66515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
66525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      readers[i].segment = i;
66545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      i++;
66555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
66565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* If we managed to successfully read them all, optimize them. */
66585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc==SQLITE_DONE ){
66595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( i==nReaders );
66605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = optimizeInternal(v, readers, nReaders, &writer);
66615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
66625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_reset(s);      /* So we don't leave a lock. */
66635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
66645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    while( i-- > 0 ){
66665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      leavesReaderDestroy(&readers[i].reader);
66675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
66685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_free(readers);
66695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* If we've successfully gotten to here, delete the old segments
66715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** and flush the interior structure of the new segment.
66725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    */
66735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc==SQLITE_OK ){
66745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      for( i=0; i<=iMaxLevel; i++ ){
66755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = segdir_delete(v, i);
66765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( rc!=SQLITE_OK ) break;
66775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
66785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc==SQLITE_OK ) rc = leafWriterFinalize(v, &writer);
66805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
66815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    leafWriterDestroy(&writer);
66835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) goto err;
66855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_result_text(pContext, "Index optimized", -1, SQLITE_STATIC);
66875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return;
66885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
66895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* TODO(shess): Error-handling needs to be improved along the
66905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** lines of the dump_ functions.
66915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    */
66925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles) err:
66935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    {
66945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      char buf[512];
66955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_snprintf(sizeof(buf), buf, "Error in optimize: %s",
66965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       sqlite3_errmsg(sqlite3_context_db_handle(pContext)));
66975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_result_error(pContext, buf, -1);
66985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
66995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
67005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
67015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
67025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifdef SQLITE_TEST
67035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Generate an error of the form "<prefix>: <msg>".  If msg is NULL,
67045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** pull the error from the context's db handle.
67055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
67065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void generateError(sqlite3_context *pContext,
67075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                          const char *prefix, const char *msg){
67085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char buf[512];
67095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( msg==NULL ) msg = sqlite3_errmsg(sqlite3_context_db_handle(pContext));
67105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_snprintf(sizeof(buf), buf, "%s: %s", prefix, msg);
67115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_result_error(pContext, buf, -1);
67125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
67135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
67145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Helper function to collect the set of terms in the segment into
67155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** pTerms.  The segment is defined by the leaf nodes between
67165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** iStartBlockid and iEndBlockid, inclusive, or by the contents of
67175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** pRootData if iStartBlockid is 0 (in which case the entire segment
67185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** fit in a leaf).
67195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
67205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int collectSegmentTerms(fulltext_vtab *v, sqlite3_stmt *s,
67215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                               fts2Hash *pTerms){
67225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const sqlite_int64 iStartBlockid = sqlite3_column_int64(s, 0);
67235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const sqlite_int64 iEndBlockid = sqlite3_column_int64(s, 1);
67245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *pRootData = sqlite3_column_blob(s, 2);
67255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const int nRootData = sqlite3_column_bytes(s, 2);
67265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
67275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  LeavesReader reader;
67285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
67295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Corrupt if we get back different types than we stored. */
67305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( sqlite3_column_type(s, 0)!=SQLITE_INTEGER ||
67315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_column_type(s, 1)!=SQLITE_INTEGER ||
67325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_column_type(s, 2)!=SQLITE_BLOB ){
67335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_CORRUPT_BKPT;
67345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
67355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
67365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = leavesReaderInit(v, 0, iStartBlockid, iEndBlockid,
67375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                        pRootData, nRootData, &reader);
67385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
67395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
67405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  while( rc==SQLITE_OK && !leavesReaderAtEnd(&reader) ){
67415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const char *pTerm = leavesReaderTerm(&reader);
67425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const int nTerm = leavesReaderTermBytes(&reader);
67435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    void *oldValue = sqlite3Fts2HashFind(pTerms, pTerm, nTerm);
67445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    void *newValue = (void *)((char *)oldValue+1);
67455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
67465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* From the comment before sqlite3Fts2HashInsert in fts2_hash.c,
67475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** the data value passed is returned in case of malloc failure.
67485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    */
67495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( newValue==sqlite3Fts2HashInsert(pTerms, pTerm, nTerm, newValue) ){
67505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = SQLITE_NOMEM;
67515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
67525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = leavesReaderStep(v, &reader);
67535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
67545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
67555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
67565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  leavesReaderDestroy(&reader);
67575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
67585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
67595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
67605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Helper function to build the result string for dump_terms(). */
67615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int generateTermsResult(sqlite3_context *pContext, fts2Hash *pTerms){
67625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int iTerm, nTerms, nResultBytes, iByte;
67635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char *result;
67645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  TermData *pData;
67655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fts2HashElem *e;
67665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
67675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Iterate pTerms to generate an array of terms in pData for
67685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** sorting.
67695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
67705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nTerms = fts2HashCount(pTerms);
67715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nTerms>0 );
67725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pData = sqlite3_malloc(nTerms*sizeof(TermData));
67735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pData==NULL ) return SQLITE_NOMEM;
67745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
67755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  nResultBytes = 0;
67765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(iTerm = 0, e = fts2HashFirst(pTerms); e; iTerm++, e = fts2HashNext(e)){
67775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    nResultBytes += fts2HashKeysize(e)+1;   /* Term plus trailing space */
67785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    assert( iTerm<nTerms );
67795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pData[iTerm].pTerm = fts2HashKey(e);
67805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pData[iTerm].nTerm = fts2HashKeysize(e);
67815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    pData[iTerm].pCollector = fts2HashData(e);  /* unused */
67825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
67835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iTerm==nTerms );
67845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
67855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( nResultBytes>0 );   /* nTerms>0, nResultsBytes must be, too. */
67865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  result = sqlite3_malloc(nResultBytes);
67875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( result==NULL ){
67885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_free(pData);
67895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return SQLITE_NOMEM;
67905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
67915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
67925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( nTerms>1 ) qsort(pData, nTerms, sizeof(*pData), termDataCmp);
67935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
67945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Read the terms in order to build the result. */
67955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  iByte = 0;
67965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for(iTerm=0; iTerm<nTerms; ++iTerm){
67975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    memcpy(result+iByte, pData[iTerm].pTerm, pData[iTerm].nTerm);
67985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    iByte += pData[iTerm].nTerm;
67995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    result[iByte++] = ' ';
68005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
68015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( iByte==nResultBytes );
68025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( result[nResultBytes-1]==' ' );
68035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  result[nResultBytes-1] = '\0';
68045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
68055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Passes away ownership of result. */
68065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_result_text(pContext, result, nResultBytes-1, sqlite3_free);
68075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(pData);
68085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
68095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
68105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
68115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Implements dump_terms() for use in inspecting the fts2 index from
68125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** tests.  TEXT result containing the ordered list of terms joined by
68135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** spaces.  dump_terms(t, level, idx) dumps the terms for the segment
68145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** specified by level, idx (in %_segdir), while dump_terms(t) dumps
68155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** all terms in the index.  In both cases t is the fts table's magic
68165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** table-named column.
68175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
68185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dumpTermsFunc(
68195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_context *pContext,
68205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int argc, sqlite3_value **argv
68215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
68225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_cursor *pCursor;
68235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( argc!=3 && argc!=1 ){
68245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    generateError(pContext, "dump_terms", "incorrect arguments");
68255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else if( sqlite3_value_type(argv[0])!=SQLITE_BLOB ||
68265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            sqlite3_value_bytes(argv[0])!=sizeof(pCursor) ){
68275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    generateError(pContext, "dump_terms", "illegal first argument");
68285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
68295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    fulltext_vtab *v;
68305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    fts2Hash terms;
68315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_stmt *s = NULL;
68325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int rc;
68335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
68345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    memcpy(&pCursor, sqlite3_value_blob(argv[0]), sizeof(pCursor));
68355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    v = cursor_vtab(pCursor);
68365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
68375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* If passed only the cursor column, get all segments.  Otherwise
68385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** get the segment described by the following two arguments.
68395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    */
68405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( argc==1 ){
68415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = sql_get_statement(v, SEGDIR_SELECT_ALL_STMT, &s);
68425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
68435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = sql_get_statement(v, SEGDIR_SELECT_SEGMENT_STMT, &s);
68445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc==SQLITE_OK ){
68455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = sqlite3_bind_int(s, 1, sqlite3_value_int(argv[1]));
68465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( rc==SQLITE_OK ){
68475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          rc = sqlite3_bind_int(s, 2, sqlite3_value_int(argv[2]));
68485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
68495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
68505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
68515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
68525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ){
68535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      generateError(pContext, "dump_terms", NULL);
68545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      return;
68555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
68565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
68575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* Collect the terms for each segment. */
68585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3Fts2HashInit(&terms, FTS2_HASH_STRING, 1);
68595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    while( (rc = sqlite3_step(s))==SQLITE_ROW ){
68605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = collectSegmentTerms(v, s, &terms);
68615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!=SQLITE_OK ) break;
68625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
68635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
68645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_DONE ){
68655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_reset(s);
68665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      generateError(pContext, "dump_terms", NULL);
68675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
68685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      const int nTerms = fts2HashCount(&terms);
68695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( nTerms>0 ){
68705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = generateTermsResult(pContext, &terms);
68715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( rc==SQLITE_NOMEM ){
68725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          generateError(pContext, "dump_terms", "out of memory");
68735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }else{
68745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          assert( rc==SQLITE_OK );
68755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
68765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }else if( argc==3 ){
68775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        /* The specific segment asked for could not be found. */
68785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        generateError(pContext, "dump_terms", "segment not found");
68795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }else{
68805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        /* No segments found. */
68815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        /* TODO(shess): It should be impossible to reach this.  This
68825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        ** case can only happen for an empty table, in which case
68835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        ** SQLite has no rows to call this function on.
68845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        */
68855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        sqlite3_result_null(pContext);
68865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
68875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
68885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3Fts2HashClear(&terms);
68895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
68905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
68915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
68925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Expand the DL_DEFAULT doclist in pData into a text result in
68935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** pContext.
68945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
68955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void createDoclistResult(sqlite3_context *pContext,
68965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                                const char *pData, int nData){
68975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DataBuffer dump;
68985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  DLReader dlReader;
68995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc;
69005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
69015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( pData!=NULL && nData>0 );
69025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
69035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  rc = dlrInit(&dlReader, DL_DEFAULT, pData, nData);
69045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ) return rc;
69055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dataBufferInit(&dump, 0);
69065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  for( ; rc==SQLITE_OK && !dlrAtEnd(&dlReader); rc = dlrStep(&dlReader) ){
69075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    char buf[256];
69085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    PLReader plReader;
69095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
69105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = plrInit(&plReader, &dlReader);
69115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc!=SQLITE_OK ) break;
69125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( DL_DEFAULT==DL_DOCIDS || plrAtEnd(&plReader) ){
69135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_snprintf(sizeof(buf), buf, "[%lld] ", dlrDocid(&dlReader));
69145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dataBufferAppend(&dump, buf, strlen(buf));
69155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
69165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      int iColumn = plrColumn(&plReader);
69175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
69185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_snprintf(sizeof(buf), buf, "[%lld %d[",
69195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                       dlrDocid(&dlReader), iColumn);
69205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dataBufferAppend(&dump, buf, strlen(buf));
69215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
69225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      for( ; !plrAtEnd(&plReader); rc = plrStep(&plReader) ){
69235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( rc!=SQLITE_OK ) break;
69245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( plrColumn(&plReader)!=iColumn ){
69255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          iColumn = plrColumn(&plReader);
69265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          sqlite3_snprintf(sizeof(buf), buf, "] %d[", iColumn);
69275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          assert( dump.nData>0 );
69285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          dump.nData--;                     /* Overwrite trailing space. */
69295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          assert( dump.pData[dump.nData]==' ');
69305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          dataBufferAppend(&dump, buf, strlen(buf));
69315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
69325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( DL_DEFAULT==DL_POSITIONS_OFFSETS ){
69335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          sqlite3_snprintf(sizeof(buf), buf, "%d,%d,%d ",
69345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                           plrPosition(&plReader),
69355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                           plrStartOffset(&plReader), plrEndOffset(&plReader));
69365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }else if( DL_DEFAULT==DL_POSITIONS ){
69375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          sqlite3_snprintf(sizeof(buf), buf, "%d ", plrPosition(&plReader));
69385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }else{
69395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          assert( NULL=="Unhandled DL_DEFAULT value");
69405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
69415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        dataBufferAppend(&dump, buf, strlen(buf));
69425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
69435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      plrDestroy(&plReader);
69445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc!= SQLITE_OK ) break;
69455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
69465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( dump.nData>0 );
69475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dump.nData--;                     /* Overwrite trailing space. */
69485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      assert( dump.pData[dump.nData]==' ');
69495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      dataBufferAppend(&dump, "]] ", 3);
69505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
69515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
69525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dlrDestroy(&dlReader);
69535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc!=SQLITE_OK ){
69545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferDestroy(&dump);
69555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return rc;
69565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
69575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
69585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( dump.nData>0 );
69595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dump.nData--;                     /* Overwrite trailing space. */
69605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( dump.pData[dump.nData]==' ');
69615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dump.pData[dump.nData] = '\0';
69625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( dump.nData>0 );
69635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
69645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Passes ownership of dump's buffer to pContext. */
69655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_result_text(pContext, dump.pData, dump.nData, sqlite3_free);
69665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dump.pData = NULL;
69675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  dump.nData = dump.nCapacity = 0;
69685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return SQLITE_OK;
69695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
69705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
69715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/* Implements dump_doclist() for use in inspecting the fts2 index from
69725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** tests.  TEXT result containing a string representation of the
69735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** doclist for the indicated term.  dump_doclist(t, term, level, idx)
69745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** dumps the doclist for term from the segment specified by level, idx
69755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** (in %_segdir), while dump_doclist(t, term) dumps the logical
69765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** doclist for the term across all segments.  The per-segment doclist
69775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** can contain deletions, while the full-index doclist will not
69785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** (deletions are omitted).
69795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
69805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Result formats differ with the setting of DL_DEFAULTS.  Examples:
69815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
69825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** DL_DOCIDS: [1] [3] [7]
69835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** DL_POSITIONS: [1 0[0 4] 1[17]] [3 1[5]]
69845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** DL_POSITIONS_OFFSETS: [1 0[0,0,3 4,23,26] 1[17,102,105]] [3 1[5,20,23]]
69855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
69865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** In each case the number after the outer '[' is the docid.  In the
69875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** latter two cases, the number before the inner '[' is the column
69885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** associated with the values within.  For DL_POSITIONS the numbers
69895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** within are the positions, for DL_POSITIONS_OFFSETS they are the
69905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** position, the start offset, and the end offset.
69915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
69925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void dumpDoclistFunc(
69935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_context *pContext,
69945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int argc, sqlite3_value **argv
69955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
69965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_cursor *pCursor;
69975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( argc!=2 && argc!=4 ){
69985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    generateError(pContext, "dump_doclist", "incorrect arguments");
69995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else if( sqlite3_value_type(argv[0])!=SQLITE_BLOB ||
70005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            sqlite3_value_bytes(argv[0])!=sizeof(pCursor) ){
70015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    generateError(pContext, "dump_doclist", "illegal first argument");
70025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else if( sqlite3_value_text(argv[1])==NULL ||
70035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            sqlite3_value_text(argv[1])[0]=='\0' ){
70045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    generateError(pContext, "dump_doclist", "empty second argument");
70055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
70065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const char *pTerm = (const char *)sqlite3_value_text(argv[1]);
70075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    const int nTerm = strlen(pTerm);
70085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    fulltext_vtab *v;
70095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    int rc;
70105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    DataBuffer doclist;
70115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
70125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    memcpy(&pCursor, sqlite3_value_blob(argv[0]), sizeof(pCursor));
70135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    v = cursor_vtab(pCursor);
70145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
70155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferInit(&doclist, 0);
70165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
70175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* termSelect() yields the same logical doclist that queries are
70185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** run against.
70195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    */
70205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( argc==2 ){
70215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = termSelect(v, v->nColumn, pTerm, nTerm, 0, DL_DEFAULT, &doclist);
70225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
70235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_stmt *s = NULL;
70245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
70255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* Get our specific segment's information. */
70265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = sql_get_statement(v, SEGDIR_SELECT_SEGMENT_STMT, &s);
70275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc==SQLITE_OK ){
70285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = sqlite3_bind_int(s, 1, sqlite3_value_int(argv[2]));
70295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( rc==SQLITE_OK ){
70305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          rc = sqlite3_bind_int(s, 2, sqlite3_value_int(argv[3]));
70315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
70325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
70335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
70345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( rc==SQLITE_OK ){
70355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        rc = sqlite3_step(s);
70365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
70375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( rc==SQLITE_DONE ){
70385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          dataBufferDestroy(&doclist);
70395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          generateError(pContext, "dump_doclist", "segment not found");
70405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          return;
70415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
70425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
70435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        /* Found a segment, load it into doclist. */
70445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        if( rc==SQLITE_ROW ){
70455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          const sqlite_int64 iLeavesEnd = sqlite3_column_int64(s, 1);
70465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          const char *pData = sqlite3_column_blob(s, 2);
70475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          const int nData = sqlite3_column_bytes(s, 2);
70485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
70495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          /* loadSegment() is used by termSelect() to load each
70505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          ** segment's data.
70515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          */
70525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          rc = loadSegment(v, pData, nData, iLeavesEnd, pTerm, nTerm, 0,
70535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)                           &doclist);
70545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          if( rc==SQLITE_OK ){
70555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            rc = sqlite3_step(s);
70565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
70575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            /* Should not have more than one matching segment. */
70585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            if( rc!=SQLITE_DONE ){
70595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)              sqlite3_reset(s);
70605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)              dataBufferDestroy(&doclist);
70615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)              generateError(pContext, "dump_doclist", "invalid segdir");
70625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)              return;
70635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            }
70645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)            rc = SQLITE_OK;
70655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)          }
70665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        }
70675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
70685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
70695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      sqlite3_reset(s);
70705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
70715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
70725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( rc==SQLITE_OK ){
70735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      if( doclist.nData>0 ){
70745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        createDoclistResult(pContext, doclist.pData, doclist.nData);
70755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }else{
70765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        /* TODO(shess): This can happen if the term is not present, or
70775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        ** if all instances of the term have been deleted and this is
70785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        ** an all-index dump.  It may be interesting to distinguish
70795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        ** these cases.
70805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        */
70815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        sqlite3_result_text(pContext, "", 0, SQLITE_STATIC);
70825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      }
70835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else if( rc==SQLITE_NOMEM ){
70845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* Handle out-of-memory cases specially because if they are
70855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      ** generated in fts2 code they may not be reflected in the db
70865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      ** handle.
70875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      */
70885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* TODO(shess): Handle this more comprehensively.
70895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      ** sqlite3ErrStr() has what I need, but is internal.
70905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      */
70915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      generateError(pContext, "dump_doclist", "out of memory");
70925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }else{
70935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      generateError(pContext, "dump_doclist", NULL);
70945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
70955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
70965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    dataBufferDestroy(&doclist);
70975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
70985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
70995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
71005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
71015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
71025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** This routine implements the xFindFunction method for the FTS2
71035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** virtual table.
71045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
71055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextFindFunction(
71065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_vtab *pVtab,
71075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int nArg,
71085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zName,
71095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  void (**pxFunc)(sqlite3_context*,int,sqlite3_value**),
71105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  void **ppArg
71115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
71125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( strcmp(zName,"snippet")==0 ){
71135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *pxFunc = snippetFunc;
71145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return 1;
71155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else if( strcmp(zName,"offsets")==0 ){
71165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *pxFunc = snippetOffsetsFunc;
71175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return 1;
71185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else if( strcmp(zName,"optimize")==0 ){
71195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *pxFunc = optimizeFunc;
71205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return 1;
71215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifdef SQLITE_TEST
71225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* NOTE(shess): These functions are present only for testing
71235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** purposes.  No particular effort is made to optimize their
71245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ** execution or how they build their results.
71255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    */
71265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else if( strcmp(zName,"dump_terms")==0 ){
71275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* fprintf(stderr, "Found dump_terms\n"); */
71285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *pxFunc = dumpTermsFunc;
71295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return 1;
71305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else if( strcmp(zName,"dump_doclist")==0 ){
71315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    /* fprintf(stderr, "Found dump_doclist\n"); */
71325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    *pxFunc = dumpDoclistFunc;
71335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return 1;
71345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
71355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
71365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return 0;
71375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
71385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
71395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
71405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Rename an fts2 table.
71415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
71425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static int fulltextRename(
71435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_vtab *pVtab,
71445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const char *zName
71455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
71465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fulltext_vtab *p = (fulltext_vtab *)pVtab;
71475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = SQLITE_NOMEM;
71485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char *zSql = sqlite3_mprintf(
71495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    "ALTER TABLE %Q.'%q_content'  RENAME TO '%q_content';"
71505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    "ALTER TABLE %Q.'%q_segments' RENAME TO '%q_segments';"
71515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    "ALTER TABLE %Q.'%q_segdir'   RENAME TO '%q_segdir';"
71525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    , p->zDb, p->zName, zName
71535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    , p->zDb, p->zName, zName
71545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    , p->zDb, p->zName, zName
71555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  );
71565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( zSql ){
71575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = sqlite3_exec(p->db, zSql, 0, 0, 0);
71585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_free(zSql);
71595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
71605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
71615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
71625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
71635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static const sqlite3_module fts2Module = {
71645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* iVersion      */ 0,
71655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xCreate       */ fulltextCreate,
71665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xConnect      */ fulltextConnect,
71675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xBestIndex    */ fulltextBestIndex,
71685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xDisconnect   */ fulltextDisconnect,
71695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xDestroy      */ fulltextDestroy,
71705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xOpen         */ fulltextOpen,
71715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xClose        */ fulltextClose,
71725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xFilter       */ fulltextFilter,
71735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xNext         */ fulltextNext,
71745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xEof          */ fulltextEof,
71755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xColumn       */ fulltextColumn,
71765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xRowid        */ fulltextRowid,
71775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xUpdate       */ fulltextUpdate,
71785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xBegin        */ fulltextBegin,
71795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xSync         */ fulltextSync,
71805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xCommit       */ fulltextCommit,
71815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xRollback     */ fulltextRollback,
71825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xFindFunction */ fulltextFindFunction,
71835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* xRename */       fulltextRename,
71845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)};
71855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
71865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)static void hashDestroy(void *p){
71875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fts2Hash *pHash = (fts2Hash *)p;
71885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3Fts2HashClear(pHash);
71895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3_free(pHash);
71905821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
71915821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
71925821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
71935821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** The fts2 built-in tokenizers - "simple" and "porter" - are implemented
71945821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** in files fts2_tokenizer1.c and fts2_porter.c respectively. The following
71955821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** two forward declarations are for functions declared in these files
71965821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** used to retrieve the respective implementations.
71975821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)**
71985821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Calling sqlite3Fts2SimpleTokenizerModule() sets the value pointed
71995821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** to by the argument to point a the "simple" tokenizer implementation.
72005821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Function ...PorterTokenizerModule() sets *pModule to point to the
72015821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** porter tokenizer/stemmer implementation.
72025821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
72035821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)void sqlite3Fts2SimpleTokenizerModule(sqlite3_tokenizer_module const**ppModule);
72045821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)void sqlite3Fts2PorterTokenizerModule(sqlite3_tokenizer_module const**ppModule);
72055821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)void sqlite3Fts2IcuTokenizerModule(sqlite3_tokenizer_module const**ppModule);
72065821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
72075821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)int sqlite3Fts2InitHashTable(sqlite3 *, fts2Hash *, const char *);
72085821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
72095821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)/*
72105821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** Initialise the fts2 extension. If this extension is built as part
72115821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** of the sqlite library, then this function is called directly by
72125821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** SQLite. If fts2 is built as a dynamically loadable extension, this
72135821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)** function is called by the sqlite3_extension_init() entry point.
72145821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)*/
72155821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)int sqlite3Fts2Init(sqlite3 *db){
72165821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  int rc = SQLITE_OK;
72175821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  fts2Hash *pHash = 0;
72185821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const sqlite3_tokenizer_module *pSimple = 0;
72195821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const sqlite3_tokenizer_module *pPorter = 0;
72205821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const sqlite3_tokenizer_module *pIcu = 0;
72215821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
72225821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3Fts2SimpleTokenizerModule(&pSimple);
72235821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3Fts2PorterTokenizerModule(&pPorter);
72245821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifdef SQLITE_ENABLE_ICU
72255821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3Fts2IcuTokenizerModule(&pIcu);
72265821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
72275821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
72285821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Allocate and initialise the hash-table used to store tokenizers. */
72295821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  pHash = sqlite3_malloc(sizeof(fts2Hash));
72305821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( !pHash ){
72315821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    rc = SQLITE_NOMEM;
72325821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }else{
72335821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3Fts2HashInit(pHash, FTS2_HASH_STRING, 1);
72345821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
72355821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
72365821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Load the built-in tokenizers into the hash table */
72375821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( rc==SQLITE_OK ){
72385821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    if( sqlite3Fts2HashInsert(pHash, "simple", 7, (void *)pSimple)
72395821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     || sqlite3Fts2HashInsert(pHash, "porter", 7, (void *)pPorter)
72405821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)     || (pIcu && sqlite3Fts2HashInsert(pHash, "icu", 4, (void *)pIcu))
72415821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    ){
72425821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      rc = SQLITE_NOMEM;
72435821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    }
72445821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
72455821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
72465821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* Create the virtual table wrapper around the hash-table and overload
72475821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** the two scalar functions. If this is successful, register the
72485821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ** module with sqlite.
72495821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  */
72505821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( SQLITE_OK==rc
72515821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#if GEARS_FTS2_CHANGES && !SQLITE_TEST
72525821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)      /* fts2_tokenizer() disabled for security reasons. */
72535821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#else
72545821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)   && SQLITE_OK==(rc = sqlite3Fts2InitHashTable(db, pHash, "fts2_tokenizer"))
72555821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
72565821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)   && SQLITE_OK==(rc = sqlite3_overload_function(db, "snippet", -1))
72575821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)   && SQLITE_OK==(rc = sqlite3_overload_function(db, "offsets", -1))
72585821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)   && SQLITE_OK==(rc = sqlite3_overload_function(db, "optimize", -1))
72595821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#ifdef SQLITE_TEST
72605821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)   && SQLITE_OK==(rc = sqlite3_overload_function(db, "dump_terms", -1))
72615821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)   && SQLITE_OK==(rc = sqlite3_overload_function(db, "dump_doclist", -1))
72625821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
72635821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  ){
72645821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    return sqlite3_create_module_v2(
72655821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)        db, "fts2", &fts2Module, (void *)pHash, hashDestroy
72665821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    );
72675821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
72685821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
72695821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  /* An error has occurred. Delete the hash table and return the error code. */
72705821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  assert( rc!=SQLITE_OK );
72715821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  if( pHash ){
72725821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3Fts2HashClear(pHash);
72735821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)    sqlite3_free(pHash);
72745821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  }
72755821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return rc;
72765821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
72775821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
72785821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#if !SQLITE_CORE
72795821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)int sqlite3_extension_init(
72805821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  sqlite3 *db,
72815821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  char **pzErrMsg,
72825821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  const sqlite3_api_routines *pApi
72835821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)){
72845821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  SQLITE_EXTENSION_INIT2(pApi)
72855821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)  return sqlite3Fts2Init(db);
72865821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)}
72875821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif
72885821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)
72895821806d5e7f356e8fa4b058a389a808ea183019Torne (Richard Coles)#endif /* !defined(SQLITE_CORE) || defined(SQLITE_ENABLE_FTS2) */
7290