Commit Graph

186 Commits

Author SHA1 Message Date
Sam McCall 329fc143fd [clangd] Query dex index using query-style trigrams, not identifier-style trigrams
llvm-svn: 343453
2018-10-01 10:42:51 +00:00
Eric Liu d5d6a60a78 [clangd] Fix header mapping for std::string. NFC
Some implementation has std::string declared in <iosfwd>.

llvm-svn: 343448
2018-10-01 08:50:49 +00:00
Eric Liu 670c147d83 [clangd] Initial supoprt for cross-namespace global code completion.
Summary:
When no scope qualifier is specified, allow completing index symbols
from any scope and insert proper automatically. This is still experimental and
hidden behind a flag.

Things missing:
- Scope proximity based scoring.
- FuzzyFind supports weighted scopes.

Reviewers: sammccall

Reviewed By: sammccall

Subscribers: kbobyrev, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D52364

llvm-svn: 343248
2018-09-27 18:46:00 +00:00
Eric Liu ee7fe93fa8 [clangd] Add more tracing to index queries. NFC
Reviewers: sammccall

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D52611

llvm-svn: 343247
2018-09-27 18:23:23 +00:00
Kirill Bobyrev ea4f20c6be [clangd] Fix bugs with incorrect memory estimate report
* With the current implementation, `sizeof(std::vector<Chunk>)` is added
twice to the `Dex` memory estimate which is incorrect
* `Dex` logs memory usage estimation before `BackingDataSize` is set and
hence the log report excludes size of the external `SymbolSlab` which is
coupled with `Dex` instance

Reviewed By: ioeric

Differential Revision: https://reviews.llvm.org/D52503

llvm-svn: 343117
2018-09-26 15:06:23 +00:00
Kirill Bobyrev 0cdf629394 [docs] Update PostingList string representation format
Because `PostingList` objects are compressed, it is now impossible to
see elements other than the current one and the documentation doesn't
match implementation anymore.

Reviewed By: ioeric

Differential Revision: https://reviews.llvm.org/D52545

llvm-svn: 343116
2018-09-26 14:59:49 +00:00
Simon Pilgrim 3462e76ba5 Removed extra semicolon to fix Wpedantic. (NFCI).
llvm-svn: 343083
2018-09-26 09:02:45 +00:00
Sam McCall 321d5d4802 [clangd] Extract mapper logic from clangd-indexer into a library.
Summary: Soon we can drop support for MR-via-YAML.
I need to modify some out-of-tree versions to use the library, first.

Reviewers: kadircet

Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits

Differential Revision: https://reviews.llvm.org/D52465

llvm-svn: 343019
2018-09-25 20:02:36 +00:00
Sam McCall e38c7f8d96 [clangd] Fix reversed RIFF/YAML serialization
llvm-svn: 343017
2018-09-25 19:53:33 +00:00
Sam McCall 02d600d267 [clangd] Merge binary + YAML serialization behind a (mostly) common interface.
Summary:
Interface is in one file, implementation in two as they have little in common.
A couple of ad-hoc YAML functions left exposed:
 - symbol -> YAML I expect to keep for tools like dexp
 - YAML -> symbol is used for the MR-style indexer, I think we can eliminate
   this (merge-on-the-fly, else use a different serialization)

Reviewers: kbobyrev

Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D52453

llvm-svn: 342999
2018-09-25 18:06:43 +00:00
Kirill Bobyrev d041f8a9d0 [clangd] NFC: Simplify code, enforce LLVM Coding Standards
For consistency, functional-style code pieces are replaced with their
simple counterparts to improve readability.

Also, file headers are fixed to comply with LLVM Coding Standards.

`static` member of anonymous namespace is not marked `static` anymore,
because it is redundant.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D52466

llvm-svn: 342974
2018-09-25 13:58:48 +00:00
Kirill Bobyrev 69e6388564 [clangd] Fix some buildbots after r342965
Some compilers fail to parse struct default member initializer.

llvm-svn: 342970
2018-09-25 13:14:11 +00:00
Kirill Bobyrev 6c2f5bd0f1 [clangd] Implement VByte PostingList compression
This patch implements Variable-length Byte compression of `PostingList`s
to sacrifice some performance for lower memory consumption.

`PostingList` compression and decompression was extensively tested using
fuzzer for multiple hours and runnning significant number of realistic
`FuzzyFindRequests`. AddressSanitizer and UndefinedBehaviorSanitizer
were used to ensure the correct behaviour.

Performance evaluation was conducted with recent LLVM symbol index (292k
symbols) and the collection of user-recorded queries (7751
`FuzzyFindRequest` JSON dumps):

| Metrics | Before| After | Change (%)
| -----  | -----  | -----   | -----
| Memory consumption (posting lists only), MB  |  54.4 | 23.5 | -60%
| Time to process queries, sec | 7.70 | 9.4 | +25%

Reviewers: sammccall, ioeric

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D52300

llvm-svn: 342965
2018-09-25 11:54:51 +00:00
Sam McCall 3ca9759a21 [clangd] Fix uninit bool in r342888
llvm-svn: 342903
2018-09-24 16:52:48 +00:00
Sam McCall 8fb7bb2482 [clangd] Do bounds checks while reading data, otherwise var-length records are too painful. NFC
llvm-svn: 342888
2018-09-24 14:51:15 +00:00
Kirill Bobyrev 94af0612e0 [clangd] Force Dex to respect symbol collector flags
`Dex` should utilize `FuzzyFindRequest.RestrictForCodeCompletion` flags
and omit symbols not meant for code completion when asked for it.

The measurements below were conducted with setting
`FuzzyFindRequest.RestrictForCodeCompletion` to `true` (so that it's
more realistic). Sadly, the average latency goes down, I suspect that is
mostly because of the empty queries where the number of posting lists is
critical.

| Metrics  | Before | After | Relative difference
| -----  | -----  | -----   | -----
| Cumulative query latency (7000 `FuzzyFindRequest`s over LLVM static index)  | 6182735043 ns    | 7202442053 ns | +16%
| Whole Index size | 81.24 MB    | 81.79 MB | +0.6%

Out of 292252 symbols collected from LLVM codebase 136926 appear to be
restricted for code completion.

Reviewers: ioeric

Differential Revision: https://reviews.llvm.org/D52357

llvm-svn: 342866
2018-09-24 08:45:18 +00:00
Eric Liu c275fb2a5d [clangd] Remember to serialize symbol origin in YAML.
llvm-svn: 342730
2018-09-21 13:04:57 +00:00
Eric Liu 467c5f9ce0 [clangd] Store preamble macros in dynamic index.
Summary:
Pros:
o Loading macros from preamble for every completion is slow (see profile).
o Calculating macro USR is also slow (see profile).
o Sema can provide a lot of macro completion results (e.g. when filter is empty,
60k for some large TUs!).

Cons:
o Slight memory increase in dynamic index (~1%).
o Some extra work during preamble build (should be fine as preamble build and
indexAST is way slower).

Before:
{F7195645}

After:
{F7195646}

Reviewers: ilya-biryukov, sammccall

Reviewed By: sammccall

Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D52078

llvm-svn: 342529
2018-09-19 09:35:04 +00:00
Sam McCall 46b5555844 [clangd] Fix error handling for SymbolID parsing (notably YAML and dexp)
llvm-svn: 342505
2018-09-18 19:00:59 +00:00
Eric Liu 764f461f9c [clangd] Get rid of Decls parameter in indexMainDecls. NFC
It's already available in ParsedAST.

llvm-svn: 342473
2018-09-18 13:35:16 +00:00
Eric Liu 821a116818 [clangd] Merge ClangdServer::DynamicIndex into FileIndex. NFC.
Summary:
FileIndex now provides explicit interfaces for preamble and main file updates.
This avoids growing parameter list when preamble and main symbols diverge
further (e.g. D52078). This also gets rid of the hack in `indexAST` that
inferred main file index based on `TopLevelDecls`.

Also separate `indexMainDecls` from `indexAST`.

Reviewers: sammccall

Reviewed By: sammccall

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D52222

llvm-svn: 342460
2018-09-18 10:30:44 +00:00
Sam McCall 3bf9b6d920 [clangd] dexp tool uses llvm::cl to parse its flags.
Summary:
We can use cl::ResetCommandLineParser() to support different types of
command-lines, as long as we're careful about option lifetimes.
(I tried using subcommands, but the error messages were bad)
I found a mostly-reasonable pattern to isolate the fiddly parts.

Added -scope and -limit flags to the `find` command to demonstrate.
(Note that scope support seems to be broken in dex?)

Fixed symbol lookup to parse symbol IDs.

Caveats:
 - with command help (e.g. `find -help`), you also get some spam
   about required arguments. This is a bug in llvm::cl, which prints
   these to errs() rather than the designated stream.

Reviewers: kbobyrev

Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D51989

llvm-svn: 342456
2018-09-18 09:49:57 +00:00
Eric Liu f736766659 [clangd] Adapt API change after 342451.
llvm-svn: 342452
2018-09-18 08:52:14 +00:00
Eric Liu a57afd091f [clangd] Get rid of AST matchers in SymbolCollector. NFC
Reviewers: ilya-biryukov, kadircet

Subscribers: MaskRay, jkorous, arphaman, cfe-commits

Differential Revision: https://reviews.llvm.org/D52089

llvm-svn: 342362
2018-09-17 07:43:49 +00:00
Kirill Bobyrev 249c5864cf [clangd] Introduce PostingList interface
This patch abstracts `PostingList` interface and reuses existing
implementation. It will be used later to test different `PostingList`
representations.

No functionality change is introduced, this patch is mostly refactoring
so that the following patches could focus on functionality while not
being too hard to review.

Reviewed By: sammccall, ioeric

Differential Revision: https://reviews.llvm.org/D51982

llvm-svn: 342155
2018-09-13 17:11:03 +00:00
Kirill Bobyrev bd72b08eb3 [clangd] Fix Dexp build
%s/MaxCandidateCount/Limit/g after rL342138.

llvm-svn: 342143
2018-09-13 15:35:55 +00:00
Kirill Bobyrev e6dd0806c7 [clangd] Cleanup FuzzyFindRequest filtering limit semantics
As discussed during D51860 review, it is better to use `llvm::Optional`
here as it has clear semantics which reflect intended behavior.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D52028

llvm-svn: 342138
2018-09-13 14:27:03 +00:00
Kirill Bobyrev d9f33b129c [clangd] Don't create child AND and OR iterators with one posting list
`AND( AND( Child ) ... )` -> `AND( Child ... )`
`AND( OR( Child ) ... )` -> `AND( Child ... )`

This simple optimization results in 5-6% performance improvement in the
benchmark with 2000 serialized `FuzzyFindRequest`s.

Reviewed By: ilya-biryukov

Differential Revision: https://reviews.llvm.org/D52016

llvm-svn: 342124
2018-09-13 10:02:48 +00:00
Heejin Ahn 386d272387 [clangd] Add missing clangBasic target_link_libraries
Without this, builds with `-DSHARED_LIB=ON` fail.

llvm-svn: 342037
2018-09-12 09:40:13 +00:00
Kirill Bobyrev e1e19c7b75 [clangd] Implement a Proof-of-Concept tool for symbol index exploration
Reviewed By: sammccall, ilya-biryukov

Differential Revision: https://reviews.llvm.org/D51628

llvm-svn: 342025
2018-09-12 07:32:54 +00:00
Kirill Bobyrev 0dee397e06 [clangd] NFC: Use uint32_t for FuzzyFindRequest limits
Reviewed By: ioeric

Differential Revision: https://reviews.llvm.org/D51860

llvm-svn: 341921
2018-09-11 10:31:38 +00:00
Kirill Bobyrev 5faf8a3d84 [clangd] Unbreak buildbots after r341802
Solution: use std::move when returning result from toJSON(...).
llvm-svn: 341832
2018-09-10 14:31:38 +00:00
Kirill Bobyrev 09f00dcf69 [clangd] Implement FuzzyFindRequest JSON (de)serialization
JSON (de)serialization of `FuzzyFindRequest` might be useful for both
D51090 and D51628. Also, this allows precise logging of the fuzzy find
requests.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D51852

llvm-svn: 341802
2018-09-10 11:51:05 +00:00
Kirill Bobyrev 38a889c185 [clangd] Add symbol slab size to index memory consumption estimates
Currently, `SymbolIndex::estimateMemoryUsage()` returns the "overhead"
estimate, i.e. the estimate of the Index data structure excluding
backing data (such as Symbol Slab and Reference Slab). This patch
propagates information about paired data size where necessary.

Reviewed By: ioeric, sammccall

Differential Revision: https://reviews.llvm.org/D51539

llvm-svn: 341800
2018-09-10 11:46:07 +00:00
Kirill Bobyrev 5abe478a3d [clangd] NFC: Rename DexIndex to Dex
Also, cleanup some redundant includes.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D51774

llvm-svn: 341784
2018-09-10 08:23:53 +00:00
Kirill Bobyrev 59491a1fa9 [clangd] Make advanceTo() faster on Posting Lists
If the current element is already beyond advanceTo()'s DocID, just
return instead of doing binary search. This simple optimization saves up
to 6-7% performance,

Reviewed By: ilya-biryukov

Differential Revision: https://reviews.llvm.org/D51802

llvm-svn: 341781
2018-09-10 07:57:28 +00:00
Eric Liu f76886859f [clangd] Canonicalize include paths in clangd.
Get rid of "../"  and "../../".

llvm-svn: 341645
2018-09-07 09:40:36 +00:00
Eric Liu 6df66001ee [clangd] Add "Deprecated" field to Symbol and CodeCompletion.
Summary: Also set "deprecated" field in LSP CompletionItem.

Reviewers: sammccall, kadircet

Reviewed By: sammccall

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, cfe-commits

Differential Revision: https://reviews.llvm.org/D51724

llvm-svn: 341576
2018-09-06 18:52:26 +00:00
Kirill Bobyrev 049b2d4345 [clangd] Fix Dex initialization
This patch sets URI schemes of Dex to SymbolCollector's default schemes
in case callers tried to pass empty list of schemes. This was the case
for initialization in Clangd main and was a reason of incorrect
behavior.

Also, it fixes a bug with missed `continue;` after spotting invalid URI
scheme conversion.

llvm-svn: 341552
2018-09-06 15:10:10 +00:00
Kirill Bobyrev afbf31854d [clangd] NFC: Use TopN instead of std::priority_queue
Quality.cpp defines a structure for convenient storage of Top N items,
it should be used instead of the `std::priority_queue` with slightly
obscure semantics.

This patch does not affect functionality.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D51676

llvm-svn: 341544
2018-09-06 13:15:03 +00:00
Kirill Bobyrev e4ee0213d4 [clangd] NFC: mark single-parameter constructors explicit
Code health: prevent implicit conversions to user-defined types.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D51690

llvm-svn: 341543
2018-09-06 13:06:04 +00:00
Kirill Bobyrev 19a9461e5f [clangd] Implement proximity path boosting for Dex
This patch introduces `PathURI` Search Token kind and utilizes it to
uprank symbols which are defined in files with small distance to the
directory where the fuzzy find request is coming from (e.g. files user
is editing).

Reviewed By: ioeric

Reviewers: ioeric, sammccall

Differential Revision: https://reviews.llvm.org/D51481

llvm-svn: 341542
2018-09-06 12:54:43 +00:00
Eric Liu d25f1214a8 [clangd] Set SymbolID for sema macros so that they can be merged with index macros.
Reviewers: sammccall

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D51688

llvm-svn: 341534
2018-09-06 09:59:37 +00:00
Sam McCall e4fa7b8418 [clangd] make zlib compression optional for binary format
llvm-svn: 341465
2018-09-05 13:17:47 +00:00
Sam McCall d85264bf53 [clangd] Fix buildbot failures on older compilers from r341375
llvm-svn: 341451
2018-09-05 07:52:49 +00:00
Sam McCall 76c4c3af52 [clangd] Load static index asynchronously, add tracing.
Summary:
Like D51475 but simplified based on recent patches.
While here, clarify that loadIndex() takes a filename, not file content.

Reviewers: ioeric

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D51638

llvm-svn: 341376
2018-09-04 16:19:40 +00:00
Sam McCall 50f3631057 [clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
  llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.

The format is a RIFF container (chunks of (type, size, data)) with:
 - a compressed string table
 - simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.

There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.

Alternatives considered:
 - compressed YAML or JSON: bulky and slow to load
 - llvm bitstream: confusing model and libraries are hard to use. My attempt
   produced slightly larger files, and the code was longer and slower.
 - protobuf or similar: would be really nice (esp for back-compat) but the
   dependency is a big hassle
 - ad-hoc binary format without a container: it seems clear we're going
   to add posting lists and occurrences here, and that they will benefit
   from sharing a string table. The container makes it easy to debug
   these pieces in isolation, and make them optional.

Reviewers: ioeric

Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D51585

llvm-svn: 341375
2018-09-04 16:16:50 +00:00
Kirill Bobyrev cc8b507a60 [clangd] NFC: Change quality type to float
Reviewed by: sammccall

Differential Revision: https://reviews.llvm.org/D51636

llvm-svn: 341374
2018-09-04 15:45:56 +00:00
Kirill Bobyrev d5bc65444c [clangd] Move buildStaticIndex() to SymbolYAML
`buildStaticIndex()` is used by two other tools that I'm building, now
it's useful outside of `tool/ClangdMain.cpp`.

Also, slightly refactor the code while moving it to the different source
file.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D51626

llvm-svn: 341369
2018-09-04 15:10:40 +00:00
Sam McCall b0138317d6 [clangd] SymbolOccurrences -> Refs and cleanup
Summary:
A few things that I noticed while merging the SwapIndex patch:
 - SymbolOccurrences and particularly SymbolOccurrenceSlab are unwieldy names,
   and these names appear *a lot*. Ref, RefSlab, etc seem clear enough
   and read/format much better.
 - The asymmetry between SymbolSlab and RefSlab (build() vs freeze()) is
   confusing and irritating, and doesn't even save much code.
   Avoiding RefSlab::Builder was my idea, but it was a bad one; add it.
 - DenseMap<SymbolID, ArrayRef<Ref>> seems like a reasonable compromise for
   constructing MemIndex - and means many less wasted allocations than the
   current DenseMap<SymbolID, vector<Ref*>> for FileIndex, and none for
   slabs.
 - RefSlab::find() is not actually used for anything, so we can throw
   away the DenseMap and keep the representation much more compact.
 - A few naming/consistency fixes: e.g. Slabs,Refs -> Symbols,Refs.

Reviewers: ioeric

Subscribers: ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D51605

llvm-svn: 341368
2018-09-04 14:39:56 +00:00