JavaScript's scopeCompletionSource yields properties that are inaccessable with the `obj.prop` syntax

using these extensions

import { basicSetup } from "codemirror"
import { javascript, javascriptLanguage, scopeCompletionSource } from "@codemirror/lang-javascript"
const extensions = [
  basicSetup,
  javascript(),
  javascriptLanguage.data.of({
    autocomplete: scopeCompletionSource(window),
  }),
]

When I type “location.href.” in the editor, the autocompletion gives

0
1
10
11
12
13
14

etc (which are string indexes), and if I type “RegExp.” instead it yields

$_
$'
$*
$&
$`
$+

most of which would be invalid if used as property names in the usual obj.prop syntax. They need the bracket syntax like location.href[0] or RegExp["$`"]. Therefore I don’t think these properties should be suggested after typing the dot, only valid identifiers should be given.

Perhaps I should post this in the github issues instead ?

This patch should fix that.

Thanks but there may still be a problem. People may name their properties in other languages (I don’t do that my self, though), and often /^[a-zA-Z_$][\w$]*$/ just isn’t enough in that case. However, to accurately detect if a string is a valid identifier requires either a regular expression with the Unicode(u) flag, or a lot of unicode data, it seems.

After consulting the ECMAScript spec, I wrote a (JavaScript) RegExp that matches only valid identifiers, keywords and reserved words:

/^[\p{ID_Start}$_][\p{ID_Continue}$\u200c\u200d]*$/u

This will be much harder to do without the u flag, as ID_Start and ID_Continue are very complicated. They are defined in the Unicode Character Database’s DerivedCoreProperties.txt (Ctrl+F for “Derived Property: ID_Start”)

If it is necessary to work around the complexity, we may as well just match an inaccurate and much wider range of characters, and exclude invalid characters that are in ASCII, such as:

/^[a-zA-Z_$\xaa-\uffdc][$\w\xaa-\uffdc]*$/

(This pattern allows \u{10000}-\u{10ffff} because the regexp matches by UTF-16 due to lack of u flag and the ranges include UTF-16 surrogates.)

\p isn’t universally supported yet, but using a wide range like that seems reasonable.

1 Like