Coverage for C:\leo.repo\leo-editor\leo\core\leoAst.py : 99%

Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
1# -*- coding: utf-8 -*-
2#@+leo-ver=5-thin
3#@+node:ekr.20141012064706.18389: * @file leoAst.py
4#@@first
5# This file is part of Leo: https://leoeditor.com
6# Leo's copyright notice is based on the MIT license: http://leoeditor.com/license.html
7#@+<< docstring >>
8#@+node:ekr.20200113081838.1: ** << docstring >> (leoAst.py)
9"""
10leoAst.py: This file does not depend on Leo in any way.
12The classes in this file unify python's token-based and ast-based worlds by
13creating two-way links between tokens in the token list and ast nodes in
14the parse tree. For more details, see the "Overview" section below.
17**Stand-alone operation**
19usage:
20 leoAst.py --help
21 leoAst.py [--fstringify | --fstringify-diff | --orange | --orange-diff] PATHS
22 leoAst.py --py-cov [ARGS]
23 leoAst.py --pytest [ARGS]
24 leoAst.py --unittest [ARGS]
26examples:
27 --py-cov "-f TestOrange"
28 --pytest "-f TestOrange"
29 --unittest TestOrange
31positional arguments:
32 PATHS directory or list of files
34optional arguments:
35 -h, --help show this help message and exit
36 --fstringify leonine fstringify
37 --fstringify-diff show fstringify diff
38 --orange leonine Black
39 --orange-diff show orange diff
40 --py-cov run pytest --cov on leoAst.py
41 --pytest run pytest on leoAst.py
42 --unittest run unittest on leoAst.py
45**Overview**
47leoAst.py unifies python's token-oriented and ast-oriented worlds.
49leoAst.py defines classes that create two-way links between tokens
50created by python's tokenize module and parse tree nodes created by
51python's ast module:
53The Token Order Generator (TOG) class quickly creates the following
54links:
56- An *ordered* children array from each ast node to its children.
58- A parent link from each ast.node to its parent.
60- Two-way links between tokens in the token list, a list of Token
61 objects, and the ast nodes in the parse tree:
63 - For each token, token.node contains the ast.node "responsible" for
64 the token.
66 - For each ast node, node.first_i and node.last_i are indices into
67 the token list. These indices give the range of tokens that can be
68 said to be "generated" by the ast node.
70Once the TOG class has inserted parent/child links, the Token Order
71Traverser (TOT) class traverses trees annotated with parent/child
72links extremely quickly.
75**Applicability and importance**
77Many python developers will find asttokens meets all their needs.
78asttokens is well documented and easy to use. Nevertheless, two-way
79links are significant additions to python's tokenize and ast modules:
81- Links from tokens to nodes are assigned to the nearest possible ast
82 node, not the nearest statement, as in asttokens. Links can easily
83 be reassigned, if desired.
85- The TOG and TOT classes are intended to be the foundation of tools
86 such as fstringify and black.
88- The TOG class solves real problems, such as:
89 https://stackoverflow.com/questions/16748029/
91**Known bug**
93This file has no known bugs *except* for Python version 3.8.
95For Python 3.8, syncing tokens will fail for function call such as:
97 f(1, x=2, *[3, 4], y=5)
99that is, for calls where keywords appear before non-keyword args.
101There are no plans to fix this bug. The workaround is to use Python version
1023.9 or above.
105**Figures of merit**
107Simplicity: The code consists primarily of a set of generators, one
108for every kind of ast node.
110Speed: The TOG creates two-way links between tokens and ast nodes in
111roughly the time taken by python's tokenize.tokenize and ast.parse
112library methods. This is substantially faster than the asttokens,
113black or fstringify tools. The TOT class traverses trees annotated
114with parent/child links even more quickly.
116Memory: The TOG class makes no significant demands on python's
117resources. Generators add nothing to python's call stack.
118TOG.node_stack is the only variable-length data. This stack resides in
119python's heap, so its length is unimportant. In the worst case, it
120might contain a few thousand entries. The TOT class uses no
121variable-length data at all.
123**Links**
125Leo...
126Ask for help: https://groups.google.com/forum/#!forum/leo-editor
127Report a bug: https://github.com/leo-editor/leo-editor/issues
128leoAst.py docs: http://leoeditor.com/appendices.html#leoast-py
130Other tools...
131asttokens: https://pypi.org/project/asttokens
132black: https://pypi.org/project/black/
133fstringify: https://pypi.org/project/fstringify/
135Python modules...
136tokenize.py: https://docs.python.org/3/library/tokenize.html
137ast.py https://docs.python.org/3/library/ast.html
139**Studying this file**
141I strongly recommend that you use Leo when studying this code so that you
142will see the file's intended outline structure.
144Without Leo, you will see only special **sentinel comments** that create
145Leo's outline structure. These comments have the form::
147 `#@<comment-kind>:<user-id>.<timestamp>.<number>: <outline-level> <headline>`
148"""
149#@-<< docstring >>
150#@+<< imports >>
151#@+node:ekr.20200105054219.1: ** << imports >> (leoAst.py)
152import argparse
153import ast
154import codecs
155import difflib
156import glob
157import io
158import os
159import re
160import sys
161import textwrap
162import tokenize
163import traceback
164from typing import List, Optional
165#@-<< imports >>
166v1, v2, junk1, junk2, junk3 = sys.version_info
167py_version = (v1, v2)
169# Async tokens exist only in Python 3.5 and 3.6.
170# https://docs.python.org/3/library/token.html
171has_async_tokens = (3, 5) <= py_version <= (3, 6)
173# has_position_only_params = (v1, v2) >= (3, 8)
174#@+others
175#@+node:ekr.20191226175251.1: ** class LeoGlobals
176#@@nosearch
179class LeoGlobals: # pragma: no cover
180 """
181 Simplified version of functions in leoGlobals.py.
182 """
184 total_time = 0.0 # For unit testing.
186 #@+others
187 #@+node:ekr.20191226175903.1: *3* LeoGlobals.callerName
188 def callerName(self, n):
189 """Get the function name from the call stack."""
190 try:
191 f1 = sys._getframe(n)
192 code1 = f1.f_code
193 return code1.co_name
194 except Exception:
195 return ''
196 #@+node:ekr.20191226175426.1: *3* LeoGlobals.callers
197 def callers(self, n=4):
198 """
199 Return a string containing a comma-separated list of the callers
200 of the function that called g.callerList.
201 """
202 i, result = 2, []
203 while True:
204 s = self.callerName(n=i)
205 if s:
206 result.append(s)
207 if not s or len(result) >= n:
208 break
209 i += 1
210 return ','.join(reversed(result))
211 #@+node:ekr.20191226190709.1: *3* leoGlobals.es_exception & helper
212 def es_exception(self, full=True):
213 typ, val, tb = sys.exc_info()
214 for line in traceback.format_exception(typ, val, tb):
215 print(line)
216 fileName, n = self.getLastTracebackFileAndLineNumber()
217 return fileName, n
218 #@+node:ekr.20191226192030.1: *4* LeoGlobals.getLastTracebackFileAndLineNumber
219 def getLastTracebackFileAndLineNumber(self):
220 typ, val, tb = sys.exc_info()
221 if typ == SyntaxError:
222 # IndentationError is a subclass of SyntaxError.
223 # SyntaxError *does* have 'filename' and 'lineno' attributes.
224 return val.filename, val.lineno # type:ignore
225 #
226 # Data is a list of tuples, one per stack entry.
227 # The tuples have the form (filename, lineNumber, functionName, text).
228 data = traceback.extract_tb(tb)
229 item = data[-1] # Get the item at the top of the stack.
230 filename, n, functionName, text = item
231 return filename, n
232 #@+node:ekr.20200220065737.1: *3* LeoGlobals.objToString
233 def objToString(self, obj, tag=None):
234 """Simplified version of g.printObj."""
235 result = []
236 if tag:
237 result.append(f"{tag}...")
238 if isinstance(obj, str):
239 obj = g.splitLines(obj)
240 if isinstance(obj, list):
241 result.append('[')
242 for z in obj:
243 result.append(f" {z!r}")
244 result.append(']')
245 elif isinstance(obj, tuple):
246 result.append('(')
247 for z in obj:
248 result.append(f" {z!r}")
249 result.append(')')
250 else:
251 result.append(repr(obj))
252 result.append('')
253 return '\n'.join(result)
254 #@+node:ekr.20191226190425.1: *3* LeoGlobals.plural
255 def plural(self, obj):
256 """Return "s" or "" depending on n."""
257 if isinstance(obj, (list, tuple, str)):
258 n = len(obj)
259 else:
260 n = obj
261 return '' if n == 1 else 's'
262 #@+node:ekr.20191226175441.1: *3* LeoGlobals.printObj
263 def printObj(self, obj, tag=None):
264 """Simplified version of g.printObj."""
265 print(self.objToString(obj, tag))
266 #@+node:ekr.20191226190131.1: *3* LeoGlobals.splitLines
267 def splitLines(self, s):
268 """Split s into lines, preserving the number of lines and
269 the endings of all lines, including the last line."""
270 # g.stat()
271 if s:
272 return s.splitlines(True)
273 # This is a Python string function!
274 return []
275 #@+node:ekr.20191226190844.1: *3* LeoGlobals.toEncodedString
276 def toEncodedString(self, s, encoding='utf-8'):
277 """Convert unicode string to an encoded string."""
278 if not isinstance(s, str):
279 return s
280 try:
281 s = s.encode(encoding, "strict")
282 except UnicodeError:
283 s = s.encode(encoding, "replace")
284 print(f"toEncodedString: Error converting {s!r} to {encoding}")
285 return s
286 #@+node:ekr.20191226190006.1: *3* LeoGlobals.toUnicode
287 def toUnicode(self, s, encoding='utf-8'):
288 """Convert bytes to unicode if necessary."""
289 tag = 'g.toUnicode'
290 if isinstance(s, str):
291 return s
292 if not isinstance(s, bytes):
293 print(f"{tag}: bad s: {s!r}")
294 return ''
295 b: bytes = s
296 try:
297 s2 = b.decode(encoding, 'strict')
298 except(UnicodeDecodeError, UnicodeError):
299 s2 = b.decode(encoding, 'replace')
300 print(f"{tag}: unicode error. encoding: {encoding!r}, s2:\n{s2!r}")
301 g.trace(g.callers())
302 except Exception:
303 g.es_exception()
304 print(f"{tag}: unexpected error! encoding: {encoding!r}, s2:\n{s2!r}")
305 g.trace(g.callers())
306 return s2
307 #@+node:ekr.20191226175436.1: *3* LeoGlobals.trace
308 def trace(self, *args):
309 """Print a tracing message."""
310 # Compute the caller name.
311 try:
312 f1 = sys._getframe(1)
313 code1 = f1.f_code
314 name = code1.co_name
315 except Exception:
316 name = ''
317 print(f"{name}: {' '.join(str(z) for z in args)}")
318 #@+node:ekr.20191226190241.1: *3* LeoGlobals.truncate
319 def truncate(self, s, n):
320 """Return s truncated to n characters."""
321 if len(s) <= n:
322 return s
323 s2 = s[: n - 3] + f"...({len(s)})"
324 return s2 + '\n' if s.endswith('\n') else s2
325 #@-others
326#@+node:ekr.20200702114522.1: ** leoAst.py: top-level commands
327#@+node:ekr.20200702114557.1: *3* command: fstringify_command
328def fstringify_command(files):
329 """
330 Entry point for --fstringify.
332 Fstringify the given file, overwriting the file.
333 """
334 for filename in files: # pragma: no cover
335 if os.path.exists(filename):
336 print(f"fstringify {filename}")
337 Fstringify().fstringify_file_silent(filename)
338 else:
339 print(f"file not found: {filename}")
340#@+node:ekr.20200702121222.1: *3* command: fstringify_diff_command
341def fstringify_diff_command(files):
342 """
343 Entry point for --fstringify-diff.
345 Print the diff that would be produced by fstringify.
346 """
347 for filename in files: # pragma: no cover
348 if os.path.exists(filename):
349 print(f"fstringify-diff {filename}")
350 Fstringify().fstringify_file_diff(filename)
351 else:
352 print(f"file not found: {filename}")
353#@+node:ekr.20200702115002.1: *3* command: orange_command
354def orange_command(files):
356 for filename in files: # pragma: no cover
357 if os.path.exists(filename):
358 print(f"orange {filename}")
359 Orange().beautify_file(filename)
360 else:
361 print(f"file not found: {filename}")
362#@+node:ekr.20200702121315.1: *3* command: orange_diff_command
363def orange_diff_command(files):
365 for filename in files: # pragma: no cover
366 if os.path.exists(filename):
367 print(f"orange-diff {filename}")
368 Orange().beautify_file_diff(filename)
369 else:
370 print(f"file not found: {filename}")
371#@+node:ekr.20160521104628.1: ** leoAst.py: top-level utils
372if 1: # pragma: no cover
373 #@+others
374 #@+node:ekr.20200702102239.1: *3* function: main (leoAst.py)
375 def main():
376 """Run commands specified by sys.argv."""
377 description = textwrap.dedent("""\
378 leo-editor/leo/unittests/core/test_leoAst.py contains unit tests (100% coverage).
379 """)
380 parser = argparse.ArgumentParser(description=description, formatter_class=argparse.RawTextHelpFormatter)
381 parser.add_argument('PATHS', nargs='*', help='directory or list of files')
382 group = parser.add_mutually_exclusive_group(required=False) # Don't require any args.
383 add = group.add_argument
384 add('--fstringify', dest='f', action='store_true', help='leonine fstringify')
385 add('--fstringify-diff', dest='fd', action='store_true', help='show fstringify diff')
386 add('--orange', dest='o', action='store_true', help='leonine Black')
387 add('--orange-diff', dest='od', action='store_true', help='show orange diff')
388 args = parser.parse_args()
389 files = args.PATHS
390 if len(files) == 1 and os.path.isdir(files[0]):
391 files = glob.glob(f"{files[0]}{os.sep}*.py")
392 if args.f:
393 fstringify_command(files)
394 if args.fd:
395 fstringify_diff_command(files)
396 if args.o:
397 orange_command(files)
398 if args.od:
399 orange_diff_command(files)
400 #@+node:ekr.20200107114409.1: *3* functions: reading & writing files
401 #@+node:ekr.20200218071822.1: *4* function: regularize_nls
402 def regularize_nls(s):
403 """Regularize newlines within s."""
404 return s.replace('\r\n', '\n').replace('\r', '\n')
405 #@+node:ekr.20200106171502.1: *4* function: get_encoding_directive
406 encoding_pattern = re.compile(r'^[ \t\f]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)')
407 # This is the pattern in PEP 263.
409 def get_encoding_directive(bb):
410 """
411 Get the encoding from the encoding directive at the start of a file.
413 bb: The bytes of the file.
415 Returns the codec name, or 'UTF-8'.
417 Adapted from pyzo. Copyright 2008 to 2020 by Almar Klein.
418 """
419 for line in bb.split(b'\n', 2)[:2]:
420 # Try to make line a string
421 try:
422 line2 = line.decode('ASCII').strip()
423 except Exception:
424 continue
425 # Does the line match the PEP 263 pattern?
426 m = encoding_pattern.match(line2)
427 if not m:
428 continue
429 # Is it a known encoding? Correct the name if it is.
430 try:
431 c = codecs.lookup(m.group(1))
432 return c.name
433 except Exception:
434 pass
435 return 'UTF-8'
436 #@+node:ekr.20200103113417.1: *4* function: read_file
437 def read_file(filename, encoding='utf-8'):
438 """
439 Return the contents of the file with the given name.
440 Print an error message and return None on error.
441 """
442 tag = 'read_file'
443 try:
444 # Translate all newlines to '\n'.
445 with open(filename, 'r', encoding=encoding) as f:
446 s = f.read()
447 return regularize_nls(s)
448 except Exception:
449 print(f"{tag}: can not read {filename}")
450 return None
451 #@+node:ekr.20200106173430.1: *4* function: read_file_with_encoding
452 def read_file_with_encoding(filename):
453 """
454 Read the file with the given name, returning (e, s), where:
456 s is the string, converted to unicode, or '' if there was an error.
458 e is the encoding of s, computed in the following order:
460 - The BOM encoding if the file starts with a BOM mark.
461 - The encoding given in the # -*- coding: utf-8 -*- line.
462 - The encoding given by the 'encoding' keyword arg.
463 - 'utf-8'.
464 """
465 # First, read the file.
466 tag = 'read_with_encoding'
467 try:
468 with open(filename, 'rb') as f:
469 bb = f.read()
470 except Exception:
471 print(f"{tag}: can not read {filename}")
472 if not bb:
473 return 'UTF-8', ''
474 # Look for the BOM.
475 e, bb = strip_BOM(bb)
476 if not e:
477 # Python's encoding comments override everything else.
478 e = get_encoding_directive(bb)
479 s = g.toUnicode(bb, encoding=e)
480 s = regularize_nls(s)
481 return e, s
482 #@+node:ekr.20200106174158.1: *4* function: strip_BOM
483 def strip_BOM(bb):
484 """
485 bb must be the bytes contents of a file.
487 If bb starts with a BOM (Byte Order Mark), return (e, bb2), where:
489 - e is the encoding implied by the BOM.
490 - bb2 is bb, stripped of the BOM.
492 If there is no BOM, return (None, bb)
493 """
494 assert isinstance(bb, bytes), bb.__class__.__name__
495 table = (
496 # Test longer bom's first.
497 (4, 'utf-32', codecs.BOM_UTF32_BE),
498 (4, 'utf-32', codecs.BOM_UTF32_LE),
499 (3, 'utf-8', codecs.BOM_UTF8),
500 (2, 'utf-16', codecs.BOM_UTF16_BE),
501 (2, 'utf-16', codecs.BOM_UTF16_LE),
502 )
503 for n, e, bom in table:
504 assert len(bom) == n
505 if bom == bb[: len(bom)]:
506 return e, bb[len(bom) :]
507 return None, bb
508 #@+node:ekr.20200103163100.1: *4* function: write_file
509 def write_file(filename, s, encoding='utf-8'):
510 """
511 Write the string s to the file whose name is given.
513 Handle all exeptions.
515 Before calling this function, the caller should ensure
516 that the file actually has been changed.
517 """
518 try:
519 # Write the file with platform-dependent newlines.
520 with open(filename, 'w', encoding=encoding) as f:
521 f.write(s)
522 except Exception as e:
523 g.trace(f"Error writing {filename}\n{e}")
524 #@+node:ekr.20200113154120.1: *3* functions: tokens
525 #@+node:ekr.20191223093539.1: *4* function: find_anchor_token
526 def find_anchor_token(node, global_token_list):
527 """
528 Return the anchor_token for node, a token such that token.node == node.
530 The search starts at node, and then all the usual child nodes.
531 """
533 node1 = node
535 def anchor_token(node):
536 """Return the anchor token in node.token_list"""
537 # Careful: some tokens in the token list may have been killed.
538 for token in get_node_token_list(node, global_token_list):
539 if is_ancestor(node1, token):
540 return token
541 return None
543 # This table only has to cover fields for ast.Nodes that
544 # won't have any associated token.
546 fields = (
547 # Common...
548 'elt', 'elts', 'body', 'value',
549 # Less common...
550 'dims', 'ifs', 'names', 's',
551 'test', 'values', 'targets',
552 )
553 while node:
554 # First, try the node itself.
555 token = anchor_token(node)
556 if token:
557 return token
558 # Second, try the most common nodes w/o token_lists:
559 if isinstance(node, ast.Call):
560 node = node.func
561 elif isinstance(node, ast.Tuple):
562 node = node.elts # type:ignore
563 # Finally, try all other nodes.
564 else:
565 # This will be used rarely.
566 for field in fields:
567 node = getattr(node, field, None)
568 if node:
569 token = anchor_token(node)
570 if token:
571 return token
572 else:
573 break
574 return None
575 #@+node:ekr.20191231160225.1: *4* function: find_paren_token (changed signature)
576 def find_paren_token(i, global_token_list):
577 """Return i of the next paren token, starting at tokens[i]."""
578 while i < len(global_token_list):
579 token = global_token_list[i]
580 if token.kind == 'op' and token.value in '()':
581 return i
582 if is_significant_token(token):
583 break
584 i += 1
585 return None
586 #@+node:ekr.20200113110505.4: *4* function: get_node_tokens_list
587 def get_node_token_list(node, global_tokens_list):
588 """
589 tokens_list must be the global tokens list.
590 Return the tokens assigned to the node, or [].
591 """
592 i = getattr(node, 'first_i', None)
593 j = getattr(node, 'last_i', None)
594 return [] if i is None else global_tokens_list[i : j + 1]
595 #@+node:ekr.20191124123830.1: *4* function: is_significant & is_significant_token
596 def is_significant(kind, value):
597 """
598 Return True if (kind, value) represent a token that can be used for
599 syncing generated tokens with the token list.
600 """
601 # Making 'endmarker' significant ensures that all tokens are synced.
602 return (
603 kind in ('async', 'await', 'endmarker', 'name', 'number', 'string') or
604 kind == 'op' and value not in ',;()')
606 def is_significant_token(token):
607 """Return True if the given token is a syncronizing token"""
608 return is_significant(token.kind, token.value)
609 #@+node:ekr.20191224093336.1: *4* function: match_parens
610 def match_parens(filename, i, j, tokens):
611 """Match parens in tokens[i:j]. Return the new j."""
612 if j >= len(tokens):
613 return len(tokens)
614 # Calculate paren level...
615 level = 0
616 for n in range(i, j + 1):
617 token = tokens[n]
618 if token.kind == 'op' and token.value == '(':
619 level += 1
620 if token.kind == 'op' and token.value == ')':
621 if level == 0:
622 break
623 level -= 1
624 # Find matching ')' tokens *after* j.
625 if level > 0:
626 while level > 0 and j + 1 < len(tokens):
627 token = tokens[j + 1]
628 if token.kind == 'op' and token.value == ')':
629 level -= 1
630 elif token.kind == 'op' and token.value == '(':
631 level += 1
632 elif is_significant_token(token):
633 break
634 j += 1
635 if level != 0: # pragma: no cover.
636 line_n = tokens[i].line_number
637 raise AssignLinksError(
638 f"\n"
639 f"Unmatched parens: level={level}\n"
640 f" file: {filename}\n"
641 f" line: {line_n}\n")
642 return j
643 #@+node:ekr.20191223053324.1: *4* function: tokens_for_node
644 def tokens_for_node(filename, node, global_token_list):
645 """Return the list of all tokens descending from node."""
646 # Find any token descending from node.
647 token = find_anchor_token(node, global_token_list)
648 if not token:
649 if 0: # A good trace for debugging.
650 print('')
651 g.trace('===== no tokens', node.__class__.__name__)
652 return []
653 assert is_ancestor(node, token)
654 # Scan backward.
655 i = first_i = token.index
656 while i >= 0:
657 token2 = global_token_list[i - 1]
658 if getattr(token2, 'node', None):
659 if is_ancestor(node, token2):
660 first_i = i - 1
661 else:
662 break
663 i -= 1
664 # Scan forward.
665 j = last_j = token.index
666 while j + 1 < len(global_token_list):
667 token2 = global_token_list[j + 1]
668 if getattr(token2, 'node', None):
669 if is_ancestor(node, token2):
670 last_j = j + 1
671 else:
672 break
673 j += 1
674 last_j = match_parens(filename, first_i, last_j, global_token_list)
675 results = global_token_list[first_i : last_j + 1]
676 return results
677 #@+node:ekr.20200101030236.1: *4* function: tokens_to_string
678 def tokens_to_string(tokens):
679 """Return the string represented by the list of tokens."""
680 if tokens is None:
681 # This indicates an internal error.
682 print('')
683 g.trace('===== token list is None ===== ')
684 print('')
685 return ''
686 return ''.join([z.to_string() for z in tokens])
687 #@+node:ekr.20191231072039.1: *3* functions: utils...
688 # General utility functions on tokens and nodes.
689 #@+node:ekr.20191119085222.1: *4* function: obj_id
690 def obj_id(obj):
691 """Return the last four digits of id(obj), for dumps & traces."""
692 return str(id(obj))[-4:]
693 #@+node:ekr.20191231060700.1: *4* function: op_name
694 #@@nobeautify
696 # https://docs.python.org/3/library/ast.html
698 _op_names = {
699 # Binary operators.
700 'Add': '+',
701 'BitAnd': '&',
702 'BitOr': '|',
703 'BitXor': '^',
704 'Div': '/',
705 'FloorDiv': '//',
706 'LShift': '<<',
707 'MatMult': '@', # Python 3.5.
708 'Mod': '%',
709 'Mult': '*',
710 'Pow': '**',
711 'RShift': '>>',
712 'Sub': '-',
713 # Boolean operators.
714 'And': ' and ',
715 'Or': ' or ',
716 # Comparison operators
717 'Eq': '==',
718 'Gt': '>',
719 'GtE': '>=',
720 'In': ' in ',
721 'Is': ' is ',
722 'IsNot': ' is not ',
723 'Lt': '<',
724 'LtE': '<=',
725 'NotEq': '!=',
726 'NotIn': ' not in ',
727 # Context operators.
728 'AugLoad': '<AugLoad>',
729 'AugStore': '<AugStore>',
730 'Del': '<Del>',
731 'Load': '<Load>',
732 'Param': '<Param>',
733 'Store': '<Store>',
734 # Unary operators.
735 'Invert': '~',
736 'Not': ' not ',
737 'UAdd': '+',
738 'USub': '-',
739 }
741 def op_name(node):
742 """Return the print name of an operator node."""
743 class_name = node.__class__.__name__
744 assert class_name in _op_names, repr(class_name)
745 return _op_names[class_name].strip()
746 #@+node:ekr.20200107114452.1: *3* node/token creators...
747 #@+node:ekr.20200103082049.1: *4* function: make_tokens
748 def make_tokens(contents):
749 """
750 Return a list (not a generator) of Token objects corresponding to the
751 list of 5-tuples generated by tokenize.tokenize.
753 Perform consistency checks and handle all exeptions.
754 """
756 def check(contents, tokens):
757 result = tokens_to_string(tokens)
758 ok = result == contents
759 if not ok:
760 print('\nRound-trip check FAILS')
761 print('Contents...\n')
762 g.printObj(contents)
763 print('\nResult...\n')
764 g.printObj(result)
765 return ok
767 try:
768 five_tuples = tokenize.tokenize(
769 io.BytesIO(contents.encode('utf-8')).readline)
770 except Exception:
771 print('make_tokens: exception in tokenize.tokenize')
772 g.es_exception()
773 return None
774 tokens = Tokenizer().create_input_tokens(contents, five_tuples)
775 assert check(contents, tokens)
776 return tokens
777 #@+node:ekr.20191027075648.1: *4* function: parse_ast
778 def parse_ast(s):
779 """
780 Parse string s, catching & reporting all exceptions.
781 Return the ast node, or None.
782 """
784 def oops(message):
785 print('')
786 print(f"parse_ast: {message}")
787 g.printObj(s)
788 print('')
790 try:
791 s1 = g.toEncodedString(s)
792 tree = ast.parse(s1, filename='before', mode='exec')
793 return tree
794 except IndentationError:
795 oops('Indentation Error')
796 except SyntaxError:
797 oops('Syntax Error')
798 except Exception:
799 oops('Unexpected Exception')
800 g.es_exception()
801 return None
802 #@+node:ekr.20191231110051.1: *3* node/token dumpers...
803 #@+node:ekr.20191027074436.1: *4* function: dump_ast
804 def dump_ast(ast, tag='dump_ast'):
805 """Utility to dump an ast tree."""
806 g.printObj(AstDumper().dump_ast(ast), tag=tag)
807 #@+node:ekr.20191228095945.4: *4* function: dump_contents
808 def dump_contents(contents, tag='Contents'):
809 print('')
810 print(f"{tag}...\n")
811 for i, z in enumerate(g.splitLines(contents)):
812 print(f"{i+1:<3} ", z.rstrip())
813 print('')
814 #@+node:ekr.20191228095945.5: *4* function: dump_lines
815 def dump_lines(tokens, tag='Token lines'):
816 print('')
817 print(f"{tag}...\n")
818 for z in tokens:
819 if z.line.strip():
820 print(z.line.rstrip())
821 else:
822 print(repr(z.line))
823 print('')
824 #@+node:ekr.20191228095945.7: *4* function: dump_results
825 def dump_results(tokens, tag='Results'):
826 print('')
827 print(f"{tag}...\n")
828 print(tokens_to_string(tokens))
829 print('')
830 #@+node:ekr.20191228095945.8: *4* function: dump_tokens
831 def dump_tokens(tokens, tag='Tokens'):
832 print('')
833 print(f"{tag}...\n")
834 if not tokens:
835 return
836 print("Note: values shown are repr(value) *except* for 'string' tokens.")
837 tokens[0].dump_header()
838 for i, z in enumerate(tokens):
839 # Confusing.
840 # if (i % 20) == 0: z.dump_header()
841 print(z.dump())
842 print('')
843 #@+node:ekr.20191228095945.9: *4* function: dump_tree
844 def dump_tree(tokens, tree, tag='Tree'):
845 print('')
846 print(f"{tag}...\n")
847 print(AstDumper().dump_tree(tokens, tree))
848 #@+node:ekr.20200107040729.1: *4* function: show_diffs
849 def show_diffs(s1, s2, filename=''):
850 """Print diffs between strings s1 and s2."""
851 lines = list(difflib.unified_diff(
852 g.splitLines(s1),
853 g.splitLines(s2),
854 fromfile=f"Old {filename}",
855 tofile=f"New {filename}",
856 ))
857 print('')
858 tag = f"Diffs for {filename}" if filename else 'Diffs'
859 g.printObj(lines, tag=tag)
860 #@+node:ekr.20191223095408.1: *3* node/token nodes...
861 # Functions that associate tokens with nodes.
862 #@+node:ekr.20200120082031.1: *4* function: find_statement_node
863 def find_statement_node(node):
864 """
865 Return the nearest statement node.
866 Return None if node has only Module for a parent.
867 """
868 if isinstance(node, ast.Module):
869 return None
870 parent = node
871 while parent:
872 if is_statement_node(parent):
873 return parent
874 parent = parent.parent
875 return None
876 #@+node:ekr.20191223054300.1: *4* function: is_ancestor
877 def is_ancestor(node, token):
878 """Return True if node is an ancestor of token."""
879 t_node = token.node
880 if not t_node:
881 assert token.kind == 'killed', repr(token)
882 return False
883 while t_node:
884 if t_node == node:
885 return True
886 t_node = t_node.parent
887 return False
888 #@+node:ekr.20200120082300.1: *4* function: is_long_statement
889 def is_long_statement(node):
890 """
891 Return True if node is an instance of a node that might be split into
892 shorter lines.
893 """
894 return isinstance(node, (
895 ast.Assign, ast.AnnAssign, ast.AsyncFor, ast.AsyncWith, ast.AugAssign,
896 ast.Call, ast.Delete, ast.ExceptHandler, ast.For, ast.Global,
897 ast.If, ast.Import, ast.ImportFrom,
898 ast.Nonlocal, ast.Return, ast.While, ast.With, ast.Yield, ast.YieldFrom))
899 #@+node:ekr.20200120110005.1: *4* function: is_statement_node
900 def is_statement_node(node):
901 """Return True if node is a top-level statement."""
902 return is_long_statement(node) or isinstance(node, (
903 ast.Break, ast.Continue, ast.Pass, ast.Try))
904 #@+node:ekr.20191231082137.1: *4* function: nearest_common_ancestor
905 def nearest_common_ancestor(node1, node2):
906 """
907 Return the nearest common ancestor node for the given nodes.
909 The nodes must have parent links.
910 """
912 def parents(node):
913 aList = []
914 while node:
915 aList.append(node)
916 node = node.parent
917 return list(reversed(aList))
919 result = None
920 parents1 = parents(node1)
921 parents2 = parents(node2)
922 while parents1 and parents2:
923 parent1 = parents1.pop(0)
924 parent2 = parents2.pop(0)
925 if parent1 == parent2:
926 result = parent1
927 else:
928 break
929 return result
930 #@+node:ekr.20191225061516.1: *3* node/token replacers...
931 # Functions that replace tokens or nodes.
932 #@+node:ekr.20191231162249.1: *4* function: add_token_to_token_list
933 def add_token_to_token_list(token, node):
934 """Insert token in the proper location of node.token_list."""
935 if getattr(node, 'first_i', None) is None:
936 node.first_i = node.last_i = token.index
937 else:
938 node.first_i = min(node.first_i, token.index)
939 node.last_i = max(node.last_i, token.index)
940 #@+node:ekr.20191225055616.1: *4* function: replace_node
941 def replace_node(new_node, old_node):
942 """Replace new_node by old_node in the parse tree."""
943 parent = old_node.parent
944 new_node.parent = parent
945 new_node.node_index = old_node.node_index
946 children = parent.children
947 i = children.index(old_node)
948 children[i] = new_node
949 fields = getattr(old_node, '_fields', None)
950 if fields:
951 for field in fields:
952 field = getattr(old_node, field)
953 if field == old_node:
954 setattr(old_node, field, new_node)
955 break
956 #@+node:ekr.20191225055626.1: *4* function: replace_token
957 def replace_token(token, kind, value):
958 """Replace kind and value of the given token."""
959 if token.kind in ('endmarker', 'killed'):
960 return
961 token.kind = kind
962 token.value = value
963 token.node = None # Should be filled later.
964 #@-others
965#@+node:ekr.20191027072910.1: ** Exception classes
966class AssignLinksError(Exception):
967 """Assigning links to ast nodes failed."""
970class AstNotEqual(Exception):
971 """The two given AST's are not equivalent."""
974class FailFast(Exception):
975 """Abort tests in TestRunner class."""
976#@+node:ekr.20141012064706.18390: ** class AstDumper
977class AstDumper: # pragma: no cover
978 """A class supporting various kinds of dumps of ast nodes."""
979 #@+others
980 #@+node:ekr.20191112033445.1: *3* dumper.dump_tree & helper
981 def dump_tree(self, tokens, tree):
982 """Briefly show a tree, properly indented."""
983 self.tokens = tokens
984 result = [self.show_header()]
985 self.dump_tree_and_links_helper(tree, 0, result)
986 return ''.join(result)
987 #@+node:ekr.20191125035321.1: *4* dumper.dump_tree_and_links_helper
988 def dump_tree_and_links_helper(self, node, level, result):
989 """Return the list of lines in result."""
990 if node is None:
991 return
992 # Let block.
993 indent = ' ' * 2 * level
994 children: List[ast.AST] = getattr(node, 'children', [])
995 node_s = self.compute_node_string(node, level)
996 # Dump...
997 if isinstance(node, (list, tuple)):
998 for z in node:
999 self.dump_tree_and_links_helper(z, level, result)
1000 elif isinstance(node, str):
1001 result.append(f"{indent}{node.__class__.__name__:>8}:{node}\n")
1002 elif isinstance(node, ast.AST):
1003 # Node and parent.
1004 result.append(node_s)
1005 # Children.
1006 for z in children:
1007 self.dump_tree_and_links_helper(z, level + 1, result)
1008 else:
1009 result.append(node_s)
1010 #@+node:ekr.20191125035600.1: *3* dumper.compute_node_string & helpers
1011 def compute_node_string(self, node, level):
1012 """Return a string summarizing the node."""
1013 indent = ' ' * 2 * level
1014 parent = getattr(node, 'parent', None)
1015 node_id = getattr(node, 'node_index', '??')
1016 parent_id = getattr(parent, 'node_index', '??')
1017 parent_s = f"{parent_id:>3}.{parent.__class__.__name__} " if parent else ''
1018 class_name = node.__class__.__name__
1019 descriptor_s = f"{node_id}.{class_name}: " + self.show_fields(
1020 class_name, node, 30)
1021 tokens_s = self.show_tokens(node, 70, 100)
1022 lines = self.show_line_range(node)
1023 full_s1 = f"{parent_s:<16} {lines:<10} {indent}{descriptor_s} "
1024 node_s = f"{full_s1:<62} {tokens_s}\n"
1025 return node_s
1026 #@+node:ekr.20191113223424.1: *4* dumper.show_fields
1027 def show_fields(self, class_name, node, truncate_n):
1028 """Return a string showing interesting fields of the node."""
1029 val = ''
1030 if class_name == 'JoinedStr':
1031 values = node.values
1032 assert isinstance(values, list)
1033 # Str tokens may represent *concatenated* strings.
1034 results = []
1035 fstrings, strings = 0, 0
1036 for z in values:
1037 assert isinstance(z, (ast.FormattedValue, ast.Str))
1038 if isinstance(z, ast.Str):
1039 results.append(z.s)
1040 strings += 1
1041 else:
1042 results.append(z.__class__.__name__)
1043 fstrings += 1
1044 val = f"{strings} str, {fstrings} f-str"
1045 elif class_name == 'keyword':
1046 if isinstance(node.value, ast.Str):
1047 val = f"arg={node.arg}..Str.value.s={node.value.s}"
1048 elif isinstance(node.value, ast.Name):
1049 val = f"arg={node.arg}..Name.value.id={node.value.id}"
1050 else:
1051 val = f"arg={node.arg}..value={node.value.__class__.__name__}"
1052 elif class_name == 'Name':
1053 val = f"id={node.id!r}"
1054 elif class_name == 'NameConstant':
1055 val = f"value={node.value!r}"
1056 elif class_name == 'Num':
1057 val = f"n={node.n}"
1058 elif class_name == 'Starred':
1059 if isinstance(node.value, ast.Str):
1060 val = f"s={node.value.s}"
1061 elif isinstance(node.value, ast.Name):
1062 val = f"id={node.value.id}"
1063 else:
1064 val = f"s={node.value.__class__.__name__}"
1065 elif class_name == 'Str':
1066 val = f"s={node.s!r}"
1067 elif class_name in ('AugAssign', 'BinOp', 'BoolOp', 'UnaryOp'): # IfExp
1068 name = node.op.__class__.__name__
1069 val = f"op={_op_names.get(name, name)}"
1070 elif class_name == 'Compare':
1071 ops = ','.join([op_name(z) for z in node.ops])
1072 val = f"ops='{ops}'"
1073 else:
1074 val = ''
1075 return g.truncate(val, truncate_n)
1076 #@+node:ekr.20191114054726.1: *4* dumper.show_line_range
1077 def show_line_range(self, node):
1079 token_list = get_node_token_list(node, self.tokens)
1080 if not token_list:
1081 return ''
1082 min_ = min([z.line_number for z in token_list])
1083 max_ = max([z.line_number for z in token_list])
1084 return f"{min_}" if min_ == max_ else f"{min_}..{max_}"
1085 #@+node:ekr.20191113223425.1: *4* dumper.show_tokens
1086 def show_tokens(self, node, n, m, show_cruft=False):
1087 """
1088 Return a string showing node.token_list.
1090 Split the result if n + len(result) > m
1091 """
1092 token_list = get_node_token_list(node, self.tokens)
1093 result = []
1094 for z in token_list:
1095 val = None
1096 if z.kind == 'comment':
1097 if show_cruft:
1098 val = g.truncate(z.value, 10) # Short is good.
1099 result.append(f"{z.kind}.{z.index}({val})")
1100 elif z.kind == 'name':
1101 val = g.truncate(z.value, 20)
1102 result.append(f"{z.kind}.{z.index}({val})")
1103 elif z.kind == 'newline':
1104 # result.append(f"{z.kind}.{z.index}({z.line_number}:{len(z.line)})")
1105 result.append(f"{z.kind}.{z.index}")
1106 elif z.kind == 'number':
1107 result.append(f"{z.kind}.{z.index}({z.value})")
1108 elif z.kind == 'op':
1109 if z.value not in ',()' or show_cruft:
1110 result.append(f"{z.kind}.{z.index}({z.value})")
1111 elif z.kind == 'string':
1112 val = g.truncate(z.value, 30)
1113 result.append(f"{z.kind}.{z.index}({val})")
1114 elif z.kind == 'ws':
1115 if show_cruft:
1116 result.append(f"{z.kind}.{z.index}({len(z.value)})")
1117 else:
1118 # Indent, dedent, encoding, etc.
1119 # Don't put a blank.
1120 continue
1121 if result and result[-1] != ' ':
1122 result.append(' ')
1123 #
1124 # split the line if it is too long.
1125 # g.printObj(result, tag='show_tokens')
1126 if 1:
1127 return ''.join(result)
1128 line, lines = [], []
1129 for r in result:
1130 line.append(r)
1131 if n + len(''.join(line)) >= m:
1132 lines.append(''.join(line))
1133 line = []
1134 lines.append(''.join(line))
1135 pad = '\n' + ' ' * n
1136 return pad.join(lines)
1137 #@+node:ekr.20191110165235.5: *3* dumper.show_header
1138 def show_header(self):
1139 """Return a header string, but only the fist time."""
1140 return (
1141 f"{'parent':<16} {'lines':<10} {'node':<34} {'tokens'}\n"
1142 f"{'======':<16} {'=====':<10} {'====':<34} {'======'}\n")
1143 #@+node:ekr.20141012064706.18392: *3* dumper.dump_ast & helper
1144 annotate_fields = False
1145 include_attributes = False
1146 indent_ws = ' '
1148 def dump_ast(self, node, level=0):
1149 """
1150 Dump an ast tree. Adapted from ast.dump.
1151 """
1152 sep1 = '\n%s' % (self.indent_ws * (level + 1))
1153 if isinstance(node, ast.AST):
1154 fields = [(a, self.dump_ast(b, level + 1)) for a, b in self.get_fields(node)]
1155 if self.include_attributes and node._attributes:
1156 fields.extend([(a, self.dump_ast(getattr(node, a), level + 1))
1157 for a in node._attributes])
1158 if self.annotate_fields:
1159 aList = ['%s=%s' % (a, b) for a, b in fields]
1160 else:
1161 aList = [b for a, b in fields]
1162 name = node.__class__.__name__
1163 sep = '' if len(aList) <= 1 else sep1
1164 return '%s(%s%s)' % (name, sep, sep1.join(aList))
1165 if isinstance(node, list):
1166 sep = sep1
1167 return 'LIST[%s]' % ''.join(
1168 ['%s%s' % (sep, self.dump_ast(z, level + 1)) for z in node])
1169 return repr(node)
1170 #@+node:ekr.20141012064706.18393: *4* dumper.get_fields
1171 def get_fields(self, node):
1173 return (
1174 (a, b) for a, b in ast.iter_fields(node)
1175 if a not in ['ctx',] and b not in (None, [])
1176 )
1177 #@-others
1178#@+node:ekr.20191227170628.1: ** TOG classes...
1179#@+node:ekr.20191113063144.1: *3* class TokenOrderGenerator
1180class TokenOrderGenerator:
1181 """
1182 A class that traverses ast (parse) trees in token order.
1184 Overview: https://github.com/leo-editor/leo-editor/issues/1440#issue-522090981
1186 Theory of operation:
1187 - https://github.com/leo-editor/leo-editor/issues/1440#issuecomment-573661883
1188 - http://leoeditor.com/appendices.html#tokenorder-classes-theory-of-operation
1190 How to: http://leoeditor.com/appendices.html#tokenorder-class-how-to
1192 Project history: https://github.com/leo-editor/leo-editor/issues/1440#issuecomment-574145510
1193 """
1195 n_nodes = 0 # The number of nodes that have been visited.
1196 #@+others
1197 #@+node:ekr.20200103174914.1: *4* tog: Init...
1198 #@+node:ekr.20191228184647.1: *5* tog.balance_tokens
1199 def balance_tokens(self, tokens):
1200 """
1201 TOG.balance_tokens.
1203 Insert two-way links between matching paren tokens.
1204 """
1205 count, stack = 0, []
1206 for token in tokens:
1207 if token.kind == 'op':
1208 if token.value == '(':
1209 count += 1
1210 stack.append(token.index)
1211 if token.value == ')':
1212 if stack:
1213 index = stack.pop()
1214 tokens[index].matching_paren = token.index
1215 tokens[token.index].matching_paren = index
1216 else:
1217 g.trace(f"unmatched ')' at index {token.index}")
1218 # g.trace(f"tokens: {len(tokens)} matched parens: {count}")
1219 if stack:
1220 g.trace("unmatched '(' at {','.join(stack)}")
1221 return count
1222 #@+node:ekr.20191113063144.4: *5* tog.create_links
1223 def create_links(self, tokens, tree, file_name=''):
1224 """
1225 A generator creates two-way links between the given tokens and ast-tree.
1227 Callers should call this generator with list(tog.create_links(...))
1229 The sync_tokens method creates the links and verifies that the resulting
1230 tree traversal generates exactly the given tokens in exact order.
1232 tokens: the list of Token instances for the input.
1233 Created by make_tokens().
1234 tree: the ast tree for the input.
1235 Created by parse_ast().
1236 """
1237 #
1238 # Init all ivars.
1239 self.file_name = file_name
1240 # For tests.
1241 self.level = 0
1242 # Python indentation level.
1243 self.node = None
1244 # The node being visited.
1245 # The parent of the about-to-be visited node.
1246 self.tokens = tokens
1247 # The immutable list of input tokens.
1248 self.tree = tree
1249 # The tree of ast.AST nodes.
1250 #
1251 # Traverse the tree.
1252 try:
1253 while True:
1254 next(self.visitor(tree))
1255 except StopIteration:
1256 pass
1257 #
1258 # Ensure that all tokens are patched.
1259 self.node = tree
1260 yield from self.gen_token('endmarker', '')
1261 #@+node:ekr.20191229071733.1: *5* tog.init_from_file
1262 def init_from_file(self, filename): # pragma: no cover
1263 """
1264 Create the tokens and ast tree for the given file.
1265 Create links between tokens and the parse tree.
1266 Return (contents, encoding, tokens, tree).
1267 """
1268 self.level = 0
1269 self.filename = filename
1270 encoding, contents = read_file_with_encoding(filename)
1271 if not contents:
1272 return None, None, None, None
1273 self.tokens = tokens = make_tokens(contents)
1274 self.tree = tree = parse_ast(contents)
1275 list(self.create_links(tokens, tree))
1276 return contents, encoding, tokens, tree
1277 #@+node:ekr.20191229071746.1: *5* tog.init_from_string
1278 def init_from_string(self, contents, filename): # pragma: no cover
1279 """
1280 Tokenize, parse and create links in the contents string.
1282 Return (tokens, tree).
1283 """
1284 self.filename = filename
1285 self.level = 0
1286 self.tokens = tokens = make_tokens(contents)
1287 self.tree = tree = parse_ast(contents)
1288 list(self.create_links(tokens, tree))
1289 return tokens, tree
1290 #@+node:ekr.20191223052749.1: *4* tog: Traversal...
1291 #@+node:ekr.20191113063144.3: *5* tog.begin_visitor
1292 begin_end_stack: List[str] = []
1293 node_index = 0 # The index into the node_stack.
1294 node_stack: List[ast.AST] = [] # The stack of parent nodes.
1296 def begin_visitor(self, node):
1297 """Enter a visitor."""
1298 # Update the stats.
1299 self.n_nodes += 1
1300 # Do this first, *before* updating self.node.
1301 node.parent = self.node
1302 if self.node:
1303 children = getattr(self.node, 'children', []) # type:ignore
1304 children.append(node)
1305 self.node.children = children
1306 # Inject the node_index field.
1307 assert not hasattr(node, 'node_index'), g.callers()
1308 node.node_index = self.node_index
1309 self.node_index += 1
1310 # begin_visitor and end_visitor must be paired.
1311 self.begin_end_stack.append(node.__class__.__name__)
1312 # Push the previous node.
1313 self.node_stack.append(self.node)
1314 # Update self.node *last*.
1315 self.node = node
1316 #@+node:ekr.20200104032811.1: *5* tog.end_visitor
1317 def end_visitor(self, node):
1318 """Leave a visitor."""
1319 # begin_visitor and end_visitor must be paired.
1320 entry_name = self.begin_end_stack.pop()
1321 assert entry_name == node.__class__.__name__, f"{entry_name!r} {node.__class__.__name__}"
1322 assert self.node == node, (repr(self.node), repr(node))
1323 # Restore self.node.
1324 self.node = self.node_stack.pop()
1325 #@+node:ekr.20200110162044.1: *5* tog.find_next_significant_token
1326 def find_next_significant_token(self):
1327 """
1328 Scan from *after* self.tokens[px] looking for the next significant
1329 token.
1331 Return the token, or None. Never change self.px.
1332 """
1333 px = self.px + 1
1334 while px < len(self.tokens):
1335 token = self.tokens[px]
1336 px += 1
1337 if is_significant_token(token):
1338 return token
1339 # This will never happen, because endtoken is significant.
1340 return None # pragma: no cover
1341 #@+node:ekr.20191121180100.1: *5* tog.gen*
1342 # Useful wrappers...
1344 def gen(self, z):
1345 yield from self.visitor(z)
1347 def gen_name(self, val):
1348 yield from self.visitor(self.sync_name(val)) # type:ignore
1350 def gen_op(self, val):
1351 yield from self.visitor(self.sync_op(val)) # type:ignore
1353 def gen_token(self, kind, val):
1354 yield from self.visitor(self.sync_token(kind, val)) # type:ignore
1355 #@+node:ekr.20191113063144.7: *5* tog.sync_token & set_links
1356 px = -1 # Index of the previously synced token.
1358 def sync_token(self, kind, val):
1359 """
1360 Sync to a token whose kind & value are given. The token need not be
1361 significant, but it must be guaranteed to exist in the token list.
1363 The checks in this method constitute a strong, ever-present, unit test.
1365 Scan the tokens *after* px, looking for a token T matching (kind, val).
1366 raise AssignLinksError if a significant token is found that doesn't match T.
1367 Otherwise:
1368 - Create two-way links between all assignable tokens between px and T.
1369 - Create two-way links between T and self.node.
1370 - Advance by updating self.px to point to T.
1371 """
1372 node, tokens = self.node, self.tokens
1373 assert isinstance(node, ast.AST), repr(node)
1374 # g.trace(
1375 # f"px: {self.px:2} "
1376 # f"node: {node.__class__.__name__:<10} "
1377 # f"kind: {kind:>10}: val: {val!r}")
1378 #
1379 # Step one: Look for token T.
1380 old_px = px = self.px + 1
1381 while px < len(self.tokens):
1382 token = tokens[px]
1383 if (kind, val) == (token.kind, token.value):
1384 break # Success.
1385 if kind == token.kind == 'number':
1386 val = token.value
1387 break # Benign: use the token's value, a string, instead of a number.
1388 if is_significant_token(token): # pragma: no cover
1389 line_s = f"line {token.line_number}:"
1390 val = str(val) # for g.truncate.
1391 raise AssignLinksError(
1392 f" file: {self.filename}\n"
1393 f"{line_s:>12} {token.line.strip()}\n"
1394 f"Looking for: {kind}.{g.truncate(val, 40)!r}\n"
1395 f" found: {token.kind}.{token.value!r}\n"
1396 f"token.index: {token.index}\n")
1397 # Skip the insignificant token.
1398 px += 1
1399 else: # pragma: no cover
1400 val = str(val) # for g.truncate.
1401 raise AssignLinksError(
1402 f" file: {self.filename}\n"
1403 f"Looking for: {kind}.{g.truncate(val, 40)}\n"
1404 f" found: end of token list")
1405 #
1406 # Step two: Assign *secondary* links only for newline tokens.
1407 # Ignore all other non-significant tokens.
1408 while old_px < px:
1409 token = tokens[old_px]
1410 old_px += 1
1411 if token.kind in ('comment', 'newline', 'nl'):
1412 self.set_links(node, token)
1413 #
1414 # Step three: Set links in the found token.
1415 token = tokens[px]
1416 self.set_links(node, token)
1417 #
1418 # Step four: Advance.
1419 self.px = px
1420 #@+node:ekr.20191125120814.1: *6* tog.set_links
1421 last_statement_node = None
1423 def set_links(self, node, token):
1424 """Make two-way links between token and the given node."""
1425 # Don't bother assigning comment, comma, parens, ws and endtoken tokens.
1426 if token.kind == 'comment':
1427 # Append the comment to node.comment_list.
1428 comment_list = getattr(node, 'comment_list', []) # type:ignore
1429 node.comment_list = comment_list + [token]
1430 return
1431 if token.kind in ('endmarker', 'ws'):
1432 return
1433 if token.kind == 'op' and token.value in ',()':
1434 return
1435 # *Always* remember the last statement.
1436 statement = find_statement_node(node)
1437 if statement:
1438 self.last_statement_node = statement # type:ignore
1439 assert not isinstance(self.last_statement_node, ast.Module)
1440 if token.node is not None: # pragma: no cover
1441 line_s = f"line {token.line_number}:"
1442 raise AssignLinksError(
1443 f" file: {self.filename}\n"
1444 f"{line_s:>12} {token.line.strip()}\n"
1445 f"token index: {self.px}\n"
1446 f"token.node is not None\n"
1447 f" token.node: {token.node.__class__.__name__}\n"
1448 f" callers: {g.callers()}")
1449 # Assign newlines to the previous statement node, if any.
1450 if token.kind in ('newline', 'nl'):
1451 # Set an *auxilliary* link for the split/join logic.
1452 # Do *not* set token.node!
1453 token.statement_node = self.last_statement_node
1454 return
1455 if is_significant_token(token):
1456 # Link the token to the ast node.
1457 token.node = node # type:ignore
1458 # Add the token to node's token_list.
1459 add_token_to_token_list(token, node)
1460 #@+node:ekr.20191124083124.1: *5* tog.sync_name and sync_op
1461 # It's valid for these to return None.
1463 def sync_name(self, val):
1464 aList = val.split('.')
1465 if len(aList) == 1:
1466 self.sync_token('name', val)
1467 else:
1468 for i, part in enumerate(aList):
1469 self.sync_token('name', part)
1470 if i < len(aList) - 1:
1471 self.sync_op('.')
1473 def sync_op(self, val):
1474 """
1475 Sync to the given operator.
1477 val may be '(' or ')' *only* if the parens *will* actually exist in the
1478 token list.
1479 """
1480 self.sync_token('op', val)
1481 #@+node:ekr.20191113081443.1: *5* tog.visitor (calls begin/end_visitor)
1482 def visitor(self, node):
1483 """Given an ast node, return a *generator* from its visitor."""
1484 # This saves a lot of tests.
1485 trace = False
1486 if node is None:
1487 return
1488 if trace:
1489 # Keep this trace. It's useful.
1490 cn = node.__class__.__name__ if node else ' '
1491 caller1, caller2 = g.callers(2).split(',')
1492 g.trace(f"{caller1:>15} {caller2:<14} {cn}")
1493 # More general, more convenient.
1494 if isinstance(node, (list, tuple)):
1495 for z in node or []:
1496 if isinstance(z, ast.AST):
1497 yield from self.visitor(z)
1498 else: # pragma: no cover
1499 # Some fields may contain ints or strings.
1500 assert isinstance(z, (int, str)), z.__class__.__name__
1501 return
1502 # We *do* want to crash if the visitor doesn't exist.
1503 method = getattr(self, 'do_' + node.__class__.__name__)
1504 # Allow begin/end visitor to be generators.
1505 self.begin_visitor(node)
1506 yield from method(node)
1507 self.end_visitor(node)
1508 #@+node:ekr.20191113063144.13: *4* tog: Visitors...
1509 #@+node:ekr.20191113063144.32: *5* tog.keyword: not called!
1510 # keyword arguments supplied to call (NULL identifier for **kwargs)
1512 # keyword = (identifier? arg, expr value)
1514 def do_keyword(self, node): # pragma: no cover
1515 """A keyword arg in an ast.Call."""
1516 # This should never be called.
1517 # tog.hande_call_arguments calls self.gen(kwarg_arg.value) instead.
1518 filename = getattr(self, 'filename', '<no file>')
1519 raise AssignLinksError(
1520 f"file: {filename}\n"
1521 f"do_keyword should never be called\n"
1522 f"{g.callers(8)}")
1523 #@+node:ekr.20191113063144.14: *5* tog: Contexts
1524 #@+node:ekr.20191113063144.28: *6* tog.arg
1525 # arg = (identifier arg, expr? annotation)
1527 def do_arg(self, node):
1528 """This is one argument of a list of ast.Function or ast.Lambda arguments."""
1529 yield from self.gen_name(node.arg)
1530 annotation = getattr(node, 'annotation', None)
1531 if annotation is not None:
1532 yield from self.gen_op(':')
1533 yield from self.gen(node.annotation)
1534 #@+node:ekr.20191113063144.27: *6* tog.arguments
1535 # arguments = (
1536 # arg* posonlyargs, arg* args, arg? vararg, arg* kwonlyargs,
1537 # expr* kw_defaults, arg? kwarg, expr* defaults
1538 # )
1540 def do_arguments(self, node):
1541 """Arguments to ast.Function or ast.Lambda, **not** ast.Call."""
1542 #
1543 # No need to generate commas anywhere below.
1544 #
1545 # Let block. Some fields may not exist pre Python 3.8.
1546 n_plain = len(node.args) - len(node.defaults)
1547 posonlyargs = getattr(node, 'posonlyargs', []) # type:ignore
1548 vararg = getattr(node, 'vararg', None)
1549 kwonlyargs = getattr(node, 'kwonlyargs', []) # type:ignore
1550 kw_defaults = getattr(node, 'kw_defaults', []) # type:ignore
1551 kwarg = getattr(node, 'kwarg', None)
1552 if 0:
1553 g.printObj(ast.dump(node.vararg) if node.vararg else 'None', tag='node.vararg')
1554 g.printObj([ast.dump(z) for z in node.args], tag='node.args')
1555 g.printObj([ast.dump(z) for z in node.defaults], tag='node.defaults')
1556 g.printObj([ast.dump(z) for z in posonlyargs], tag='node.posonlyargs')
1557 g.printObj([ast.dump(z) for z in kwonlyargs], tag='kwonlyargs')
1558 g.printObj([ast.dump(z) if z else 'None' for z in kw_defaults], tag='kw_defaults')
1559 # 1. Sync the position-only args.
1560 if posonlyargs:
1561 for n, z in enumerate(posonlyargs):
1562 # g.trace('pos-only', ast.dump(z))
1563 yield from self.gen(z)
1564 yield from self.gen_op('/')
1565 # 2. Sync all args.
1566 for i, z in enumerate(node.args):
1567 yield from self.gen(z)
1568 if i >= n_plain:
1569 yield from self.gen_op('=')
1570 yield from self.gen(node.defaults[i - n_plain])
1571 # 3. Sync the vararg.
1572 if vararg:
1573 # g.trace('vararg', ast.dump(vararg))
1574 yield from self.gen_op('*')
1575 yield from self.gen(vararg)
1576 # 4. Sync the keyword-only args.
1577 if kwonlyargs:
1578 if not vararg:
1579 yield from self.gen_op('*')
1580 for n, z in enumerate(kwonlyargs):
1581 # g.trace('keyword-only', ast.dump(z))
1582 yield from self.gen(z)
1583 val = kw_defaults[n]
1584 if val is not None:
1585 yield from self.gen_op('=')
1586 yield from self.gen(val)
1587 # 5. Sync the kwarg.
1588 if kwarg:
1589 # g.trace('kwarg', ast.dump(kwarg))
1590 yield from self.gen_op('**')
1591 yield from self.gen(kwarg)
1593 #@+node:ekr.20191113063144.15: *6* tog.AsyncFunctionDef
1594 # AsyncFunctionDef(identifier name, arguments args, stmt* body, expr* decorator_list,
1595 # expr? returns)
1597 def do_AsyncFunctionDef(self, node):
1599 if node.decorator_list:
1600 for z in node.decorator_list:
1601 # '@%s\n'
1602 yield from self.gen_op('@')
1603 yield from self.gen(z)
1604 # 'asynch def (%s): -> %s\n'
1605 # 'asynch def %s(%s):\n'
1606 async_token_type = 'async' if has_async_tokens else 'name'
1607 yield from self.gen_token(async_token_type, 'async')
1608 yield from self.gen_name('def')
1609 yield from self.gen_name(node.name) # A string
1610 yield from self.gen_op('(')
1611 yield from self.gen(node.args)
1612 yield from self.gen_op(')')
1613 returns = getattr(node, 'returns', None)
1614 if returns is not None:
1615 yield from self.gen_op('->')
1616 yield from self.gen(node.returns)
1617 yield from self.gen_op(':')
1618 self.level += 1
1619 yield from self.gen(node.body)
1620 self.level -= 1
1621 #@+node:ekr.20191113063144.16: *6* tog.ClassDef
1622 def do_ClassDef(self, node, print_body=True):
1624 for z in node.decorator_list or []:
1625 # @{z}\n
1626 yield from self.gen_op('@')
1627 yield from self.gen(z)
1628 # class name(bases):\n
1629 yield from self.gen_name('class')
1630 yield from self.gen_name(node.name) # A string.
1631 if node.bases:
1632 yield from self.gen_op('(')
1633 yield from self.gen(node.bases)
1634 yield from self.gen_op(')')
1635 yield from self.gen_op(':')
1636 # Body...
1637 self.level += 1
1638 yield from self.gen(node.body)
1639 self.level -= 1
1640 #@+node:ekr.20191113063144.17: *6* tog.FunctionDef
1641 # FunctionDef(
1642 # identifier name, arguments args,
1643 # stmt* body,
1644 # expr* decorator_list,
1645 # expr? returns,
1646 # string? type_comment)
1648 def do_FunctionDef(self, node):
1650 # Guards...
1651 returns = getattr(node, 'returns', None)
1652 # Decorators...
1653 # @{z}\n
1654 for z in node.decorator_list or []:
1655 yield from self.gen_op('@')
1656 yield from self.gen(z)
1657 # Signature...
1658 # def name(args): -> returns\n
1659 # def name(args):\n
1660 yield from self.gen_name('def')
1661 yield from self.gen_name(node.name) # A string.
1662 yield from self.gen_op('(')
1663 yield from self.gen(node.args)
1664 yield from self.gen_op(')')
1665 if returns is not None:
1666 yield from self.gen_op('->')
1667 yield from self.gen(node.returns)
1668 yield from self.gen_op(':')
1669 # Body...
1670 self.level += 1
1671 yield from self.gen(node.body)
1672 self.level -= 1
1673 #@+node:ekr.20191113063144.18: *6* tog.Interactive
1674 def do_Interactive(self, node): # pragma: no cover
1676 yield from self.gen(node.body)
1677 #@+node:ekr.20191113063144.20: *6* tog.Lambda
1678 def do_Lambda(self, node):
1680 yield from self.gen_name('lambda')
1681 yield from self.gen(node.args)
1682 yield from self.gen_op(':')
1683 yield from self.gen(node.body)
1684 #@+node:ekr.20191113063144.19: *6* tog.Module
1685 def do_Module(self, node):
1687 # Encoding is a non-syncing statement.
1688 yield from self.gen(node.body)
1689 #@+node:ekr.20191113063144.21: *5* tog: Expressions
1690 #@+node:ekr.20191113063144.22: *6* tog.Expr
1691 def do_Expr(self, node):
1692 """An outer expression."""
1693 # No need to put parentheses.
1694 yield from self.gen(node.value)
1695 #@+node:ekr.20191113063144.23: *6* tog.Expression
1696 def do_Expression(self, node): # pragma: no cover
1697 """An inner expression."""
1698 # No need to put parentheses.
1699 yield from self.gen(node.body)
1700 #@+node:ekr.20191113063144.24: *6* tog.GeneratorExp
1701 def do_GeneratorExp(self, node):
1703 # '<gen %s for %s>' % (elt, ','.join(gens))
1704 # No need to put parentheses or commas.
1705 yield from self.gen(node.elt)
1706 yield from self.gen(node.generators)
1707 #@+node:ekr.20210321171703.1: *6* tog.NamedExpr
1708 # NamedExpr(expr target, expr value)
1710 def do_NamedExpr(self, node): # Python 3.8+
1712 yield from self.gen(node.target)
1713 yield from self.gen_op(':=')
1714 yield from self.gen(node.value)
1715 #@+node:ekr.20191113063144.26: *5* tog: Operands
1716 #@+node:ekr.20191113063144.29: *6* tog.Attribute
1717 # Attribute(expr value, identifier attr, expr_context ctx)
1719 def do_Attribute(self, node):
1721 yield from self.gen(node.value)
1722 yield from self.gen_op('.')
1723 yield from self.gen_name(node.attr) # A string.
1724 #@+node:ekr.20191113063144.30: *6* tog.Bytes
1725 def do_Bytes(self, node):
1727 """
1728 It's invalid to mix bytes and non-bytes literals, so just
1729 advancing to the next 'string' token suffices.
1730 """
1731 token = self.find_next_significant_token()
1732 yield from self.gen_token('string', token.value)
1733 #@+node:ekr.20191113063144.33: *6* tog.comprehension
1734 # comprehension = (expr target, expr iter, expr* ifs, int is_async)
1736 def do_comprehension(self, node):
1738 # No need to put parentheses.
1739 yield from self.gen_name('for') # #1858.
1740 yield from self.gen(node.target) # A name
1741 yield from self.gen_name('in')
1742 yield from self.gen(node.iter)
1743 for z in node.ifs or []:
1744 yield from self.gen_name('if')
1745 yield from self.gen(z)
1746 #@+node:ekr.20191113063144.34: *6* tog.Constant
1747 def do_Constant(self, node): # pragma: no cover
1748 """
1750 https://greentreesnakes.readthedocs.io/en/latest/nodes.html
1752 A constant. The value attribute holds the Python object it represents.
1753 This can be simple types such as a number, string or None, but also
1754 immutable container types (tuples and frozensets) if all of their
1755 elements are constant.
1756 """
1758 # Support Python 3.8.
1759 if node.value is None or isinstance(node.value, bool):
1760 # Weird: return a name!
1761 yield from self.gen_token('name', repr(node.value))
1762 elif node.value == Ellipsis:
1763 yield from self.gen_op('...')
1764 elif isinstance(node.value, str):
1765 yield from self.do_Str(node)
1766 elif isinstance(node.value, (int, float)):
1767 yield from self.gen_token('number', repr(node.value))
1768 elif isinstance(node.value, bytes):
1769 yield from self.do_Bytes(node)
1770 elif isinstance(node.value, tuple):
1771 yield from self.do_Tuple(node)
1772 elif isinstance(node.value, frozenset):
1773 yield from self.do_Set(node)
1774 else:
1775 # Unknown type.
1776 g.trace('----- Oops -----', repr(node.value), g.callers())
1777 #@+node:ekr.20191113063144.35: *6* tog.Dict
1778 # Dict(expr* keys, expr* values)
1780 def do_Dict(self, node):
1782 assert len(node.keys) == len(node.values)
1783 yield from self.gen_op('{')
1784 # No need to put commas.
1785 for i, key in enumerate(node.keys):
1786 key, value = node.keys[i], node.values[i]
1787 yield from self.gen(key) # a Str node.
1788 yield from self.gen_op(':')
1789 if value is not None:
1790 yield from self.gen(value)
1791 yield from self.gen_op('}')
1792 #@+node:ekr.20191113063144.36: *6* tog.DictComp
1793 # DictComp(expr key, expr value, comprehension* generators)
1795 # d2 = {val: key for key, val in d}
1797 def do_DictComp(self, node):
1799 yield from self.gen_token('op', '{')
1800 yield from self.gen(node.key)
1801 yield from self.gen_op(':')
1802 yield from self.gen(node.value)
1803 for z in node.generators or []:
1804 yield from self.gen(z)
1805 yield from self.gen_token('op', '}')
1806 #@+node:ekr.20191113063144.37: *6* tog.Ellipsis
1807 def do_Ellipsis(self, node): # pragma: no cover (Does not exist for python 3.8+)
1809 yield from self.gen_op('...')
1810 #@+node:ekr.20191113063144.38: *6* tog.ExtSlice
1811 # https://docs.python.org/3/reference/expressions.html#slicings
1813 # ExtSlice(slice* dims)
1815 def do_ExtSlice(self, node): # pragma: no cover (deprecated)
1817 # ','.join(node.dims)
1818 for i, z in enumerate(node.dims):
1819 yield from self.gen(z)
1820 if i < len(node.dims) - 1:
1821 yield from self.gen_op(',')
1822 #@+node:ekr.20191113063144.40: *6* tog.Index
1823 def do_Index(self, node): # pragma: no cover (deprecated)
1825 yield from self.gen(node.value)
1826 #@+node:ekr.20191113063144.39: *6* tog.FormattedValue: not called!
1827 # FormattedValue(expr value, int? conversion, expr? format_spec)
1829 def do_FormattedValue(self, node): # pragma: no cover
1830 """
1831 This node represents the *components* of a *single* f-string.
1833 Happily, JoinedStr nodes *also* represent *all* f-strings,
1834 so the TOG should *never visit this node!
1835 """
1836 filename = getattr(self, 'filename', '<no file>')
1837 raise AssignLinksError(
1838 f"file: {filename}\n"
1839 f"do_FormattedValue should never be called")
1841 # This code has no chance of being useful...
1843 # conv = node.conversion
1844 # spec = node.format_spec
1845 # yield from self.gen(node.value)
1846 # if conv is not None:
1847 # yield from self.gen_token('number', conv)
1848 # if spec is not None:
1849 # yield from self.gen(node.format_spec)
1850 #@+node:ekr.20191113063144.41: *6* tog.JoinedStr & helpers
1851 # JoinedStr(expr* values)
1853 def do_JoinedStr(self, node):
1854 """
1855 JoinedStr nodes represent at least one f-string and all other strings
1856 concatentated to it.
1858 Analyzing JoinedStr.values would be extremely tricky, for reasons that
1859 need not be explained here.
1861 Instead, we get the tokens *from the token list itself*!
1862 """
1863 for z in self.get_concatenated_string_tokens():
1864 yield from self.gen_token(z.kind, z.value)
1865 #@+node:ekr.20191113063144.42: *6* tog.List
1866 def do_List(self, node):
1868 # No need to put commas.
1869 yield from self.gen_op('[')
1870 yield from self.gen(node.elts)
1871 yield from self.gen_op(']')
1872 #@+node:ekr.20191113063144.43: *6* tog.ListComp
1873 # ListComp(expr elt, comprehension* generators)
1875 def do_ListComp(self, node):
1877 yield from self.gen_op('[')
1878 yield from self.gen(node.elt)
1879 for z in node.generators:
1880 yield from self.gen(z)
1881 yield from self.gen_op(']')
1882 #@+node:ekr.20191113063144.44: *6* tog.Name & NameConstant
1883 def do_Name(self, node):
1885 yield from self.gen_name(node.id)
1887 def do_NameConstant(self, node): # pragma: no cover (Does not exist in Python 3.8+)
1889 yield from self.gen_name(repr(node.value))
1891 #@+node:ekr.20191113063144.45: *6* tog.Num
1892 def do_Num(self, node): # pragma: no cover (Does not exist in Python 3.8+)
1894 yield from self.gen_token('number', node.n)
1895 #@+node:ekr.20191113063144.47: *6* tog.Set
1896 # Set(expr* elts)
1898 def do_Set(self, node):
1900 yield from self.gen_op('{')
1901 yield from self.gen(node.elts)
1902 yield from self.gen_op('}')
1903 #@+node:ekr.20191113063144.48: *6* tog.SetComp
1904 # SetComp(expr elt, comprehension* generators)
1906 def do_SetComp(self, node):
1908 yield from self.gen_op('{')
1909 yield from self.gen(node.elt)
1910 for z in node.generators or []:
1911 yield from self.gen(z)
1912 yield from self.gen_op('}')
1913 #@+node:ekr.20191113063144.49: *6* tog.Slice
1914 # slice = Slice(expr? lower, expr? upper, expr? step)
1916 def do_Slice(self, node):
1918 lower = getattr(node, 'lower', None)
1919 upper = getattr(node, 'upper', None)
1920 step = getattr(node, 'step', None)
1921 if lower is not None:
1922 yield from self.gen(lower)
1923 # Always put the colon between upper and lower.
1924 yield from self.gen_op(':')
1925 if upper is not None:
1926 yield from self.gen(upper)
1927 # Put the second colon if it exists in the token list.
1928 if step is None:
1929 token = self.find_next_significant_token()
1930 if token and token.value == ':':
1931 yield from self.gen_op(':')
1932 else:
1933 yield from self.gen_op(':')
1934 yield from self.gen(step)
1935 #@+node:ekr.20191113063144.50: *6* tog.Str & helper
1936 def do_Str(self, node):
1937 """This node represents a string constant."""
1938 # This loop is necessary to handle string concatenation.
1939 for z in self.get_concatenated_string_tokens():
1940 yield from self.gen_token(z.kind, z.value)
1941 #@+node:ekr.20200111083914.1: *7* tog.get_concatenated_tokens
1942 def get_concatenated_string_tokens(self):
1943 """
1944 Return the next 'string' token and all 'string' tokens concatenated to
1945 it. *Never* update self.px here.
1946 """
1947 trace = False
1948 tag = 'tog.get_concatenated_string_tokens'
1949 i = self.px
1950 # First, find the next significant token. It should be a string.
1951 i, token = i + 1, None
1952 while i < len(self.tokens):
1953 token = self.tokens[i]
1954 i += 1
1955 if token.kind == 'string':
1956 # Rescan the string.
1957 i -= 1
1958 break
1959 # An error.
1960 if is_significant_token(token): # pragma: no cover
1961 break
1962 # Raise an error if we didn't find the expected 'string' token.
1963 if not token or token.kind != 'string': # pragma: no cover
1964 if not token:
1965 token = self.tokens[-1]
1966 filename = getattr(self, 'filename', '<no filename>')
1967 raise AssignLinksError(
1968 f"\n"
1969 f"{tag}...\n"
1970 f"file: {filename}\n"
1971 f"line: {token.line_number}\n"
1972 f" i: {i}\n"
1973 f"expected 'string' token, got {token!s}")
1974 # Accumulate string tokens.
1975 assert self.tokens[i].kind == 'string'
1976 results = []
1977 while i < len(self.tokens):
1978 token = self.tokens[i]
1979 i += 1
1980 if token.kind == 'string':
1981 results.append(token)
1982 elif token.kind == 'op' or is_significant_token(token):
1983 # Any significant token *or* any op will halt string concatenation.
1984 break
1985 # 'ws', 'nl', 'newline', 'comment', 'indent', 'dedent', etc.
1986 # The (significant) 'endmarker' token ensures we will have result.
1987 assert results
1988 if trace:
1989 g.printObj(results, tag=f"{tag}: Results")
1990 return results
1991 #@+node:ekr.20191113063144.51: *6* tog.Subscript
1992 # Subscript(expr value, slice slice, expr_context ctx)
1994 def do_Subscript(self, node):
1996 yield from self.gen(node.value)
1997 yield from self.gen_op('[')
1998 yield from self.gen(node.slice)
1999 yield from self.gen_op(']')
2000 #@+node:ekr.20191113063144.52: *6* tog.Tuple
2001 # Tuple(expr* elts, expr_context ctx)
2003 def do_Tuple(self, node):
2005 # Do not call gen_op for parens or commas here.
2006 # They do not necessarily exist in the token list!
2007 yield from self.gen(node.elts)
2008 #@+node:ekr.20191113063144.53: *5* tog: Operators
2009 #@+node:ekr.20191113063144.55: *6* tog.BinOp
2010 def do_BinOp(self, node):
2012 op_name_ = op_name(node.op)
2013 yield from self.gen(node.left)
2014 yield from self.gen_op(op_name_)
2015 yield from self.gen(node.right)
2016 #@+node:ekr.20191113063144.56: *6* tog.BoolOp
2017 # BoolOp(boolop op, expr* values)
2019 def do_BoolOp(self, node):
2021 # op.join(node.values)
2022 op_name_ = op_name(node.op)
2023 for i, z in enumerate(node.values):
2024 yield from self.gen(z)
2025 if i < len(node.values) - 1:
2026 yield from self.gen_name(op_name_)
2027 #@+node:ekr.20191113063144.57: *6* tog.Compare
2028 # Compare(expr left, cmpop* ops, expr* comparators)
2030 def do_Compare(self, node):
2032 assert len(node.ops) == len(node.comparators)
2033 yield from self.gen(node.left)
2034 for i, z in enumerate(node.ops):
2035 op_name_ = op_name(node.ops[i])
2036 if op_name_ in ('not in', 'is not'):
2037 for z in op_name_.split(' '):
2038 yield from self.gen_name(z)
2039 elif op_name_.isalpha():
2040 yield from self.gen_name(op_name_)
2041 else:
2042 yield from self.gen_op(op_name_)
2043 yield from self.gen(node.comparators[i])
2044 #@+node:ekr.20191113063144.58: *6* tog.UnaryOp
2045 def do_UnaryOp(self, node):
2047 op_name_ = op_name(node.op)
2048 if op_name_.isalpha():
2049 yield from self.gen_name(op_name_)
2050 else:
2051 yield from self.gen_op(op_name_)
2052 yield from self.gen(node.operand)
2053 #@+node:ekr.20191113063144.59: *6* tog.IfExp (ternary operator)
2054 # IfExp(expr test, expr body, expr orelse)
2056 def do_IfExp(self, node):
2058 #'%s if %s else %s'
2059 yield from self.gen(node.body)
2060 yield from self.gen_name('if')
2061 yield from self.gen(node.test)
2062 yield from self.gen_name('else')
2063 yield from self.gen(node.orelse)
2064 #@+node:ekr.20191113063144.60: *5* tog: Statements
2065 #@+node:ekr.20191113063144.83: *6* tog.Starred
2066 # Starred(expr value, expr_context ctx)
2068 def do_Starred(self, node):
2069 """A starred argument to an ast.Call"""
2070 yield from self.gen_op('*')
2071 yield from self.gen(node.value)
2072 #@+node:ekr.20191113063144.61: *6* tog.AnnAssign
2073 # AnnAssign(expr target, expr annotation, expr? value, int simple)
2075 def do_AnnAssign(self, node):
2077 # {node.target}:{node.annotation}={node.value}\n'
2078 yield from self.gen(node.target)
2079 yield from self.gen_op(':')
2080 yield from self.gen(node.annotation)
2081 if node.value is not None: # #1851
2082 yield from self.gen_op('=')
2083 yield from self.gen(node.value)
2084 #@+node:ekr.20191113063144.62: *6* tog.Assert
2085 # Assert(expr test, expr? msg)
2087 def do_Assert(self, node):
2089 # Guards...
2090 msg = getattr(node, 'msg', None)
2091 # No need to put parentheses or commas.
2092 yield from self.gen_name('assert')
2093 yield from self.gen(node.test)
2094 if msg is not None:
2095 yield from self.gen(node.msg)
2096 #@+node:ekr.20191113063144.63: *6* tog.Assign
2097 def do_Assign(self, node):
2099 for z in node.targets:
2100 yield from self.gen(z)
2101 yield from self.gen_op('=')
2102 yield from self.gen(node.value)
2103 #@+node:ekr.20191113063144.64: *6* tog.AsyncFor
2104 def do_AsyncFor(self, node):
2106 # The def line...
2107 # Py 3.8 changes the kind of token.
2108 async_token_type = 'async' if has_async_tokens else 'name'
2109 yield from self.gen_token(async_token_type, 'async')
2110 yield from self.gen_name('for')
2111 yield from self.gen(node.target)
2112 yield from self.gen_name('in')
2113 yield from self.gen(node.iter)
2114 yield from self.gen_op(':')
2115 # Body...
2116 self.level += 1
2117 yield from self.gen(node.body)
2118 # Else clause...
2119 if node.orelse:
2120 yield from self.gen_name('else')
2121 yield from self.gen_op(':')
2122 yield from self.gen(node.orelse)
2123 self.level -= 1
2124 #@+node:ekr.20191113063144.65: *6* tog.AsyncWith
2125 def do_AsyncWith(self, node):
2127 async_token_type = 'async' if has_async_tokens else 'name'
2128 yield from self.gen_token(async_token_type, 'async')
2129 yield from self.do_With(node)
2130 #@+node:ekr.20191113063144.66: *6* tog.AugAssign
2131 # AugAssign(expr target, operator op, expr value)
2133 def do_AugAssign(self, node):
2135 # %s%s=%s\n'
2136 op_name_ = op_name(node.op)
2137 yield from self.gen(node.target)
2138 yield from self.gen_op(op_name_ + '=')
2139 yield from self.gen(node.value)
2140 #@+node:ekr.20191113063144.67: *6* tog.Await
2141 # Await(expr value)
2143 def do_Await(self, node):
2145 #'await %s\n'
2146 async_token_type = 'await' if has_async_tokens else 'name'
2147 yield from self.gen_token(async_token_type, 'await')
2148 yield from self.gen(node.value)
2149 #@+node:ekr.20191113063144.68: *6* tog.Break
2150 def do_Break(self, node):
2152 yield from self.gen_name('break')
2153 #@+node:ekr.20191113063144.31: *6* tog.Call & helpers
2154 # Call(expr func, expr* args, keyword* keywords)
2156 # Python 3 ast.Call nodes do not have 'starargs' or 'kwargs' fields.
2158 def do_Call(self, node):
2160 # The calls to gen_op(')') and gen_op('(') do nothing by default.
2161 # Subclasses might handle them in an overridden tog.set_links.
2162 yield from self.gen(node.func)
2163 yield from self.gen_op('(')
2164 # No need to generate any commas.
2165 yield from self.handle_call_arguments(node)
2166 yield from self.gen_op(')')
2167 #@+node:ekr.20191204114930.1: *7* tog.arg_helper
2168 def arg_helper(self, node):
2169 """
2170 Yield the node, with a special case for strings.
2171 """
2172 if isinstance(node, str):
2173 yield from self.gen_token('name', node)
2174 else:
2175 yield from self.gen(node)
2176 #@+node:ekr.20191204105506.1: *7* tog.handle_call_arguments
2177 def handle_call_arguments(self, node):
2178 """
2179 Generate arguments in the correct order.
2181 Call(expr func, expr* args, keyword* keywords)
2183 https://docs.python.org/3/reference/expressions.html#calls
2185 Warning: This code will fail on Python 3.8 only for calls
2186 containing kwargs in unexpected places.
2187 """
2188 # *args: in node.args[]: Starred(value=Name(id='args'))
2189 # *[a, 3]: in node.args[]: Starred(value=List(elts=[Name(id='a'), Num(n=3)])
2190 # **kwargs: in node.keywords[]: keyword(arg=None, value=Name(id='kwargs'))
2191 #
2192 # Scan args for *name or *List
2193 args = node.args or []
2194 keywords = node.keywords or []
2196 def get_pos(obj):
2197 line1 = getattr(obj, 'lineno', None)
2198 col1 = getattr(obj, 'col_offset', None)
2199 return line1, col1, obj
2201 def sort_key(aTuple):
2202 line, col, obj = aTuple
2203 return line * 1000 + col
2205 if 0:
2206 g.printObj([ast.dump(z) for z in args], tag='args')
2207 g.printObj([ast.dump(z) for z in keywords], tag='keywords')
2209 if py_version >= (3, 9):
2210 places = [get_pos(z) for z in args + keywords]
2211 places.sort(key=sort_key)
2212 ordered_args = [z[2] for z in places]
2213 for z in ordered_args:
2214 if isinstance(z, ast.Starred):
2215 yield from self.gen_op('*')
2216 yield from self.gen(z.value)
2217 elif isinstance(z, ast.keyword):
2218 if getattr(z, 'arg', None) is None:
2219 yield from self.gen_op('**')
2220 yield from self.arg_helper(z.value)
2221 else:
2222 yield from self.arg_helper(z.arg)
2223 yield from self.gen_op('=')
2224 yield from self.arg_helper(z.value)
2225 else:
2226 yield from self.arg_helper(z)
2227 else: # pragma: no cover
2228 #
2229 # Legacy code: May fail for Python 3.8
2230 #
2231 # Scan args for *arg and *[...]
2232 kwarg_arg = star_arg = None
2233 for z in args:
2234 if isinstance(z, ast.Starred):
2235 if isinstance(z.value, ast.Name): # *Name.
2236 star_arg = z
2237 args.remove(z)
2238 break
2239 elif isinstance(z.value, (ast.List, ast.Tuple)): # *[...]
2240 # star_list = z
2241 break
2242 raise AttributeError(f"Invalid * expression: {ast.dump(z)}") # pragma: no cover
2243 # Scan keywords for **name.
2244 for z in keywords:
2245 if hasattr(z, 'arg') and z.arg is None:
2246 kwarg_arg = z
2247 keywords.remove(z)
2248 break
2249 # Sync the plain arguments.
2250 for z in args:
2251 yield from self.arg_helper(z)
2252 # Sync the keyword args.
2253 for z in keywords:
2254 yield from self.arg_helper(z.arg)
2255 yield from self.gen_op('=')
2256 yield from self.arg_helper(z.value)
2257 # Sync the * arg.
2258 if star_arg:
2259 yield from self.arg_helper(star_arg)
2260 # Sync the ** kwarg.
2261 if kwarg_arg:
2262 yield from self.gen_op('**')
2263 yield from self.gen(kwarg_arg.value)
2264 #@+node:ekr.20191113063144.69: *6* tog.Continue
2265 def do_Continue(self, node):
2267 yield from self.gen_name('continue')
2268 #@+node:ekr.20191113063144.70: *6* tog.Delete
2269 def do_Delete(self, node):
2271 # No need to put commas.
2272 yield from self.gen_name('del')
2273 yield from self.gen(node.targets)
2274 #@+node:ekr.20191113063144.71: *6* tog.ExceptHandler
2275 def do_ExceptHandler(self, node):
2277 # Except line...
2278 yield from self.gen_name('except')
2279 if getattr(node, 'type', None):
2280 yield from self.gen(node.type)
2281 if getattr(node, 'name', None):
2282 yield from self.gen_name('as')
2283 yield from self.gen_name(node.name)
2284 yield from self.gen_op(':')
2285 # Body...
2286 self.level += 1
2287 yield from self.gen(node.body)
2288 self.level -= 1
2289 #@+node:ekr.20191113063144.73: *6* tog.For
2290 def do_For(self, node):
2292 # The def line...
2293 yield from self.gen_name('for')
2294 yield from self.gen(node.target)
2295 yield from self.gen_name('in')
2296 yield from self.gen(node.iter)
2297 yield from self.gen_op(':')
2298 # Body...
2299 self.level += 1
2300 yield from self.gen(node.body)
2301 # Else clause...
2302 if node.orelse:
2303 yield from self.gen_name('else')
2304 yield from self.gen_op(':')
2305 yield from self.gen(node.orelse)
2306 self.level -= 1
2307 #@+node:ekr.20191113063144.74: *6* tog.Global
2308 # Global(identifier* names)
2310 def do_Global(self, node):
2312 yield from self.gen_name('global')
2313 for z in node.names:
2314 yield from self.gen_name(z)
2315 #@+node:ekr.20191113063144.75: *6* tog.If & helpers
2316 # If(expr test, stmt* body, stmt* orelse)
2318 def do_If(self, node):
2319 #@+<< do_If docstring >>
2320 #@+node:ekr.20191122222412.1: *7* << do_If docstring >>
2321 """
2322 The parse trees for the following are identical!
2324 if 1: if 1:
2325 pass pass
2326 else: elif 2:
2327 if 2: pass
2328 pass
2330 So there is *no* way for the 'if' visitor to disambiguate the above two
2331 cases from the parse tree alone.
2333 Instead, we scan the tokens list for the next 'if', 'else' or 'elif' token.
2334 """
2335 #@-<< do_If docstring >>
2336 # Use the next significant token to distinguish between 'if' and 'elif'.
2337 token = self.find_next_significant_token()
2338 yield from self.gen_name(token.value)
2339 yield from self.gen(node.test)
2340 yield from self.gen_op(':')
2341 #
2342 # Body...
2343 self.level += 1
2344 yield from self.gen(node.body)
2345 self.level -= 1
2346 #
2347 # Else and elif clauses...
2348 if node.orelse:
2349 self.level += 1
2350 token = self.find_next_significant_token()
2351 if token.value == 'else':
2352 yield from self.gen_name('else')
2353 yield from self.gen_op(':')
2354 yield from self.gen(node.orelse)
2355 else:
2356 yield from self.gen(node.orelse)
2357 self.level -= 1
2358 #@+node:ekr.20191113063144.76: *6* tog.Import & helper
2359 def do_Import(self, node):
2361 yield from self.gen_name('import')
2362 for alias in node.names:
2363 yield from self.gen_name(alias.name)
2364 if alias.asname:
2365 yield from self.gen_name('as')
2366 yield from self.gen_name(alias.asname)
2367 #@+node:ekr.20191113063144.77: *6* tog.ImportFrom
2368 # ImportFrom(identifier? module, alias* names, int? level)
2370 def do_ImportFrom(self, node):
2372 yield from self.gen_name('from')
2373 for i in range(node.level):
2374 yield from self.gen_op('.')
2375 if node.module:
2376 yield from self.gen_name(node.module)
2377 yield from self.gen_name('import')
2378 # No need to put commas.
2379 for alias in node.names:
2380 if alias.name == '*': # #1851.
2381 yield from self.gen_op('*')
2382 else:
2383 yield from self.gen_name(alias.name)
2384 if alias.asname:
2385 yield from self.gen_name('as')
2386 yield from self.gen_name(alias.asname)
2387 #@+node:ekr.20191113063144.78: *6* tog.Nonlocal
2388 # Nonlocal(identifier* names)
2390 def do_Nonlocal(self, node):
2392 # nonlocal %s\n' % ','.join(node.names))
2393 # No need to put commas.
2394 yield from self.gen_name('nonlocal')
2395 for z in node.names:
2396 yield from self.gen_name(z)
2397 #@+node:ekr.20191113063144.79: *6* tog.Pass
2398 def do_Pass(self, node):
2400 yield from self.gen_name('pass')
2401 #@+node:ekr.20191113063144.81: *6* tog.Raise
2402 # Raise(expr? exc, expr? cause)
2404 def do_Raise(self, node):
2406 # No need to put commas.
2407 yield from self.gen_name('raise')
2408 exc = getattr(node, 'exc', None)
2409 cause = getattr(node, 'cause', None)
2410 tback = getattr(node, 'tback', None)
2411 yield from self.gen(exc)
2412 yield from self.gen(cause)
2413 yield from self.gen(tback)
2414 #@+node:ekr.20191113063144.82: *6* tog.Return
2415 def do_Return(self, node):
2417 yield from self.gen_name('return')
2418 yield from self.gen(node.value)
2419 #@+node:ekr.20191113063144.85: *6* tog.Try
2420 # Try(stmt* body, excepthandler* handlers, stmt* orelse, stmt* finalbody)
2422 def do_Try(self, node):
2424 # Try line...
2425 yield from self.gen_name('try')
2426 yield from self.gen_op(':')
2427 # Body...
2428 self.level += 1
2429 yield from self.gen(node.body)
2430 yield from self.gen(node.handlers)
2431 # Else...
2432 if node.orelse:
2433 yield from self.gen_name('else')
2434 yield from self.gen_op(':')
2435 yield from self.gen(node.orelse)
2436 # Finally...
2437 if node.finalbody:
2438 yield from self.gen_name('finally')
2439 yield from self.gen_op(':')
2440 yield from self.gen(node.finalbody)
2441 self.level -= 1
2442 #@+node:ekr.20191113063144.88: *6* tog.While
2443 def do_While(self, node):
2445 # While line...
2446 # while %s:\n'
2447 yield from self.gen_name('while')
2448 yield from self.gen(node.test)
2449 yield from self.gen_op(':')
2450 # Body...
2451 self.level += 1
2452 yield from self.gen(node.body)
2453 # Else clause...
2454 if node.orelse:
2455 yield from self.gen_name('else')
2456 yield from self.gen_op(':')
2457 yield from self.gen(node.orelse)
2458 self.level -= 1
2459 #@+node:ekr.20191113063144.89: *6* tog.With
2460 # With(withitem* items, stmt* body)
2462 # withitem = (expr context_expr, expr? optional_vars)
2464 def do_With(self, node):
2466 expr: Optional[ast.AST] = getattr(node, 'context_expression', None)
2467 items: List[ast.AST] = getattr(node, 'items', [])
2468 yield from self.gen_name('with')
2469 yield from self.gen(expr)
2470 # No need to put commas.
2471 for item in items:
2472 yield from self.gen(item.context_expr) # type:ignore
2473 optional_vars = getattr(item, 'optional_vars', None)
2474 if optional_vars is not None:
2475 yield from self.gen_name('as')
2476 yield from self.gen(item.optional_vars) # type:ignore
2477 # End the line.
2478 yield from self.gen_op(':')
2479 # Body...
2480 self.level += 1
2481 yield from self.gen(node.body)
2482 self.level -= 1
2483 #@+node:ekr.20191113063144.90: *6* tog.Yield
2484 def do_Yield(self, node):
2486 yield from self.gen_name('yield')
2487 if hasattr(node, 'value'):
2488 yield from self.gen(node.value)
2489 #@+node:ekr.20191113063144.91: *6* tog.YieldFrom
2490 # YieldFrom(expr value)
2492 def do_YieldFrom(self, node):
2494 yield from self.gen_name('yield')
2495 yield from self.gen_name('from')
2496 yield from self.gen(node.value)
2497 #@-others
2498#@+node:ekr.20191226195813.1: *3* class TokenOrderTraverser
2499class TokenOrderTraverser:
2500 """
2501 Traverse an ast tree using the parent/child links created by the
2502 TokenOrderInjector class.
2503 """
2504 #@+others
2505 #@+node:ekr.20191226200154.1: *4* TOT.traverse
2506 def traverse(self, tree):
2507 """
2508 Call visit, in token order, for all nodes in tree.
2510 Recursion is not allowed.
2512 The code follows p.moveToThreadNext exactly.
2513 """
2515 def has_next(i, node, stack):
2516 """Return True if stack[i] is a valid child of node.parent."""
2517 # g.trace(node.__class__.__name__, stack)
2518 parent = node.parent
2519 return bool(parent and parent.children and i < len(parent.children))
2521 # Update stats
2523 self.last_node_index = -1 # For visit
2524 # The stack contains child indices.
2525 node, stack = tree, [0]
2526 seen = set()
2527 while node and stack:
2528 # g.trace(
2529 # f"{node.node_index:>3} "
2530 # f"{node.__class__.__name__:<12} {stack}")
2531 # Visit the node.
2532 assert node.node_index not in seen, node.node_index
2533 seen.add(node.node_index)
2534 self.visit(node)
2535 # if p.v.children: p.moveToFirstChild()
2536 children: List[ast.AST] = getattr(node, 'children', [])
2537 if children:
2538 # Move to the first child.
2539 stack.append(0)
2540 node = children[0]
2541 # g.trace(' child:', node.__class__.__name__, stack)
2542 continue
2543 # elif p.hasNext(): p.moveToNext()
2544 stack[-1] += 1
2545 i = stack[-1]
2546 if has_next(i, node, stack):
2547 node = node.parent.children[i]
2548 continue
2549 # else...
2550 # p.moveToParent()
2551 node = node.parent
2552 stack.pop()
2553 # while p:
2554 while node and stack:
2555 # if p.hasNext():
2556 stack[-1] += 1
2557 i = stack[-1]
2558 if has_next(i, node, stack):
2559 # Move to the next sibling.
2560 node = node.parent.children[i]
2561 break # Found.
2562 # p.moveToParent()
2563 node = node.parent
2564 stack.pop()
2565 # not found.
2566 else:
2567 break # pragma: no cover
2568 return self.last_node_index
2569 #@+node:ekr.20191227160547.1: *4* TOT.visit
2570 def visit(self, node):
2572 self.last_node_index += 1
2573 assert self.last_node_index == node.node_index, (
2574 self.last_node_index, node.node_index)
2575 #@-others
2576#@+node:ekr.20200107165250.1: *3* class Orange
2577class Orange:
2578 """
2579 A flexible and powerful beautifier for Python.
2580 Orange is the new black.
2582 *Important*: This is a predominantly a *token*-based beautifier.
2583 However, orange.colon and orange.possible_unary_op use the parse
2584 tree to provide context that would otherwise be difficult to
2585 deduce.
2586 """
2587 # This switch is really a comment. It will always be false.
2588 # It marks the code that simulates the operation of the black tool.
2589 black_mode = False
2591 # Patterns...
2592 nobeautify_pat = re.compile(r'\s*#\s*pragma:\s*no\s*beautify\b|#\s*@@nobeautify')
2594 # Patterns from FastAtRead class, specialized for python delims.
2595 node_pat = re.compile(r'^(\s*)#@\+node:([^:]+): \*(\d+)?(\*?) (.*)$') # @node
2596 start_doc_pat = re.compile(r'^\s*#@\+(at|doc)?(\s.*?)?$') # @doc or @
2597 at_others_pat = re.compile(r'^(\s*)#@(\+|-)others\b(.*)$') # @others
2599 # Doc parts end with @c or a node sentinel. Specialized for python.
2600 end_doc_pat = re.compile(r"^\s*#@(@(c(ode)?)|([+]node\b.*))$")
2601 #@+others
2602 #@+node:ekr.20200107165250.2: *4* orange.ctor
2603 def __init__(self, settings=None):
2604 """Ctor for Orange class."""
2605 if settings is None:
2606 settings = {}
2607 valid_keys = (
2608 'allow_joined_strings',
2609 'max_join_line_length',
2610 'max_split_line_length',
2611 'orange',
2612 'tab_width',
2613 )
2614 # For mypy...
2615 self.kind: str = ''
2616 # Default settings...
2617 self.allow_joined_strings = False # EKR's preference.
2618 self.max_join_line_length = 88
2619 self.max_split_line_length = 88
2620 self.tab_width = 4
2621 # Override from settings dict...
2622 for key in settings: # pragma: no cover
2623 value = settings.get(key)
2624 if key in valid_keys and value is not None:
2625 setattr(self, key, value)
2626 else:
2627 g.trace(f"Unexpected setting: {key} = {value!r}")
2628 #@+node:ekr.20200107165250.51: *4* orange.push_state
2629 def push_state(self, kind, value=None):
2630 """Append a state to the state stack."""
2631 state = ParseState(kind, value)
2632 self.state_stack.append(state)
2633 #@+node:ekr.20200107165250.8: *4* orange: Entries
2634 #@+node:ekr.20200107173542.1: *5* orange.beautify (main token loop)
2635 def oops(self):
2636 g.trace(f"Unknown kind: {self.kind}")
2638 def beautify(self, contents, filename, tokens, tree, max_join_line_length=None, max_split_line_length=None):
2639 """
2640 The main line. Create output tokens and return the result as a string.
2641 """
2642 # Config overrides
2643 if max_join_line_length is not None:
2644 self.max_join_line_length = max_join_line_length
2645 if max_split_line_length is not None:
2646 self.max_split_line_length = max_split_line_length
2647 # State vars...
2648 self.curly_brackets_level = 0 # Number of unmatched '{' tokens.
2649 self.decorator_seen = False # Set by do_name for do_op.
2650 self.in_arg_list = 0 # > 0 if in an arg list of a def.
2651 self.level = 0 # Set only by do_indent and do_dedent.
2652 self.lws = '' # Leading whitespace.
2653 self.paren_level = 0 # Number of unmatched '(' tokens.
2654 self.square_brackets_stack: List[bool] = [] # A stack of bools, for self.word().
2655 self.state_stack: List["ParseState"] = [] # Stack of ParseState objects.
2656 self.val = None # The input token's value (a string).
2657 self.verbatim = False # True: don't beautify.
2658 #
2659 # Init output list and state...
2660 self.code_list: List[Token] = [] # The list of output tokens.
2661 self.code_list_index = 0 # The token's index.
2662 self.tokens = tokens # The list of input tokens.
2663 self.tree = tree
2664 self.add_token('file-start', '')
2665 self.push_state('file-start')
2666 for i, token in enumerate(tokens):
2667 self.token = token
2668 self.kind, self.val, self.line = token.kind, token.value, token.line
2669 if self.verbatim:
2670 self.do_verbatim()
2671 else:
2672 func = getattr(self, f"do_{token.kind}", self.oops)
2673 func()
2674 # Any post pass would go here.
2675 return tokens_to_string(self.code_list)
2676 #@+node:ekr.20200107172450.1: *5* orange.beautify_file (entry)
2677 def beautify_file(self, filename): # pragma: no cover
2678 """
2679 Orange: Beautify the the given external file.
2681 Return True if the file was changed.
2682 """
2683 tag = 'beautify-file'
2684 self.filename = filename
2685 tog = TokenOrderGenerator()
2686 contents, encoding, tokens, tree = tog.init_from_file(filename)
2687 if not contents or not tokens or not tree:
2688 print(f"{tag}: Can not beautify: {filename}")
2689 return False
2690 # Beautify.
2691 results = self.beautify(contents, filename, tokens, tree)
2692 # Something besides newlines must change.
2693 if regularize_nls(contents) == regularize_nls(results):
2694 print(f"{tag}: Unchanged: {filename}")
2695 return False
2696 if 0: # This obscures more import error messages.
2697 # Show the diffs.
2698 show_diffs(contents, results, filename=filename)
2699 # Write the results
2700 print(f"{tag}: Wrote {filename}")
2701 write_file(filename, results, encoding=encoding)
2702 return True
2703 #@+node:ekr.20200107172512.1: *5* orange.beautify_file_diff (entry)
2704 def beautify_file_diff(self, filename): # pragma: no cover
2705 """
2706 Orange: Print the diffs that would resulf from the orange-file command.
2708 Return True if the file would be changed.
2709 """
2710 tag = 'diff-beautify-file'
2711 self.filename = filename
2712 tog = TokenOrderGenerator()
2713 contents, encoding, tokens, tree = tog.init_from_file(filename)
2714 if not contents or not tokens or not tree:
2715 print(f"{tag}: Can not beautify: {filename}")
2716 return False
2717 # fstringify.
2718 results = self.beautify(contents, filename, tokens, tree)
2719 # Something besides newlines must change.
2720 if regularize_nls(contents) == regularize_nls(results):
2721 print(f"{tag}: Unchanged: {filename}")
2722 return False
2723 # Show the diffs.
2724 show_diffs(contents, results, filename=filename)
2725 return True
2726 #@+node:ekr.20200107165250.13: *4* orange: Input token handlers
2727 #@+node:ekr.20200107165250.14: *5* orange.do_comment
2728 in_doc_part = False
2730 def do_comment(self):
2731 """Handle a comment token."""
2732 val = self.val
2733 #
2734 # Leo-specific code...
2735 if self.node_pat.match(val):
2736 # Clear per-node state.
2737 self.in_doc_part = False
2738 self.verbatim = False
2739 self.decorator_seen = False
2740 # Do *not clear other state, which may persist across @others.
2741 # self.curly_brackets_level = 0
2742 # self.in_arg_list = 0
2743 # self.level = 0
2744 # self.lws = ''
2745 # self.paren_level = 0
2746 # self.square_brackets_stack = []
2747 # self.state_stack = []
2748 else:
2749 # Keep track of verbatim mode.
2750 if self.beautify_pat.match(val):
2751 self.verbatim = False
2752 elif self.nobeautify_pat.match(val):
2753 self.verbatim = True
2754 # Keep trace of @doc parts, to honor the convention for splitting lines.
2755 if self.start_doc_pat.match(val):
2756 self.in_doc_part = True
2757 if self.end_doc_pat.match(val):
2758 self.in_doc_part = False
2759 #
2760 # General code: Generate the comment.
2761 self.clean('blank')
2762 entire_line = self.line.lstrip().startswith('#')
2763 if entire_line:
2764 self.clean('hard-blank')
2765 self.clean('line-indent')
2766 # #1496: No further munging needed.
2767 val = self.line.rstrip()
2768 else:
2769 # Exactly two spaces before trailing comments.
2770 val = ' ' + self.val.rstrip()
2771 self.add_token('comment', val)
2772 #@+node:ekr.20200107165250.15: *5* orange.do_encoding
2773 def do_encoding(self):
2774 """
2775 Handle the encoding token.
2776 """
2777 pass
2778 #@+node:ekr.20200107165250.16: *5* orange.do_endmarker
2779 def do_endmarker(self):
2780 """Handle an endmarker token."""
2781 # Ensure exactly one blank at the end of the file.
2782 self.clean_blank_lines()
2783 self.add_token('line-end', '\n')
2784 #@+node:ekr.20200107165250.18: *5* orange.do_indent & do_dedent & helper
2785 def do_dedent(self):
2786 """Handle dedent token."""
2787 self.level -= 1
2788 self.lws = self.level * self.tab_width * ' '
2789 self.line_indent()
2790 if self.black_mode: # pragma: no cover (black)
2791 state = self.state_stack[-1]
2792 if state.kind == 'indent' and state.value == self.level:
2793 self.state_stack.pop()
2794 state = self.state_stack[-1]
2795 if state.kind in ('class', 'def'):
2796 self.state_stack.pop()
2797 self.handle_dedent_after_class_or_def(state.kind)
2799 def do_indent(self):
2800 """Handle indent token."""
2801 new_indent = self.val
2802 old_indent = self.level * self.tab_width * ' '
2803 if new_indent > old_indent:
2804 self.level += 1
2805 elif new_indent < old_indent: # pragma: no cover (defensive)
2806 g.trace('\n===== can not happen', repr(new_indent), repr(old_indent))
2807 self.lws = new_indent
2808 self.line_indent()
2809 #@+node:ekr.20200220054928.1: *6* orange.handle_dedent_after_class_or_def
2810 def handle_dedent_after_class_or_def(self, kind): # pragma: no cover (black)
2811 """
2812 Insert blank lines after a class or def as the result of a 'dedent' token.
2814 Normal comment lines may precede the 'dedent'.
2815 Insert the blank lines *before* such comment lines.
2816 """
2817 #
2818 # Compute the tail.
2819 i = len(self.code_list) - 1
2820 tail: List[Token] = []
2821 while i > 0:
2822 t = self.code_list.pop()
2823 i -= 1
2824 if t.kind == 'line-indent':
2825 pass
2826 elif t.kind == 'line-end':
2827 tail.insert(0, t)
2828 elif t.kind == 'comment':
2829 # Only underindented single-line comments belong in the tail.
2830 # @+node comments must never be in the tail.
2831 single_line = self.code_list[i].kind in ('line-end', 'line-indent')
2832 lws = len(t.value) - len(t.value.lstrip())
2833 underindent = lws <= len(self.lws)
2834 if underindent and single_line and not self.node_pat.match(t.value):
2835 # A single-line comment.
2836 tail.insert(0, t)
2837 else:
2838 self.code_list.append(t)
2839 break
2840 else:
2841 self.code_list.append(t)
2842 break
2843 #
2844 # Remove leading 'line-end' tokens from the tail.
2845 while tail and tail[0].kind == 'line-end':
2846 tail = tail[1:]
2847 #
2848 # Put the newlines *before* the tail.
2849 # For Leo, always use 1 blank lines.
2850 n = 1 # n = 2 if kind == 'class' else 1
2851 # Retain the token (intention) for debugging.
2852 self.add_token('blank-lines', n)
2853 for i in range(0, n + 1):
2854 self.add_token('line-end', '\n')
2855 if tail:
2856 self.code_list.extend(tail)
2857 self.line_indent()
2858 #@+node:ekr.20200107165250.20: *5* orange.do_name
2859 def do_name(self):
2860 """Handle a name token."""
2861 name = self.val
2862 if self.black_mode and name in ('class', 'def'): # pragma: no cover (black)
2863 # Handle newlines before and after 'class' or 'def'
2864 self.decorator_seen = False
2865 state = self.state_stack[-1]
2866 if state.kind == 'decorator':
2867 # Always do this, regardless of @bool clean-blank-lines.
2868 self.clean_blank_lines()
2869 # Suppress split/join.
2870 self.add_token('hard-newline', '\n')
2871 self.add_token('line-indent', self.lws)
2872 self.state_stack.pop()
2873 else:
2874 # Always do this, regardless of @bool clean-blank-lines.
2875 self.blank_lines(2 if name == 'class' else 1)
2876 self.push_state(name)
2877 self.push_state('indent', self.level)
2878 # For trailing lines after inner classes/defs.
2879 self.word(name)
2880 return
2881 #
2882 # Leo mode...
2883 if name in ('class', 'def'):
2884 self.word(name)
2885 elif name in (
2886 'and', 'elif', 'else', 'for', 'if', 'in', 'not', 'not in', 'or', 'while'
2887 ):
2888 self.word_op(name)
2889 else:
2890 self.word(name)
2891 #@+node:ekr.20200107165250.21: *5* orange.do_newline & do_nl
2892 def do_newline(self):
2893 """Handle a regular newline."""
2894 self.line_end()
2896 def do_nl(self):
2897 """Handle a continuation line."""
2898 self.line_end()
2899 #@+node:ekr.20200107165250.22: *5* orange.do_number
2900 def do_number(self):
2901 """Handle a number token."""
2902 self.blank()
2903 self.add_token('number', self.val)
2904 #@+node:ekr.20200107165250.23: *5* orange.do_op
2905 def do_op(self):
2906 """Handle an op token."""
2907 val = self.val
2908 if val == '.':
2909 self.clean('blank')
2910 self.add_token('op-no-blanks', val)
2911 elif val == '@':
2912 if self.black_mode: # pragma: no cover (black)
2913 if not self.decorator_seen:
2914 self.blank_lines(1)
2915 self.decorator_seen = True
2916 self.clean('blank')
2917 self.add_token('op-no-blanks', val)
2918 self.push_state('decorator')
2919 elif val == ':':
2920 # Treat slices differently.
2921 self.colon(val)
2922 elif val in ',;':
2923 # Pep 8: Avoid extraneous whitespace immediately before
2924 # comma, semicolon, or colon.
2925 self.clean('blank')
2926 self.add_token('op', val)
2927 self.blank()
2928 elif val in '([{':
2929 # Pep 8: Avoid extraneous whitespace immediately inside
2930 # parentheses, brackets or braces.
2931 self.lt(val)
2932 elif val in ')]}':
2933 # Ditto.
2934 self.rt(val)
2935 elif val == '=':
2936 # Pep 8: Don't use spaces around the = sign when used to indicate
2937 # a keyword argument or a default parameter value.
2938 if self.paren_level:
2939 self.clean('blank')
2940 self.add_token('op-no-blanks', val)
2941 else:
2942 self.blank()
2943 self.add_token('op', val)
2944 self.blank()
2945 elif val in '~+-':
2946 self.possible_unary_op(val)
2947 elif val == '*':
2948 self.star_op()
2949 elif val == '**':
2950 self.star_star_op()
2951 else:
2952 # Pep 8: always surround binary operators with a single space.
2953 # '==','+=','-=','*=','**=','/=','//=','%=','!=','<=','>=','<','>',
2954 # '^','~','*','**','&','|','/','//',
2955 # Pep 8: If operators with different priorities are used,
2956 # consider adding whitespace around the operators with the lowest priority(ies).
2957 self.blank()
2958 self.add_token('op', val)
2959 self.blank()
2960 #@+node:ekr.20200107165250.24: *5* orange.do_string
2961 def do_string(self):
2962 """Handle a 'string' token."""
2963 # Careful: continued strings may contain '\r'
2964 val = regularize_nls(self.val)
2965 self.add_token('string', val)
2966 self.blank()
2967 #@+node:ekr.20200210175117.1: *5* orange.do_verbatim
2968 beautify_pat = re.compile(
2969 r'#\s*pragma:\s*beautify\b|#\s*@@beautify|#\s*@\+node|#\s*@[+-]others|#\s*@[+-]<<')
2971 def do_verbatim(self):
2972 """
2973 Handle one token in verbatim mode.
2974 End verbatim mode when the appropriate comment is seen.
2975 """
2976 kind = self.kind
2977 #
2978 # Careful: tokens may contain '\r'
2979 val = regularize_nls(self.val)
2980 if kind == 'comment':
2981 if self.beautify_pat.match(val):
2982 self.verbatim = False
2983 val = val.rstrip()
2984 self.add_token('comment', val)
2985 return
2986 if kind == 'indent':
2987 self.level += 1
2988 self.lws = self.level * self.tab_width * ' '
2989 if kind == 'dedent':
2990 self.level -= 1
2991 self.lws = self.level * self.tab_width * ' '
2992 self.add_token('verbatim', val)
2993 #@+node:ekr.20200107165250.25: *5* orange.do_ws
2994 def do_ws(self):
2995 """
2996 Handle the "ws" pseudo-token.
2998 Put the whitespace only if if ends with backslash-newline.
2999 """
3000 val = self.val
3001 # Handle backslash-newline.
3002 if '\\\n' in val:
3003 self.clean('blank')
3004 self.add_token('op-no-blanks', val)
3005 return
3006 # Handle start-of-line whitespace.
3007 prev = self.code_list[-1]
3008 inner = self.paren_level or self.square_brackets_stack or self.curly_brackets_level
3009 if prev.kind == 'line-indent' and inner:
3010 # Retain the indent that won't be cleaned away.
3011 self.clean('line-indent')
3012 self.add_token('hard-blank', val)
3013 #@+node:ekr.20200107165250.26: *4* orange: Output token generators
3014 #@+node:ekr.20200118145044.1: *5* orange.add_line_end
3015 def add_line_end(self):
3016 """Add a line-end request to the code list."""
3017 # This may be called from do_name as well as do_newline and do_nl.
3018 assert self.token.kind in ('newline', 'nl'), self.token.kind
3019 self.clean('blank') # Important!
3020 self.clean('line-indent')
3021 t = self.add_token('line-end', '\n')
3022 # Distinguish between kinds of 'line-end' tokens.
3023 t.newline_kind = self.token.kind
3024 return t
3025 #@+node:ekr.20200107170523.1: *5* orange.add_token
3026 def add_token(self, kind, value):
3027 """Add an output token to the code list."""
3028 tok = Token(kind, value)
3029 tok.index = self.code_list_index # For debugging only.
3030 self.code_list_index += 1
3031 self.code_list.append(tok)
3032 return tok
3033 #@+node:ekr.20200107165250.27: *5* orange.blank
3034 def blank(self):
3035 """Add a blank request to the code list."""
3036 prev = self.code_list[-1]
3037 if prev.kind not in (
3038 'blank',
3039 'blank-lines',
3040 'file-start',
3041 'hard-blank', # Unique to orange.
3042 'line-end',
3043 'line-indent',
3044 'lt',
3045 'op-no-blanks',
3046 'unary-op',
3047 ):
3048 self.add_token('blank', ' ')
3049 #@+node:ekr.20200107165250.29: *5* orange.blank_lines (black only)
3050 def blank_lines(self, n): # pragma: no cover (black)
3051 """
3052 Add a request for n blank lines to the code list.
3053 Multiple blank-lines request yield at least the maximum of all requests.
3054 """
3055 self.clean_blank_lines()
3056 prev = self.code_list[-1]
3057 if prev.kind == 'file-start':
3058 self.add_token('blank-lines', n)
3059 return
3060 for i in range(0, n + 1):
3061 self.add_token('line-end', '\n')
3062 # Retain the token (intention) for debugging.
3063 self.add_token('blank-lines', n)
3064 self.line_indent()
3065 #@+node:ekr.20200107165250.30: *5* orange.clean
3066 def clean(self, kind):
3067 """Remove the last item of token list if it has the given kind."""
3068 prev = self.code_list[-1]
3069 if prev.kind == kind:
3070 self.code_list.pop()
3071 #@+node:ekr.20200107165250.31: *5* orange.clean_blank_lines
3072 def clean_blank_lines(self):
3073 """
3074 Remove all vestiges of previous blank lines.
3076 Return True if any of the cleaned 'line-end' tokens represented "hard" newlines.
3077 """
3078 cleaned_newline = False
3079 table = ('blank-lines', 'line-end', 'line-indent')
3080 while self.code_list[-1].kind in table:
3081 t = self.code_list.pop()
3082 if t.kind == 'line-end' and getattr(t, 'newline_kind', None) != 'nl':
3083 cleaned_newline = True
3084 return cleaned_newline
3085 #@+node:ekr.20200107165250.32: *5* orange.colon
3086 def colon(self, val):
3087 """Handle a colon."""
3089 def is_expr(node):
3090 """True if node is any expression other than += number."""
3091 if isinstance(node, (ast.BinOp, ast.Call, ast.IfExp)):
3092 return True
3093 return isinstance(
3094 node, ast.UnaryOp) and not isinstance(node.operand, ast.Num)
3096 node = self.token.node
3097 self.clean('blank')
3098 if not isinstance(node, ast.Slice):
3099 self.add_token('op', val)
3100 self.blank()
3101 return
3102 # A slice.
3103 lower = getattr(node, 'lower', None)
3104 upper = getattr(node, 'upper', None)
3105 step = getattr(node, 'step', None)
3106 if any(is_expr(z) for z in (lower, upper, step)):
3107 prev = self.code_list[-1]
3108 if prev.value not in '[:':
3109 self.blank()
3110 self.add_token('op', val)
3111 self.blank()
3112 else:
3113 self.add_token('op-no-blanks', val)
3114 #@+node:ekr.20200107165250.33: *5* orange.line_end
3115 def line_end(self):
3116 """Add a line-end request to the code list."""
3117 # This should be called only be do_newline and do_nl.
3118 node, token = self.token.statement_node, self.token
3119 assert token.kind in ('newline', 'nl'), (token.kind, g.callers())
3120 # Create the 'line-end' output token.
3121 self.add_line_end()
3122 # Attempt to split the line.
3123 was_split = self.split_line(node, token)
3124 # Attempt to join the line only if it has not just been split.
3125 if not was_split and self.max_join_line_length > 0:
3126 self.join_lines(node, token)
3127 self.line_indent()
3128 # Add the indentation for all lines
3129 # until the next indent or unindent token.
3130 #@+node:ekr.20200107165250.40: *5* orange.line_indent
3131 def line_indent(self):
3132 """Add a line-indent token."""
3133 self.clean('line-indent')
3134 # Defensive. Should never happen.
3135 self.add_token('line-indent', self.lws)
3136 #@+node:ekr.20200107165250.41: *5* orange.lt & rt
3137 #@+node:ekr.20200107165250.42: *6* orange.lt
3138 def lt(self, val):
3139 """Generate code for a left paren or curly/square bracket."""
3140 assert val in '([{', repr(val)
3141 if val == '(':
3142 self.paren_level += 1
3143 elif val == '[':
3144 self.square_brackets_stack.append(False)
3145 else:
3146 self.curly_brackets_level += 1
3147 self.clean('blank')
3148 prev = self.code_list[-1]
3149 if prev.kind in ('op', 'word-op'):
3150 self.blank()
3151 self.add_token('lt', val)
3152 elif prev.kind == 'word':
3153 # Only suppress blanks before '(' or '[' for non-keyworks.
3154 if val == '{' or prev.value in ('if', 'else', 'return', 'for'):
3155 self.blank()
3156 elif val == '(':
3157 self.in_arg_list += 1
3158 self.add_token('lt', val)
3159 else:
3160 self.clean('blank')
3161 self.add_token('op-no-blanks', val)
3162 #@+node:ekr.20200107165250.43: *6* orange.rt
3163 def rt(self, val):
3164 """Generate code for a right paren or curly/square bracket."""
3165 assert val in ')]}', repr(val)
3166 if val == ')':
3167 self.paren_level -= 1
3168 self.in_arg_list = max(0, self.in_arg_list - 1)
3169 elif val == ']':
3170 self.square_brackets_stack.pop()
3171 else:
3172 self.curly_brackets_level -= 1
3173 self.clean('blank')
3174 self.add_token('rt', val)
3175 #@+node:ekr.20200107165250.45: *5* orange.possible_unary_op & unary_op
3176 def possible_unary_op(self, s):
3177 """Add a unary or binary op to the token list."""
3178 node = self.token.node
3179 self.clean('blank')
3180 if isinstance(node, ast.UnaryOp):
3181 self.unary_op(s)
3182 else:
3183 self.blank()
3184 self.add_token('op', s)
3185 self.blank()
3187 def unary_op(self, s):
3188 """Add an operator request to the code list."""
3189 assert s and isinstance(s, str), repr(s)
3190 self.clean('blank')
3191 prev = self.code_list[-1]
3192 if prev.kind == 'lt':
3193 self.add_token('unary-op', s)
3194 else:
3195 self.blank()
3196 self.add_token('unary-op', s)
3197 #@+node:ekr.20200107165250.46: *5* orange.star_op
3198 def star_op(self):
3199 """Put a '*' op, with special cases for *args."""
3200 val = '*'
3201 self.clean('blank')
3202 if self.paren_level > 0:
3203 prev = self.code_list[-1]
3204 if prev.kind == 'lt' or (prev.kind, prev.value) == ('op', ','):
3205 self.blank()
3206 self.add_token('op', val)
3207 return
3208 self.blank()
3209 self.add_token('op', val)
3210 self.blank()
3211 #@+node:ekr.20200107165250.47: *5* orange.star_star_op
3212 def star_star_op(self):
3213 """Put a ** operator, with a special case for **kwargs."""
3214 val = '**'
3215 self.clean('blank')
3216 if self.paren_level > 0:
3217 prev = self.code_list[-1]
3218 if prev.kind == 'lt' or (prev.kind, prev.value) == ('op', ','):
3219 self.blank()
3220 self.add_token('op', val)
3221 return
3222 self.blank()
3223 self.add_token('op', val)
3224 self.blank()
3225 #@+node:ekr.20200107165250.48: *5* orange.word & word_op
3226 def word(self, s):
3227 """Add a word request to the code list."""
3228 assert s and isinstance(s, str), repr(s)
3229 if self.square_brackets_stack:
3230 # A previous 'op-no-blanks' token may cancel this blank.
3231 self.blank()
3232 self.add_token('word', s)
3233 elif self.in_arg_list > 0:
3234 self.add_token('word', s)
3235 self.blank()
3236 else:
3237 self.blank()
3238 self.add_token('word', s)
3239 self.blank()
3241 def word_op(self, s):
3242 """Add a word-op request to the code list."""
3243 assert s and isinstance(s, str), repr(s)
3244 self.blank()
3245 self.add_token('word-op', s)
3246 self.blank()
3247 #@+node:ekr.20200118120049.1: *4* orange: Split/join
3248 #@+node:ekr.20200107165250.34: *5* orange.split_line & helpers
3249 def split_line(self, node, token):
3250 """
3251 Split token's line, if possible and enabled.
3253 Return True if the line was broken into two or more lines.
3254 """
3255 assert token.kind in ('newline', 'nl'), repr(token)
3256 # Return if splitting is disabled:
3257 if self.max_split_line_length <= 0: # pragma: no cover (user option)
3258 return False
3259 # Return if the node can't be split.
3260 if not is_long_statement(node):
3261 return False
3262 # Find the *output* tokens of the previous lines.
3263 line_tokens = self.find_prev_line()
3264 line_s = ''.join([z.to_string() for z in line_tokens])
3265 # Do nothing for short lines.
3266 if len(line_s) < self.max_split_line_length:
3267 return False
3268 # Return if the previous line has no opening delim: (, [ or {.
3269 if not any(z.kind == 'lt' for z in line_tokens): # pragma: no cover (defensive)
3270 return False
3271 prefix = self.find_line_prefix(line_tokens)
3272 # Calculate the tail before cleaning the prefix.
3273 tail = line_tokens[len(prefix) :]
3274 # Cut back the token list: subtract 1 for the trailing line-end.
3275 self.code_list = self.code_list[: len(self.code_list) - len(line_tokens) - 1]
3276 # Append the tail, splitting it further, as needed.
3277 self.append_tail(prefix, tail)
3278 # Add the line-end token deleted by find_line_prefix.
3279 self.add_token('line-end', '\n')
3280 return True
3281 #@+node:ekr.20200107165250.35: *6* orange.append_tail
3282 def append_tail(self, prefix, tail):
3283 """Append the tail tokens, splitting the line further as necessary."""
3284 tail_s = ''.join([z.to_string() for z in tail])
3285 if len(tail_s) < self.max_split_line_length:
3286 # Add the prefix.
3287 self.code_list.extend(prefix)
3288 # Start a new line and increase the indentation.
3289 self.add_token('line-end', '\n')
3290 self.add_token('line-indent', self.lws + ' ' * 4)
3291 self.code_list.extend(tail)
3292 return
3293 # Still too long. Split the line at commas.
3294 self.code_list.extend(prefix)
3295 # Start a new line and increase the indentation.
3296 self.add_token('line-end', '\n')
3297 self.add_token('line-indent', self.lws + ' ' * 4)
3298 open_delim = Token(kind='lt', value=prefix[-1].value)
3299 value = open_delim.value.replace('(', ')').replace('[', ']').replace('{', '}')
3300 close_delim = Token(kind='rt', value=value)
3301 delim_count = 1
3302 lws = self.lws + ' ' * 4
3303 for i, t in enumerate(tail):
3304 if t.kind == 'op' and t.value == ',':
3305 if delim_count == 1:
3306 # Start a new line.
3307 self.add_token('op-no-blanks', ',')
3308 self.add_token('line-end', '\n')
3309 self.add_token('line-indent', lws)
3310 # Kill a following blank.
3311 if i + 1 < len(tail):
3312 next_t = tail[i + 1]
3313 if next_t.kind == 'blank':
3314 next_t.kind = 'no-op'
3315 next_t.value = ''
3316 else:
3317 self.code_list.append(t)
3318 elif t.kind == close_delim.kind and t.value == close_delim.value:
3319 # Done if the delims match.
3320 delim_count -= 1
3321 if delim_count == 0:
3322 # Start a new line
3323 self.add_token('op-no-blanks', ',')
3324 self.add_token('line-end', '\n')
3325 self.add_token('line-indent', self.lws)
3326 self.code_list.extend(tail[i:])
3327 return
3328 lws = lws[:-4]
3329 self.code_list.append(t)
3330 elif t.kind == open_delim.kind and t.value == open_delim.value:
3331 delim_count += 1
3332 lws = lws + ' ' * 4
3333 self.code_list.append(t)
3334 else:
3335 self.code_list.append(t)
3336 g.trace('BAD DELIMS', delim_count)
3337 #@+node:ekr.20200107165250.36: *6* orange.find_prev_line
3338 def find_prev_line(self):
3339 """Return the previous line, as a list of tokens."""
3340 line = []
3341 for t in reversed(self.code_list[:-1]):
3342 if t.kind in ('hard-newline', 'line-end'):
3343 break
3344 line.append(t)
3345 return list(reversed(line))
3346 #@+node:ekr.20200107165250.37: *6* orange.find_line_prefix
3347 def find_line_prefix(self, token_list):
3348 """
3349 Return all tokens up to and including the first lt token.
3350 Also add all lt tokens directly following the first lt token.
3351 """
3352 result = []
3353 for i, t in enumerate(token_list):
3354 result.append(t)
3355 if t.kind == 'lt':
3356 break
3357 return result
3358 #@+node:ekr.20200107165250.39: *5* orange.join_lines
3359 def join_lines(self, node, token):
3360 """
3361 Join preceding lines, if possible and enabled.
3362 token is a line_end token. node is the corresponding ast node.
3363 """
3364 if self.max_join_line_length <= 0: # pragma: no cover (user option)
3365 return
3366 assert token.kind in ('newline', 'nl'), repr(token)
3367 if token.kind == 'nl':
3368 return
3369 # Scan backward in the *code* list,
3370 # looking for 'line-end' tokens with tok.newline_kind == 'nl'
3371 nls = 0
3372 i = len(self.code_list) - 1
3373 t = self.code_list[i]
3374 assert t.kind == 'line-end', repr(t)
3375 # Not all tokens have a newline_kind ivar.
3376 assert t.newline_kind == 'newline' # type:ignore
3377 i -= 1
3378 while i >= 0:
3379 t = self.code_list[i]
3380 if t.kind == 'comment':
3381 # Can't join.
3382 return
3383 if t.kind == 'string' and not self.allow_joined_strings:
3384 # An EKR preference: don't join strings, no matter what black does.
3385 # This allows "short" f-strings to be aligned.
3386 return
3387 if t.kind == 'line-end':
3388 if getattr(t, 'newline_kind', None) == 'nl':
3389 nls += 1
3390 else:
3391 break # pragma: no cover
3392 i -= 1
3393 # Retain at the file-start token.
3394 if i <= 0:
3395 i = 1
3396 if nls <= 0: # pragma: no cover (rare)
3397 return
3398 # Retain line-end and and any following line-indent.
3399 # Required, so that the regex below won't eat too much.
3400 while True:
3401 t = self.code_list[i]
3402 if t.kind == 'line-end':
3403 if getattr(t, 'newline_kind', None) == 'nl': # pragma: no cover (rare)
3404 nls -= 1
3405 i += 1
3406 elif self.code_list[i].kind == 'line-indent':
3407 i += 1
3408 else:
3409 break # pragma: no cover (defensive)
3410 if nls <= 0: # pragma: no cover (defensive)
3411 return
3412 # Calculate the joined line.
3413 tail = self.code_list[i:]
3414 tail_s = tokens_to_string(tail)
3415 tail_s = re.sub(r'\n\s*', ' ', tail_s)
3416 tail_s = tail_s.replace('( ', '(').replace(' )', ')')
3417 tail_s = tail_s.rstrip()
3418 # Don't join the lines if they would be too long.
3419 if len(tail_s) > self.max_join_line_length: # pragma: no cover (defensive)
3420 return
3421 # Cut back the code list.
3422 self.code_list = self.code_list[:i]
3423 # Add the new output tokens.
3424 self.add_token('string', tail_s)
3425 self.add_token('line-end', '\n')
3426 #@-others
3427#@+node:ekr.20200107170847.1: *3* class OrangeSettings
3428class OrangeSettings:
3430 pass
3431#@+node:ekr.20200107170126.1: *3* class ParseState
3432class ParseState:
3433 """
3434 A class representing items in the parse state stack.
3436 The present states:
3438 'file-start': Ensures the stack stack is never empty.
3440 'decorator': The last '@' was a decorator.
3442 do_op(): push_state('decorator')
3443 do_name(): pops the stack if state.kind == 'decorator'.
3445 'indent': The indentation level for 'class' and 'def' names.
3447 do_name(): push_state('indent', self.level)
3448 do_dendent(): pops the stack once or twice if state.value == self.level.
3450 """
3452 def __init__(self, kind, value):
3453 self.kind = kind
3454 self.value = value
3456 def __repr__(self):
3457 return f"State: {self.kind} {self.value!r}"
3459 __str__ = __repr__
3460#@+node:ekr.20200122033203.1: ** TOT classes...
3461#@+node:ekr.20191222083453.1: *3* class Fstringify (TOT)
3462class Fstringify(TokenOrderTraverser):
3463 """A class to fstringify files."""
3465 silent = True # for pytest. Defined in all entries.
3466 line_number = 0
3467 line = ''
3469 #@+others
3470 #@+node:ekr.20191222083947.1: *4* fs.fstringify
3471 def fstringify(self, contents, filename, tokens, tree):
3472 """
3473 Fstringify.fstringify:
3475 f-stringify the sources given by (tokens, tree).
3477 Return the resulting string.
3478 """
3479 self.filename = filename
3480 self.tokens = tokens
3481 self.tree = tree
3482 # Prepass: reassign tokens.
3483 ReassignTokens().reassign(filename, tokens, tree)
3484 # Main pass.
3485 self.traverse(self.tree)
3486 results = tokens_to_string(self.tokens)
3487 return results
3488 #@+node:ekr.20200103054101.1: *4* fs.fstringify_file (entry)
3489 def fstringify_file(self, filename): # pragma: no cover
3490 """
3491 Fstringify.fstringify_file.
3493 The entry point for the fstringify-file command.
3495 f-stringify the given external file with the Fstrinfify class.
3497 Return True if the file was changed.
3498 """
3499 tag = 'fstringify-file'
3500 self.filename = filename
3501 self.silent = False
3502 tog = TokenOrderGenerator()
3503 try:
3504 contents, encoding, tokens, tree = tog.init_from_file(filename)
3505 if not contents or not tokens or not tree:
3506 print(f"{tag}: Can not fstringify: {filename}")
3507 return False
3508 results = self.fstringify(contents, filename, tokens, tree)
3509 except Exception as e:
3510 print(e)
3511 return False
3512 # Something besides newlines must change.
3513 changed = regularize_nls(contents) != regularize_nls(results)
3514 status = 'Wrote' if changed else 'Unchanged'
3515 print(f"{tag}: {status:>9}: {filename}")
3516 if changed:
3517 write_file(filename, results, encoding=encoding)
3518 return changed
3519 #@+node:ekr.20200103065728.1: *4* fs.fstringify_file_diff (entry)
3520 def fstringify_file_diff(self, filename): # pragma: no cover
3521 """
3522 Fstringify.fstringify_file_diff.
3524 The entry point for the diff-fstringify-file command.
3526 Print the diffs that would resulf from the fstringify-file command.
3528 Return True if the file would be changed.
3529 """
3530 tag = 'diff-fstringify-file'
3531 self.filename = filename
3532 self.silent = False
3533 tog = TokenOrderGenerator()
3534 try:
3535 contents, encoding, tokens, tree = tog.init_from_file(filename)
3536 if not contents or not tokens or not tree:
3537 return False
3538 results = self.fstringify(contents, filename, tokens, tree)
3539 except Exception as e:
3540 print(e)
3541 return False
3542 # Something besides newlines must change.
3543 changed = regularize_nls(contents) != regularize_nls(results)
3544 if changed:
3545 show_diffs(contents, results, filename=filename)
3546 else:
3547 print(f"{tag}: Unchanged: {filename}")
3548 return changed
3549 #@+node:ekr.20200112060218.1: *4* fs.fstringify_file_silent (entry)
3550 def fstringify_file_silent(self, filename): # pragma: no cover
3551 """
3552 Fstringify.fstringify_file_silent.
3554 The entry point for the silent-fstringify-file command.
3556 fstringify the given file, suppressing all but serious error messages.
3558 Return True if the file would be changed.
3559 """
3560 self.filename = filename
3561 self.silent = True
3562 tog = TokenOrderGenerator()
3563 try:
3564 contents, encoding, tokens, tree = tog.init_from_file(filename)
3565 if not contents or not tokens or not tree:
3566 return False
3567 results = self.fstringify(contents, filename, tokens, tree)
3568 except Exception as e:
3569 print(e)
3570 return False
3571 # Something besides newlines must change.
3572 changed = regularize_nls(contents) != regularize_nls(results)
3573 status = 'Wrote' if changed else 'Unchanged'
3574 # Write the results.
3575 print(f"{status:>9}: {filename}")
3576 if changed:
3577 write_file(filename, results, encoding=encoding)
3578 return changed
3579 #@+node:ekr.20191222095754.1: *4* fs.make_fstring & helpers
3580 def make_fstring(self, node):
3581 """
3582 node is BinOp node representing an '%' operator.
3583 node.left is an ast.Str node.
3584 node.right reprsents the RHS of the '%' operator.
3586 Convert this tree to an f-string, if possible.
3587 Replace the node's entire tree with a new ast.Str node.
3588 Replace all the relevant tokens with a single new 'string' token.
3589 """
3590 trace = False
3591 assert isinstance(node.left, ast.Str), (repr(node.left), g.callers())
3592 # Careful: use the tokens, not Str.s. This preserves spelling.
3593 lt_token_list = get_node_token_list(node.left, self.tokens)
3594 if not lt_token_list: # pragma: no cover
3595 print('')
3596 g.trace('Error: no token list in Str')
3597 dump_tree(self.tokens, node)
3598 print('')
3599 return
3600 lt_s = tokens_to_string(lt_token_list)
3601 if trace:
3602 g.trace('lt_s:', lt_s)
3603 # Get the RHS values, a list of token lists.
3604 values = self.scan_rhs(node.right)
3605 if trace:
3606 for i, z in enumerate(values):
3607 dump_tokens(z, tag=f"RHS value {i}")
3608 # Compute rt_s, self.line and self.line_number for later messages.
3609 token0 = lt_token_list[0]
3610 self.line_number = token0.line_number
3611 self.line = token0.line.strip()
3612 rt_s = ''.join(tokens_to_string(z) for z in values)
3613 # Get the % specs in the LHS string.
3614 specs = self.scan_format_string(lt_s)
3615 if len(values) != len(specs): # pragma: no cover
3616 self.message(
3617 f"can't create f-fstring: {lt_s!r}\n"
3618 f":f-string mismatch: "
3619 f"{len(values)} value{g.plural(len(values))}, "
3620 f"{len(specs)} spec{g.plural(len(specs))}")
3621 return
3622 # Replace specs with values.
3623 results = self.substitute_values(lt_s, specs, values)
3624 result = self.compute_result(lt_s, results)
3625 if not result:
3626 return
3627 # Remove whitespace before ! and :.
3628 result = self.clean_ws(result)
3629 # Show the results
3630 if trace: # pragma: no cover
3631 before = (lt_s + ' % ' + rt_s).replace('\n', '<NL>')
3632 after = result.replace('\n', '<NL>')
3633 self.message(
3634 f"trace:\n"
3635 f":from: {before!s}\n"
3636 f": to: {after!s}")
3637 # Adjust the tree and the token list.
3638 self.replace(node, result, values)
3639 #@+node:ekr.20191222102831.3: *5* fs.clean_ws
3640 ws_pat = re.compile(r'(\s+)([:!][0-9]\})')
3642 def clean_ws(self, s):
3643 """Carefully remove whitespace before ! and : specifiers."""
3644 s = re.sub(self.ws_pat, r'\2', s)
3645 return s
3646 #@+node:ekr.20191222102831.4: *5* fs.compute_result & helpers
3647 def compute_result(self, lt_s, tokens):
3648 """
3649 Create the final result, with various kinds of munges.
3651 Return the result string, or None if there are errors.
3652 """
3653 # Fail if there is a backslash within { and }.
3654 if not self.check_back_slashes(lt_s, tokens):
3655 return None # pragma: no cover
3656 # Ensure consistent quotes.
3657 if not self.change_quotes(lt_s, tokens):
3658 return None # pragma: no cover
3659 return tokens_to_string(tokens)
3660 #@+node:ekr.20200215074309.1: *6* fs.check_back_slashes
3661 def check_back_slashes(self, lt_s, tokens):
3662 """
3663 Return False if any backslash appears with an {} expression.
3665 Tokens is a list of lokens on the RHS.
3666 """
3667 count = 0
3668 for z in tokens:
3669 if z.kind == 'op':
3670 if z.value == '{':
3671 count += 1
3672 elif z.value == '}':
3673 count -= 1
3674 if (count % 2) == 1 and '\\' in z.value:
3675 if not self.silent:
3676 self.message( # pragma: no cover (silent during unit tests)
3677 f"can't create f-fstring: {lt_s!r}\n"
3678 f":backslash in {{expr}}:")
3679 return False
3680 return True
3681 #@+node:ekr.20191222102831.7: *6* fs.change_quotes
3682 def change_quotes(self, lt_s, aList):
3683 """
3684 Carefully check quotes in all "inner" tokens as necessary.
3686 Return False if the f-string would contain backslashes.
3688 We expect the following "outer" tokens.
3690 aList[0]: ('string', 'f')
3691 aList[1]: ('string', a single or double quote.
3692 aList[-1]: ('string', a single or double quote matching aList[1])
3693 """
3694 # Sanity checks.
3695 if len(aList) < 4:
3696 return True # pragma: no cover (defensive)
3697 if not lt_s: # pragma: no cover (defensive)
3698 self.message("can't create f-fstring: no lt_s!")
3699 return False
3700 delim = lt_s[0]
3701 # Check tokens 0, 1 and -1.
3702 token0 = aList[0]
3703 token1 = aList[1]
3704 token_last = aList[-1]
3705 for token in token0, token1, token_last:
3706 # These are the only kinds of tokens we expect to generate.
3707 ok = (
3708 token.kind == 'string' or
3709 token.kind == 'op' and token.value in '{}')
3710 if not ok: # pragma: no cover (defensive)
3711 self.message(
3712 f"unexpected token: {token.kind} {token.value}\n"
3713 f": lt_s: {lt_s!r}")
3714 return False
3715 # These checks are important...
3716 if token0.value != 'f':
3717 return False # pragma: no cover (defensive)
3718 val1 = token1.value
3719 if delim != val1:
3720 return False # pragma: no cover (defensive)
3721 val_last = token_last.value
3722 if delim != val_last:
3723 return False # pragma: no cover (defensive)
3724 #
3725 # Check for conflicting delims, preferring f"..." to f'...'.
3726 for delim in ('"', "'"):
3727 aList[1] = aList[-1] = Token('string', delim)
3728 for z in aList[2:-1]:
3729 if delim in z.value:
3730 break
3731 else:
3732 return True
3733 if not self.silent: # pragma: no cover (silent unit test)
3734 self.message(
3735 f"can't create f-fstring: {lt_s!r}\n"
3736 f": conflicting delims:")
3737 return False
3738 #@+node:ekr.20191222102831.6: *5* fs.munge_spec
3739 def munge_spec(self, spec):
3740 """
3741 Return (head, tail).
3743 The format is spec !head:tail or :tail
3745 Example specs: s2, r3
3746 """
3747 # To do: handle more specs.
3748 head, tail = [], []
3749 if spec.startswith('+'):
3750 pass # Leave it alone!
3751 elif spec.startswith('-'):
3752 tail.append('>')
3753 spec = spec[1:]
3754 if spec.endswith('s'):
3755 spec = spec[:-1]
3756 if spec.endswith('r'):
3757 head.append('r')
3758 spec = spec[:-1]
3759 tail_s = ''.join(tail) + spec
3760 head_s = ''.join(head)
3761 return head_s, tail_s
3762 #@+node:ekr.20191222102831.9: *5* fs.scan_format_string
3763 # format_spec ::= [[fill]align][sign][#][0][width][,][.precision][type]
3764 # fill ::= <any character>
3765 # align ::= "<" | ">" | "=" | "^"
3766 # sign ::= "+" | "-" | " "
3767 # width ::= integer
3768 # precision ::= integer
3769 # type ::= "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"
3771 format_pat = re.compile(r'%(([+-]?[0-9]*(\.)?[0.9]*)*[bcdeEfFgGnoxrsX]?)')
3773 def scan_format_string(self, s):
3774 """Scan the format string s, returning a list match objects."""
3775 result = list(re.finditer(self.format_pat, s))
3776 return result
3777 #@+node:ekr.20191222104224.1: *5* fs.scan_rhs
3778 def scan_rhs(self, node):
3779 """
3780 Scan the right-hand side of a potential f-string.
3782 Return a list of the token lists for each element.
3783 """
3784 trace = False
3785 # First, Try the most common cases.
3786 if isinstance(node, ast.Str):
3787 token_list = get_node_token_list(node, self.tokens)
3788 return [token_list]
3789 if isinstance(node, (list, tuple, ast.Tuple)):
3790 result = []
3791 elts = node.elts if isinstance(node, ast.Tuple) else node
3792 for i, elt in enumerate(elts):
3793 tokens = tokens_for_node(self.filename, elt, self.tokens)
3794 result.append(tokens)
3795 if trace:
3796 g.trace(f"item: {i}: {elt.__class__.__name__}")
3797 g.printObj(tokens, tag=f"Tokens for item {i}")
3798 return result
3799 # Now we expect only one result.
3800 tokens = tokens_for_node(self.filename, node, self.tokens)
3801 return [tokens]
3802 #@+node:ekr.20191226155316.1: *5* fs.substitute_values
3803 def substitute_values(self, lt_s, specs, values):
3804 """
3805 Replace specifiers with values in lt_s string.
3807 Double { and } as needed.
3808 """
3809 i, results = 0, [Token('string', 'f')]
3810 for spec_i, m in enumerate(specs):
3811 value = tokens_to_string(values[spec_i])
3812 start, end, spec = m.start(0), m.end(0), m.group(1)
3813 if start > i:
3814 val = lt_s[i:start].replace('{', '{{').replace('}', '}}')
3815 results.append(Token('string', val[0]))
3816 results.append(Token('string', val[1:]))
3817 head, tail = self.munge_spec(spec)
3818 results.append(Token('op', '{'))
3819 results.append(Token('string', value))
3820 if head:
3821 results.append(Token('string', '!'))
3822 results.append(Token('string', head))
3823 if tail:
3824 results.append(Token('string', ':'))
3825 results.append(Token('string', tail))
3826 results.append(Token('op', '}'))
3827 i = end
3828 # Add the tail.
3829 tail = lt_s[i:]
3830 if tail:
3831 tail = tail.replace('{', '{{').replace('}', '}}')
3832 results.append(Token('string', tail[:-1]))
3833 results.append(Token('string', tail[-1]))
3834 return results
3835 #@+node:ekr.20200214142019.1: *4* fs.message
3836 def message(self, message): # pragma: no cover.
3837 """
3838 Print one or more message lines aligned on the first colon of the message.
3839 """
3840 # Print a leading blank line.
3841 print('')
3842 # Calculate the padding.
3843 lines = g.splitLines(message)
3844 pad = max(lines[0].find(':'), 30)
3845 # Print the first line.
3846 z = lines[0]
3847 i = z.find(':')
3848 if i == -1:
3849 print(z.rstrip())
3850 else:
3851 print(f"{z[:i+2].strip():>{pad+1}} {z[i+2:].strip()}")
3852 # Print the remaining message lines.
3853 for z in lines[1:]:
3854 if z.startswith('<'):
3855 # Print left aligned.
3856 print(z[1:].strip())
3857 elif z.startswith(':') and -1 < z[1:].find(':') <= pad:
3858 # Align with the first line.
3859 i = z[1:].find(':')
3860 print(f"{z[1:i+2].strip():>{pad+1}} {z[i+2:].strip()}")
3861 elif z.startswith('>'):
3862 # Align after the aligning colon.
3863 print(f"{' ':>{pad+2}}{z[1:].strip()}")
3864 else:
3865 # Default: Put the entire line after the aligning colon.
3866 print(f"{' ':>{pad+2}}{z.strip()}")
3867 # Print the standard message lines.
3868 file_s = f"{'file':>{pad}}"
3869 ln_n_s = f"{'line number':>{pad}}"
3870 line_s = f"{'line':>{pad}}"
3871 print(
3872 f"{file_s}: {self.filename}\n"
3873 f"{ln_n_s}: {self.line_number}\n"
3874 f"{line_s}: {self.line!r}")
3875 #@+node:ekr.20191225054848.1: *4* fs.replace
3876 def replace(self, node, s, values):
3877 """
3878 Replace node with an ast.Str node for s.
3879 Replace all tokens in the range of values with a single 'string' node.
3880 """
3881 # Replace the tokens...
3882 tokens = tokens_for_node(self.filename, node, self.tokens)
3883 i1 = i = tokens[0].index
3884 replace_token(self.tokens[i], 'string', s)
3885 j = 1
3886 while j < len(tokens):
3887 replace_token(self.tokens[i1 + j], 'killed', '')
3888 j += 1
3889 # Replace the node.
3890 new_node = ast.Str()
3891 new_node.s = s
3892 replace_node(new_node, node)
3893 # Update the token.
3894 token = self.tokens[i1]
3895 token.node = new_node # type:ignore
3896 # Update the token list.
3897 add_token_to_token_list(token, new_node)
3898 #@+node:ekr.20191231055008.1: *4* fs.visit
3899 def visit(self, node):
3900 """
3901 FStringify.visit. (Overrides TOT visit).
3903 Call fs.makes_fstring if node is a BinOp that might be converted to an
3904 f-string.
3905 """
3906 if (
3907 isinstance(node, ast.BinOp)
3908 and op_name(node.op) == '%'
3909 and isinstance(node.left, ast.Str)
3910 ):
3911 self.make_fstring(node)
3912 #@-others
3913#@+node:ekr.20191231084514.1: *3* class ReassignTokens (TOT)
3914class ReassignTokens(TokenOrderTraverser):
3915 """A class that reassigns tokens to more appropriate ast nodes."""
3916 #@+others
3917 #@+node:ekr.20191231084640.1: *4* reassign.reassign
3918 def reassign(self, filename, tokens, tree):
3919 """The main entry point."""
3920 self.filename = filename
3921 self.tokens = tokens
3922 self.tree = tree
3923 self.traverse(tree)
3924 #@+node:ekr.20191231084853.1: *4* reassign.visit
3925 def visit(self, node):
3926 """ReassignTokens.visit"""
3927 # For now, just handle call nodes.
3928 if not isinstance(node, ast.Call):
3929 return
3930 tokens = tokens_for_node(self.filename, node, self.tokens)
3931 node0, node9 = tokens[0].node, tokens[-1].node
3932 nca = nearest_common_ancestor(node0, node9)
3933 if not nca:
3934 return
3935 # g.trace(f"{self.filename:20} nca: {nca.__class__.__name__}")
3936 # Associate () with the call node.
3937 i = tokens[-1].index
3938 j = find_paren_token(i + 1, self.tokens)
3939 if j is None:
3940 return # pragma: no cover
3941 k = find_paren_token(j + 1, self.tokens)
3942 if k is None:
3943 return # pragma: no cover
3944 self.tokens[j].node = nca # type:ignore
3945 self.tokens[k].node = nca # type:ignore
3946 add_token_to_token_list(self.tokens[j], nca)
3947 add_token_to_token_list(self.tokens[k], nca)
3948 #@-others
3949#@+node:ekr.20191227170803.1: ** Token classes
3950#@+node:ekr.20191110080535.1: *3* class Token
3951class Token:
3952 """
3953 A class representing a 5-tuple, plus additional data.
3955 The TokenOrderTraverser class creates a list of such tokens.
3956 """
3958 def __init__(self, kind, value):
3960 self.kind = kind
3961 self.value = value
3962 #
3963 # Injected by Tokenizer.add_token.
3964 self.five_tuple = None
3965 self.index = 0
3966 self.line = ''
3967 # The entire line containing the token.
3968 # Same as five_tuple.line.
3969 self.line_number = 0
3970 # The line number, for errors and dumps.
3971 # Same as five_tuple.start[0]
3972 #
3973 # Injected by Tokenizer.add_token.
3974 self.level = 0
3975 self.node = None
3977 def __repr__(self):
3978 nl_kind = getattr(self, 'newline_kind', '')
3979 s = f"{self.kind:}.{self.index:<3}"
3980 return f"{s:>18}:{nl_kind:7} {self.show_val(80)}"
3982 def __str__(self):
3983 nl_kind = getattr(self, 'newline_kind', '')
3984 return f"{self.kind}.{self.index:<3}{nl_kind:8} {self.show_val(80)}"
3986 def to_string(self):
3987 """Return the contribution of the token to the source file."""
3988 return self.value if isinstance(self.value, str) else ''
3989 #@+others
3990 #@+node:ekr.20191231114927.1: *4* token.brief_dump
3991 def brief_dump(self): # pragma: no cover
3992 """Dump a token."""
3993 return (
3994 f"{self.index:>3} line: {self.line_number:<2} "
3995 f"{self.kind:>11} {self.show_val(100)}")
3996 #@+node:ekr.20200223022950.11: *4* token.dump
3997 def dump(self): # pragma: no cover
3998 """Dump a token and related links."""
3999 # Let block.
4000 node_id = self.node.node_index if self.node else ''
4001 node_cn = self.node.__class__.__name__ if self.node else ''
4002 return (
4003 f"{self.line_number:4} "
4004 f"{node_id:5} {node_cn:16} "
4005 f"{self.index:>5} {self.kind:>11} "
4006 f"{self.show_val(100)}")
4007 #@+node:ekr.20200121081151.1: *4* token.dump_header
4008 def dump_header(self): # pragma: no cover
4009 """Print the header for token.dump"""
4010 print(
4011 f"\n"
4012 f" node {'':10} token token\n"
4013 f"line index class {'':10} index kind value\n"
4014 f"==== ===== ===== {'':10} ===== ==== =====\n")
4015 #@+node:ekr.20191116154328.1: *4* token.error_dump
4016 def error_dump(self): # pragma: no cover
4017 """Dump a token or result node for error message."""
4018 if self.node:
4019 node_id = obj_id(self.node)
4020 node_s = f"{node_id} {self.node.__class__.__name__}"
4021 else:
4022 node_s = "None"
4023 return (
4024 f"index: {self.index:<3} {self.kind:>12} {self.show_val(20):<20} "
4025 f"{node_s}")
4026 #@+node:ekr.20191113095507.1: *4* token.show_val
4027 def show_val(self, truncate_n): # pragma: no cover
4028 """Return the token.value field."""
4029 if self.kind in ('ws', 'indent'):
4030 val = len(self.value)
4031 elif self.kind == 'string':
4032 # Important: don't add a repr for 'string' tokens.
4033 # repr just adds another layer of confusion.
4034 val = g.truncate(self.value, truncate_n) # type:ignore
4035 else:
4036 val = g.truncate(repr(self.value), truncate_n) # type:ignore
4037 return val
4038 #@-others
4039#@+node:ekr.20191110165235.1: *3* class Tokenizer
4040class Tokenizer:
4042 """Create a list of Tokens from contents."""
4044 results: List[Token] = []
4046 #@+others
4047 #@+node:ekr.20191110165235.2: *4* tokenizer.add_token
4048 token_index = 0
4049 prev_line_token = None
4051 def add_token(self, kind, five_tuple, line, s_row, value):
4052 """
4053 Add a token to the results list.
4055 Subclasses could override this method to filter out specific tokens.
4056 """
4057 tok = Token(kind, value)
4058 tok.five_tuple = five_tuple
4059 tok.index = self.token_index
4060 # Bump the token index.
4061 self.token_index += 1
4062 tok.line = line
4063 tok.line_number = s_row
4064 self.results.append(tok)
4065 #@+node:ekr.20191110170551.1: *4* tokenizer.check_results
4066 def check_results(self, contents):
4068 # Split the results into lines.
4069 result = ''.join([z.to_string() for z in self.results])
4070 result_lines = g.splitLines(result)
4071 # Check.
4072 ok = result == contents and result_lines == self.lines
4073 assert ok, (
4074 f"\n"
4075 f" result: {result!r}\n"
4076 f" contents: {contents!r}\n"
4077 f"result_lines: {result_lines}\n"
4078 f" lines: {self.lines}"
4079 )
4080 #@+node:ekr.20191110165235.3: *4* tokenizer.create_input_tokens
4081 def create_input_tokens(self, contents, tokens):
4082 """
4083 Generate a list of Token's from tokens, a list of 5-tuples.
4084 """
4085 # Create the physical lines.
4086 self.lines = contents.splitlines(True)
4087 # Create the list of character offsets of the start of each physical line.
4088 last_offset, self.offsets = 0, [0]
4089 for line in self.lines:
4090 last_offset += len(line)
4091 self.offsets.append(last_offset)
4092 # Handle each token, appending tokens and between-token whitespace to results.
4093 self.prev_offset, self.results = -1, []
4094 for token in tokens:
4095 self.do_token(contents, token)
4096 # Print results when tracing.
4097 self.check_results(contents)
4098 # Return results, as a list.
4099 return self.results
4100 #@+node:ekr.20191110165235.4: *4* tokenizer.do_token (the gem)
4101 header_has_been_shown = False
4103 def do_token(self, contents, five_tuple):
4104 """
4105 Handle the given token, optionally including between-token whitespace.
4107 This is part of the "gem".
4109 Links:
4111 - 11/13/19: ENB: A much better untokenizer
4112 https://groups.google.com/forum/#!msg/leo-editor/DpZ2cMS03WE/VPqtB9lTEAAJ
4114 - Untokenize does not round-trip ws before bs-nl
4115 https://bugs.python.org/issue38663
4116 """
4117 import token as token_module
4118 # Unpack..
4119 tok_type, val, start, end, line = five_tuple
4120 s_row, s_col = start # row/col offsets of start of token.
4121 e_row, e_col = end # row/col offsets of end of token.
4122 kind = token_module.tok_name[tok_type].lower()
4123 # Calculate the token's start/end offsets: character offsets into contents.
4124 s_offset = self.offsets[max(0, s_row - 1)] + s_col
4125 e_offset = self.offsets[max(0, e_row - 1)] + e_col
4126 # tok_s is corresponding string in the line.
4127 tok_s = contents[s_offset:e_offset]
4128 # Add any preceding between-token whitespace.
4129 ws = contents[self.prev_offset:s_offset]
4130 if ws:
4131 # No need for a hook.
4132 self.add_token('ws', five_tuple, line, s_row, ws)
4133 # Always add token, even if it contributes no text!
4134 self.add_token(kind, five_tuple, line, s_row, tok_s)
4135 # Update the ending offset.
4136 self.prev_offset = e_offset
4137 #@-others
4138#@-others
4139g = LeoGlobals()
4140if __name__ == '__main__':
4141 main()
4142#@@language python
4143#@@tabwidth -4
4144#@@pagewidth 70
4145#@-leo