[prev in list] [next in list] [prev in thread] [next in thread]
List: sbcl-devel
Subject: Re: [Sbcl-devel] [Sbcl-commits] master: Hash-cons all MEMBER types
From: Douglas Katzman via Sbcl-devel <sbcl-devel () lists ! sourceforge ! net>
Date: 2023-01-31 2:53:10
Message-ID: CAOrNasyC_WHRbkijEfFOVdYRcvLRcPqvgg5-L5=T8cYGNN_h6w () mail ! gmail ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
I added some printing if there are any elements in that table, and I've not
seen it print anything, so I don't know what it's doing for you. But it
does depend on GC behavior so I made it tolerant of worse GC. Not sure why
that should be needed, but I think it's still a valid test.
On Mon, Jan 30, 2023 at 9:29 PM Stas Boukarev <stassats@gmail.com> wrote:
> I'm randomly getting things like
> ::: UNEXPECTED-FAILURE :XSET-STABLE-HASH-WEAKNESS due to SIMPLE-ERROR:
> "The assertion
> (< (HASH-TABLE-COUNT SB-KERNEL::*XSET-STABLE-HASHES*) 100) failed
> with
> (HASH-TABLE-COUNT SB-KERNEL::*XSET-STABLE-HASHES*) = 145."
>
> On Sat, Dec 31, 2022 at 8:07 PM snuglas via Sbcl-commits
> <sbcl-commits@lists.sourceforge.net> wrote:
> >
> > The branch "master" has been updated in SBCL:
> > via c7c449b50fb072cabd9517879ac7b1e75d68c768 (commit)
> > from ef088c6fbce7dca5032aa08a3702b7f622b41d32 (commit)
> >
> > - Log -----------------------------------------------------------------
> > commit c7c449b50fb072cabd9517879ac7b1e75d68c768
> > Author: Douglas Katzman <dougk@google.com>
> > Date: Sat Dec 31 03:11:15 2022 -0500
> >
> > Hash-cons all MEMBER types
> >
> > Types whose members include arbitrary objects such as lists or
> strings will now
> > choose and memoize per-element pseudorandom hashes as needed, while
> trying to
> > avoid creating a forever-growing set of objects that acquired a
> stable hash,
> > as the object->hash association is dropped when no XSET refers to an
> object.
> > ---
> > src/code/type-class.lisp | 121
> ++++++++++++--------------------------
> > src/code/type.lisp | 62 +++++++++++++++----
> > src/code/xset.lisp | 106
> +++++++++++++++++++++++----------
> > src/compiler/ctype.lisp | 2 +-
> > src/compiler/generic/genesis.lisp | 19 ++++--
> > tests/typehashmix.impure.lisp | 30 ++++++++++
> > 6 files changed, 207 insertions(+), 133 deletions(-)
> >
> > diff --git a/src/code/type-class.lisp b/src/code/type-class.lisp
> > index e04e04c5e..f14dadab9 100644
> > --- a/src/code/type-class.lisp
> > +++ b/src/code/type-class.lisp
> > @@ -734,9 +734,10 @@
> >
> > ;; Singleton MEMBER types are best dealt with via a weak-value
> hash-table because:
> > ;; * (MEMBER THING) might lack an address-insensitive hash for THING
> > -;; but src/code/hashset can not use address-based hashes. This
> limitation is unique
> > -;; to MEMBER types because other CTYPE instances are compositions of
> CTYPES
> > -;; where all subparts have assigned hashes.
> > +;; but src/code/hashset goes through a lot of rigmarole to handle
> address-bashed
> > +;; hashing, and the end result for a single key would laboriously
> emulate an EQL table.
> > +;; This is especially important for the compiler because each time it
> asks itself the
> > +;; CTYPE-OF a constant leaf, the answer might be a singleton MEMBER
> type.
> > ;; * Symbols have slightly bad SXHASH values (by language requirement):
> > ;; "For any two objects, x and y which are symbols and which are
> similar
> > ;; (sxhash x) and (sxhash y) yield the same mathematical value even
> if x and y exist
> > @@ -747,35 +748,6 @@
> > ;; will cause the hashset to exceed its probe sequence length limit.
> > ;; This isn't to say we couldn't assign some bits of SYMBOL-HASH
> pseudorandomly,
> > ;; and mask them out in the value returned by CL:SXHASH.
> > -;;
> > -;; Also: XSETs containing arbitrary objects such as strings and lists
> don't have
> > -;; a good hash at all. There is not really a way to compute a mixed hash
> > -;; other than by pinning all objects in the XSET and taking their
> addresses.
> > -;; Then we'd need to figure out that GC happened, and it becomes a pain.
> > -;; This seems more complicated than the situation warrants.
> > -;; So we'll just give up on hash-consing which should fix lp#1999687
> > -;;
> > -;; An outline of a better design would be as follows:
> > -;; - create a global hash-table of objects which were placed in an XSET
> > -;; other than the nicely hashable object types. Call this
> xset-key->hash-mapping.
> > -;; This mapping does not need to be weak, because it will have a way
> of purging it.
> > -;; Each value in the table is a cons of a random hash and the number
> of XSETs
> > -;; using the key. Increment the refcount when making a new XSET with
> that key.
> > -;; - when hashing the XSET, look up its keys (other than EQL hashable)
> in the global table
> > -;; - attach a finalizer to the XSET. The finalizer's job is to
> decrement the refcount
> > -;; on each key in the global mappping. When the count reaches zero
> there are no
> > -;; XSETs that refer to the key, and the random hash can be removed.
> > -;; Why make a refcounted table? Because otherwise there is no way to
> remove mapping
> > -;; entries for objects that outlive the MEMBER-TYPE's use of the object
> but where the
> > -;; MEMBER type itself is dead. Worst-case, every object in the lisp
> image could at some
> > -;; point appear in a MEMBER type but then never be needed again with
> regard to type
> > -;; system operations. So you'd have created a permanent mapping of
> every object to
> > -;; a random hash for no good reason.
> > -
> > -;; Why the singleton table is so important is that any time the
> compiler asks itself
> > -;; the ctype-of a constant leaf, it might yield `(MEMBER ,the-constant).
> > -;; So then you end up with an assortment of random objects that don't
> hash
> > -;; nicely in a ctype hashset, but are OK in a hash-table.
> > (define-load-time-global *eql-type-cache* ; like EQL-SPECIALIZER-TABLE
> in PCL
> > (sb-impl::make-system-hash-table :test 'eql :weakness :value
> :synchronized nil))
> >
> > @@ -784,9 +756,8 @@
> > (heap-allocated-p ,obj)))
> >
> > (defvar *hashsets-preloaded* nil)
> > -(defmacro new-ctype (pseudonym &rest initargs)
> > - (let* ((name (if (eq pseudonym 'eql) 'member-type pseudonym))
> > - (allocator (package-symbolicate "SB-KERNEL" "!ALLOC-" name))
> > +(defmacro new-ctype (name &rest initargs)
> > + (let* ((allocator (package-symbolicate "SB-KERNEL" "!ALLOC-" name))
> > (hashset (package-symbolicate "SB-KERNEL" "*" name
> "-HASHSET*"))
> > (bits (ctype-class-bits (ctype-instance->type-class name))))
> > #+sb-xc-host ; allocate permanent data, and insert into cache if
> not found
> > @@ -803,32 +774,7 @@
> > (unless *hashsets-preloaded*
> > (write-string "CTYPE hashset preload failure")
> > (sb-vm:ldb-monitor))
> > - ,(case pseudonym
> > - (eql ; as per above remarks: use hash-table, not hashset
> > - `(let* ((xset ,(first initargs))
> > - (zeros ,(second initargs))
> > - (key (first (or zeros (xset-data xset))))
> > - (table *eql-type-cache*))
> > - (with-system-mutex ((hash-table-lock table))
> > - ;; This is like ENSURE-GETHASH but it potentially
> copies the key
> > - (or (gethash key table)
> > - ;; hope no off-heap pointers buried within KEY
> > - (let ((key (cond ((numberp key)
> (sb-vm:copy-number-to-heap key))
> > - ((safe-member-type-elt-p key) key)
> > - (t
> > - (warn "Off-heap hash-table key @
> ~X"
> > - (get-lisp-obj-address key))
> > - key))))
> > - (setf (gethash key table) (copy-ctype temp)))))))
> > - (member-type ; problem case: don't always know how to hash
> well
> > - `(let ((xset ,(first initargs)))
> > - (flet ((hashable (x) (typep x '(or symbol number
> character instance))))
> > - (if (xset-every #'hashable xset)
> > - (hashset-insert-if-absent ,hashset temp
> #'copy-ctype)
> > - ;; otherwise just copy it always (for now)
> > - (copy-ctype temp)))))
> > - (t
> > - `(hashset-insert-if-absent ,hashset temp #'copy-ctype))))))
> > + (hashset-insert-if-absent ,hashset temp #'copy-ctype))))
> >
> > ;;; The NAMED-TYPE is used to represent *, T and NIL, the standard
> > ;;; special cases, as well as other special cases needed to
> > @@ -878,6 +824,14 @@
> > (def-type-model (member-type (:constructor* nil (xset fp-zeroes)))
> > (xset nil :type xset :hasher xset-elts-hash :test xset=)
> > (fp-zeroes nil :type list :hasher hash-fp-zeros :test fp-zeros=)))
> > +(define-load-time-global *xset-mutex* (or #-sb-xc-host
> (sb-thread:make-mutex)))
> > +;;; This hashset is guarded by *XSET-MUTEX*. It is _not_ declared as
> synchronized
> > +;;; so that HASHSET-INSERT-IF-ABSENT should not acquire a mutex inside
> a mutex
> > +;;; (stable hashes have to be assigned while holding the lock)
> > +(define-load-time-global *member/eq-type-hashset*
> > + (make-hashset 32 #'member-type-equiv #'calc-member-type-hash
> > + :weakness t :synchronized nil))
> > +(pushnew '*member/eq-type-hashset* *ctype-hashsets*)
> >
> > ;;; An ARRAY-TYPE is used to represent any array type, including
> > ;;; things such as SIMPLE-BASE-STRING.
> > @@ -1248,6 +1202,8 @@
> >
> > ;;; Return the name of the global hashset that OBJ (a CTYPE instance)
> > ;;; would be stored in, if it were stored in one.
> > +;;; This is only for bootstrap, and not 100% precise as it does not know
> > +;;; about the other MEMBER type containers.
> > (defun ctype->hashset-sym (obj)
> > (macrolet ((generate ()
> > (collect ((clauses))
> > @@ -1331,25 +1287,21 @@
> > #-sb-xc-host
> > (progn
> > (defglobal *!initial-ctypes* nil)
> > -(defun preload-ctype-hashsets (&aux permtypes)
> > - (declare (ignorable permtypes))
> > +(defun preload-ctype-hashsets ()
> > (dolist (pair (nreverse *!initial-ctypes*))
> > - (destructuring-bind (instance . hashset-symbol) pair
> > - (cond ((not hashset-symbol)
> > - ;; There are very few which aren't in a hashset:
> > - ;; - (6) NAMED-TYPEs
> > - ;; - (1) MEMBER-TYPE NULL
> > - ;; - (3) BASE-CHAR, EXTENDED-CHAR, CHARACTER
> > - ;; - (1) CONS
> > - (push instance permtypes))
> > - ;; Mandatory special-case for singleton MEMBER types
> > - ((and (member-type-p instance) (not (cdr
> (member-type-members instance))))
> > - (setf (gethash (car (member-type-members instance))
> *eql-type-cache*)
> > - instance))
> > + (let ((instance (car pair))
> > + (container (symbol-value (cdr pair))))
> > + (cond ((hash-table-p container)
> > + (aver (member-type-p instance))
> > + ;; As of this writing there are only two EQL types to
> preload:
> > + ;; one is in the IR1-transform of FORMAT with stream (EQL
> T),
> > + ;; the other is CHECK-ARG-TYPE looking for (EQL DUMMY)
> type.
> > + (let ((key (first (member-type-members instance))))
> > + (aver (not (gethash key container)))
> > + (setf (gethash key container) instance)))
> > (t
> > - (let ((hashset (symbol-value hashset-symbol)))
> > - (aver (not (hashset-find hashset instance))) ; instances
> are dumped bottom-up
> > - (hashset-insert hashset instance))))
> > + (aver (not (hashset-find container instance))) ; instances
> are built bottom-up
> > + (hashset-insert container instance)))
> > (labels ((ensure-interned-list (list hashset)
> > (let ((found (hashset-find hashset list)))
> > (when (and found (neq found list))
> > @@ -1367,7 +1319,7 @@
> > (bug "genesis dumped bad instance within ~X"
> > (get-lisp-obj-address instance)))))))
> > (etypecase instance
> > - ((or named-type numeric-type member-type character-set-type ;
> nothing extra to do
> > + ((or numeric-type member-type character-set-type ; nothing
> extra to do
> > #+sb-simd-pack simd-pack-type #+sb-simd-pack-256
> simd-pack-256-type
> > hairy-type))
> > (args-type
> > @@ -1387,7 +1339,6 @@
> > (ensure-interned-list (compound-type-types instance)
> *ctype-set-hashset*))
> > (negation-type
> > (check (negation-type-type instance)))))))
> > - (aver (= (length permtypes) (+ 11 #-sb-unicode -2)))
> > #+sb-devel (setq *hashsets-preloaded* t))
> > (preload-ctype-hashsets))
> >
> > @@ -1530,10 +1481,16 @@
> >
> > ;;; Copy X to the heap, give it a random hash, and if it is a MEMBER
> type
> > ;;; then assert that all members are cacheable.
> > +#+sb-xc-host
> > +(defun copy-ctype (x)
> > + (let ((bits (logior (logand (ctype-random) +type-hash-mask+)
> (type-%bits x))))
> > + (etypecase x
> > + (member-type
> > + (!alloc-member-type bits (member-type-xset x)
> (member-type-fp-zeroes x))))))
> > #-sb-xc-host
> > (defun copy-ctype (x)
> > (declare (type ctype x))
> > - (declare (sb-c::tlab :system) (inline !copy-xset))
> > + (declare (sb-c::tlab :system) (inline !new-xset))
> > #+c-stack-is-control-stack (aver (stack-allocated-p x))
> > (labels ((copy (x)
> > ;; Return a heap copy of X if X was arena or
> stack-allocated.
> > @@ -1559,7 +1516,7 @@
> > ;; While we could use (load-time-value) to referece
> a constant empty xset
> > ;; there's really no point to doing that.
> > (collect ((elts))
> > - (dolist (x data (!copy-xset (xset-list-size xset)
> (elts)))
> > + (dolist (x data (!new-xset (elts) (xset-extra
> xset)))
> > (elts (cond ((numberp x)
> (sb-vm:copy-number-to-heap x))
> > ((safe-member-type-elt-p x) x)
> > ;; surely things will go haywire if
> this occurs
> > diff --git a/src/code/type.lisp b/src/code/type.lisp
> > index 71b35509a..73c4da809 100644
> > --- a/src/code/type.lisp
> > +++ b/src/code/type.lisp
> > @@ -341,6 +341,9 @@
> > unparsed
> > (nconc unparsed '(&optional))))))
> >
> > +#+sb-xc-host ; why only on the host? Shouldn't we always declaim the
> ftype??
> > +(declaim (ftype (sfunction (ctype ctype) (values t t)) type=))
> > +
> > ;;; Return true if LIST1 and LIST2 have the same elements in the same
> > ;;; positions according to TYPE=. We return NIL, NIL if there is an
> > ;;; uncertain comparison.
> > @@ -3814,6 +3817,13 @@ used for a COMPLEX component.~:@>"
> > ;; Return possibly a union of a MEMBER type and a NUMERIC type,
> > ;; or just one or the other, or *EMPTY-TYPE* depending on what's in the
> XSET
> > ;; and the FP-ZEROES. XSET must not contains characters or real numbers.
> > +;; MEMBER types go into one of three hash containers:
> > +;; - *EQL-TYPE-CACHE* holds singleton types. A weak hash-table
> suffices for this.
> > +;; - *MEMBER-TYPE-HASHSET* holds types whose members are
> {NUMBER|CHARACTER|SYMBOL}.
> > +;; Intrinsically each element has a stable hash, making it possible
> to
> > +;; hash-cons XSETs without complications for EQ-comparable keys.
> > +;; - *MEMBER/EQ-TYPE-HASHSET* is the general case, allowing a mixture
> of objects
> > +;;; hashed by content-dependent hash and/or pseudorandom opaque hash.
> > (defun make-member-type (xset fp-zeroes)
> > ;; if we have a pair of zeros (e.g. 0.0d0 and -0.0d0), then we can
> > ;; canonicalize to (DOUBLE-FLOAT 0.0d0 0.0d0), because numeric
> > @@ -3851,18 +3861,48 @@ used for a COMPLEX component.~:@>"
> > ;; Bypass the hashset for type NULL because it's so important
> > (return-from make-member-type
> > (inline-cache-ctype (!alloc-member-type
> (pack-interned-ctype-bits 'member)
> > - (xset-from-list '(nil))
> > + (!new-xset '(nil) 1)
> > '())
> > null))))
> > - (let ((member-type
> > - (case (+ (length unpaired) (xset-count xset))
> > - (0 nil) ; nil
> > - ;; It's slightly suboptimal to use two DX-lets, but to
> remedy that,
> > - ;; a single macro invocation would need to select which
> hash collection
> > - ;; to look in. More easy would be to paste the macro guts
> here
> > - ;; with suitable alteration, which I don't want to do.
> > - (1 (new-ctype eql xset unpaired)) ; most common case
> > - (t (new-ctype member-type xset unpaired)))))
> > + (let* ((count (+ (length unpaired) (xset-count xset)))
> > + (member-type
> > + (unless (= count 0)
> > + (dx-let ((temp (!alloc-member-type (ctype-class-bits
> 'member)
> > + xset unpaired)))
> > + (cond
> > + ((= count 1)
> > + (let ((container *eql-type-cache*)
> > + (key (first (or unpaired (xset-data xset)))))
> > + (with-system-mutex ((hash-table-lock container))
> > + ;; This is like ENSURE-GETHASH but it
> potentially copies the key
> > + (or (gethash key container)
> > + (let ((copy (copy-ctype temp)))
> > + ;; re-fetch KEY from XSET in case it was
> copied.
> > + ;; hope no off-heap pointers buried within
> KEY.
> > + (setf (gethash (first (member-type-members
> copy)) container)
> > + copy))))))
> > + ((xset-every (lambda (x) (typep x '(or symbol number
> character))) xset)
> > + (hashset-insert-if-absent *member-type-hashset* temp
> #'copy-ctype))
> > + (t
> > + (binding*
> > + ((container *member/eq-type-hashset*)
> > + ((result foundp)
> > + (with-system-mutex (*xset-mutex*)
> > + (xset-generate-stable-hashes xset)
> > + (acond ((hashset-find container temp)
> > + (xset-delete-stable-hashes xset) ;
> inside the mutex scope
> > + (values it t))
> > + (t
> > + (values (hashset-insert container
> (copy-ctype temp))
> > + nil))))))
> > + (unless foundp ; "use" the var binding if
> #+sb-xc-host
> > + #-sb-xc-host ; attach finalizer (outside the
> mutex scope)
> > + (let ((xset (member-type-xset result))) ; in
> case XSET was copied
> > + (finalize
> > + result (lambda ()
> > + (with-system-mutex (*xset-mutex*)
> > + (xset-delete-stable-hashes
> xset))))))
> > + result)))))))
> > ;; The actual member-type contains the XSET (with no FP zeroes),
> > ;; and a list of unpaired zeroes.
> > (if (not float-types)
> > @@ -4631,8 +4671,6 @@ used for a COMPLEX component.~:@>"
> >
> > (define-type-class cons :enumerable nil :might-contain-other-types nil)
> >
> > -#+sb-xc-host
> > -(declaim (ftype (sfunction (ctype ctype) (values t t)) type=))
> > (defun make-cons-type (car-type cdr-type)
> > (aver (not (or (eq car-type *wild-type*)
> > (eq cdr-type *wild-type*))))
> > diff --git a/src/code/xset.lisp b/src/code/xset.lisp
> > index a6a86cb16..c8faa2bfb 100644
> > --- a/src/code/xset.lisp
> > +++ b/src/code/xset.lisp
> > @@ -22,25 +22,31 @@
> > ;;;; requires a function as the first argument -- not a function
> > ;;;; designator.
> > ;;;;
> > -;;;; XSET-LIST-SIZE is true only for XSETs whose data is stored into a
> > -;;;; list -- XSET-COUNT returns the real value.
> > -;;;;
> > ;;;; Note: XSET always uses EQL as the equivalence test
> >
> > (in-package "SB-KERNEL")
> >
> > (defstruct (xset (:constructor alloc-xset)
> > - (:constructor !copy-xset (list-size data))
> > + (:constructor !new-xset (data extra))
> > (:copier nil)
> > (:predicate nil))
> > - (list-size 0 :type index)
> > - (data nil :type (or list hash-table)))
> > + (data nil :type (or list hash-table))
> > + ;; EXTRA is a dual-purpose slot: initially it holds the number of
> items
> > + ;; in LIST. If the list becomes a hash-table, then EXTRA becomes 0.
> > + ;; An XSET can be optionally have a vector of stable hashes, 1 per
> element.
> > + ;; The hash vector if present goes in EXTRA, and the vector length
> > + ;; is the same as the list length. After creating a hash vector, it
> is forbidden
> > + ;; to add more elements to the set. In this manner we can avoid
> adding a subtype
> > + ;; of XSET stably-hashed-xset, or wasting a slot that would almost
> never be used.
> > + ;; (99.999% of all XSETs do not need stable hashes)
> > + (extra 0 :type (or simple-vector index)))
> > (declaim (freeze-type xset))
> >
> > (defun xset-count (xset)
> > (let ((data (xset-data xset)))
> > (if (listp data)
> > - (xset-list-size xset)
> > + (let ((extra (xset-extra xset)))
> > + (if (fixnump extra) extra (length extra)))
> > (hash-table-count data))))
> >
> > (defun map-xset (function xset)
> > @@ -60,28 +66,21 @@
> >
> > ;;; Checks that the element is not in the set yet.
> > (defun add-to-xset (elt xset)
> > - (let ((data (xset-data xset))
> > - (size (xset-list-size xset)))
> > + (let ((data (xset-data xset)))
> > (if (listp data)
> > - (if (< size +xset-list-size-limit+)
> > - (unless (member elt data :test #'eql)
> > - (setf (xset-list-size xset) (1+ size)
> > - (xset-data xset) (cons elt data)))
> > - (let ((table (make-hash-table :size (* 2 size) :test
> #'eql)))
> > - (setf (gethash elt table) t)
> > - (dolist (x data)
> > - (setf (gethash x table) t))
> > - (setf (xset-data xset) table)))
> > + (let ((size (xset-extra xset)))
> > + (if (< size +xset-list-size-limit+)
> > + (unless (member elt data :test #'eql)
> > + (setf (xset-extra xset) (1+ size)
> > + (xset-data xset) (cons elt data)))
> > + (let ((table (make-hash-table :size (* 2 size) :test
> #'eql)))
> > + (setf (gethash elt table) t)
> > + (dolist (x data)
> > + (setf (gethash x table) t))
> > + (setf (xset-extra xset) 0 ; looks nice to clear it
> > + (xset-data xset) table))))
> > (setf (gethash elt data) t))))
> >
> > -;; items must be canonical - no duplicates - and few in number.
> > -(defun xset-from-list (items)
> > - (let ((n (length items)))
> > - (aver (<= n +xset-list-size-limit+))
> > - (let ((xset (alloc-xset)))
> > - (setf (xset-list-size xset) n (xset-data xset) items)
> > - xset)))
> > -
> > (defun xset-union (a b)
> > (let ((xset (alloc-xset)))
> > (map-xset (lambda (x)
> > @@ -186,12 +185,16 @@
> > (defun xset-elts-hash (xset)
> > (let ((h 0))
> > (declare (sb-xc:fixnum h))
> > - (map-xset (lambda (x)
> > - ;; Rather than masking each intermediate result to
> MOST-POSITIVE-FIXNUM,
> > - ;; allow bits to rollover into the sign bit
> > - (when (typep x '(or symbol number character
> #-sb-xc-host instance))
> > - (setq h (plus-mod-fixnum (sb-xc:sxhash x) h))))
> > - xset)
> > + ;; Rather than masking each intermediate result to
> MOST-POSITIVE-FIXNUM,
> > + ;; allow bits to rollover into the sign bit
> > + (let ((hashes (xset-extra xset)))
> > + (if (simple-vector-p hashes)
> > + (dovector (x hashes)
> > + (setq h (plus-mod-fixnum h (truly-the fixnum (if (listp x)
> (cdr x) x)))))
> > + (map-xset (lambda (x)
> > + (when (typep x '(or symbol number character))
> > + (setq h (plus-mod-fixnum (sb-xc:sxhash x) h))))
> > + xset)))
> > ;; Now mix the bits thoroughly and then mask to a positive fixnum.
> > ;; I think this does not need to be compatible between host and
> target.
> > ;; But I'm trying to make it compatible anyway because I'm not 100%
> sure
> > @@ -203,3 +206,42 @@
> > #+sb-xc-host (ldb (byte sb-vm:n-word-bits 0) (ash h
> sb-vm:n-fixnum-tag-bits))
> > #-sb-xc-host (get-lisp-obj-address h)))
> > (logand (sb-impl::murmur3-fmix-word word-bits)
> most-positive-fixnum))))
> > +
> > +;;; Stably-hashed XSETs that have elements which are not nicely
> EQL-hashable
> > +;;; rely on a global table that maps any object to a pseudorandom hash.
> > +;;; The table keys are refcounted so that they can be deleted when no
> XSET
> > +;;; references a particular key. Caller MUST provide synchronization.
> > +(define-load-time-global *xset-stable-hashes* (make-hash-table :test
> 'eq))
> > +
> > +(defun xset-generate-stable-hashes (xset &aux (hashmap
> *xset-stable-hashes*))
> > + #-sb-xc-host (declare (notinline sb-impl::eql-hash)) ; forward ref
> > + (flet ((get-stable-hash-cell (obj)
> > + (let ((cell (gethash obj hashmap)))
> > + (cond (cell
> > + (incf (car cell))
> > + cell)
> > + (t
> > + (setf (gethash obj hashmap) (cons 1
> (ctype-random))))))))
> > + (let ((hashes (make-array (xset-count xset)))
> > + (i 0))
> > + (map-xset (lambda (elt)
> > + (multiple-value-bind (hashval eq?)
> > + #+sb-xc-host (if (sb-xc:typep elt '(or symbol
> character number))
> > + (values (sb-xc:sxhash elt) nil)
> > + (values 4 ; chosen by algorithm
> of https://xkcd.com/221/
> > + t)) ; yes, it's
> address-based
> > + #-sb-xc-host (sb-impl::eql-hash elt)
> > + (setf (aref hashes i) (if eq? (get-stable-hash-cell
> elt) hashval))
> > + (incf i)))
> > + xset)
> > + (setf (xset-extra xset) hashes)))
> > + xset)
> > +(defun xset-delete-stable-hashes (xset &aux (hashmap
> *xset-stable-hashes*))
> > + (let ((hashes (the simple-vector (xset-extra xset)))
> > + (index -1))
> > + ;; Iteration order will be the same as it was in
> GENERATE-STABLE-HASHES
> > + (map-xset (lambda (elt &aux (cell (aref hashes (incf index))))
> > + (when (and (listp cell)
> > + (zerop (decf (car cell)))) ; element is not
> used in any XSET
> > + (remhash elt hashmap)))
> > + xset)))
> > diff --git a/src/compiler/ctype.lisp b/src/compiler/ctype.lisp
> > index 18deeec53..7b846ee2e 100644
> > --- a/src/compiler/ctype.lisp
> > +++ b/src/compiler/ctype.lisp
> > @@ -242,7 +242,7 @@ and no value was provided for it." name))))))))))
> > (let* ((ctype (lvar-type lvar))
> > (int (funcall *ctype-test-fun* ctype type)))
> > (cond ((not int)
> > - (unless (type= ctype (specifier-type '(member dummy)))
> > + (unless (type= ctype (specifier-type '(eql dummy)))
> > (note-lossage "The ~:R argument is a ~S, not a ~S."
> > n (type-specifier ctype) (type-specifier
> type)))
> > nil)
> > diff --git a/src/compiler/generic/genesis.lisp
> b/src/compiler/generic/genesis.lisp
> > index 18b0a5aad..dd83acc31 100644
> > --- a/src/compiler/generic/genesis.lisp
> > +++ b/src/compiler/generic/genesis.lisp
> > @@ -1084,6 +1084,8 @@ core and return a descriptor to it."
> > (loop (if (cold-null list) (return n))
> > (incf n)
> > (setq list (cold-cdr list)))))
> > +(defun cold-push (item symbol)
> > + (cold-set symbol (cold-cons item (cold-symbol-value symbol))))
> >
> > ;;; Make a simple-vector on the target that holds the specified
> > ;;; OBJECTS, and return its descriptor.
> > @@ -1489,13 +1491,18 @@ core and return a descriptor to it."
> > (let ((cell (cold-find-classoid-cell (classoid-name obj)
> :create t)))
> > (write-slots cell :classoid result)))
> > ((ctype-p obj)
> > + ;; If OBJ belongs in a hash container, then deduce which
> > (let* ((hashset (sb-kernel::ctype->hashset-sym obj))
> > - (entry-p (and hashset (hashset-find (symbol-value
> hashset) obj))))
> > - ;; Record for preloading in hashset
> > - (cold-set 'sb-kernel::*!initial-ctypes*
> > - (cold-cons (cold-cons result (if entry-p (cold-intern
> hashset)
> > - *nil-descriptor*))
> > - (cold-symbol-value
> 'sb-kernel::*!initial-ctypes*))))))
> > + (preload
> > + (cond ((and hashset (hashset-find (symbol-value
> hashset) obj))
> > + hashset)
> > + ((and (member-type-p obj)
> > + ;; NULL is a hardwired case in the
> MEMBER type constructor
> > + (neq obj (specifier-type 'null))
> > + (type-singleton-p obj))
> > + 'sb-kernel::*eql-type-cache*))))
> > + (when preload ; Record it
> > + (cold-push (cold-cons result preload)
> 'sb-kernel::*!initial-ctypes*)))))
> > result))))
> >
> > ;;; Convert a layout to a wrapper and back.
> > diff --git a/tests/typehashmix.impure.lisp
> b/tests/typehashmix.impure.lisp
> > index 70290c9ea..0aac4027c 100644
> > --- a/tests/typehashmix.impure.lisp
> > +++ b/tests/typehashmix.impure.lisp
> > @@ -370,3 +370,33 @@
> > collect (list i tp))
> > (let ((post (compute-max-psl hs)))
> > (assert (<= (- post pre) 2)))))
> > +
> > +(defvar a "foo")
> > +(defvar b '(nil t))
> > +(defvar c #*101)
> > +(with-test (:name :hash-cons-member-type)
> > + (assert (eq (sb-kernel:specifier-type `(member ,a ,b ,c))
> > + (sb-kernel:specifier-type `(member ,c ,a ,b))))
> > + (assert (eq (sb-kernel:specifier-type `(member ,a ,b ,c))
> > + (sb-kernel:specifier-type `(member ,b ,c ,a)))))
> > +
> > +(with-test (:name :hash-cons-member-type-large)
> > + (let ((numbers ; force the XSET to be represented as a hash-table
> > + (loop for i below 30 collect (complex (coerce i 'single-float)
> i)))
> > + (list '(thing)))
> > + (assert (hash-table-p
> > + (sb-kernel::xset-data
> > + (sb-kernel::member-type-xset
> > + (sb-kernel:specifier-type `(member ,@numbers))))))
> > + (assert (eq (sb-kernel:specifier-type `(member ,a ,@numbers ,list))
> > + (sb-kernel:specifier-type `(member ,list ,a
> ,@numbers))))))
> > +
> > +;(print sb-kernel::*xset-stable-hashes*)
> > +(gc :full t)
> > +#+sb-thread (sb-kernel:run-pending-finalizers)
> > +(with-test (:name :xset-stable-hash-weakness)
> > + ;; After running the :MEMBER-TYPE-HASH-MIXER test, there were >5000
> entries
> > + ;; in the *XSET-STABLE-HASHES* table for me.
> > + ;; The preceding GC should have had some effect.
> > + (assert (< (hash-table-count sb-kernel::*xset-stable-hashes*)
> > + 100)))
> >
> > -----------------------------------------------------------------------
> >
> >
> > hooks/post-receive
> > --
> > SBCL
> >
> >
> > _______________________________________________
> > Sbcl-commits mailing list
> > Sbcl-commits@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/sbcl-commits
>
[Attachment #5 (text/html)]
<div dir="ltr"><div dir="ltr">I added some printing if there are any elements in that \
table, and I've not seen it print anything, so I don't know what it's \
doing for you. But it does depend on GC behavior so I made it tolerant of worse GC. \
Not sure why that should be needed, but I think it's still a valid \
test.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jan \
30, 2023 at 9:29 PM Stas Boukarev <<a \
href="mailto:stassats@gmail.com">stassats@gmail.com</a>> \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I'm randomly \
getting things like<br> ::: UNEXPECTED-FAILURE :XSET-STABLE-HASH-WEAKNESS due to \
SIMPLE-ERROR:<br> "The assertion<br>
(< (HASH-TABLE-COUNT SB-KERNEL::*XSET-STABLE-HASHES*) 100) failed \
with<br>
(HASH-TABLE-COUNT SB-KERNEL::*XSET-STABLE-HASHES*) = 145."<br>
<br>
On Sat, Dec 31, 2022 at 8:07 PM snuglas via Sbcl-commits<br>
<<a href="mailto:sbcl-commits@lists.sourceforge.net" \
target="_blank">sbcl-commits@lists.sourceforge.net</a>> wrote:<br> ><br>
> The branch "master" has been updated in SBCL:<br>
> via c7c449b50fb072cabd9517879ac7b1e75d68c768 (commit)<br>
> from ef088c6fbce7dca5032aa08a3702b7f622b41d32 (commit)<br>
><br>
> - Log -----------------------------------------------------------------<br>
> commit c7c449b50fb072cabd9517879ac7b1e75d68c768<br>
> Author: Douglas Katzman <<a href="mailto:dougk@google.com" \
target="_blank">dougk@google.com</a>><br> > Date: Sat Dec 31 03:11:15 2022 \
-0500<br> ><br>
> Hash-cons all MEMBER types<br>
><br>
> Types whose members include arbitrary objects such as lists or strings \
will now<br> > choose and memoize per-element pseudorandom hashes as \
needed, while trying to<br> > avoid creating a forever-growing set of \
objects that acquired a stable hash,<br> > as the object->hash \
association is dropped when no XSET refers to an object.<br> > ---<br>
> src/code/type-class.lisp | 121 \
++++++++++++--------------------------<br> > src/code/type.lisp \
| 62 +++++++++++++++----<br> > src/code/xset.lisp | 106 \
+++++++++++++++++++++++----------<br> > src/compiler/ctype.lisp \
| 2 +-<br> > src/compiler/generic/genesis.lisp | 19 ++++--<br>
> tests/typehashmix.impure.lisp | 30 ++++++++++<br>
> 6 files changed, 207 insertions(+), 133 deletions(-)<br>
><br>
> diff --git a/src/code/type-class.lisp b/src/code/type-class.lisp<br>
> index e04e04c5e..f14dadab9 100644<br>
> --- a/src/code/type-class.lisp<br>
> +++ b/src/code/type-class.lisp<br>
> @@ -734,9 +734,10 @@<br>
><br>
> ;; Singleton MEMBER types are best dealt with via a weak-value hash-table \
because:<br> > ;; * (MEMBER THING) might lack an address-insensitive hash for \
THING<br> > -;; but src/code/hashset can not use address-based hashes. This \
limitation is unique<br> > -;; to MEMBER types because other CTYPE instances \
are compositions of CTYPES<br> > -;; where all subparts have assigned \
hashes.<br> > +;; but src/code/hashset goes through a lot of rigmarole to \
handle address-bashed<br> > +;; hashing, and the end result for a single key \
would laboriously emulate an EQL table.<br> > +;; This is especially important \
for the compiler because each time it asks itself the<br> > +;; CTYPE-OF a \
constant leaf, the answer might be a singleton MEMBER type.<br> > ;; * Symbols \
have slightly bad SXHASH values (by language requirement):<br> > ;; \
"For any two objects, x and y which are symbols and which are similar<br> > \
;; (sxhash x) and (sxhash y) yield the same mathematical value even if x and y \
exist<br> > @@ -747,35 +748,6 @@<br>
> ;; will cause the hashset to exceed its probe sequence length limit.<br>
> ;; This isn't to say we couldn't assign some bits of SYMBOL-HASH \
pseudorandomly,<br> > ;; and mask them out in the value returned by \
CL:SXHASH.<br> > -;;<br>
> -;; Also: XSETs containing arbitrary objects such as strings and lists don't \
have<br> > -;; a good hash at all. There is not really a way to compute a mixed \
hash<br> > -;; other than by pinning all objects in the XSET and taking their \
addresses.<br> > -;; Then we'd need to figure out that GC happened, and it \
becomes a pain.<br> > -;; This seems more complicated than the situation \
warrants.<br> > -;; So we'll just give up on hash-consing which should fix \
lp#1999687<br> > -;;<br>
> -;; An outline of a better design would be as follows:<br>
> -;; - create a global hash-table of objects which were placed in an XSET<br>
> -;; other than the nicely hashable object types. Call this \
xset-key->hash-mapping.<br> > -;; This mapping does not need to be weak, \
because it will have a way of purging it.<br> > -;; Each value in the table is \
a cons of a random hash and the number of XSETs<br> > -;; using the key. \
Increment the refcount when making a new XSET with that key.<br> > -;; - when \
hashing the XSET, look up its keys (other than EQL hashable) in the global table<br> \
> -;; - attach a finalizer to the XSET. The finalizer's job is to decrement \
the refcount<br> > -;; on each key in the global mappping. When the count \
reaches zero there are no<br> > -;; XSETs that refer to the key, and the \
random hash can be removed.<br> > -;; Why make a refcounted table? Because \
otherwise there is no way to remove mapping<br> > -;; entries for objects that \
outlive the MEMBER-TYPE's use of the object but where the<br> > -;; MEMBER \
type itself is dead. Worst-case, every object in the lisp image could at some<br> \
> -;; point appear in a MEMBER type but then never be needed again with regard to \
type<br> > -;; system operations. So you'd have created a permanent mapping of \
every object to<br> > -;; a random hash for no good reason.<br>
> -<br>
> -;; Why the singleton table is so important is that any time the compiler asks \
itself<br> > -;; the ctype-of a constant leaf, it might yield `(MEMBER \
,the-constant).<br> > -;; So then you end up with an assortment of random objects \
that don't hash<br> > -;; nicely in a ctype hashset, but are OK in a \
hash-table.<br> > (define-load-time-global *eql-type-cache* ; like \
EQL-SPECIALIZER-TABLE in PCL<br> > (sb-impl::make-system-hash-table :test \
'eql :weakness :value :synchronized nil))<br> ><br>
> @@ -784,9 +756,8 @@<br>
> (heap-allocated-p ,obj)))<br>
><br>
> (defvar *hashsets-preloaded* nil)<br>
> -(defmacro new-ctype (pseudonym &rest initargs)<br>
> - (let* ((name (if (eq pseudonym 'eql) 'member-type pseudonym))<br>
> - (allocator (package-symbolicate "SB-KERNEL" \
"!ALLOC-" name))<br> > +(defmacro new-ctype (name &rest \
initargs)<br> > + (let* ((allocator (package-symbolicate "SB-KERNEL" \
"!ALLOC-" name))<br> > (hashset (package-symbolicate \
"SB-KERNEL" "*" name "-HASHSET*"))<br> > \
(bits (ctype-class-bits (ctype-instance->type-class name))))<br> > \
#+sb-xc-host ; allocate permanent data, and insert into cache if not found<br> > \
@@ -803,32 +774,7 @@<br> > (unless *hashsets-preloaded*<br>
> (write-string "CTYPE hashset preload failure")<br>
> (sb-vm:ldb-monitor))<br>
> - ,(case pseudonym<br>
> - (eql ; as per above remarks: use hash-table, not hashset<br>
> - `(let* ((xset ,(first initargs))<br>
> - (zeros ,(second initargs))<br>
> - (key (first (or zeros (xset-data xset))))<br>
> - (table *eql-type-cache*))<br>
> - (with-system-mutex ((hash-table-lock table))<br>
> - ;; This is like ENSURE-GETHASH but it potentially \
copies the key<br> > - (or (gethash key table)<br>
> - ;; hope no off-heap pointers buried within \
KEY<br> > - (let ((key (cond ((numberp key) \
(sb-vm:copy-number-to-heap key))<br> > - \
((safe-member-type-elt-p key) key)<br> > - \
(t<br> > - (warn \
"Off-heap hash-table key @ ~X"<br> > - \
(get-lisp-obj-address key))<br> > - \
key))))<br> > - (setf (gethash key table) \
(copy-ctype temp)))))))<br> > - (member-type ; problem case: \
don't always know how to hash well<br> > - `(let ((xset \
,(first initargs)))<br> > - (flet ((hashable (x) (typep x \
'(or symbol number character instance))))<br> > - (if \
(xset-every #'hashable xset)<br> > - \
(hashset-insert-if-absent ,hashset temp #'copy-ctype)<br> > - \
;; otherwise just copy it always (for now)<br> > - \
(copy-ctype temp)))))<br> > - (t<br>
> - `(hashset-insert-if-absent ,hashset temp \
#'copy-ctype))))))<br> > + (hashset-insert-if-absent ,hashset temp \
#'copy-ctype))))<br> ><br>
> ;;; The NAMED-TYPE is used to represent *, T and NIL, the standard<br>
> ;;; special cases, as well as other special cases needed to<br>
> @@ -878,6 +824,14 @@<br>
> (def-type-model (member-type (:constructor* nil (xset fp-zeroes)))<br>
> (xset nil :type xset :hasher xset-elts-hash :test xset=)<br>
> (fp-zeroes nil :type list :hasher hash-fp-zeros :test fp-zeros=)))<br>
> +(define-load-time-global *xset-mutex* (or #-sb-xc-host \
(sb-thread:make-mutex)))<br> > +;;; This hashset is guarded by *XSET-MUTEX*. It is \
_not_ declared as synchronized<br> > +;;; so that HASHSET-INSERT-IF-ABSENT should \
not acquire a mutex inside a mutex<br> > +;;; (stable hashes have to be assigned \
while holding the lock)<br> > +(define-load-time-global \
*member/eq-type-hashset*<br> > + (make-hashset 32 #'member-type-equiv \
#'calc-member-type-hash<br> > + :weakness t \
:synchronized nil))<br> > +(pushnew '*member/eq-type-hashset* \
*ctype-hashsets*)<br> ><br>
> ;;; An ARRAY-TYPE is used to represent any array type, including<br>
> ;;; things such as SIMPLE-BASE-STRING.<br>
> @@ -1248,6 +1202,8 @@<br>
><br>
> ;;; Return the name of the global hashset that OBJ (a CTYPE instance)<br>
> ;;; would be stored in, if it were stored in one.<br>
> +;;; This is only for bootstrap, and not 100% precise as it does not know<br>
> +;;; about the other MEMBER type containers.<br>
> (defun ctype->hashset-sym (obj)<br>
> (macrolet ((generate ()<br>
> (collect ((clauses))<br>
> @@ -1331,25 +1287,21 @@<br>
> #-sb-xc-host<br>
> (progn<br>
> (defglobal *!initial-ctypes* nil)<br>
> -(defun preload-ctype-hashsets (&aux permtypes)<br>
> - (declare (ignorable permtypes))<br>
> +(defun preload-ctype-hashsets ()<br>
> (dolist (pair (nreverse *!initial-ctypes*))<br>
> - (destructuring-bind (instance . hashset-symbol) pair<br>
> - (cond ((not hashset-symbol)<br>
> - ;; There are very few which aren't in a hashset:<br>
> - ;; - (6) NAMED-TYPEs<br>
> - ;; - (1) MEMBER-TYPE NULL<br>
> - ;; - (3) BASE-CHAR, EXTENDED-CHAR, CHARACTER<br>
> - ;; - (1) CONS<br>
> - (push instance permtypes))<br>
> - ;; Mandatory special-case for singleton MEMBER types<br>
> - ((and (member-type-p instance) (not (cdr (member-type-members \
instance))))<br> > - (setf (gethash (car (member-type-members \
instance)) *eql-type-cache*)<br> > - instance))<br>
> + (let ((instance (car pair))<br>
> + (container (symbol-value (cdr pair))))<br>
> + (cond ((hash-table-p container)<br>
> + (aver (member-type-p instance))<br>
> + ;; As of this writing there are only two EQL types to \
preload:<br> > + ;; one is in the IR1-transform of FORMAT with \
stream (EQL T),<br> > + ;; the other is CHECK-ARG-TYPE looking \
for (EQL DUMMY) type.<br> > + (let ((key (first \
(member-type-members instance))))<br> > + (aver (not \
(gethash key container)))<br> > + (setf (gethash key \
container) instance)))<br> > (t<br>
> - (let ((hashset (symbol-value hashset-symbol)))<br>
> - (aver (not (hashset-find hashset instance))) ; instances \
are dumped bottom-up<br> > - (hashset-insert hashset \
instance))))<br> > + (aver (not (hashset-find container \
instance))) ; instances are built bottom-up<br> > + \
(hashset-insert container instance)))<br> > (labels \
((ensure-interned-list (list hashset)<br> > (let \
((found (hashset-find hashset list)))<br> > (when \
(and found (neq found list))<br> > @@ -1367,7 +1319,7 @@<br>
> (bug "genesis dumped bad instance \
within ~X"<br> > \
(get-lisp-obj-address instance)))))))<br> > (etypecase instance<br>
> - ((or named-type numeric-type member-type character-set-type ; \
nothing extra to do<br> > + ((or numeric-type member-type \
character-set-type ; nothing extra to do<br> > #+sb-simd-pack \
simd-pack-type #+sb-simd-pack-256 simd-pack-256-type<br> > \
hairy-type))<br> > (args-type<br>
> @@ -1387,7 +1339,6 @@<br>
> (ensure-interned-list (compound-type-types instance) \
*ctype-set-hashset*))<br> > (negation-type<br>
> (check (negation-type-type instance)))))))<br>
> - (aver (= (length permtypes) (+ 11 #-sb-unicode -2)))<br>
> #+sb-devel (setq *hashsets-preloaded* t))<br>
> (preload-ctype-hashsets))<br>
><br>
> @@ -1530,10 +1481,16 @@<br>
><br>
> ;;; Copy X to the heap, give it a random hash, and if it is a MEMBER type<br>
> ;;; then assert that all members are cacheable.<br>
> +#+sb-xc-host<br>
> +(defun copy-ctype (x)<br>
> + (let ((bits (logior (logand (ctype-random) +type-hash-mask+) (type-%bits \
x))))<br> > + (etypecase x<br>
> + (member-type<br>
> + (!alloc-member-type bits (member-type-xset x) (member-type-fp-zeroes \
x))))))<br> > #-sb-xc-host<br>
> (defun copy-ctype (x)<br>
> (declare (type ctype x))<br>
> - (declare (sb-c::tlab :system) (inline !copy-xset))<br>
> + (declare (sb-c::tlab :system) (inline !new-xset))<br>
> #+c-stack-is-control-stack (aver (stack-allocated-p x))<br>
> (labels ((copy (x)<br>
> ;; Return a heap copy of X if X was arena or \
stack-allocated.<br> > @@ -1559,7 +1516,7 @@<br>
> ;; While we could use (load-time-value) to \
referece a constant empty xset<br> > ;; \
there's really no point to doing that.<br> > \
(collect ((elts))<br> > - (dolist (x data \
(!copy-xset (xset-list-size xset) (elts)))<br> > + \
(dolist (x data (!new-xset (elts) (xset-extra xset)))<br> > \
(elts (cond ((numberp x) (sb-vm:copy-number-to-heap x))<br> > \
((safe-member-type-elt-p x) x)<br> > \
;; surely things will go haywire if this occurs<br> > diff --git \
a/src/code/type.lisp b/src/code/type.lisp<br> > index 71b35509a..73c4da809 \
100644<br> > --- a/src/code/type.lisp<br>
> +++ b/src/code/type.lisp<br>
> @@ -341,6 +341,9 @@<br>
> unparsed<br>
> (nconc unparsed '(&optional))))))<br>
><br>
> +#+sb-xc-host ; why only on the host? Shouldn't we always declaim the \
ftype??<br> > +(declaim (ftype (sfunction (ctype ctype) (values t t)) type=))<br>
> +<br>
> ;;; Return true if LIST1 and LIST2 have the same elements in the same<br>
> ;;; positions according to TYPE=. We return NIL, NIL if there is an<br>
> ;;; uncertain comparison.<br>
> @@ -3814,6 +3817,13 @@ used for a COMPLEX component.~:@>"<br>
> ;; Return possibly a union of a MEMBER type and a NUMERIC type,<br>
> ;; or just one or the other, or *EMPTY-TYPE* depending on what's in the \
XSET<br> > ;; and the FP-ZEROES. XSET must not contains characters or real \
numbers.<br> > +;; MEMBER types go into one of three hash containers:<br>
> +;; - *EQL-TYPE-CACHE* holds singleton types. A weak hash-table suffices for \
this.<br> > +;; - *MEMBER-TYPE-HASHSET* holds types whose members are \
{NUMBER|CHARACTER|SYMBOL}.<br> > +;; Intrinsically each element has a stable \
hash, making it possible to<br> > +;; hash-cons XSETs without complications \
for EQ-comparable keys.<br> > +;; - *MEMBER/EQ-TYPE-HASHSET* is the general \
case, allowing a mixture of objects<br> > +;;; hashed by content-dependent \
hash and/or pseudorandom opaque hash.<br> > (defun make-member-type (xset \
fp-zeroes)<br> > ;; if we have a pair of zeros (e.g. 0.0d0 and -0.0d0), then \
we can<br> > ;; canonicalize to (DOUBLE-FLOAT 0.0d0 0.0d0), because \
numeric<br> > @@ -3851,18 +3861,48 @@ used for a COMPLEX \
component.~:@>"<br> > ;; Bypass the hashset for type NULL \
because it's so important<br> > (return-from make-member-type<br>
> (inline-cache-ctype (!alloc-member-type (pack-interned-ctype-bits \
'member)<br> > - \
(xset-from-list '(nil))<br> > + \
(!new-xset '(nil) 1)<br> > \
'())<br> > null))))<br>
> - (let ((member-type<br>
> - (case (+ (length unpaired) (xset-count xset))<br>
> - (0 nil) ; nil<br>
> - ;; It's slightly suboptimal to use two DX-lets, but to \
remedy that,<br> > - ;; a single macro invocation would need to \
select which hash collection<br> > - ;; to look in. More easy \
would be to paste the macro guts here<br> > - ;; with suitable \
alteration, which I don't want to do.<br> > - (1 (new-ctype \
eql xset unpaired)) ; most common case<br> > - (t (new-ctype \
member-type xset unpaired)))))<br> > + (let* ((count (+ (length unpaired) \
(xset-count xset)))<br> > + (member-type<br>
> + (unless (= count 0)<br>
> + (dx-let ((temp (!alloc-member-type (ctype-class-bits \
'member)<br> > + \
xset unpaired)))<br> > + (cond<br>
> + ((= count 1)<br>
> + (let ((container *eql-type-cache*)<br>
> + (key (first (or unpaired (xset-data \
xset)))))<br> > + (with-system-mutex \
((hash-table-lock container))<br> > + ;; This is \
like ENSURE-GETHASH but it potentially copies the key<br> > + \
(or (gethash key container)<br> > + (let \
((copy (copy-ctype temp)))<br> > + ;; \
re-fetch KEY from XSET in case it was copied.<br> > + \
;; hope no off-heap pointers buried within KEY.<br> > + \
(setf (gethash (first (member-type-members copy)) container)<br> > + \
copy))))))<br> > + ((xset-every (lambda (x) (typep x \
'(or symbol number character))) xset)<br> > + \
(hashset-insert-if-absent *member-type-hashset* temp #'copy-ctype))<br> > + \
(t<br> > + (binding*<br>
> + ((container *member/eq-type-hashset*)<br>
> + ((result foundp)<br>
> + (with-system-mutex (*xset-mutex*)<br>
> + (xset-generate-stable-hashes xset)<br>
> + (acond ((hashset-find container \
temp)<br> > + \
(xset-delete-stable-hashes xset) ; inside the mutex scope<br> > + \
(values it t))<br> > + (t<br>
> + (values (hashset-insert \
container (copy-ctype temp))<br> > + \
nil))))))<br> > + (unless foundp ; "use" \
the var binding if #+sb-xc-host<br> > + \
#-sb-xc-host ; attach finalizer (outside the mutex scope)<br> > + \
(let ((xset (member-type-xset result))) ; in case XSET was copied<br> > + \
(finalize<br> > + result (lambda ()<br>
> + (with-system-mutex \
(*xset-mutex*)<br> > + \
(xset-delete-stable-hashes xset))))))<br> > + \
result)))))))<br> > ;; The actual member-type contains the XSET (with \
no FP zeroes),<br> > ;; and a list of unpaired zeroes.<br>
> (if (not float-types)<br>
> @@ -4631,8 +4671,6 @@ used for a COMPLEX component.~:@>"<br>
><br>
> (define-type-class cons :enumerable nil :might-contain-other-types nil)<br>
><br>
> -#+sb-xc-host<br>
> -(declaim (ftype (sfunction (ctype ctype) (values t t)) type=))<br>
> (defun make-cons-type (car-type cdr-type)<br>
> (aver (not (or (eq car-type *wild-type*)<br>
> (eq cdr-type *wild-type*))))<br>
> diff --git a/src/code/xset.lisp b/src/code/xset.lisp<br>
> index a6a86cb16..c8faa2bfb 100644<br>
> --- a/src/code/xset.lisp<br>
> +++ b/src/code/xset.lisp<br>
> @@ -22,25 +22,31 @@<br>
> ;;;; requires a function as the first argument -- not a function<br>
> ;;;; designator.<br>
> ;;;;<br>
> -;;;; XSET-LIST-SIZE is true only for XSETs whose data is stored into a<br>
> -;;;; list -- XSET-COUNT returns the real value.<br>
> -;;;;<br>
> ;;;; Note: XSET always uses EQL as the equivalence test<br>
><br>
> (in-package "SB-KERNEL")<br>
><br>
> (defstruct (xset (:constructor alloc-xset)<br>
> - (:constructor !copy-xset (list-size data))<br>
> + (:constructor !new-xset (data extra))<br>
> (:copier nil)<br>
> (:predicate nil))<br>
> - (list-size 0 :type index)<br>
> - (data nil :type (or list hash-table)))<br>
> + (data nil :type (or list hash-table))<br>
> + ;; EXTRA is a dual-purpose slot: initially it holds the number of items<br>
> + ;; in LIST. If the list becomes a hash-table, then EXTRA becomes 0.<br>
> + ;; An XSET can be optionally have a vector of stable hashes, 1 per \
element.<br> > + ;; The hash vector if present goes in EXTRA, and the vector \
length<br> > + ;; is the same as the list length. After creating a hash vector, \
it is forbidden<br> > + ;; to add more elements to the set. In this manner we \
can avoid adding a subtype<br> > + ;; of XSET stably-hashed-xset, or wasting a \
slot that would almost never be used.<br> > + ;; (99.999% of all XSETs do not \
need stable hashes)<br> > + (extra 0 :type (or simple-vector index)))<br>
> (declaim (freeze-type xset))<br>
><br>
> (defun xset-count (xset)<br>
> (let ((data (xset-data xset)))<br>
> (if (listp data)<br>
> - (xset-list-size xset)<br>
> + (let ((extra (xset-extra xset)))<br>
> + (if (fixnump extra) extra (length extra)))<br>
> (hash-table-count data))))<br>
><br>
> (defun map-xset (function xset)<br>
> @@ -60,28 +66,21 @@<br>
><br>
> ;;; Checks that the element is not in the set yet.<br>
> (defun add-to-xset (elt xset)<br>
> - (let ((data (xset-data xset))<br>
> - (size (xset-list-size xset)))<br>
> + (let ((data (xset-data xset)))<br>
> (if (listp data)<br>
> - (if (< size +xset-list-size-limit+)<br>
> - (unless (member elt data :test #'eql)<br>
> - (setf (xset-list-size xset) (1+ size)<br>
> - (xset-data xset) (cons elt data)))<br>
> - (let ((table (make-hash-table :size (* 2 size) :test \
#'eql)))<br> > - (setf (gethash elt table) t)<br>
> - (dolist (x data)<br>
> - (setf (gethash x table) t))<br>
> - (setf (xset-data xset) table)))<br>
> + (let ((size (xset-extra xset)))<br>
> + (if (< size +xset-list-size-limit+)<br>
> + (unless (member elt data :test #'eql)<br>
> + (setf (xset-extra xset) (1+ size)<br>
> + (xset-data xset) (cons elt data)))<br>
> + (let ((table (make-hash-table :size (* 2 size) :test \
#'eql)))<br> > + (setf (gethash elt table) t)<br>
> + (dolist (x data)<br>
> + (setf (gethash x table) t))<br>
> + (setf (xset-extra xset) 0 ; looks nice to clear it<br>
> + (xset-data xset) table))))<br>
> (setf (gethash elt data) t))))<br>
><br>
> -;; items must be canonical - no duplicates - and few in number.<br>
> -(defun xset-from-list (items)<br>
> - (let ((n (length items)))<br>
> - (aver (<= n +xset-list-size-limit+))<br>
> - (let ((xset (alloc-xset)))<br>
> - (setf (xset-list-size xset) n (xset-data xset) items)<br>
> - xset)))<br>
> -<br>
> (defun xset-union (a b)<br>
> (let ((xset (alloc-xset)))<br>
> (map-xset (lambda (x)<br>
> @@ -186,12 +185,16 @@<br>
> (defun xset-elts-hash (xset)<br>
> (let ((h 0))<br>
> (declare (sb-xc:fixnum h))<br>
> - (map-xset (lambda (x)<br>
> - ;; Rather than masking each intermediate result to \
MOST-POSITIVE-FIXNUM,<br> > - ;; allow bits to rollover \
into the sign bit<br> > - (when (typep x '(or symbol \
number character #-sb-xc-host instance))<br> > - (setq h \
(plus-mod-fixnum (sb-xc:sxhash x) h))))<br> > - xset)<br>
> + ;; Rather than masking each intermediate result to \
MOST-POSITIVE-FIXNUM,<br> > + ;; allow bits to rollover into the sign bit<br>
> + (let ((hashes (xset-extra xset)))<br>
> + (if (simple-vector-p hashes)<br>
> + (dovector (x hashes)<br>
> + (setq h (plus-mod-fixnum h (truly-the fixnum (if (listp x) \
(cdr x) x)))))<br> > + (map-xset (lambda (x)<br>
> + (when (typep x '(or symbol number \
character))<br> > + (setq h (plus-mod-fixnum \
(sb-xc:sxhash x) h))))<br> > + xset)))<br>
> ;; Now mix the bits thoroughly and then mask to a positive fixnum.<br>
> ;; I think this does not need to be compatible between host and \
target.<br> > ;; But I'm trying to make it compatible anyway because \
I'm not 100% sure<br> > @@ -203,3 +206,42 @@<br>
> #+sb-xc-host (ldb (byte sb-vm:n-word-bits 0) (ash h \
sb-vm:n-fixnum-tag-bits))<br> > #-sb-xc-host \
(get-lisp-obj-address h)))<br> > (logand (sb-impl::murmur3-fmix-word \
word-bits) most-positive-fixnum))))<br> > +<br>
> +;;; Stably-hashed XSETs that have elements which are not nicely \
EQL-hashable<br> > +;;; rely on a global table that maps any object to a \
pseudorandom hash.<br> > +;;; The table keys are refcounted so that they can be \
deleted when no XSET<br> > +;;; references a particular key. Caller MUST provide \
synchronization.<br> > +(define-load-time-global *xset-stable-hashes* \
(make-hash-table :test 'eq))<br> > +<br>
> +(defun xset-generate-stable-hashes (xset &aux (hashmap \
*xset-stable-hashes*))<br> > + #-sb-xc-host (declare (notinline \
sb-impl::eql-hash)) ; forward ref<br> > + (flet ((get-stable-hash-cell (obj)<br>
> + (let ((cell (gethash obj hashmap)))<br>
> + (cond (cell<br>
> + (incf (car cell))<br>
> + cell)<br>
> + (t<br>
> + (setf (gethash obj hashmap) (cons 1 \
(ctype-random))))))))<br> > + (let ((hashes (make-array (xset-count \
xset)))<br> > + (i 0))<br>
> + (map-xset (lambda (elt)<br>
> + (multiple-value-bind (hashval eq?)<br>
> + #+sb-xc-host (if (sb-xc:typep elt '(or \
symbol character number))<br> > + \
(values (sb-xc:sxhash elt) nil)<br> > + \
(values 4 ; chosen by algorithm of <a href="https://xkcd.com/221/" rel="noreferrer" \
target="_blank">https://xkcd.com/221/</a><br> > + \
t)) ; yes, it's address-based<br> > + \
#-sb-xc-host (sb-impl::eql-hash elt)<br> > + (setf \
(aref hashes i) (if eq? (get-stable-hash-cell elt) hashval))<br> > + \
(incf i)))<br> > + xset)<br>
> + (setf (xset-extra xset) hashes)))<br>
> + xset)<br>
> +(defun xset-delete-stable-hashes (xset &aux (hashmap \
*xset-stable-hashes*))<br> > + (let ((hashes (the simple-vector (xset-extra \
xset)))<br> > + (index -1))<br>
> + ;; Iteration order will be the same as it was in \
GENERATE-STABLE-HASHES<br> > + (map-xset (lambda (elt &aux (cell (aref \
hashes (incf index))))<br> > + (when (and (listp cell)<br>
> + (zerop (decf (car cell)))) ; element \
is not used in any XSET<br> > + (remhash elt \
hashmap)))<br> > + xset)))<br>
> diff --git a/src/compiler/ctype.lisp b/src/compiler/ctype.lisp<br>
> index 18deeec53..7b846ee2e 100644<br>
> --- a/src/compiler/ctype.lisp<br>
> +++ b/src/compiler/ctype.lisp<br>
> @@ -242,7 +242,7 @@ and no value was provided for it." name))))))))))<br>
> (let* ((ctype (lvar-type lvar))<br>
> (int (funcall *ctype-test-fun* ctype type)))<br>
> (cond ((not int)<br>
> - (unless (type= ctype (specifier-type '(member \
dummy)))<br> > + (unless (type= ctype (specifier-type \
'(eql dummy)))<br> > (note-lossage "The ~:R \
argument is a ~S, not a ~S."<br> > \
n (type-specifier ctype) (type-specifier type)))<br> > \
nil)<br> > diff --git a/src/compiler/generic/genesis.lisp \
b/src/compiler/generic/genesis.lisp<br> > index 18b0a5aad..dd83acc31 100644<br>
> --- a/src/compiler/generic/genesis.lisp<br>
> +++ b/src/compiler/generic/genesis.lisp<br>
> @@ -1084,6 +1084,8 @@ core and return a descriptor to it."<br>
> (loop (if (cold-null list) (return n))<br>
> (incf n)<br>
> (setq list (cold-cdr list)))))<br>
> +(defun cold-push (item symbol)<br>
> + (cold-set symbol (cold-cons item (cold-symbol-value symbol))))<br>
><br>
> ;;; Make a simple-vector on the target that holds the specified<br>
> ;;; OBJECTS, and return its descriptor.<br>
> @@ -1489,13 +1491,18 @@ core and return a descriptor to it."<br>
> (let ((cell (cold-find-classoid-cell (classoid-name obj) \
:create t)))<br> > (write-slots cell :classoid \
result)))<br> > ((ctype-p obj)<br>
> + ;; If OBJ belongs in a hash container, then deduce \
which<br> > (let* ((hashset \
(sb-kernel::ctype->hashset-sym obj))<br> > - \
(entry-p (and hashset (hashset-find (symbol-value hashset) obj))))<br> > - \
;; Record for preloading in hashset<br> > - (cold-set \
'sb-kernel::*!initial-ctypes*<br> > - (cold-cons \
(cold-cons result (if entry-p (cold-intern hashset)<br> > - \
*nil-descriptor*))<br> > - \
(cold-symbol-value 'sb-kernel::*!initial-ctypes*))))))<br> > + \
(preload<br> > + (cond ((and hashset (hashset-find \
(symbol-value hashset) obj))<br> > + \
hashset)<br> > + ((and (member-type-p \
obj)<br> > + ;; NULL is a \
hardwired case in the MEMBER type constructor<br> > + \
(neq obj (specifier-type 'null))<br> > + \
(type-singleton-p obj))<br> > + \
'sb-kernel::*eql-type-cache*))))<br> > + (when preload \
; Record it<br> > + (cold-push (cold-cons result \
preload) 'sb-kernel::*!initial-ctypes*)))))<br> > result))))<br>
><br>
> ;;; Convert a layout to a wrapper and back.<br>
> diff --git a/tests/typehashmix.impure.lisp b/tests/typehashmix.impure.lisp<br>
> index 70290c9ea..0aac4027c 100644<br>
> --- a/tests/typehashmix.impure.lisp<br>
> +++ b/tests/typehashmix.impure.lisp<br>
> @@ -370,3 +370,33 @@<br>
> collect (list i tp))<br>
> (let ((post (compute-max-psl hs)))<br>
> (assert (<= (- post pre) 2)))))<br>
> +<br>
> +(defvar a "foo")<br>
> +(defvar b '(nil t))<br>
> +(defvar c #*101)<br>
> +(with-test (:name :hash-cons-member-type)<br>
> + (assert (eq (sb-kernel:specifier-type `(member ,a ,b ,c))<br>
> + (sb-kernel:specifier-type `(member ,c ,a ,b))))<br>
> + (assert (eq (sb-kernel:specifier-type `(member ,a ,b ,c))<br>
> + (sb-kernel:specifier-type `(member ,b ,c ,a)))))<br>
> +<br>
> +(with-test (:name :hash-cons-member-type-large)<br>
> + (let ((numbers ; force the XSET to be represented as a hash-table<br>
> + (loop for i below 30 collect (complex (coerce i \
'single-float) i)))<br> > + (list '(thing)))<br>
> + (assert (hash-table-p<br>
> + (sb-kernel::xset-data<br>
> + (sb-kernel::member-type-xset<br>
> + (sb-kernel:specifier-type `(member ,@numbers))))))<br>
> + (assert (eq (sb-kernel:specifier-type `(member ,a ,@numbers ,list))<br>
> + (sb-kernel:specifier-type `(member ,list ,a \
,@numbers))))))<br> > +<br>
> +;(print sb-kernel::*xset-stable-hashes*)<br>
> +(gc :full t)<br>
> +#+sb-thread (sb-kernel:run-pending-finalizers)<br>
> +(with-test (:name :xset-stable-hash-weakness)<br>
> + ;; After running the :MEMBER-TYPE-HASH-MIXER test, there were >5000 \
entries<br> > + ;; in the *XSET-STABLE-HASHES* table for me.<br>
> + ;; The preceding GC should have had some effect.<br>
> + (assert (< (hash-table-count sb-kernel::*xset-stable-hashes*)<br>
> + 100)))<br>
><br>
> -----------------------------------------------------------------------<br>
><br>
><br>
> hooks/post-receive<br>
> --<br>
> SBCL<br>
><br>
><br>
> _______________________________________________<br>
> Sbcl-commits mailing list<br>
> <a href="mailto:Sbcl-commits@lists.sourceforge.net" \
target="_blank">Sbcl-commits@lists.sourceforge.net</a><br> > <a \
href="https://lists.sourceforge.net/lists/listinfo/sbcl-commits" rel="noreferrer" \
target="_blank">https://lists.sourceforge.net/lists/listinfo/sbcl-commits</a><br> \
</blockquote></div></div>
_______________________________________________
Sbcl-devel mailing list
Sbcl-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sbcl-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic