Se excedió el tiempo de espera de bloqueo de SQL Server al eliminar registros en un bucle

Encontré la respuesta:mi eliminación en bucle está en conflicto con el proceso de limpieza de fantasmas.

Siguiendo la sugerencia de Nicholas, agregué BEGIN TRANSACTION y un COMMIT . Envolví el bucle de eliminación en un BEGIN TRY / BEGIN CATCH . En el BEGIN CATCH , justo antes de un ROLLBACK , ejecuté sp_lock y sp_who2 . (Agregué los cambios de código en la pregunta anterior).

Cuando mi proceso se bloqueó, vi el siguiente resultado:

spid   dbid   ObjId       IndId  Type Resource                         Mode     Status
------ ------ ----------- ------ ---- -------------------------------- -------- ------
20     2      1401108082  0      TAB                                   IX       GRANT
20     2      1401108082  1      PAG  1:102368                         X        GRANT

SPID  Status     Login HostName BlkBy DBName Command       CPUTime DiskIO
----  ---------- ----- -------- ----- ------ ------------- ------- ------
20    BACKGROUND sa    .        .     tempdb GHOST CLEANUP 31      0

Para referencia futura, cuando SQL Server elimina registros, establece un bit en ellos para marcarlos simplemente como "registros fantasma". Cada pocos minutos, se ejecuta un proceso interno llamado limpieza fantasma para recuperar páginas de registros que se han eliminado por completo (es decir, todos los registros son registros fantasma).

El proceso de limpieza de fantasmas se analizó en ServerFault en esta pregunta.

Aquí está Paul La explicación de S. Randal sobre el proceso de limpieza de fantasmas.

Es posible deshabilitar el proceso de limpieza fantasma con una marca de rastreo. Pero no tenía que hacerlo en este caso.

Terminé agregando un tiempo de espera de bloqueo de 100 ms. Esto provoca tiempos de espera de bloqueo ocasionales en el proceso de limpieza de registros fantasma, pero eso es aceptable. También agregué un bucle nuestro que vuelve a intentar bloquear los tiempos de espera hasta 5 veces. Con estos dos cambios, mi proceso ahora generalmente se completa. Ahora solo obtiene un tiempo de espera si hay un proceso muy largo que transfiere muchos datos que adquiere bloqueos de tabla o página en los datos que mi proceso necesita limpiar.

EDITAR 2016-07-20

El código final se ve así:

-- Do not block long if records are locked.
SET LOCK_TIMEOUT 100

-- This process volunteers to be a deadlock victim in the case of a deadlock.
SET DEADLOCK_PRIORITY LOW

DECLARE @Error BIT
SET @Error = 0

DECLARE @ErrMsg VARCHAR(1000)
DECLARE @DeletedCount INT
SELECT @DeletedCount = 0

DECLARE @LockTimeoutCount INT
SET @LockTimeoutCount = 0

DECLARE @ContinueDeleting BIT,
    @LastDeleteSuccessful BIT

SET @ContinueDeleting = 1
SET @LastDeleteSuccessful = 1

WHILE @ContinueDeleting = 1
BEGIN
    DECLARE @RowCount INT
    SET @RowCount = 0

    BEGIN TRY

        BEGIN TRANSACTION

        -- The READPAST below attempts to skip over locked records.
        -- However, it might still cause a lock wait error (1222) if a page or index is locked, because the delete has to modify indexes.
        -- The threshold for row lock escalation to table locks is around 5,000 records,
        -- so keep the deleted number smaller than this limit in case we are deleting a large chunk of data.
        -- Table name, field, and value are all set dynamically in the actual script.
        SET @SQL = N'DELETE TOP (1000) MyTable WITH(ROWLOCK, READPAST) WHERE MyField = SomeValue' 
        EXEC sp_executesql @SQL, N'@ProcGuid uniqueidentifier', @ProcGUID

        SET @RowCount = @@ROWCOUNT

        COMMIT

        SET @LastDeleteSuccessful = 1

        SET @DeletedCount = @DeletedCount + @RowCount
        IF @RowCount = 0
        BEGIN
            SET @ContinueDeleting = 0
        END

    END TRY
    BEGIN CATCH

        IF @@TRANCOUNT > 0
            ROLLBACK

        IF Error_Number() = 1222 -- Lock timeout
        BEGIN

            IF @LastDeleteSuccessful = 1
            BEGIN
                -- If we hit a lock timeout, and we had already deleted something successfully, try again.
                SET @LastDeleteSuccessful = 0
            END
            ELSE
            BEGIN
                -- The last delete failed, too.  Give up for now.  The job will run again shortly.
                SET @ContinueDeleting = 0
            END
        END
        ELSE -- On anything other than a lock timeout, report an error.
        BEGIN       
            SET @ErrMsg = 'An error occurred cleaning up data.  Table: MyTable Column: MyColumn Value: SomeValue.  Message: ' + ERROR_MESSAGE() + ' Error Number: ' + CONVERT(VARCHAR(20), ERROR_NUMBER()) + ' Line: ' + CONVERT(VARCHAR(20), ERROR_LINE())
            PRINT @ErrMsg -- this error message will be included in the SQL Server job history
            SET @Error = 1
            SET @ContinueDeleting = 0
        END

    END CATCH

END

IF @Error <> 0
    RAISERROR('Not all data could be cleaned up.  See previous messages.', 16, 1)