Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-7127

Broken InboundTransferTask may lose segments on restart

    XMLWordPrintable

Details

    Description

      Found this in a failed StateTransferRestartTest log:

      10:33:56,560 TRACE (transport-thread-Test-NodeC-p46436-t3:[Topology-___defaultcache]) [CacheTopology] Current consistent hash's routing table: 0: 0 1, 1: 0 1, 2: 0 1, 3: 0 1, 4: 0 1, 5: 1 0, 6: 1 0, 7: 0 1, 8: 1 0, 9: 0 1, 10: 0 1, 11: 0 1, 12: 1 0, 13: 1 0, 14: 0 1, 15: 1 0, 16: 0 1, 17: 0 1, 18: 1 0, 19: 1 0, 20: 0 1, 21: 0 1, 22: 1 0, 23: 0 1, 24: 1 0, 25: 1 0, 26: 0 1, 27: 0 1, 28: 0 1, 29: 1 0, 30: 0 1, 31: 0 1, 32: 1 0, 33: 1 0, 34: 1 0, 35: 0 1, 36: 0 1, 37: 1 0, 38: 1 0, 39: 0 1, 40: 0 1, 41: 1 0, 42: 0 1, 43: 0 1, 44: 1 0, 45: 1 0, 46: 1 0, 47: 1 0, 48: 0 1, 49: 0 1, 50: 0 1, 51: 1 0, 52: 1 0, 53: 0 1, 54: 1 0, 55: 1 0, 56: 0 1, 57: 0 1, 58: 1 0, 59: 0 1, 60: 1 0, 61: 0 1, 62: 0 1, 63: 0 1, 64: 1 0, 65: 1 0, 66: 1 0, 67: 1 0, 68: 1 0, 69: 0 1, 70: 1 0, 71: 0 1, 72: 0 1, 73: 0 1, 74: 1 0, 75: 1 0, 76: 0 1, 77: 0 1, 78: 1 0, 79: 1 0, 80: 1 0, 81: 1 0, 82: 0 1, 83: 0 1, 84: 0 1, 85: 0 1, 86: 0 1, 87: 1 0, 88: 0 1, 89: 1 0, 90: 0 1, 91: 1 0, 92: 0 1, 93: 0 1, 94: 1 0, 95: 0 1, 96: 1 0, 97: 1 0, 98: 0 1, 99: 1 0, 100: 0 1, 101: 1 0, 102: 0 1, 103: 0 1, 104: 0 1, 105: 0 1, 106: 0 1, 107: 0 1, 108: 1 0, 109: 1 0, 110: 1 0, 111: 1 0, 112: 0 1, 113: 1 0, 114: 0 1, 115: 0 1, 116: 1 0, 117: 0 1, 118: 0 1, 119: 1 0, 120: 0 1, 121: 1 0, 122: 0 1, 123: 1 0, 124: 0 1, 125: 1 0, 126: 1 0, 127: 1 0, 128: 1 0, 129: 1 0, 130: 0 1, 131: 1 0, 132: 0 1, 133: 1 0, 134: 1 0, 135: 1 0, 136: 1 0, 137: 0 1, 138: 1 0, 139: 0 1, 140: 0 1, 141: 1 0, 142: 1 0, 143: 1 0, 144: 1 0, 145: 0 1, 146: 1 0, 147: 0 1, 148: 0 1, 149: 0 1, 150: 0 1, 151: 1 0, 152: 1 0, 153: 0 1, 154: 1 0, 155: 1 0, 156: 1 0, 157: 0 1, 158: 0 1, 159: 0 1, 160: 0 1, 161: 1 0, 162: 1 0, 163: 1 0, 164: 0 1, 165: 1 0, 166: 1 0, 167: 1 0, 168: 1 0, 169: 0 1, 170: 1 0, 171: 1 0, 172: 0 1, 173: 0 1, 174: 1 0, 175: 1 0, 176: 0 1, 177: 1 0, 178: 1 0, 179: 0 1, 180: 1 0, 181: 1 0, 182: 0 1, 183: 1 0, 184: 0 1, 185: 1 0, 186: 0 1, 187: 0 1, 188: 1 0, 189: 0 1, 190: 0 1, 191: 1 0, 192: 1 0, 193: 1 0, 194: 1 0, 195: 1 0, 196: 0 1, 197: 1 0, 198: 0 1, 199: 0 1, 200: 1 0, 201: 1 0, 202: 0 1, 203: 1 0, 204: 1 0, 205: 1 0, 206: 1 0, 207: 1 0, 208: 0 1, 209: 1 0, 210: 1 0, 211: 0 1, 212: 0 1, 213: 1 0, 214: 1 0, 215: 1 0, 216: 0 1, 217: 0 1, 218: 0 1, 219: 0 1, 220: 1 0, 221: 1 0, 222: 1 0, 223: 0 1, 224: 0 1, 225: 1 0, 226: 0 1, 227: 1 0, 228: 0 1, 229: 1 0, 230: 0 1, 231: 0 1, 232: 0 1, 233: 0 1, 234: 0 1, 235: 1 0, 236: 1 0, 237: 1 0, 238: 1 0, 239: 0 1, 240: 1 0, 241: 0 1, 242: 0 1, 243: 0 1, 244: 0 1, 245: 0 1, 246: 1 0, 247: 0 1, 248: 1 0, 249: 0 1, 250: 0 1, 251: 1 0, 252: 1 0, 253: 1 0, 254: 0 1, 255: 0 1
      10:33:56,562 TRACE (transport-thread-Test-NodeC-p46436-t3:[Topology-___defaultcache]) [CacheTopology] Pending consistent hash's routing table: 0: 0 1, 1: 0 1, 2: 0 1, 3: 0 1, 4: 0 1, 5: 1 2, 6: 2 1, 7: 2 1, 8: 1 0, 9: 0 2, 10: 2 1, 11: 2 1, 12: 1 2, 13: 2 0, 14: 0 1, 15: 1 0, 16: 0 1, 17: 0 1, 18: 1 2, 19: 1 2, 20: 2 0, 21: 0 1, 22: 1 0, 23: 0 2, 24: 2 0, 25: 2 0, 26: 0 2, 27: 0 2, 28: 2 1, 29: 1 0, 30: 0 1, 31: 0 1, 32: 1 2, 33: 1 2, 34: 2 0, 35: 0 2, 36: 0 2, 37: 2 0, 38: 2 0, 39: 0 2, 40: 2 1, 41: 1 0, 42: 0 2, 43: 0 2, 44: 2 1, 45: 2 1, 46: 1 0, 47: 1 0, 48: 0 2, 49: 0 2, 50: 0 2, 51: 2 1, 52: 1 0, 53: 0 2, 54: 2 1, 55: 1 2, 56: 2 0, 57: 0 1, 58: 1 2, 59: 2 1, 60: 1 0, 61: 0 2, 62: 0 2, 63: 0 2, 64: 2 1, 65: 1 0, 66: 1 0, 67: 1 0, 68: 1 0, 69: 0 2, 70: 2 0, 71: 0 2, 72: 0 2, 73: 2 1, 74: 1 2, 75: 2 0, 76: 0 2, 77: 0 2, 78: 2 1, 79: 1 2, 80: 1 2, 81: 2 1, 82: 2 1, 83: 2 1, 84: 2 1, 85: 2 1, 86: 2 1, 87: 1 0, 88: 0 1, 89: 1 0, 90: 0 2, 91: 2 0, 92: 0 2, 93: 2 1, 94: 1 0, 95: 0 1, 96: 1 2, 97: 2 0, 98: 0 2, 99: 2 0, 100: 2 0, 101: 2 0, 102: 0 2, 103: 0 2, 104: 2 0, 105: 0 2, 106: 2 1, 107: 2 1, 108: 1 2, 109: 1 2, 110: 1 2, 111: 1 2, 112: 2 1, 113: 1 0, 114: 0 2, 115: 2 1, 116: 1 0, 117: 0 1, 118: 0 1, 119: 1 2, 120: 2 1, 121: 1 2, 122: 2 1, 123: 1 0, 124: 0 1, 125: 1 2, 126: 1 2, 127: 1 2, 128: 2 1, 129: 1 2, 130: 2 0, 131: 2 0, 132: 0 1, 133: 1 2, 134: 2 0, 135: 2 0, 136: 2 0, 137: 2 0, 138: 2 0, 139: 0 1, 140: 0 1, 141: 1 0, 142: 1 0, 143: 1 0, 144: 1 0, 145: 0 2, 146: 2 0, 147: 0 2, 148: 0 2, 149: 0 2, 150: 2 1, 151: 1 2, 152: 1 2, 153: 2 1, 154: 1 0, 155: 1 0, 156: 1 0, 157: 0 2, 158: 0 2, 159: 0 2, 160: 2 1, 161: 1 2, 162: 1 2, 163: 2 0, 164: 0 1, 165: 1 2, 166: 1 2, 167: 1 2, 168: 2 0, 169: 0 2, 170: 2 1, 171: 1 0, 172: 0 2, 173: 2 1, 174: 1 0, 175: 1 0, 176: 0 1, 177: 1 0, 178: 1 0, 179: 0 1, 180: 1 2, 181: 2 0, 182: 0 2, 183: 2 0, 184: 0 2, 185: 2 0, 186: 2 0, 187: 0 2, 188: 2 0, 189: 0 2, 190: 2 1, 191: 1 2, 192: 2 1, 193: 1 0, 194: 1 0, 195: 1 0, 196: 0 1, 197: 1 2, 198: 2 0, 199: 0 1, 200: 1 0, 201: 1 0, 202: 0 1, 203: 1 0, 204: 1 0, 205: 1 0, 206: 1 0, 207: 1 0, 208: 0 1, 209: 1 0, 210: 1 0, 211: 0 2, 212: 2 0, 213: 2 0, 214: 2 0, 215: 2 0, 216: 2 0, 217: 2 0, 218: 0 1, 219: 0 1, 220: 1 2, 221: 1 2, 222: 1 2, 223: 2 0, 224: 0 2, 225: 2 0, 226: 2 0, 227: 2 0, 228: 2 0, 229: 2 0, 230: 2 0, 231: 0 2, 232: 2 0, 233: 2 0, 234: 0 2, 235: 2 1, 236: 2 1, 237: 1 0, 238: 1 0, 239: 0 1, 240: 1 0, 241: 0 2, 242: 0 2, 243: 0 2, 244: 0 2, 245: 2 1, 246: 1 0, 247: 0 1, 248: 1 0, 249: 0 2, 250: 0 2, 251: 2 1, 252: 1 0, 253: 1 0, 254: 0 2, 255: 2 0
      10:33:56,586 TRACE (stateTransferExecutor-thread-Test-NodeC-p46440-t1:[StateRequest-___defaultcache]) [InboundTransferTask] Requesting segments [128, 129, 131, 5, 133, 6, 134, 135, 136, 138, 12, 13, 18, 146, 19, 151, 24, 152, 25, 32, 33, 161, 34, 162, 163, 37, 165, 38, 166, 167, 168, 170, 44, 45, 51, 180, 181, 54, 55, 183, 185, 58, 188, 191, 64, 192, 197, 70, 74, 75, 78, 79, 80, 81, 213, 214, 215, 91, 220, 221, 222, 96, 97, 225, 99, 227, 101, 229, 235, 108, 236, 109, 110, 111, 119, 121, 251, 125, 126, 127] of cache ___defaultcache from node Test-NodeB-26514
      10:34:01,027 WARN  (stateTransferExecutor-thread-Test-NodeC-p46440-t1:[StateRequest-___defaultcache]) [InboundTransferTask] ISPN000210: Failed to request segments [128, 129, 131, 5, 133, 6, 134, 135, 136, 138, 12, 13, 18, 146, 19, 151, 24, 152, 25, 32, 33, 161, 34, 162, 163, 37, 165, 38, 166, 167, 168, 170, 44, 45, 51, 180, 181, 54, 55, 183, 185, 58, 188, 191, 64, 192, 197, 70, 74, 75, 78, 79, 80, 81, 213, 214, 215, 91, 220, 221, 222, 96, 97, 225, 99, 227, 101, 229, 235, 108, 236, 109, 110, 111, 119, 121, 251, 125, 126, 127] of cache ___defaultcache from node Test-NodeB-26514 (node will not be retried)
      org.infinispan.commons.CacheException: CacheNotFoundResponse
      10:34:01,031 TRACE (transport-thread-Test-NodeC-p46436-t5:[Topology-___defaultcache]) [CacheTopology] Current consistent hash's routing table: 0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0, 26: 0, 27: 0, 28: 0, 29: 0, 30: 0, 31: 0, 32: 0, 33: 0, 34: 0, 35: 0, 36: 0, 37: 0, 38: 0, 39: 0, 40: 0, 41: 0, 42: 0, 43: 0, 44: 0, 45: 0, 46: 0, 47: 0, 48: 0, 49: 0, 50: 0, 51: 0, 52: 0, 53: 0, 54: 0, 55: 0, 56: 0, 57: 0, 58: 0, 59: 0, 60: 0, 61: 0, 62: 0, 63: 0, 64: 0, 65: 0, 66: 0, 67: 0, 68: 0, 69: 0, 70: 0, 71: 0, 72: 0, 73: 0, 74: 0, 75: 0, 76: 0, 77: 0, 78: 0, 79: 0, 80: 0, 81: 0, 82: 0, 83: 0, 84: 0, 85: 0, 86: 0, 87: 0, 88: 0, 89: 0, 90: 0, 91: 0, 92: 0, 93: 0, 94: 0, 95: 0, 96: 0, 97: 0, 98: 0, 99: 0, 100: 0, 101: 0, 102: 0, 103: 0, 104: 0, 105: 0, 106: 0, 107: 0, 108: 0, 109: 0, 110: 0, 111: 0, 112: 0, 113: 0, 114: 0, 115: 0, 116: 0, 117: 0, 118: 0, 119: 0, 120: 0, 121: 0, 122: 0, 123: 0, 124: 0, 125: 0, 126: 0, 127: 0, 128: 0, 129: 0, 130: 0, 131: 0, 132: 0, 133: 0, 134: 0, 135: 0, 136: 0, 137: 0, 138: 0, 139: 0, 140: 0, 141: 0, 142: 0, 143: 0, 144: 0, 145: 0, 146: 0, 147: 0, 148: 0, 149: 0, 150: 0, 151: 0, 152: 0, 153: 0, 154: 0, 155: 0, 156: 0, 157: 0, 158: 0, 159: 0, 160: 0, 161: 0, 162: 0, 163: 0, 164: 0, 165: 0, 166: 0, 167: 0, 168: 0, 169: 0, 170: 0, 171: 0, 172: 0, 173: 0, 174: 0, 175: 0, 176: 0, 177: 0, 178: 0, 179: 0, 180: 0, 181: 0, 182: 0, 183: 0, 184: 0, 185: 0, 186: 0, 187: 0, 188: 0, 189: 0, 190: 0, 191: 0, 192: 0, 193: 0, 194: 0, 195: 0, 196: 0, 197: 0, 198: 0, 199: 0, 200: 0, 201: 0, 202: 0, 203: 0, 204: 0, 205: 0, 206: 0, 207: 0, 208: 0, 209: 0, 210: 0, 211: 0, 212: 0, 213: 0, 214: 0, 215: 0, 216: 0, 217: 0, 218: 0, 219: 0, 220: 0, 221: 0, 222: 0, 223: 0, 224: 0, 225: 0, 226: 0, 227: 0, 228: 0, 229: 0, 230: 0, 231: 0, 232: 0, 233: 0, 234: 0, 235: 0, 236: 0, 237: 0, 238: 0, 239: 0, 240: 0, 241: 0, 242: 0, 243: 0, 244: 0, 245: 0, 246: 0, 247: 0, 248: 0, 249: 0, 250: 0, 251: 0, 252: 0, 253: 0, 254: 0, 255: 0
      10:34:01,032 TRACE (transport-thread-Test-NodeC-p46436-t5:[Topology-___defaultcache]) [CacheTopology] Pending consistent hash's routing table: 0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 1, 6: 1, 7: 1, 8: 0, 9: 0 1, 10: 1, 11: 1, 12: 1, 13: 1 0, 14: 0, 15: 0, 16: 0, 17: 0, 18: 1, 19: 1, 20: 1 0, 21: 0, 22: 0, 23: 0 1, 24: 1 0, 25: 1 0, 26: 0 1, 27: 0 1, 28: 1, 29: 0, 30: 0, 31: 0, 32: 1, 33: 1, 34: 1 0, 35: 0 1, 36: 0 1, 37: 1 0, 38: 1 0, 39: 0 1, 40: 1, 41: 0, 42: 0 1, 43: 0 1, 44: 1, 45: 1, 46: 0, 47: 0, 48: 0 1, 49: 0 1, 50: 0 1, 51: 1, 52: 0, 53: 0 1, 54: 1, 55: 1, 56: 1 0, 57: 0, 58: 1, 59: 1, 60: 0, 61: 0 1, 62: 0 1, 63: 0 1, 64: 1, 65: 0, 66: 0, 67: 0, 68: 0, 69: 0 1, 70: 1 0, 71: 0 1, 72: 0 1, 73: 1, 74: 1, 75: 1 0, 76: 0 1, 77: 0 1, 78: 1, 79: 1, 80: 1, 81: 1, 82: 1, 83: 1, 84: 1, 85: 1, 86: 1, 87: 0, 88: 0, 89: 0, 90: 0 1, 91: 1 0, 92: 0 1, 93: 1, 94: 0, 95: 0, 96: 1, 97: 1 0, 98: 0 1, 99: 1 0, 100: 1 0, 101: 1 0, 102: 0 1, 103: 0 1, 104: 1 0, 105: 0 1, 106: 1, 107: 1, 108: 1, 109: 1, 110: 1, 111: 1, 112: 1, 113: 0, 114: 0 1, 115: 1, 116: 0, 117: 0, 118: 0, 119: 1, 120: 1, 121: 1, 122: 1, 123: 0, 124: 0, 125: 1, 126: 1, 127: 1, 128: 1, 129: 1, 130: 1 0, 131: 1 0, 132: 0, 133: 1, 134: 1 0, 135: 1 0, 136: 1 0, 137: 1 0, 138: 1 0, 139: 0, 140: 0, 141: 0, 142: 0, 143: 0, 144: 0, 145: 0 1, 146: 1 0, 147: 0 1, 148: 0 1, 149: 0 1, 150: 1, 151: 1, 152: 1, 153: 1, 154: 0, 155: 0, 156: 0, 157: 0 1, 158: 0 1, 159: 0 1, 160: 1, 161: 1, 162: 1, 163: 1 0, 164: 0, 165: 1, 166: 1, 167: 1, 168: 1 0, 169: 0 1, 170: 1, 171: 0, 172: 0 1, 173: 1, 174: 0, 175: 0, 176: 0, 177: 0, 178: 0, 179: 0, 180: 1, 181: 1 0, 182: 0 1, 183: 1 0, 184: 0 1, 185: 1 0, 186: 1 0, 187: 0 1, 188: 1 0, 189: 0 1, 190: 1, 191: 1, 192: 1, 193: 0, 194: 0, 195: 0, 196: 0, 197: 1, 198: 1 0, 199: 0, 200: 0, 201: 0, 202: 0, 203: 0, 204: 0, 205: 0, 206: 0, 207: 0, 208: 0, 209: 0, 210: 0, 211: 0 1, 212: 1 0, 213: 1 0, 214: 1 0, 215: 1 0, 216: 1 0, 217: 1 0, 218: 0, 219: 0, 220: 1, 221: 1, 222: 1, 223: 1 0, 224: 0 1, 225: 1 0, 226: 1 0, 227: 1 0, 228: 1 0, 229: 1 0, 230: 1 0, 231: 0 1, 232: 1 0, 233: 1 0, 234: 0 1, 235: 1, 236: 1, 237: 0, 238: 0, 239: 0, 240: 0, 241: 0 1, 242: 0 1, 243: 0 1, 244: 0 1, 245: 1, 246: 0, 247: 0, 248: 0, 249: 0 1, 250: 0 1, 251: 1, 252: 0, 253: 0, 254: 0 1, 255: 1 010:34:01,034 TRACE (stateTransferExecutor-thread-Test-NodeC-p46440-t1:[StateRequest-___defaultcache]) [StateConsumerImpl] Retrying failed task: InboundTransferTask{segments=[128, 129, 131, 5, 133, 6, 134, 135, 136, 138, 12, 13, 18, 146, 19, 151, 24, 152, 25, 32, 33, 161, 34, 162, 163, 37, 165, 38, 166, 167, 168, 170, 44, 45, 51, 180, 181, 54, 55, 183, 185, 58, 188, 191, 64, 192, 197, 70, 74, 75, 78, 79, 80, 81, 213, 214, 215, 91, 220, 221, 222, 96, 97, 225, 99, 227, 101, 229, 235, 108, 236, 109, 110, 111, 119, 121, 251, 125, 126, 127], finishedSegments=[], unfinishedSegments=[128, 129, 131, 5, 133, 6, 134, 135, 136, 138, 12, 13, 18, 146, 19, 151, 24, 152, 25, 32, 33, 161, 34, 162, 163, 37, 165, 38, 166, 167, 168, 170, 44, 45, 51, 180, 181, 54, 55, 183, 185, 58, 188, 191, 64, 192, 197, 70, 74, 75, 78, 79, 80, 81, 213, 214, 215, 91, 220, 221, 222, 96, 97, 225, 99, 227, 101, 229, 235, 108, 236, 109, 110, 111, 119, 121, 251, 125, 126, 127], source=Test-NodeB-26514, isCancelled=false, completionFuture=java.util.concurrent.CompletableFuture@4a6bf15e[Completed exceptionally], topologyId=4, timeout=20000, cacheName=___defaultcache}
      10:34:01,037 TRACE (transport-thread-Test-NodeC-p46436-t5:[Topology-___defaultcache]) [StateConsumerImpl] Received new topology for cache ___defaultcache, isRebalance = false, isMember = true, topology = CacheTopology{id=5, rebalanceId=3, currentCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (1)[Test-NodeA-5117: 256+0]}, pendingCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (2)[Test-NodeA-5117: 133+48, Test-NodeC-8198: 123+52]}, unionCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (2)[Test-NodeA-5117: 256+0, Test-NodeC-8198: 0+175]}, actualMembers=[Test-NodeA-5117, Test-NodeC-8198], persistentUUIDs=[7efdf21a-a495-44f1-a9a8-d2278d6ba43b, cb5f0659-41fc-4c75-a982-85ba98a9c992]}
      10:34:01,040 TRACE (stateTransferExecutor-thread-Test-NodeC-p46440-t1:[StateRequest-___defaultcache]) [InboundTransferTask] Requesting segments [130, 7, 9, 137, 10, 11, 145, 147, 20, 148, 149, 150, 23, 153, 26, 27, 28, 157, 158, 159, 160, 35, 36, 39, 40, 169, 42, 43, 172, 173, 48, 49, 50, 53, 182, 56, 184, 186, 59, 187, 61, 189, 62, 190, 63, 69, 198, 71, 72, 73, 76, 77, 82, 83, 211, 84, 212, 85, 86, 216, 217, 90, 92, 93, 223, 224, 98, 226, 100, 228, 102, 230, 103, 231, 104, 232, 105, 233, 106, 234, 107, 112, 241, 114, 242, 115, 243, 244, 245, 120, 249, 122, 250, 254, 255] of cache ___defaultcache from node Test-NodeA-5117
      

      NodeB stopping, which made the transfer of segments 128, 129, ... on NodeC fail. Thread stateTransferExecutor-thread-Test-NodeC-p46440-t1 started retrying the transfer, but instead of requesting the same segments, it requested segments 130, 7, ... – the new segments that it became an owner of in topology 5.

      As a side node, we should also avoid NodeB becoming the only owner in topology 5's pending CH, as it's going to have to send it back to NodeA soon anyway.

      Attachments

        Activity

          People

            dberinde@redhat.com Dan Berindei
            dberinde@redhat.com Dan Berindei
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: