hive任务优化
hive任务优化
目录
错误背景
错误信息定位
client端日志
APPlication日志
map和reduce单个错误日志
错误分析
解决方案
1. 取消虚拟内存的检查(不建议):
2.增大 或者 mapreduce. (建议)
.适当增大 vmem-pmem-ratio的大小
4.换成sparkSQL任务(的一比,强烈推荐)
小结
错误背景
大概是job运行超过了map和reduce设置的内存大小,导致任务失败 ,就是写了一个hql语句运行在大数据平台上面,发现报错了。
错误信息定位
IFO : converting to local hdfs://hacluster/tenant/yxs/product/resources/resources/jar/fc06465-4af1-4756-894e-ce74ec11b9c.jar
IFO : Added [/opt/huawei/Bigdata/tmp/hivelocaltmp/session_resources/2d0a2efc-776c-4ccc-957d-927079862ab2_resources/fc06465-4af1-4756-894e-ce74ec11b9c.jar] to class path
IFO : Added resources: [hdfs://hacluster/tenant/yxs/product/resources/resources/jar/fc06465-4af1-4756-894e-ce74ec11b9c.jar]
IFO : umber of reduce tasks not specified. Estimated from input data size: 2
IFO : In order to change the average load for a reducer (in bytes):
IFO : set reducers.bytes.per.reducer=<number>
IFO : In order to limit the maximum number of reducers:
IFO : set =<number>
IFO : In order to set a ctant number of reducers:
IFO : set mapreduce.job.reduces=<number>
IFO : number of splits:10
IFO : Submitting tokens for job: job_1567609664100_85580
IFO : Kind: HDFS_DELEGATIO_TOKE, Service: ha-hdfs:hacluster
IFO : Kind: HIVE_DELEGATIO_TOKE, Service: HiveServer2ImpersonationToken
IFO : The url to track the job: https://yiclouddata0-szzb:26001/proxy/application_1567609664100_85580/
IFO : Starting Job = job_1567609664100_85580, Tracking URL = https://yiclouddata0-szzb:26001/proxy/application_1567609664100_85580/
IFO : Kill Command = /opt/huawei/Bigdata/FusionInsight_HD_V100R002C80SPC20/install/FusionInsight-Hive-1..0/hive-1..0/bin/..//../hadoop/bin/hadoop job -kill job_1567609664100_85580
IFO : Hadoop job information for Stage-6: number of mappers: 10; number of reducers: 2
IFO : 2019-09-24 16:16:17,686 Stage-6 map = 0%, reduce = 0%
IFO : 2019-09-24 16:16:27,299 Stage-6 map = 20%, reduce = 0%, Cumulative CPU 10.12 sec
IFO : 2019-09-24 16:16:28,474 Stage-6 map = 0%, reduce = 0%, Cumulative CPU 0.4 sec
IFO : 2019-09-24 16:16:29,664 Stage-6 map = 70%, reduce = 0%, Cumulative CPU 8.44 sec
IFO : 2019-09-24 16:16:0,841 Stage-6 map = 90%, reduce = 0%, Cumulative CPU 115.79 sec
IFO : 2019-09-24 16:16:2,004 Stage-6 map = 91%, reduce = 0%, Cumulative CPU 14.7 sec
IFO : 2019-09-24 16:16:44,928 Stage-6 map = 92%, reduce = 0%, Cumulative CPU 22.25 sec
IFO : 2019-09-24 16:16:55,61 Stage-6 map = 9%, reduce = 0%, Cumulative CPU 284.27 sec
IFO : 2019-09-24 16:17:0,797 Stage-6 map = 94%, reduce = 0%, Cumulative CPU 1.69 sec
IFO : 2019-09-24 16:17:11,881 Stage-6 map = 90%, reduce = 0%, Cumulative CPU 115.79 sec
IFO : 2019-09-24 16:18:12,546 Stage-6 map = 90%, reduce = 0%, Cumulative CPU 115.79 sec
IFO : 2019-09-24 16:19:04,47 Stage-6 map = 91%, reduce = 0%, Cumulative CPU 185.47 sec
IFO : 2019-09-24 16:19:1,68 Stage-6 map = 92%, reduce = 0%, Cumulative CPU 22.5 sec
IFO : 2019-09-24 16:19:22,825 Stage-6 map = 9%, reduce = 0%, Cumulative CPU 281.97 sec
IFO : 2019-09-24 16:19:2,05 Stage-6 map = 94%, reduce = 0%, Cumulative CPU 14.97 sec
IFO : 2019-09-24 16:19:54,14 Stage-6 map = 95%, reduce = 0%, Cumulative CPU 77.6 sec
IFO : 2019-09-24 16:19:56,520 Stage-6 map = 90%, reduce = 0%, Cumulative CPU 115.79 sec
IFO : 2019-09-24 16:20:09,8 Stage-6 map = 91%, reduce = 0%, Cumulative CPU 181.59 sec
IFO : 2019-09-24 16:20:18,574 Stage-6 map = 92%, reduce = 0%, Cumulative CPU 217.27 sec
IFO : 2019-09-24 16:20:27,772 Stage-6 map = 9%, reduce = 0%, Cumulative CPU 266.25 sec
IFO : 2019-09-24 16:20:40,49 Stage-6 map = 94%, reduce = 0%, Cumulative CPU 05.2 sec
IFO : 2019-09-24 16:20:57,751 Stage-6 map = 90%, reduce = 0%, Cumulative CPU 115.79 sec
IFO : 2019-09-24 16:21:11,624 Stage-6 map = 91%, reduce = 0%, Cumulative CPU 18.87 sec
IFO : 2019-09-24 16:21:20,948 Stage-6 map = 92%, reduce = 0%, Cumulative CPU 219.12 sec
IFO : 2019-09-24 16:21:1,427 Stage-6 map = 9%, reduce = 0%, Cumulative CPU 282.71 sec
IFO : 2019-09-24 16:21:9,754 Stage-6 map = 94%, reduce = 0%, Cumulative CPU 17.99 sec
IFO : 2019-09-24 16:21:45,519 Stage-6 map = 100%, reduce = 100%, Cumulative CPU 115.79 sec
IFO : MapReduce Total cumulative CPU time: 1 minutes 55 seconds 790 msec
ERROR : Ended Job = job_1567609664100_85580 with errors
任务-T_626089799950704_20190924161555945_1_1 运行失败,失败原因:java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.MapRedTaskat org.apache.hive.jdbc.(HiveStatement.java:28)at org.apache.hive.jdbc.Query(HiveStatement.java:79)at com.dtwave.dipper.runner.impl.Hive2TaskRunner.doRun(Hive2TaskRunner.java:244)at com.dtwave.dipper.runner.(BasicTaskRunner.java:100)at com.dtwave.dipper.TaskExecutor.run(TaskExecutor.java:2)at java.Executors$(Executors.java:511)at java.FutureTask.run(FutureTask.java:266)at java.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)任务运行失败(Failed)
看完错误是不是一脸懵逼,两眼茫然...怀疑人生,哈哈...
看这个能看出啥错误呀,需要去yarn里面看application任务运行日志如下所示:
2019-09-24 16:16:27,712 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.JobImpl: um completed Tasks:
2019-09-24 16:16:27,712 IFO [ContainerLauncher #2] org.apache.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: COTAIER_REMOTE_CLEAUP for container container_e29_1567609664100_85580_01_000011 taskAttempt attempt_1567609664100_85580_m_000009_0
2019-09-24 16:16:27,71 IFO [ContainerLauncher #2] org.apache.v2.app.launcher.ContainerLauncherImpl: KILLIG attempt_1567609664100_85580_m_000009_0
2019-09-24 16:16:27,71 IFO [ContainerLauncher #2] org.apache.hadoop.api.impl.ContainerManagementProtocolProxy: Opening proxy : yiclouddata04-SZZB:26009
2019-09-24 16:16:27,997 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:10 AssignedReds:0 CompletedMaps: CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:16:28,005 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000009
2019-09-24 16:16:28,006 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000011
2019-09-24 16:16:28,006 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_00000
2019-09-24 16:16:28,006 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:125952, vCores:6>
2019-09-24 16:16:28,006 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:16:28,006 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:7 AssignedReds:0 CompletedMaps: CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:16:28,006 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000008_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:28,006 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000009_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:28,006 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000007_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:28,557 IFO [IPC Server handler 7 on 27102] org.apache.TaskAttemptListenerImpl: Done acknowledgement from attempt_1567609664100_85580_m_000006_0
2019-09-24 16:16:28,558 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Task Attempt attempt_1567609664100_85580_m_000006_0 finished. Firing COTAIER_AVAILABLE_FOR_REUSE event to ContainerAllocator
2019-09-24 16:16:28,558 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: attempt_1567609664100_85580_m_000006_0 TaskAttempt Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:28,558 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1567609664100_85580_m_000006_0
2019-09-24 16:16:28,558 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: task_1567609664100_85580_m_000006 Task Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:28,559 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.JobImpl: um completed Tasks: 4
2019-09-24 16:16:28,560 IFO [ContainerLauncher #5] org.apache.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: COTAIER_REMOTE_CLEAUP for container container_e29_1567609664100_85580_01_000007 taskAttempt attempt_1567609664100_85580_m_000006_0
2019-09-24 16:16:28,560 IFO [ContainerLauncher #5] org.apache.v2.app.launcher.ContainerLauncherImpl: KILLIG attempt_1567609664100_85580_m_000006_0
2019-09-24 16:16:28,560 IFO [ContainerLauncher #5] org.apache.hadoop.api.impl.ContainerManagementProtocolProxy: Opening proxy : yiclouddata05-SZZB:26009
2019-09-24 16:16:28,851 IFO [IPC Server handler 10 on 27102] org.apache.TaskAttemptListenerImpl: Done acknowledgement from attempt_1567609664100_85580_m_000005_0
2019-09-24 16:16:28,852 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Task Attempt attempt_1567609664100_85580_m_000005_0 finished. Firing COTAIER_AVAILABLE_FOR_REUSE event to ContainerAllocator
2019-09-24 16:16:28,852 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: attempt_1567609664100_85580_m_000005_0 TaskAttempt Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:28,852 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1567609664100_85580_m_000005_0
2019-09-24 16:16:28,852 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: task_1567609664100_85580_m_000005 Task Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:28,85 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.JobImpl: um completed Tasks: 5
2019-09-24 16:16:28,856 IFO [ContainerLauncher #8] org.apache.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: COTAIER_REMOTE_CLEAUP for container container_e29_1567609664100_85580_01_000008 taskAttempt attempt_1567609664100_85580_m_000005_0
2019-09-24 16:16:28,856 IFO [ContainerLauncher #8] org.apache.v2.app.launcher.ContainerLauncherImpl: KILLIG attempt_1567609664100_85580_m_000005_0
2019-09-24 16:16:28,856 IFO [ContainerLauncher #8] org.apache.hadoop.api.impl.ContainerManagementProtocolProxy: Opening proxy : yiclouddata16-SZZB:26009
2019-09-24 16:16:28,986 IFO [IPC Server handler 16 on 27102] org.apache.TaskAttemptListenerImpl: Done acknowledgement from attempt_1567609664100_85580_m_000004_0
2019-09-24 16:16:28,987 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Task Attempt attempt_1567609664100_85580_m_000004_0 finished. Firing COTAIER_AVAILABLE_FOR_REUSE event to ContainerAllocator
2019-09-24 16:16:28,987 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: attempt_1567609664100_85580_m_000004_0 TaskAttempt Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:28,987 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1567609664100_85580_m_000004_0
2019-09-24 16:16:28,988 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: task_1567609664100_85580_m_000004 Task Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:28,989 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.JobImpl: um completed Tasks: 6
2019-09-24 16:16:28,989 IFO [ContainerLauncher #6] org.apache.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: COTAIER_REMOTE_CLEAUP for container container_e29_1567609664100_85580_01_000005 taskAttempt attempt_1567609664100_85580_m_000004_0
2019-09-24 16:16:28,990 IFO [ContainerLauncher #6] org.apache.v2.app.launcher.ContainerLauncherImpl: KILLIG attempt_1567609664100_85580_m_000004_0
2019-09-24 16:16:28,990 IFO [ContainerLauncher #6] org.apache.hadoop.api.impl.ContainerManagementProtocolProxy: Opening proxy : yiclouddata10-SZZB:26009
2019-09-24 16:16:29,006 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:7 AssignedReds:0 CompletedMaps:6 CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:16:29,008 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000008
2019-09-24 16:16:29,009 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000007
2019-09-24 16:16:29,009 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000005_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:29,009 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:10048, vCores:8>
2019-09-24 16:16:29,009 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:16:29,009 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000006_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:29,009 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5 AssignedReds:0 CompletedMaps:6 CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:16:29,582 IFO [IPC Server handler 12 on 27102] org.apache.TaskAttemptListenerImpl: Done acknowledgement from attempt_1567609664100_85580_m_000002_0
2019-09-24 16:16:29,584 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Task Attempt attempt_1567609664100_85580_m_000002_0 finished. Firing COTAIER_AVAILABLE_FOR_REUSE event to ContainerAllocator
2019-09-24 16:16:29,584 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: attempt_1567609664100_85580_m_000002_0 TaskAttempt Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:29,584 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1567609664100_85580_m_000002_0
2019-09-24 16:16:29,584 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: task_1567609664100_85580_m_000002 Task Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:29,584 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.JobImpl: um completed Tasks: 7
2019-09-24 16:16:29,585 IFO [ContainerLauncher #4] org.apache.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: COTAIER_REMOTE_CLEAUP for container container_e29_1567609664100_85580_01_000010 taskAttempt attempt_1567609664100_85580_m_000002_0
2019-09-24 16:16:29,586 IFO [ContainerLauncher #4] org.apache.v2.app.launcher.ContainerLauncherImpl: KILLIG attempt_1567609664100_85580_m_000002_0
2019-09-24 16:16:29,586 IFO [ContainerLauncher #4] org.apache.hadoop.api.impl.ContainerManagementProtocolProxy: Opening proxy : yiclouddata14-SZZB:26009
2019-09-24 16:16:0,009 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5 AssignedReds:0 CompletedMaps:7 CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:16:0,01 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000010
2019-09-24 16:16:0,01 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000005
2019-09-24 16:16:0,01 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000002_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:0,01 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:14144, vCores:10>
2019-09-24 16:16:0,01 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:16:0,01 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps: AssignedReds:0 CompletedMaps:7 CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:16:0,01 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000004_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:0,416 IFO [IPC Server handler 6 on 27102] org.apache.TaskAttemptListenerImpl: Done acknowledgement from attempt_1567609664100_85580_m_000001_0
2019-09-24 16:16:0,417 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Task Attempt attempt_1567609664100_85580_m_000001_0 finished. Firing COTAIER_AVAILABLE_FOR_REUSE event to ContainerAllocator
2019-09-24 16:16:0,417 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: attempt_1567609664100_85580_m_000001_0 TaskAttempt Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:0,417 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1567609664100_85580_m_000001_0
2019-09-24 16:16:0,418 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: task_1567609664100_85580_m_000001 Task Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:0,418 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.JobImpl: um completed Tasks: 8
2019-09-24 16:16:0,419 IFO [ContainerLauncher #] org.apache.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: COTAIER_REMOTE_CLEAUP for container container_e29_1567609664100_85580_01_000004 taskAttempt attempt_1567609664100_85580_m_000001_0
2019-09-24 16:16:0,419 IFO [ContainerLauncher #] org.apache.v2.app.launcher.ContainerLauncherImpl: KILLIG attempt_1567609664100_85580_m_000001_0
2019-09-24 16:16:0,419 IFO [ContainerLauncher #] org.apache.hadoop.api.impl.ContainerManagementProtocolProxy: Opening proxy : yiclouddata12-SZZB:26009
2019-09-24 16:16:0,440 IFO [IPC Server handler 7 on 27102] org.apache.TaskAttemptListenerImpl: Done acknowledgement from attempt_1567609664100_85580_m_00000_0
2019-09-24 16:16:0,442 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Task Attempt attempt_1567609664100_85580_m_00000_0 finished. Firing COTAIER_AVAILABLE_FOR_REUSE event to ContainerAllocator
2019-09-24 16:16:0,442 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: attempt_1567609664100_85580_m_00000_0 TaskAttempt Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:0,442 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1567609664100_85580_m_00000_0
2019-09-24 16:16:0,442 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: task_1567609664100_85580_m_00000 Task Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:0,442 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.JobImpl: um completed Tasks: 9
2019-09-24 16:16:0,44 IFO [ContainerLauncher #7] org.apache.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: COTAIER_REMOTE_CLEAUP for container container_e29_1567609664100_85580_01_000002 taskAttempt attempt_1567609664100_85580_m_00000_0
2019-09-24 16:16:0,446 IFO [ContainerLauncher #7] org.apache.v2.app.launcher.ContainerLauncherImpl: KILLIG attempt_1567609664100_85580_m_00000_0
2019-09-24 16:16:0,447 IFO [ContainerLauncher #7] org.apache.hadoop.api.impl.ContainerManagementProtocolProxy: Opening proxy : yiclouddata11-SZZB:26009
2019-09-24 16:16:0,556 IFO [IPC Server handler 8 on 27102] org.apache.TaskAttemptListenerImpl: JVM with ID : jvm_1567609664100_85580_m_188587205506 asked for a task
2019-09-24 16:16:0,556 IFO [IPC Server handler 8 on 27102] org.apache.TaskAttemptListenerImpl: JVM with ID: jvm_1567609664100_85580_m_188587205506 is invalid and will be killed.
2019-09-24 16:16:1,01 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps: AssignedReds:0 CompletedMaps:9 CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:16:1,017 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000004
2019-09-24 16:16:1,017 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000002
2019-09-24 16:16:1,017 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:18240, vCores:12>
2019-09-24 16:16:1,017 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000001_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:1,017 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:16:1,017 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:9 CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:16:1,017 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_00000_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:4,026 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:128000, vCores:10>
2019-09-24 16:16:4,026 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:16:6,02 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:125952, vCores:9>
2019-09-24 16:16:6,02 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:16:47,061 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:115712, vCores:7>
2019-09-24 16:16:47,061 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:16:58,089 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:105472, vCores:5>
2019-09-24 16:16:58,090 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:16:59,092 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:84992, vCores:1>
2019-09-24 16:16:59,092 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:17:06,109 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:125952, vCores:9>
2019-09-24 16:17:06,109 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:17:08,11 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:115712, vCores:7>
2019-09-24 16:17:08,11 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:17:09,115 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:9522, vCores:>
2019-09-24 16:17:09,115 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:17:10,117 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:84992, vCores:1>
2019-09-24 16:17:10,117 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:17:11,121 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000006
2019-09-24 16:17:11,122 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:76800, vCores:0>
2019-09-24 16:17:11,122 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:17:11,122 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:9 CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:17:11,122 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000000_0: Container [pid=44860,containerID=container_e29_1567609664100_85580_01_000006] is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 4.0 GB of 16.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_e29_1567609664100_85580_01_000006 :|- PID PPID PGRPID SESSID CMD_AME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LIE|- 44881 44860 44860 44860 (java) 21865 1198 418670784 526521 /opt/huawei/Bigdata/common/runtime0/jdk1.8.0_162//bin/java -Djava.security.auth.=/opt/huawei/Bigdata/FusionInsight_Current/1_11_odeManager/etc/ -Dzookeeper.server.principal=zookeeper/hadoop.hadoop -Dzookeeper.=120000 -server -XX:ewRatio=8 -Djava.preferIPv4Stack=true -Xmx2048M -Djava.preferIPv4Stack=true -Djava.security.=/opt/huawei/Bigdata/common/runtime/ -Djava.=/srv/BigData/hadoop/data6/nm/localdir/usercache/yxs_product/appcache/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006/tmp =container-log4j.properties -Dyarn.log.dir=/srv/BigData/hadoop/data10/nm/containerlogs/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006 -Dyarn.log.filesize=0 -Dhadoop.root.logger=IFO,CLA -Dhadoop.root.logfile=syslog org.apache.YarnChild 10.240.250.1 27102 attempt_1567609664100_85580_m_000000_0 188587205510 |- 44860 44857 44860 44860 (bash) 2 1 11601488 74 /bin/bash -c /opt/huawei/Bigdata/common/runtime0/jdk1.8.0_162//bin/java -Djava.security.auth.=/opt/huawei/Bigdata/FusionInsight_Current/1_11_odeManager/etc/ -Dzookeeper.server.principal=zookeeper/hadoop.hadoop -Dzookeeper.=120000 -server -XX:ewRatio=8 -Djava.preferIPv4Stack=true -Xmx2048M -Djava.preferIPv4Stack=true -Djava.security.=/opt/huawei/Bigdata/common/runtime/ -Djava.=/srv/BigData/hadoop/data6/nm/localdir/usercache/yxs_product/appcache/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006/tmp =container-log4j.properties -Dyarn.log.dir=/srv/BigData/hadoop/data10/nm/containerlogs/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006 -Dyarn.log.filesize=0 -Dhadoop.root.logger=IFO,CLA -Dhadoop.root.logfile=syslog org.apache.YarnChild 10.240.250.1 27102 attempt_1567609664100_85580_m_000000_0 188587205510 1>/srv/BigData/hadoop/data10/nm/containerlogs/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006/stdout 2>/srv/BigData/hadoop/data10/nm/containerlogs/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006/stderr Container killed on request. Exit code is 14
Container exited with a non-zero exit code 14
然后我其实还是没有看出来有啥子错误,继续详细看map和reduce报错信息:
错误日志如下
Container [pid=44860,containerID=container_e29_1567609664100_85580_01_000006] is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 4.0 GB of 16.2 GB virtual memory used. Killing container. Dump of the process-tree for container_e29_1567609664100_85580_01_000006 : |- PID PPID PGRPID SESSID CMD_AME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LIE |- 44881 44860 44860 44860 (java) 21865 1198 418670784 526521 /opt/huawei/Bigdata/common/runtime0/jdk1.8.0_162//bin/java -Djava.security.auth.=/opt/huawei/Bigdata/FusionInsight_Current/1_11_odeManager/etc/ -Dzookeeper.server.principal=zookeeper/hadoop.hadoop -Dzookeeper.=120000 -server -XX:ewRatio=8 -Djava.preferIPv4Stack=true -Xmx2048M -Djava.preferIPv4Stack=true -Djava.security.=/opt/huawei/Bigdata/common/runtime/ -Djava.=/srv/BigData/hadoop/data6/nm/localdir/usercache/yxs_product/appcache/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006/tmp =container-log4j.properties -Dyarn.log.dir=/srv/BigData/hadoop/data10/nm/containerlogs/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006 -Dyarn.log.filesize=0 -Dhadoop.root.logger=IFO,CLA -Dhadoop.root.logfile=syslog org.apache.YarnChild 10.240.250.1 27102 attempt_1567609664100_85580_m_000000_0 188587205510 |- 44860 44857 44860 44860 (bash) 2 1 11601488 74 /bin/bash -c /opt/huawei/Bigdata/common/runtime0/jdk1.8.0_162//bin/java -Djava.security.auth.=/opt/huawei/Bigdata/FusionInsight_Current/1_11_odeManager/etc/ -Dzookeeper.server.principal=zookeeper/hadoop.hadoop -Dzookeeper.=120000 -server -XX:ewRatio=8 -Djava.preferIPv4Stack=true -Xmx2048M -Djava.preferIPv4Stack=true -Djava.security.=/opt/huawei/Bigdata/common/runtime/ -Djava.=/srv/BigData/hadoop/data6/nm/localdir/usercache/yxs_product/appcache/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006/tmp =container-log4j.properties -Dyarn.log.dir=/srv/BigData/hadoop/data10/nm/containerlogs/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006 -Dyarn.log.filesize=0 -Dhadoop.root.logger=IFO,CLA -Dhadoop.root.logfile=syslog org.apache.YarnChild 10.240.250.1 27102 attempt_1567609664100_85580_m_000000_0 188587205510 1>/srv/BigData/hadoop/data10/nm/containerlogs/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006/stdout 2>/srv/BigData/hadoop/data10/nm/containerlogs/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006/stderr Container killed on request. Exit code is 14 Container exited with a non-zero exit code 14
Container [pid=44860,containerID=container_e29_1567609664100_85580_01_000006] is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 4.0 GB of 16.2 GB virtual memory used. Killing container.
ok,看到这里终于到错误原因了。
错误分析
首先检查yarn上面配置信息
ERROR:Container [pid=44860,containerID=container_e29_1567609664100_85580_01_000006] is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 4.0 GB of 16.2 GB virtual memory used. Killing container.
2.0 GB:任务所占的物理内存
2GB: 参数默认设置大小
4.0 GB:程序占用的虚拟内存
16.2 GB: 乘以 vmem-pmem-ratio 得到的
其中 vmem-pmem-ratio 是 虚拟内存和物理内存比例,在yarn-site.xml中设置,默认是2.1
很明显,container需要占用了超过了任务的物理内存限制(running beyond physical memory limits)。所以kill掉了这个container。
上面只是map中产生的报错,当然也有可能在reduce中报错,如果是reduce中,那么就是mapreduce.db * vmem-pmem-ratio
物理内存:真实的硬件设备(内存条)
虚拟内存:利用磁盘空间虚拟出的一块逻辑内存,用作虚拟内存的磁盘空间被称为交换空间(Swap Space)。(为了满足物理内存的不足而提出的策略)
linux会在物理内存不足时,使用交换分区的虚拟内存。内核会将暂时不用的内存块信息写到交换空间,这样以来,物理内存得到了释放,这块内存就可以用于其它目的,当需要用到原始的内容时,这些信息会被重新从交换空间读入物理内存。
解决方案
在yarn-site.xml或者程序中中设置vmem-check-enabled为false
<property><name>vmem-check-enabled</name><value>false</value><description>Whether virtual memory limits will be enforced for containers.</description>
</property>
除了物理内存超了,也有可能是虚拟内存超了,同样也可以设置物理内存的检查为
pmem-check-enabled :false
个人认为这种办法并不太好,如果程序有内存泄漏等问题,取消这个检查,可能会导致集崩溃。
为物理内存增大对应的虚拟内存, 但是这个参数也不能太离谱
小结
任务内存问题,主要分为两块,一块是物理内存,一块是虚拟内存,哪个超过了任务都会报错的,适当地修改对应的参数,就可以将任务继续运行了。如果任务所占用的内存太过离谱,更多考虑的应该是程序是否有内存泄漏,是否存在数据倾斜等,优先程序解决此类问题。终极解法:拆分数据,将数据均分成多个任务,进行操作~
或者选择spark哦~
6 的飞起!!!
#感谢您对电脑配置推荐网 - 最新i3 i5 i7组装电脑配置单推荐报价格的认可,转载请说明来源于"电脑配置推荐网 - 最新i3 i5 i7组装电脑配置单推荐报价格
推荐阅读
留言与评论(共有 5 条评论) |
本站网友 武汉购物 | 27分钟前 发表 |
Execution Error | |
本站网友 水电改造价格 | 23分钟前 发表 |
6 CompletedReds | |
本站网友 田七炖鸡 | 9分钟前 发表 |
20 | |
本站网友 柳州房屋出租 | 12分钟前 发表 |
2 ScheduledMaps |