您现在的位置是:首页 > 数码 > 

hive任务优化

2025-07-20 18:28:23
hive任务优化 目录 错误背景 错误信息定位 client端日志 APPlication日志 map和reduce单个错误日志 错误分析 解决方案 1. 取消虚拟内存的检查(不建议): 2.增大mapreduce.map.memory.mb 或者 mapreduce.reduce.memory.mb (建议)

hive任务优化

目录

错误背景

错误信息定位

client端日志

APPlication日志

map和reduce单个错误日志

错误分析

解决方案

1. 取消虚拟内存的检查(不建议):

2.增大 或者 mapreduce. (建议)

.适当增大 vmem-pmem-ratio的大小

4.换成sparkSQL任务(的一比,强烈推荐)

小结


错误背景

          大概是job运行超过了map和reduce设置的内存大小,导致任务失败 ,就是写了一个hql语句运行在大数据平台上面,发现报错了。

错误信息定位

client端日志
IFO  : converting to local hdfs://hacluster/tenant/yxs/product/resources/resources/jar/fc06465-4af1-4756-894e-ce74ec11b9c.jar
IFO  : Added [/opt/huawei/Bigdata/tmp/hivelocaltmp/session_resources/2d0a2efc-776c-4ccc-957d-927079862ab2_resources/fc06465-4af1-4756-894e-ce74ec11b9c.jar] to class path
IFO  : Added resources: [hdfs://hacluster/tenant/yxs/product/resources/resources/jar/fc06465-4af1-4756-894e-ce74ec11b9c.jar]
IFO  : umber of reduce tasks not specified. Estimated from input data size: 2
IFO  : In order to change the average load for a reducer (in bytes):
IFO  :   set reducers.bytes.per.reducer=<number>
IFO  : In order to limit the maximum number of reducers:
IFO  :   set =<number>
IFO  : In order to set a ctant number of reducers:
IFO  :   set mapreduce.job.reduces=<number>
IFO  : number of splits:10
IFO  : Submitting tokens for job: job_1567609664100_85580
IFO  : Kind: HDFS_DELEGATIO_TOKE, Service: ha-hdfs:hacluster
IFO  : Kind: HIVE_DELEGATIO_TOKE, Service: HiveServer2ImpersonationToken
IFO  : The url to track the job: https://yiclouddata0-szzb:26001/proxy/application_1567609664100_85580/
IFO  : Starting Job = job_1567609664100_85580, Tracking URL = https://yiclouddata0-szzb:26001/proxy/application_1567609664100_85580/
IFO  : Kill Command = /opt/huawei/Bigdata/FusionInsight_HD_V100R002C80SPC20/install/FusionInsight-Hive-1..0/hive-1..0/bin/..//../hadoop/bin/hadoop job  -kill job_1567609664100_85580
IFO  : Hadoop job information for Stage-6: number of mappers: 10; number of reducers: 2
IFO  : 2019-09-24 16:16:17,686 Stage-6 map = 0%,  reduce = 0%
IFO  : 2019-09-24 16:16:27,299 Stage-6 map = 20%,  reduce = 0%, Cumulative CPU 10.12 sec
IFO  : 2019-09-24 16:16:28,474 Stage-6 map = 0%,  reduce = 0%, Cumulative CPU 0.4 sec
IFO  : 2019-09-24 16:16:29,664 Stage-6 map = 70%,  reduce = 0%, Cumulative CPU 8.44 sec
IFO  : 2019-09-24 16:16:0,841 Stage-6 map = 90%,  reduce = 0%, Cumulative CPU 115.79 sec
IFO  : 2019-09-24 16:16:2,004 Stage-6 map = 91%,  reduce = 0%, Cumulative CPU 14.7 sec
IFO  : 2019-09-24 16:16:44,928 Stage-6 map = 92%,  reduce = 0%, Cumulative CPU 22.25 sec
IFO  : 2019-09-24 16:16:55,61 Stage-6 map = 9%,  reduce = 0%, Cumulative CPU 284.27 sec
IFO  : 2019-09-24 16:17:0,797 Stage-6 map = 94%,  reduce = 0%, Cumulative CPU 1.69 sec
IFO  : 2019-09-24 16:17:11,881 Stage-6 map = 90%,  reduce = 0%, Cumulative CPU 115.79 sec
IFO  : 2019-09-24 16:18:12,546 Stage-6 map = 90%,  reduce = 0%, Cumulative CPU 115.79 sec
IFO  : 2019-09-24 16:19:04,47 Stage-6 map = 91%,  reduce = 0%, Cumulative CPU 185.47 sec
IFO  : 2019-09-24 16:19:1,68 Stage-6 map = 92%,  reduce = 0%, Cumulative CPU 22.5 sec
IFO  : 2019-09-24 16:19:22,825 Stage-6 map = 9%,  reduce = 0%, Cumulative CPU 281.97 sec
IFO  : 2019-09-24 16:19:2,05 Stage-6 map = 94%,  reduce = 0%, Cumulative CPU 14.97 sec
IFO  : 2019-09-24 16:19:54,14 Stage-6 map = 95%,  reduce = 0%, Cumulative CPU 77.6 sec
IFO  : 2019-09-24 16:19:56,520 Stage-6 map = 90%,  reduce = 0%, Cumulative CPU 115.79 sec
IFO  : 2019-09-24 16:20:09,8 Stage-6 map = 91%,  reduce = 0%, Cumulative CPU 181.59 sec
IFO  : 2019-09-24 16:20:18,574 Stage-6 map = 92%,  reduce = 0%, Cumulative CPU 217.27 sec
IFO  : 2019-09-24 16:20:27,772 Stage-6 map = 9%,  reduce = 0%, Cumulative CPU 266.25 sec
IFO  : 2019-09-24 16:20:40,49 Stage-6 map = 94%,  reduce = 0%, Cumulative CPU 05.2 sec
IFO  : 2019-09-24 16:20:57,751 Stage-6 map = 90%,  reduce = 0%, Cumulative CPU 115.79 sec
IFO  : 2019-09-24 16:21:11,624 Stage-6 map = 91%,  reduce = 0%, Cumulative CPU 18.87 sec
IFO  : 2019-09-24 16:21:20,948 Stage-6 map = 92%,  reduce = 0%, Cumulative CPU 219.12 sec
IFO  : 2019-09-24 16:21:1,427 Stage-6 map = 9%,  reduce = 0%, Cumulative CPU 282.71 sec
IFO  : 2019-09-24 16:21:9,754 Stage-6 map = 94%,  reduce = 0%, Cumulative CPU 17.99 sec
IFO  : 2019-09-24 16:21:45,519 Stage-6 map = 100%,  reduce = 100%, Cumulative CPU 115.79 sec
IFO  : MapReduce Total cumulative CPU time: 1 minutes 55 seconds 790 msec
ERROR : Ended Job = job_1567609664100_85580 with errors
任务-T_626089799950704_20190924161555945_1_1 运行失败,失败原因:java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.MapRedTaskat org.apache.hive.jdbc.(HiveStatement.java:28)at org.apache.hive.jdbc.Query(HiveStatement.java:79)at com.dtwave.dipper.runner.impl.Hive2TaskRunner.doRun(Hive2TaskRunner.java:244)at com.dtwave.dipper.runner.(BasicTaskRunner.java:100)at com.dtwave.dipper.TaskExecutor.run(TaskExecutor.java:2)at java.Executors$(Executors.java:511)at java.FutureTask.run(FutureTask.java:266)at java.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)任务运行失败(Failed)

       看完错误是不是一脸懵逼,两眼茫然...怀疑人生,哈哈...

APPlication日志

       看这个能看出啥错误呀,需要去yarn里面看application任务运行日志如下所示:

2019-09-24 16:16:27,712 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.JobImpl: um completed Tasks: 
2019-09-24 16:16:27,712 IFO [ContainerLauncher #2] org.apache.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: COTAIER_REMOTE_CLEAUP for container container_e29_1567609664100_85580_01_000011 taskAttempt attempt_1567609664100_85580_m_000009_0
2019-09-24 16:16:27,71 IFO [ContainerLauncher #2] org.apache.v2.app.launcher.ContainerLauncherImpl: KILLIG attempt_1567609664100_85580_m_000009_0
2019-09-24 16:16:27,71 IFO [ContainerLauncher #2] org.apache.hadoop.api.impl.ContainerManagementProtocolProxy: Opening proxy : yiclouddata04-SZZB:26009
2019-09-24 16:16:27,997 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:10 AssignedReds:0 CompletedMaps: CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:16:28,005 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000009
2019-09-24 16:16:28,006 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000011
2019-09-24 16:16:28,006 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_00000
2019-09-24 16:16:28,006 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:125952, vCores:6>
2019-09-24 16:16:28,006 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:16:28,006 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:7 AssignedReds:0 CompletedMaps: CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:16:28,006 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000008_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:28,006 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000009_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:28,006 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000007_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:28,557 IFO [IPC Server handler 7 on 27102] org.apache.TaskAttemptListenerImpl: Done acknowledgement from attempt_1567609664100_85580_m_000006_0
2019-09-24 16:16:28,558 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Task Attempt attempt_1567609664100_85580_m_000006_0 finished. Firing COTAIER_AVAILABLE_FOR_REUSE event to ContainerAllocator
2019-09-24 16:16:28,558 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: attempt_1567609664100_85580_m_000006_0 TaskAttempt Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:28,558 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1567609664100_85580_m_000006_0
2019-09-24 16:16:28,558 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: task_1567609664100_85580_m_000006 Task Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:28,559 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.JobImpl: um completed Tasks: 4
2019-09-24 16:16:28,560 IFO [ContainerLauncher #5] org.apache.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: COTAIER_REMOTE_CLEAUP for container container_e29_1567609664100_85580_01_000007 taskAttempt attempt_1567609664100_85580_m_000006_0
2019-09-24 16:16:28,560 IFO [ContainerLauncher #5] org.apache.v2.app.launcher.ContainerLauncherImpl: KILLIG attempt_1567609664100_85580_m_000006_0
2019-09-24 16:16:28,560 IFO [ContainerLauncher #5] org.apache.hadoop.api.impl.ContainerManagementProtocolProxy: Opening proxy : yiclouddata05-SZZB:26009
2019-09-24 16:16:28,851 IFO [IPC Server handler 10 on 27102] org.apache.TaskAttemptListenerImpl: Done acknowledgement from attempt_1567609664100_85580_m_000005_0
2019-09-24 16:16:28,852 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Task Attempt attempt_1567609664100_85580_m_000005_0 finished. Firing COTAIER_AVAILABLE_FOR_REUSE event to ContainerAllocator
2019-09-24 16:16:28,852 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: attempt_1567609664100_85580_m_000005_0 TaskAttempt Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:28,852 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1567609664100_85580_m_000005_0
2019-09-24 16:16:28,852 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: task_1567609664100_85580_m_000005 Task Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:28,85 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.JobImpl: um completed Tasks: 5
2019-09-24 16:16:28,856 IFO [ContainerLauncher #8] org.apache.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: COTAIER_REMOTE_CLEAUP for container container_e29_1567609664100_85580_01_000008 taskAttempt attempt_1567609664100_85580_m_000005_0
2019-09-24 16:16:28,856 IFO [ContainerLauncher #8] org.apache.v2.app.launcher.ContainerLauncherImpl: KILLIG attempt_1567609664100_85580_m_000005_0
2019-09-24 16:16:28,856 IFO [ContainerLauncher #8] org.apache.hadoop.api.impl.ContainerManagementProtocolProxy: Opening proxy : yiclouddata16-SZZB:26009
2019-09-24 16:16:28,986 IFO [IPC Server handler 16 on 27102] org.apache.TaskAttemptListenerImpl: Done acknowledgement from attempt_1567609664100_85580_m_000004_0
2019-09-24 16:16:28,987 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Task Attempt attempt_1567609664100_85580_m_000004_0 finished. Firing COTAIER_AVAILABLE_FOR_REUSE event to ContainerAllocator
2019-09-24 16:16:28,987 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: attempt_1567609664100_85580_m_000004_0 TaskAttempt Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:28,987 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1567609664100_85580_m_000004_0
2019-09-24 16:16:28,988 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: task_1567609664100_85580_m_000004 Task Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:28,989 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.JobImpl: um completed Tasks: 6
2019-09-24 16:16:28,989 IFO [ContainerLauncher #6] org.apache.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: COTAIER_REMOTE_CLEAUP for container container_e29_1567609664100_85580_01_000005 taskAttempt attempt_1567609664100_85580_m_000004_0
2019-09-24 16:16:28,990 IFO [ContainerLauncher #6] org.apache.v2.app.launcher.ContainerLauncherImpl: KILLIG attempt_1567609664100_85580_m_000004_0
2019-09-24 16:16:28,990 IFO [ContainerLauncher #6] org.apache.hadoop.api.impl.ContainerManagementProtocolProxy: Opening proxy : yiclouddata10-SZZB:26009
2019-09-24 16:16:29,006 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:7 AssignedReds:0 CompletedMaps:6 CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:16:29,008 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000008
2019-09-24 16:16:29,009 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000007
2019-09-24 16:16:29,009 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000005_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:29,009 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:10048, vCores:8>
2019-09-24 16:16:29,009 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:16:29,009 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000006_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:29,009 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5 AssignedReds:0 CompletedMaps:6 CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:16:29,582 IFO [IPC Server handler 12 on 27102] org.apache.TaskAttemptListenerImpl: Done acknowledgement from attempt_1567609664100_85580_m_000002_0
2019-09-24 16:16:29,584 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Task Attempt attempt_1567609664100_85580_m_000002_0 finished. Firing COTAIER_AVAILABLE_FOR_REUSE event to ContainerAllocator
2019-09-24 16:16:29,584 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: attempt_1567609664100_85580_m_000002_0 TaskAttempt Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:29,584 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1567609664100_85580_m_000002_0
2019-09-24 16:16:29,584 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: task_1567609664100_85580_m_000002 Task Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:29,584 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.JobImpl: um completed Tasks: 7
2019-09-24 16:16:29,585 IFO [ContainerLauncher #4] org.apache.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: COTAIER_REMOTE_CLEAUP for container container_e29_1567609664100_85580_01_000010 taskAttempt attempt_1567609664100_85580_m_000002_0
2019-09-24 16:16:29,586 IFO [ContainerLauncher #4] org.apache.v2.app.launcher.ContainerLauncherImpl: KILLIG attempt_1567609664100_85580_m_000002_0
2019-09-24 16:16:29,586 IFO [ContainerLauncher #4] org.apache.hadoop.api.impl.ContainerManagementProtocolProxy: Opening proxy : yiclouddata14-SZZB:26009
2019-09-24 16:16:0,009 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5 AssignedReds:0 CompletedMaps:7 CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:16:0,01 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000010
2019-09-24 16:16:0,01 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000005
2019-09-24 16:16:0,01 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000002_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:0,01 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:14144, vCores:10>
2019-09-24 16:16:0,01 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:16:0,01 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps: AssignedReds:0 CompletedMaps:7 CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:16:0,01 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000004_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:0,416 IFO [IPC Server handler 6 on 27102] org.apache.TaskAttemptListenerImpl: Done acknowledgement from attempt_1567609664100_85580_m_000001_0
2019-09-24 16:16:0,417 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Task Attempt attempt_1567609664100_85580_m_000001_0 finished. Firing COTAIER_AVAILABLE_FOR_REUSE event to ContainerAllocator
2019-09-24 16:16:0,417 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: attempt_1567609664100_85580_m_000001_0 TaskAttempt Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:0,417 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1567609664100_85580_m_000001_0
2019-09-24 16:16:0,418 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: task_1567609664100_85580_m_000001 Task Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:0,418 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.JobImpl: um completed Tasks: 8
2019-09-24 16:16:0,419 IFO [ContainerLauncher #] org.apache.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: COTAIER_REMOTE_CLEAUP for container container_e29_1567609664100_85580_01_000004 taskAttempt attempt_1567609664100_85580_m_000001_0
2019-09-24 16:16:0,419 IFO [ContainerLauncher #] org.apache.v2.app.launcher.ContainerLauncherImpl: KILLIG attempt_1567609664100_85580_m_000001_0
2019-09-24 16:16:0,419 IFO [ContainerLauncher #] org.apache.hadoop.api.impl.ContainerManagementProtocolProxy: Opening proxy : yiclouddata12-SZZB:26009
2019-09-24 16:16:0,440 IFO [IPC Server handler 7 on 27102] org.apache.TaskAttemptListenerImpl: Done acknowledgement from attempt_1567609664100_85580_m_00000_0
2019-09-24 16:16:0,442 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Task Attempt attempt_1567609664100_85580_m_00000_0 finished. Firing COTAIER_AVAILABLE_FOR_REUSE event to ContainerAllocator
2019-09-24 16:16:0,442 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: attempt_1567609664100_85580_m_00000_0 TaskAttempt Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:0,442 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1567609664100_85580_m_00000_0
2019-09-24 16:16:0,442 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskImpl: task_1567609664100_85580_m_00000 Task Transitioned from RUIG to SUCCEEDED
2019-09-24 16:16:0,442 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.JobImpl: um completed Tasks: 9
2019-09-24 16:16:0,44 IFO [ContainerLauncher #7] org.apache.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: COTAIER_REMOTE_CLEAUP for container container_e29_1567609664100_85580_01_000002 taskAttempt attempt_1567609664100_85580_m_00000_0
2019-09-24 16:16:0,446 IFO [ContainerLauncher #7] org.apache.v2.app.launcher.ContainerLauncherImpl: KILLIG attempt_1567609664100_85580_m_00000_0
2019-09-24 16:16:0,447 IFO [ContainerLauncher #7] org.apache.hadoop.api.impl.ContainerManagementProtocolProxy: Opening proxy : yiclouddata11-SZZB:26009
2019-09-24 16:16:0,556 IFO [IPC Server handler 8 on 27102] org.apache.TaskAttemptListenerImpl: JVM with ID : jvm_1567609664100_85580_m_188587205506 asked for a task
2019-09-24 16:16:0,556 IFO [IPC Server handler 8 on 27102] org.apache.TaskAttemptListenerImpl: JVM with ID: jvm_1567609664100_85580_m_188587205506 is invalid and will be killed.
2019-09-24 16:16:1,01 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps: AssignedReds:0 CompletedMaps:9 CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:16:1,017 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000004
2019-09-24 16:16:1,017 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000002
2019-09-24 16:16:1,017 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:18240, vCores:12>
2019-09-24 16:16:1,017 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000001_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:1,017 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:16:1,017 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:9 CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:16:1,017 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_00000_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 14
Container exited with a non-zero exit code 142019-09-24 16:16:4,026 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:128000, vCores:10>
2019-09-24 16:16:4,026 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:16:6,02 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:125952, vCores:9>
2019-09-24 16:16:6,02 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:16:47,061 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:115712, vCores:7>
2019-09-24 16:16:47,061 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:16:58,089 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:105472, vCores:5>
2019-09-24 16:16:58,090 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:16:59,092 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:84992, vCores:1>
2019-09-24 16:16:59,092 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:17:06,109 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:125952, vCores:9>
2019-09-24 16:17:06,109 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:17:08,11 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:115712, vCores:7>
2019-09-24 16:17:08,11 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:17:09,115 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:9522, vCores:>
2019-09-24 16:17:09,115 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:17:10,117 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:84992, vCores:1>
2019-09-24 16:17:10,117 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:17:11,121 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Received completed container container_e29_1567609664100_85580_01_000006
2019-09-24 16:17:11,122 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:76800, vCores:0>
2019-09-24 16:17:11,122 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 10
2019-09-24 16:17:11,122 IFO [RMCommunicator Allocator] org.apache.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:2 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:9 CompletedReds:0 ContAlloc:10 ContRel:0 HostLocal:8 RackLocal:1
2019-09-24 16:17:11,122 IFO [AsyncDispatcher event handler] org.apache.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1567609664100_85580_m_000000_0: Container [pid=44860,containerID=container_e29_1567609664100_85580_01_000006] is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 4.0 GB of 16.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_e29_1567609664100_85580_01_000006 :|- PID PPID PGRPID SESSID CMD_AME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LIE|- 44881 44860 44860 44860 (java) 21865 1198 418670784 526521 /opt/huawei/Bigdata/common/runtime0/jdk1.8.0_162//bin/java -Djava.security.auth.=/opt/huawei/Bigdata/FusionInsight_Current/1_11_odeManager/etc/ -Dzookeeper.server.principal=zookeeper/hadoop.hadoop -Dzookeeper.=120000 -server -XX:ewRatio=8 -Djava.preferIPv4Stack=true -Xmx2048M -Djava.preferIPv4Stack=true -Djava.security.=/opt/huawei/Bigdata/common/runtime/ -Djava.=/srv/BigData/hadoop/data6/nm/localdir/usercache/yxs_product/appcache/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006/tmp =container-log4j.properties -Dyarn.log.dir=/srv/BigData/hadoop/data10/nm/containerlogs/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006 -Dyarn.log.filesize=0 -Dhadoop.root.logger=IFO,CLA -Dhadoop.root.logfile=syslog org.apache.YarnChild 10.240.250.1 27102 attempt_1567609664100_85580_m_000000_0 188587205510 |- 44860 44857 44860 44860 (bash) 2 1 11601488 74 /bin/bash -c /opt/huawei/Bigdata/common/runtime0/jdk1.8.0_162//bin/java -Djava.security.auth.=/opt/huawei/Bigdata/FusionInsight_Current/1_11_odeManager/etc/ -Dzookeeper.server.principal=zookeeper/hadoop.hadoop -Dzookeeper.=120000 -server -XX:ewRatio=8 -Djava.preferIPv4Stack=true -Xmx2048M -Djava.preferIPv4Stack=true -Djava.security.=/opt/huawei/Bigdata/common/runtime/ -Djava.=/srv/BigData/hadoop/data6/nm/localdir/usercache/yxs_product/appcache/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006/tmp =container-log4j.properties -Dyarn.log.dir=/srv/BigData/hadoop/data10/nm/containerlogs/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006 -Dyarn.log.filesize=0 -Dhadoop.root.logger=IFO,CLA -Dhadoop.root.logfile=syslog org.apache.YarnChild 10.240.250.1 27102 attempt_1567609664100_85580_m_000000_0 188587205510 1>/srv/BigData/hadoop/data10/nm/containerlogs/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006/stdout 2>/srv/BigData/hadoop/data10/nm/containerlogs/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006/stderr  Container killed on request. Exit code is 14
Container exited with a non-zero exit code 14
map和reduce单个错误日志

          然后我其实还是没有看出来有啥子错误,继续详细看map和reduce报错信息:

错误日志如下

Container [pid=44860,containerID=container_e29_1567609664100_85580_01_000006] is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 4.0 GB of 16.2 GB virtual memory used. Killing container. Dump of the process-tree for container_e29_1567609664100_85580_01_000006 : |- PID PPID PGRPID SESSID CMD_AME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LIE |- 44881 44860 44860 44860 (java) 21865 1198 418670784 526521 /opt/huawei/Bigdata/common/runtime0/jdk1.8.0_162//bin/java -Djava.security.auth.=/opt/huawei/Bigdata/FusionInsight_Current/1_11_odeManager/etc/ -Dzookeeper.server.principal=zookeeper/hadoop.hadoop -Dzookeeper.=120000 -server -XX:ewRatio=8 -Djava.preferIPv4Stack=true -Xmx2048M -Djava.preferIPv4Stack=true -Djava.security.=/opt/huawei/Bigdata/common/runtime/ -Djava.=/srv/BigData/hadoop/data6/nm/localdir/usercache/yxs_product/appcache/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006/tmp =container-log4j.properties -Dyarn.log.dir=/srv/BigData/hadoop/data10/nm/containerlogs/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006 -Dyarn.log.filesize=0 -Dhadoop.root.logger=IFO,CLA -Dhadoop.root.logfile=syslog org.apache.YarnChild 10.240.250.1 27102 attempt_1567609664100_85580_m_000000_0 188587205510 |- 44860 44857 44860 44860 (bash) 2 1 11601488 74 /bin/bash -c /opt/huawei/Bigdata/common/runtime0/jdk1.8.0_162//bin/java -Djava.security.auth.=/opt/huawei/Bigdata/FusionInsight_Current/1_11_odeManager/etc/ -Dzookeeper.server.principal=zookeeper/hadoop.hadoop -Dzookeeper.=120000 -server -XX:ewRatio=8 -Djava.preferIPv4Stack=true -Xmx2048M -Djava.preferIPv4Stack=true -Djava.security.=/opt/huawei/Bigdata/common/runtime/ -Djava.=/srv/BigData/hadoop/data6/nm/localdir/usercache/yxs_product/appcache/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006/tmp =container-log4j.properties -Dyarn.log.dir=/srv/BigData/hadoop/data10/nm/containerlogs/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006 -Dyarn.log.filesize=0 -Dhadoop.root.logger=IFO,CLA -Dhadoop.root.logfile=syslog org.apache.YarnChild 10.240.250.1 27102 attempt_1567609664100_85580_m_000000_0 188587205510 1>/srv/BigData/hadoop/data10/nm/containerlogs/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006/stdout 2>/srv/BigData/hadoop/data10/nm/containerlogs/application_1567609664100_85580/container_e29_1567609664100_85580_01_000006/stderr Container killed on request. Exit code is 14 Container exited with a non-zero exit code 14

Container [pid=44860,containerID=container_e29_1567609664100_85580_01_000006] is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 4.0 GB of 16.2 GB virtual memory used. Killing container.

ok,看到这里终于到错误原因了。

错误分析

首先检查yarn上面配置信息

ERROR:Container [pid=44860,containerID=container_e29_1567609664100_85580_01_000006] is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 4.0 GB of 16.2 GB virtual memory used. Killing container.

2.0 GB:任务所占的物理内存
2GB: 参数默认设置大小
4.0 GB:程序占用的虚拟内存
16.2 GB: 乘以 vmem-pmem-ratio 得到的

其中 vmem-pmem-ratio 是 虚拟内存和物理内存比例,在yarn-site.xml中设置,默认是2.1

很明显,container需要占用了超过了任务的物理内存限制(running beyond physical memory limits)。所以kill掉了这个container。

上面只是map中产生的报错,当然也有可能在reduce中报错,如果是reduce中,那么就是mapreduce.db * vmem-pmem-ratio

物理内存:真实的硬件设备(内存条)
虚拟内存:利用磁盘空间虚拟出的一块逻辑内存,用作虚拟内存的磁盘空间被称为交换空间(Swap Space)。(为了满足物理内存的不足而提出的策略)
linux会在物理内存不足时,使用交换分区的虚拟内存。内核会将暂时不用的内存块信息写到交换空间,这样以来,物理内存得到了释放,这块内存就可以用于其它目的,当需要用到原始的内容时,这些信息会被重新从交换空间读入物理内存。

解决方案

1. 取消虚拟内存的检查(不建议):

在yarn-site.xml或者程序中中设置vmem-check-enabled为false

<property><name>vmem-check-enabled</name><value>false</value><description>Whether virtual memory limits will be enforced for containers.</description>
</property>

除了物理内存超了,也有可能是虚拟内存超了,同样也可以设置物理内存的检查为 

pmem-check-enabled :false

个人认为这种办法并不太好,如果程序有内存泄漏等问题,取消这个检查,可能会导致集崩溃。

2.增大 或者 mapreduce. (建议)

.适当增大 vmem-pmem-ratio的大小

        为物理内存增大对应的虚拟内存, 但是这个参数也不能太离谱

4.换成sparkSQL任务(的一比,强烈推荐)

小结

          任务内存问题,主要分为两块,一块是物理内存,一块是虚拟内存,哪个超过了任务都会报错的,适当地修改对应的参数,就可以将任务继续运行了。如果任务所占用的内存太过离谱,更多考虑的应该是程序是否有内存泄漏,是否存在数据倾斜等,优先程序解决此类问题。终极解法:拆分数据,将数据均分成多个任务,进行操作~ 

或者选择spark哦~

6 的飞起!!!

 

 

#感谢您对电脑配置推荐网 - 最新i3 i5 i7组装电脑配置单推荐报价格的认可,转载请说明来源于"电脑配置推荐网 - 最新i3 i5 i7组装电脑配置单推荐报价格

本文地址:http://www.dnpztj.cn/shuma/845597.html

相关标签:无
上传时间: 2024-02-05 11:57:17
留言与评论(共有 5 条评论)
本站网友 武汉购物
27分钟前 发表
Execution Error
本站网友 水电改造价格
23分钟前 发表
6 CompletedReds
本站网友 田七炖鸡
9分钟前 发表
20
本站网友 柳州房屋出租
12分钟前 发表
2 ScheduledMaps