Posted: Sat May 09, 2020 12:18 Post subject: Memory Leak leading to a crash
Hello,
I am running build 43078 on a TP Link WR1043ND V2.
Apparently every night when I turn off my pc (Windows 10) which is connected via cable to the first LAN port of the TP Link, the free memory amount on the TP Link starts dropping rapidly. The device has 64mb of ram and over 30mb is free during normal operation so it takes a couple hours until the free memory reaches 0 and the router reboots.
This morning, I've monitored the memory and uptime with my phone which is connected via WiFi, uptime was 3 hours (so it has crashed already once during night) and free memory was down to 8mb. I kept monitoring while I turn on my pc and as soon as Windows 10 loaded onto the login screen (it has passed the bios and windows loading screen) the free memory instantly reported 30mb again.
I am running the router as a WDS AP, adblock script, and the default dhcp. The connected router to it is also a TP Link, WR740N with the same build running as a WDS Station with a VAP.
I have used default settings and the problem persisted but was less expressed (could last longer without rebooting).
I have experienced this issue on earlier builds as well, but it is not present on builds prior to 37xxx.
Basically memory gets eaten up when the device connected via lan is off, it frees up once the device turns on.
I don't know how to approach troubleshooting this issue and any help is appreciated.
Joined: 08 May 2018 Posts: 14125 Location: Texas, USA
Posted: Sun May 10, 2020 2:14 Post subject:
Hard to diagnose without a client connected running top or some other process to see what is eating up ram, since there's no logs provided or any other telemetry information. So, no wireless devices connected, no clients at all connected and this happens?
EDIT: The odd thing is I know of several memory leaks that were fixed since the 3**** builds and present builds, so this makes zero logical sense to me.
EDIT2: Of course, now I am wondering if the missing reset I referenced in this ticket has something to do with it:
I thought it could be WOL related but playing around with WOL on the pc that when turned on frees up memory didn't show any change.
I have 8 devices connected, 2 lan (pc and a voip device), the rest are wifi and they are all connected always. The reboot happens when there is almost no activity (only if the pc is off). Once the pc turns on, loads Windows 10, the memory gets freed straight away (reconnecting other devices and connecting another windows 10 machine via wifi does not do the trick, only the pc connected via lan).
I posted during 38xxx builds about this issue, which when loading the web UI from the same pc seemed to release memory, this time only turning on the pc does the trick.
If I catch something using top on ssh I will post back for sure.
Joined: 08 May 2018 Posts: 14125 Location: Texas, USA
Posted: Sun May 10, 2020 10:16 Post subject:
Oddly enough, this was reported in the build thread for 43078:
Alozaros wrote:
Router Model TP-Link WR1043ND V2
Firmware Version DD-WRT v3.0-r43078 std (05/07/20)
Kernel Version Linux 3.18.140-d4 #77616 Thu May 7 06:29:07 +04 2020 mips
update: CLI
reset: NO
status: Operational 24h+
errors: Nothing new
all working as it should, no CPU overload, no memleak
what i run on it.... can see my sig....
Joined: 16 Nov 2015 Posts: 6410 Location: UK, London, just across the river..
Posted: Sun May 10, 2020 10:29 Post subject:
yep i do have things plugged in lan ports, 3 out of 4
i also monitored mem n cpu, couse i read that complain too...
more likely related to WDS or miss config...
p.s. i did top for a day to see if mem leak occurs..
mem never goes down than 9k...
in general, it stays around 10-15k, i run quite of a set up on this unit...also have another router to port 1, a PC port 2, n a linux laptop port 3... PC n Laptop constantly on....
1043v2 VPN can do max 15-20MBit, but no streaming/gaming and torrents on that network...just browsing and work related...
I personally, like port isolation on switch side, so i don't need clients to see each other, but if i need that, there is another way to achieve it... _________________ Atheros
TP-Link WR740Nv1 ---DD-WRT 55179 WAP
TP-Link WR1043NDv2 -DD-WRT 55303 Gateway/DoT,Forced DNS,Ad-Block,Firewall,x4VLAN,VPN
TP-Link WR1043NDv2 -Gargoyle OS 1.15.x AP,DNS,QoS,Quotas
Qualcomm-Atheros
Netgear XR500 --DD-WRT 55460 Gateway/DoH,Forced DNS,AP Isolation,4VLAN,Ad-Block,Firewall,Vanilla
Netgear R7800 --DD-WRT 55460 Gateway/DoT,AD-Block,Forced DNS,AP&Net Isolation,x3VLAN,Firewall,Vanilla
Netgear R9000 --DD-WRT 55363 Gateway/DoT,AD-Block,AP Isolation,Firewall,Forced DNS,x2VLAN,Vanilla
Broadcom
Netgear R7000 --DD-WRT 55460 Gateway/SmartDNS/DoH,AD-Block,Firewall,Forced DNS,x3VLAN,VPN
NOT USING 5Ghz ANYWHERE
------------------------------------------------------
Stubby DNS over TLS I DNSCrypt v2 by mac913
Last edited by Alozaros on Sun May 10, 2020 12:37; edited 2 times in total
Joined: 08 May 2018 Posts: 14125 Location: Texas, USA
Posted: Sun May 10, 2020 11:03 Post subject:
If you look at the driver file comparisons I did between the 3.2, 3.5, 3.10, and 3.18 kernels, there is a missing reset in the code that I am not sure if it is 1) of consequence and 2) relates to this. I would like to rule out the possibility of a configuration issue, but I would also like to see the missing code added and tested. Or at least an answer as to whether or not I'm looking at things correctly. I am probably going to be doing some comparison to some upstream code and otherwise for giggles soon enough. _________________ "Life is but a fleeting moment, a vapor that vanishes quickly; All is vanity"
Contribute To DD-WRT Pogo - A minimal level of ability is expected and needed... DD-WRT Releases 2023 (PolitePol)
DD-WRT Releases 2023 (RSS Everything)
----------------------
Linux User #377467 counter.li.org / linuxcounter.net
The same thing happens on default settings (without wds). I've taken a look at top in telnet and the most memory is used by httpd but all of the processes don't sum up to the "used" amount it reports (sums up to 30mb but used is still reporting lot higher, in this monitoring session it was 48mb (cached wasn't high to compensate for the difference, only 4mb)).
Once turning on the pc on lan first port, used memory drops down to 30mb with all the other processes remaining almost identical, which this time somewhat is correctly equal to the sum.
It would be too time consuming to test hundreds of builds back. So far the issue seems to be solved by plugging my pc in lan port 2 and the voip device in port 1.
Joined: 16 Nov 2015 Posts: 6410 Location: UK, London, just across the river..
Posted: Sat May 16, 2020 13:57 Post subject:
its a very related to a bad clent, i guess......
i dont have any trobles with my v2 all its ok,
but there are mem leaks that are present on svn so.... _________________ Atheros
TP-Link WR740Nv1 ---DD-WRT 55179 WAP
TP-Link WR1043NDv2 -DD-WRT 55303 Gateway/DoT,Forced DNS,Ad-Block,Firewall,x4VLAN,VPN
TP-Link WR1043NDv2 -Gargoyle OS 1.15.x AP,DNS,QoS,Quotas
Qualcomm-Atheros
Netgear XR500 --DD-WRT 55460 Gateway/DoH,Forced DNS,AP Isolation,4VLAN,Ad-Block,Firewall,Vanilla
Netgear R7800 --DD-WRT 55460 Gateway/DoT,AD-Block,Forced DNS,AP&Net Isolation,x3VLAN,Firewall,Vanilla
Netgear R9000 --DD-WRT 55363 Gateway/DoT,AD-Block,AP Isolation,Firewall,Forced DNS,x2VLAN,Vanilla
Broadcom
Netgear R7000 --DD-WRT 55460 Gateway/SmartDNS/DoH,AD-Block,Firewall,Forced DNS,x3VLAN,VPN
NOT USING 5Ghz ANYWHERE
------------------------------------------------------
Stubby DNS over TLS I DNSCrypt v2 by mac913
its a very related to a bad clent, i guess......
i dont have any trobles with my v2 all its ok,
but there are mem leaks that are present on svn so....
Might be, still strange why it happened when they were connected in different Lan ports.
Do you have a v2 working as WDS AP? I've noticed mine reports in the log every startup that SFE is successfully stopped even if it is enabled as followed by the guides for WDS linking, as well as after a few days WiFi drops and is unable to be used unless restarting wifi only or the whole device.
Posted: Sun Aug 30, 2020 14:41 Post subject: Memory leak -- changes between 39956 and 40009
I'm seeing a very similar issue on my TL-WR740N v4. I believe I've isolated it to changes between 39956 and 40009. 39956 will run stably for days, while 40009 will consistently reboot after 2-4 hours.
At the moment my not-super-informed hunch is that it may have something to do with this SFE change. The majority of other changes seem to be areas that aren't relevant to this device (zfs, k4.4, VHT80, and header file changes for GCC9 compatibility).
Update: still seeing the same reboot behavior with SFE disabled, so my SFE hunch may have been wrong.
It looks like I don't have a working dmesg command at all. Assume it is omitted on 4MB devices as a space saving measure, but likely will be difficult to troubleshoot this further without it.
As a next step, I'd like to see if I can build my own firmware with dmesg enabled, but will require a side trip into figuring out how to get a working build environment for myself.