[DSM-devel] dsme code cleanup, comments before merging?

Lars Wirzenius liw at liw.iki.fi
Thu Aug 24 19:28:05 EEST 2006


pe, 2006-08-18 kello 16:12 +0300, Ismo.Laitinen at nokia.com kirjoitti:
> >> Additional remarks for discussion:
> >> 
> >> * Error handling needs to become robust, no core dumping on broken
> >>   connections (dsmetool), and at the very least, all memory
> >>   allocations should be checked (strdup often isn't, now). Related
> >>   to this, a decision on what to do upon these errors should happen:
> >>   do we try to continue even though there is no memory? Or do we
> >>   simply die, after trying to report the error? How critical is it
> >>   for dsme to never die?
> 
> We need to come back to this. Basically in N770 dsme should never
> die and if it does, hardware watchdog will reboot the device. Also,
> on N770 dsme mallocs will practically never fail. If memory is really
> running out the device gets so slow in kernel that dsme (or userspace
> in general) doesn't get enough runtime to kick hardware watchdog.
> But that's another topic :-)
> 
> But yes, there should be a consistent way to deal with errors. On
> rather rare, but fatal errors (such as malloc fail) it's IMHO better 
> to die and let hardware WD to reboot the device instead of trying to
> recover. Just because dsme (and state management in general) should
> be in a known state and recovering on severe problems might be 
> tricky and unreliable. On smaller errors (mostly in the modules)
> errors should be by-passed even with limited functionality.

Based on this and my own thinking, here's what I propose: let's make
dsme die if a memory allocation fails, and rely on an external agent to
reboot or restart dsme or whatever is the appropriate response. Further,
if it later turns out that this strategy is bad, we'll need to either
come up with a simple recovery strategy for these situations (and that's
going to be difficult, if there's threading and plugins involved), or
better, rewrite dsme to not use dynamic memory allocation, thus avoiding
the problem in the first place.

I doubt the "further" scenario is going to become relevant, however. In
the interest of not doing work that can be avoided, sticking to the
simple approach of using malloc (et al) and aborting on errors is the
best course of action.

Thus, I'll write a dsme_malloc function that checks for a NULL return
from malloc and kills the process if it sees one. I'll also write
dsme_strdup, and maybe other similar functions. Then I'll change all the
code to use these, and see if I can come up with a macro trick to
prevent using of malloc, realloc, calloc, or strdup by mistake.

If there's no opposition to this, I'll do it tomorrow.

-- 
Päivät on kuin piikkilankaa, ne murjoo mua.



More information about the DSM-devel mailing list